Bug 142333

Summary: [GTK] WebCore::TransformationMatrix::multiply segfaults when loading last.fm
Product: WebKit Reporter: Joanmarie Diggs <jdiggs>
Component: Layout and RenderingAssignee: Zan Dobersek <zan>
Status: RESOLVED WONTFIX    
Severity: Normal CC: arjunak234, benjamin, bugzilla, cgarcia, mcatanzaro, rdieter, zan
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1200217
https://bugzilla.redhat.com/show_bug.cgi?id=1208989
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65690
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65715
Attachments:
Description Flags
Patch
none
backtrace from debug build
none
WIP none

Joanmarie Diggs
Reported 2015-03-05 03:07:08 PST
9 times out of 10, when I load last.fm in MiniBrowser I see the following crash: Program received signal SIGSEGV, Segmentation fault. 0x00007ffa9758a143 in WebCore::TransformationMatrix::multiply(WebCore::TransformationMatrix const&) () from /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 (gdb) bt #0 0x00007ffa9758a143 in WebCore::TransformationMatrix::multiply(WebCore::TransformationMatrix const&) () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 #1 0x00007ffa97534228 in WebCore::GraphicsLayerTransform::combineTransforms(WebCore::TransformationMatrix const&) () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 #2 0x00007ffa9757b814 in WebCore::TextureMapperLayer::computeTransformsRecursive() () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 #3 0x00007ffa9757efc0 in WebCore::TextureMapperLayer::paint() () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 #4 0x00007ffa96d48904 in WebKit::LayerTreeHostGtk::compositeLayersToContext(WebKit::LayerTreeHostGtk::CompositePurpose) () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 #5 0x00007ffa96d4901d in WebKit::LayerTreeHostGtk::flushAndRenderLayers() () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 #6 0x00007ffa96d49094 in WebKit::LayerTreeHostGtk::layerFlushTimerFired() () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 #7 0x00007ffa95618599 in WTF::GMainLoopSource::voidCallback() () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18 #8 0x00007ffa9561684a in WTF::GMainLoopSource::voidSourceCallback(WTF::GMainLoopSource*) () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18 #9 0x00007ffa924cc23a in g_main_context_dispatch (context=0xd3f470) at gmain.c:3122 #10 0x00007ffa924cc23a in g_main_context_dispatch (context=context@entry=0xd3f470) at gmain.c:3737 #11 0x00007ffa924cc5d0 in g_main_context_iterate (context=0xd3f470, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3808 #12 0x00007ffa924cc8f2 in g_main_loop_run (loop=0xf5d7e0) at gmain.c:4002 #13 0x00007ffa96d47512 in int WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebProcessMain>(int, char**) () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37 #14 0x00007ffa8c886790 in __libc_start_main (main= 0x400ab0 <main>, argc=2, argv=0x7ffc15f4bd88, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc15f4bd78) at libc-start.c:289 #15 0x0000000000400b09 in _start () The remaining 1 time, I see bug 142309. I'll try to come up with a stand-alone (independent of last.fm) test case for this bug as I work on the accessibility one.
Attachments
Patch (5.00 KB, patch)
2015-03-05 06:57 PST, Carlos Garcia Campos
no flags
backtrace from debug build (13.90 KB, text/plain)
2015-03-05 09:54 PST, Joanmarie Diggs
no flags
WIP (1.81 KB, patch)
2015-04-04 09:55 PDT, Zan Dobersek
no flags
Joanmarie Diggs
Comment 1 2015-03-05 04:32:46 PST
Turns out it is simple to get this crash. Load this: <html> <body> <div style="-webkit-transform: translate3d(0, 0, 0);">foo</div> </body> </html> Reliably crashes for me. Both machines are F22, MiniBrowser with current master, not using the WebKitGtk jhbuild deps at the moment.
Carlos Garcia Campos
Comment 2 2015-03-05 06:57:44 PST
Martin Robinson
Comment 3 2015-03-05 07:37:07 PST
Comment on attachment 247950 [details] Patch Hrm. Why is LayerTreeHostGtk::compositeLayersToContext being called before the native surface is set? That seems really suspicious.
Carlos Garcia Campos
Comment 4 2015-03-05 07:39:44 PST
(In reply to comment #3) > Comment on attachment 247950 [details] > Patch > > Hrm. Why is LayerTreeHostGtk::compositeLayersToContext being called before > the native surface is set? That seems really suspicious. Because the native handler ID is now set when the web view is realized, since r177075.
Martin Robinson
Comment 5 2015-03-05 07:51:44 PST
Comment on attachment 247950 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=247950&action=review > Source/WebKit2/WebProcess/WebPage/gtk/LayerTreeHostGtk.cpp:-130 > - // The creation of the TextureMapper needs an active OpenGL context. > - if (!makeContextCurrent()) > - return; I think failing to call makeContextCurrent here is definitely a problem. If the OpenGL context has changed the commands will be issued to the incorrect context.
Carlos Garcia Campos
Comment 6 2015-03-05 08:11:22 PST
(In reply to comment #5) > Comment on attachment 247950 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=247950&action=review > > > Source/WebKit2/WebProcess/WebPage/gtk/LayerTreeHostGtk.cpp:-130 > > - // The creation of the TextureMapper needs an active OpenGL context. > > - if (!makeContextCurrent()) > > - return; > > I think failing to call makeContextCurrent here is definitely a problem. If > the OpenGL context has changed the commands will be issued to the incorrect > context. I assumed by the comment that it was there only to create the TextureMapper.
Carlos Garcia Campos
Comment 7 2015-03-05 08:42:18 PST
Comment on attachment 247950 [details] Patch I'm not sure this patch fixes this issue, because the bts in the debug bot are a bit different, so I've opened a new bug report. See https://bugs.webkit.org/show_bug.cgi?id=142345
Joanmarie Diggs
Comment 8 2015-03-05 09:54:30 PST
Created attachment 247960 [details] backtrace from debug build Hope this helps.
Joanmarie Diggs
Comment 9 2015-03-16 05:25:49 PDT
FWIW, I just downgraded a system to F21 (current stable) and I cannot reproduce this crash on the downgraded machine.
Michael Catanzaro
Comment 10 2015-03-16 08:41:11 PDT
I think this crash happens 100% of the time visiting youtube.com.
Carlos Garcia Campos
Comment 11 2015-03-16 09:31:15 PDT
(In reply to comment #10) > I think this crash happens 100% of the time visiting youtube.com. but only in Fedora unstable, right?
Michael Catanzaro
Comment 12 2015-03-16 10:46:08 PDT
(In reply to comment #11) > but only in Fedora unstable, right? Joanie says yes, and that's what I was using when I hit the crash on youtube... so probably, yes. :(
Joanmarie Diggs
Comment 13 2015-03-16 12:30:22 PDT
(In reply to comment #11) > (In reply to comment #10) > > I think this crash happens 100% of the time visiting youtube.com. > > but only in Fedora unstable, right? Yeah, I don't see it in F21. And presumably it's only in the 64-bit F22 -- haven't tried any other 64-bit unstable distro releases -- as my quick-and-dirty hackaround so that I could work on bug 142309 was this: --- a/Source/WebCore/platform/graphics/transforms/TransformationMatrix.cpp +++ b/Source/WebCore/platform/graphics/transforms/TransformationMatrix.cpp @@ -1241,7 +1241,7 @@ TransformationMatrix& TransformationMatrix::multiply(const TransformationMatrix& } #undef MATRIX_MULTIPLY_ONE_LINE -#elif defined(TRANSFORMATION_MATRIX_USE_X86_64_SSE2) +#elif defined(TRANSFORMATION_MATRIX_USE_X86_64_SSE2) && !PLATFORM(GTK) // x86_64 has 16 XMM registers which is enough to do the multiplication fully in registers. __m128d matrixBlockA = _mm_load_pd(&(m_matrix[0][0])); __m128d matrixBlockC = _mm_load_pd(&(m_matrix[1][0])); (And yes, with that in place youtube.com works just fine on F22.)
Michael Catanzaro
Comment 14 2015-03-16 13:54:21 PDT
OK thanks for the workaround, we will "gladly" carry that as long as necessary to make youtube work.
Joanmarie Diggs
Comment 15 2015-03-16 19:56:53 PDT
As an experiment, on my F21/stable machine, I upgraded gcc to the F22 packages leaving everything else alone. I now see this crash on that machine. From yum history: Packages Altered: Updated cpp-4.9.2-6.fc21.x86_64 @updates Update 5.0.0-0.17.fc22.x86_64 @fedora/22 Updated gcc-4.9.2-6.fc21.x86_64 @updates Update 5.0.0-0.17.fc22.x86_64 @fedora/22 Updated gcc-c++-4.9.2-6.fc21.x86_64 @updates Update 5.0.0-0.17.fc22.x86_64 @fedora/22 That's all it took. HTH.....
Michael Catanzaro
Comment 16 2015-04-03 08:37:36 PDT
Well, I have no clue what's wrong, or whether it's our code or GCC that is broken, but this is by far our biggest problem right now ("I think this crash happens 100% of the time visiting youtube.com"), so at least let's commit a workaround ASAP. Any objections to conditionally compiling the x86_64 matrix multiplication fastpath only when using GCC < 5.0? It will obviously make things slower, but that beats a crash. Hi Ben, CCing you because this is your code (from r138866), so fates permitting, maybe you have some idea what might be wrong. None of us understand it. :) The best backtrace we have is https://bugzilla.redhat.com/attachment.cgi?id=999725 which isn't great but does show the assembler where the crash occurs.
Zan Dobersek
Comment 17 2015-04-03 09:19:32 PDT
Does the crash occur in debug/non-optimized builds? Given that the problem appears to be due to changes in GCC, it'd be nice to report this to the project's Bugzilla or maybe find the corresponding bug, but I don't wish gathering all the necessary information upon anyone ...
Joanmarie Diggs
Comment 18 2015-04-03 09:49:03 PDT
(In reply to comment #17) > Does the crash occur in debug/non-optimized builds? See comment #8.
Michael Catanzaro
Comment 19 2015-04-03 09:54:46 PDT
Um, the backtrace in comment #8 is better :p
Zan Dobersek
Comment 20 2015-04-04 09:51:08 PDT
The reason behind the crash is that Matrix4 is not aligned to 16 bytes anymore, despite the (aligned (16)) attribute, which apparently GCC 5 doesn't respect anymore. The alignment of 16 bytes is mandatory for data processed with SSE instructions.
Zan Dobersek
Comment 21 2015-04-04 09:55:28 PDT
Created attachment 250129 [details] WIP One possible solution, using the alignas(16) specifier.
Michael Catanzaro
Comment 22 2015-04-04 15:02:52 PDT
Nice find. The significant disadvantage of your patch is that now you have to remember to use alignas each time you declare a matrix. Since I don't see any mention of this in the GCC 5 release notes, I presume the change is unintentional. I've filed a GCC bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1208989
Michael Catanzaro
Comment 23 2015-04-09 10:46:04 PDT
Should be fixed in GCC; will reopen if we run into problems with the updated version.
Note You need to log in before you can comment on or make changes to this bug.