Bug 142333 - [GTK] WebCore::TransformationMatrix::multiply segfaults when loading last.fm
Summary: [GTK] WebCore::TransformationMatrix::multiply segfaults when loading last.fm
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Zan Dobersek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-05 03:07 PST by Joanmarie Diggs (irc: joanie)
Modified: 2015-04-29 12:09 PDT (History)
7 users (show)

See Also:


Attachments
Patch (5.00 KB, patch)
2015-03-05 06:57 PST, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
backtrace from debug build (13.90 KB, text/plain)
2015-03-05 09:54 PST, Joanmarie Diggs (irc: joanie)
no flags Details
WIP (1.81 KB, patch)
2015-04-04 09:55 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joanmarie Diggs (irc: joanie) 2015-03-05 03:07:08 PST
9 times out of 10, when I load last.fm in MiniBrowser I see the following crash:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffa9758a143 in WebCore::TransformationMatrix::multiply(WebCore::TransformationMatrix const&) ()
   from /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
(gdb) bt
#0  0x00007ffa9758a143 in WebCore::TransformationMatrix::multiply(WebCore::TransformationMatrix const&) ()
    at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#1  0x00007ffa97534228 in WebCore::GraphicsLayerTransform::combineTransforms(WebCore::TransformationMatrix const&) () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#2  0x00007ffa9757b814 in WebCore::TextureMapperLayer::computeTransformsRecursive() ()
    at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#3  0x00007ffa9757efc0 in WebCore::TextureMapperLayer::paint() ()
    at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#4  0x00007ffa96d48904 in WebKit::LayerTreeHostGtk::compositeLayersToContext(WebKit::LayerTreeHostGtk::CompositePurpose) () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#5  0x00007ffa96d4901d in WebKit::LayerTreeHostGtk::flushAndRenderLayers() ()
    at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#6  0x00007ffa96d49094 in WebKit::LayerTreeHostGtk::layerFlushTimerFired() ()
    at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#7  0x00007ffa95618599 in WTF::GMainLoopSource::voidCallback() ()
    at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#8  0x00007ffa9561684a in WTF::GMainLoopSource::voidSourceCallback(WTF::GMainLoopSource*) ()
    at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
#9  0x00007ffa924cc23a in g_main_context_dispatch (context=0xd3f470) at gmain.c:3122
#10 0x00007ffa924cc23a in g_main_context_dispatch (context=context@entry=0xd3f470) at gmain.c:3737
#11 0x00007ffa924cc5d0 in g_main_context_iterate (context=0xd3f470, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3808
#12 0x00007ffa924cc8f2 in g_main_loop_run (loop=0xf5d7e0) at gmain.c:4002
#13 0x00007ffa96d47512 in int WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebProcessMain>(int, char**) () at /home/jd/checkout/WebKitGtk/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#14 0x00007ffa8c886790 in __libc_start_main (main=
    0x400ab0 <main>, argc=2, argv=0x7ffc15f4bd88, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc15f4bd78) at libc-start.c:289
#15 0x0000000000400b09 in _start ()

The remaining 1 time, I see bug 142309. I'll try to come up with a stand-alone (independent of last.fm) test case for this bug as I work on the accessibility one.
Comment 1 Joanmarie Diggs (irc: joanie) 2015-03-05 04:32:46 PST
Turns out it is simple to get this crash. Load this:

<html>
<body>
<div style="-webkit-transform: translate3d(0, 0, 0);">foo</div>
</body>
</html>

Reliably crashes for me. Both machines are F22, MiniBrowser with current master, not using the WebKitGtk jhbuild deps at the moment.
Comment 2 Carlos Garcia Campos 2015-03-05 06:57:44 PST
Created attachment 247950 [details]
Patch
Comment 3 Martin Robinson 2015-03-05 07:37:07 PST
Comment on attachment 247950 [details]
Patch

Hrm. Why is LayerTreeHostGtk::compositeLayersToContext being called before the native surface is set? That seems really suspicious.
Comment 4 Carlos Garcia Campos 2015-03-05 07:39:44 PST
(In reply to comment #3)
> Comment on attachment 247950 [details]
> Patch
> 
> Hrm. Why is LayerTreeHostGtk::compositeLayersToContext being called before
> the native surface is set? That seems really suspicious.

Because the native handler ID is now set when the web view is realized, since r177075.
Comment 5 Martin Robinson 2015-03-05 07:51:44 PST
Comment on attachment 247950 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=247950&action=review

> Source/WebKit2/WebProcess/WebPage/gtk/LayerTreeHostGtk.cpp:-130
> -    // The creation of the TextureMapper needs an active OpenGL context.
> -    if (!makeContextCurrent())
> -        return;

I think failing to call makeContextCurrent here is definitely a problem. If the OpenGL context has changed the commands will be issued to the incorrect context.
Comment 6 Carlos Garcia Campos 2015-03-05 08:11:22 PST
(In reply to comment #5)
> Comment on attachment 247950 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=247950&action=review
> 
> > Source/WebKit2/WebProcess/WebPage/gtk/LayerTreeHostGtk.cpp:-130
> > -    // The creation of the TextureMapper needs an active OpenGL context.
> > -    if (!makeContextCurrent())
> > -        return;
> 
> I think failing to call makeContextCurrent here is definitely a problem. If
> the OpenGL context has changed the commands will be issued to the incorrect
> context.

I assumed by the comment that it was there only to create the TextureMapper.
Comment 7 Carlos Garcia Campos 2015-03-05 08:42:18 PST
Comment on attachment 247950 [details]
Patch

I'm not sure this patch fixes this issue, because the bts in the debug bot are a bit different, so I've opened a new bug report. See https://bugs.webkit.org/show_bug.cgi?id=142345
Comment 8 Joanmarie Diggs (irc: joanie) 2015-03-05 09:54:30 PST
Created attachment 247960 [details]
backtrace from debug build

Hope this helps.
Comment 9 Joanmarie Diggs (irc: joanie) 2015-03-16 05:25:49 PDT
FWIW, I just downgraded a system to F21 (current stable) and I cannot reproduce this crash on the downgraded machine.
Comment 10 Michael Catanzaro 2015-03-16 08:41:11 PDT
I think this crash happens 100% of the time visiting youtube.com.
Comment 11 Carlos Garcia Campos 2015-03-16 09:31:15 PDT
(In reply to comment #10)
> I think this crash happens 100% of the time visiting youtube.com.

but only in Fedora unstable, right?
Comment 12 Michael Catanzaro 2015-03-16 10:46:08 PDT
(In reply to comment #11) 
> but only in Fedora unstable, right?

Joanie says yes, and that's what I was using when I hit the crash on youtube... so probably, yes. :(
Comment 13 Joanmarie Diggs (irc: joanie) 2015-03-16 12:30:22 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > I think this crash happens 100% of the time visiting youtube.com.
> 
> but only in Fedora unstable, right?

Yeah, I don't see it in F21. And presumably it's only in the 64-bit F22 -- haven't tried any other 64-bit unstable distro releases -- as my quick-and-dirty hackaround so that I could work on bug 142309 was this:

--- a/Source/WebCore/platform/graphics/transforms/TransformationMatrix.cpp
+++ b/Source/WebCore/platform/graphics/transforms/TransformationMatrix.cpp
@@ -1241,7 +1241,7 @@ TransformationMatrix& TransformationMatrix::multiply(const TransformationMatrix&
     }
 #undef MATRIX_MULTIPLY_ONE_LINE
 
-#elif defined(TRANSFORMATION_MATRIX_USE_X86_64_SSE2)
+#elif defined(TRANSFORMATION_MATRIX_USE_X86_64_SSE2) && !PLATFORM(GTK)
     // x86_64 has 16 XMM registers which is enough to do the multiplication fully in registers.
     __m128d matrixBlockA = _mm_load_pd(&(m_matrix[0][0]));
     __m128d matrixBlockC = _mm_load_pd(&(m_matrix[1][0]));

(And yes, with that in place youtube.com works just fine on F22.)
Comment 14 Michael Catanzaro 2015-03-16 13:54:21 PDT
OK thanks for the workaround, we will "gladly" carry that as long as necessary to make youtube work.
Comment 15 Joanmarie Diggs (irc: joanie) 2015-03-16 19:56:53 PDT
As an experiment, on my F21/stable machine, I upgraded gcc to the F22 packages leaving everything else alone. I now see this crash on that machine. From yum history:

Packages Altered:
    Updated cpp-4.9.2-6.fc21.x86_64                @updates
    Update      5.0.0-0.17.fc22.x86_64             @fedora/22
    Updated gcc-4.9.2-6.fc21.x86_64                @updates
    Update      5.0.0-0.17.fc22.x86_64             @fedora/22
    Updated gcc-c++-4.9.2-6.fc21.x86_64            @updates
    Update          5.0.0-0.17.fc22.x86_64         @fedora/22

That's all it took. HTH.....
Comment 16 Michael Catanzaro 2015-04-03 08:37:36 PDT
Well, I have no clue what's wrong, or whether it's our code or GCC that is broken, but this is by far our biggest problem right now ("I think this crash happens 100% of the time visiting youtube.com"), so at least let's commit a workaround ASAP. Any objections to conditionally compiling the x86_64 matrix multiplication fastpath only when using GCC < 5.0? It will obviously make things slower, but that beats a crash.

Hi Ben, CCing you because this is your code (from r138866), so fates permitting, maybe you have some idea what might be wrong. None of us understand it. :)

The best backtrace we have is https://bugzilla.redhat.com/attachment.cgi?id=999725 which isn't great but does show the assembler where the crash occurs.
Comment 17 Zan Dobersek 2015-04-03 09:19:32 PDT
Does the crash occur in debug/non-optimized builds?

Given that the problem appears to be due to changes in GCC, it'd be nice to report this to the project's Bugzilla or maybe find the corresponding bug, but I don't wish gathering all the necessary information upon anyone ...
Comment 18 Joanmarie Diggs (irc: joanie) 2015-04-03 09:49:03 PDT
(In reply to comment #17)
> Does the crash occur in debug/non-optimized builds?

See comment #8.
Comment 19 Michael Catanzaro 2015-04-03 09:54:46 PDT
Um, the backtrace in comment #8 is better :p
Comment 20 Zan Dobersek 2015-04-04 09:51:08 PDT
The reason behind the crash is that Matrix4 is not aligned to 16 bytes anymore, despite the (aligned (16)) attribute, which apparently GCC 5 doesn't respect anymore.

The alignment of 16 bytes is mandatory for data processed with SSE instructions.
Comment 21 Zan Dobersek 2015-04-04 09:55:28 PDT
Created attachment 250129 [details]
WIP

One possible solution, using the alignas(16) specifier.
Comment 22 Michael Catanzaro 2015-04-04 15:02:52 PDT
Nice find.

The significant disadvantage of your patch is that now you have to remember to use alignas each time you declare a matrix.

Since I don't see any mention of this in the GCC 5 release notes, I presume the change is unintentional. I've filed a GCC bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1208989
Comment 23 Michael Catanzaro 2015-04-09 10:46:04 PDT
Should be fixed in GCC; will reopen if we run into problems with the updated version.