Bug 181138

Summary: [GTK] Crash destroying WebCore::FileMonitor
Product: WebKit Reporter: Michael Catanzaro <mcatanzaro>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, bugs-noreply, cgarcia, magomez, mcatanzaro, wilander
Priority: P2    
Version: Other   
Hardware: PC   
OS: Linux   
See Also: https://bugs.webkit.org/show_bug.cgi?id=179909
https://bugs.webkit.org/show_bug.cgi?id=181231
Attachments:
Description Flags
Speculative fix mcatanzaro: review+

Description Michael Catanzaro 2017-12-22 14:22:41 PST
I thought we had fixed this in bug #179909, but apparently something is still wrong. Layout tests fast/scrolling/rtl-scrollbars-animation-property.html and intersection-observer/intersection-observer-entry-interface.html are both flaky crashes:

Thread 1 (Thread 0x7ff833fff700 (LWP 7884)):
#0  0x00007ff89a537b75 in g_type_check_instance_is_fundamentally_a () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.54.2/gobject/gtype.c:4023
#1  0x00007ff89a5182b5 in g_object_unref () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.54.2/gobject/gobject.c:3227
#2  0x00007ff89d74ca97 in _ZN3WTF8FunctionIFvvEE15CallableWrapperIZN7WebCore11FileMonitorC4ERKNS_6StringEONS_3RefINS_9WorkQueueENS_13DumbPtrTraitsISA_EEEEONS0_IFvNS5_14FileChangeTypeEEEEEUlvE_E4callEv () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#3  0x00007ff89e980e57 in _ZN3WTF7RunLoop11performWorkEv () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#4  0x00007ff89e9be4c9 in _ZZN3WTF7RunLoopC4EvENUlPvE_4_FUNES1_ () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#5  0x00007ff89a43781a in g_main_dispatch () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.54.2/glib/gmain.c:3148
#6  g_main_context_dispatch () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.54.2/glib/gmain.c:3813
#7  0x00007ff89a437ba8 in g_main_context_iterate () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.54.2/glib/gmain.c:3886
#8  0x00007ff89a437ec2 in g_main_loop_run () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.54.2/glib/gmain.c:4082
#9  0x00007ff89e9bee90 in _ZN3WTF7RunLoop3runEv () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#10 0x00007ff89e9825ab in _ZN3WTF6Thread10entryPointEPNS0_16NewThreadContextE () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#11 0x00007ff89e9bd249 in _ZN3WTFL19wtfThreadEntryPointEPv () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
#12 0x00007ff8977ba494 in start_thread (arg=0x7ff833fff700) at pthread_create.c:333
#13 0x00007ff893f9d93f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Updating expectations accordingly.

fast/scrolling/rtl-scrollbars-animation-property.html has been crashing for a while, first one I see is r223831. But, interestingly, the first crash I see for intersection-observer/intersection-observer-entry-interface.html is at r225145, so this one didn't start crashing until after the change in bug #179909 landed.
Comment 1 Michael Catanzaro 2017-12-26 10:19:44 PST
*** Bug 180806 has been marked as a duplicate of this bug. ***
Comment 2 Michael Catanzaro 2017-12-26 10:21:04 PST
Also: fast/html/menuitem-element.html
Comment 3 Michael Catanzaro 2018-01-01 20:15:54 PST
Also:

fast/attachment/attachment-action.html
fast/canvas/2d.currentPoint.html
fast/text/international/system-language/arabic-glyph-cache-fill-combine.html

I expect this must affect a huge number of tests.
Comment 4 Carlos Garcia Campos 2018-01-02 02:22:44 PST
I can't reproduce this, but I think we can avoid all race conditions by always creating and destroying the platform file monitor in the work queue thread synchronously. 

However, I wonder if we should revert the patch that added support for resource load stats in the glib based ports. Does it make sense to have it without the machine learning part? The layout tests were passing when we added the support, but that's no longer the case, all of them are timing out now and John skipped them in r223253.

We should fix FileMonitor in any case, because even if it's only used by resource load stats right now, it's a generic implementation that could be used by other parts eventually.
Comment 5 Carlos Garcia Campos 2018-01-02 02:28:02 PST
Created attachment 330311 [details]
Speculative fix

Since resource load stats tests are skipped, we should disable the feature in WTR. That would fix these crashes for sure, but let's do it a few days after this patch lands to ensure it fixes the crashes.
Comment 6 Michael Catanzaro 2018-01-02 08:04:39 PST
(In reply to Carlos Garcia Campos from comment #4)
> However, I wonder if we should revert the patch that added support for
> resource load stats in the glib based ports. Does it make sense to have it
> without the machine learning part? The layout tests were passing when we
> added the support, but that's no longer the case, all of them are timing out
> now and John skipped them in r223253.

This has been bothering me for a while. We should revert it, yes, if we don't have time to fix it.

I don't know how essential the machine learning portion is. We should ask the Apple developers who designed it for advice. We can probably implement it ourselves with sufficient time; problem is, machine learning is tricky and requires a lot more experience with math than I have myself, so we'll surely need guidance.

> We should fix FileMonitor in any case, because even if it's only used by
> resource load stats right now, it's a generic implementation that could be
> used by other parts eventually.

Yes.
Comment 7 Michael Catanzaro 2018-01-02 08:05:42 PST
I'll review the patch in the afternoon.
Comment 8 Brent Fulgham 2018-01-02 08:29:38 PST
(In reply to Carlos Garcia Campos from comment #4) 
> However, I wonder if we should revert the patch that added support for
> resource load stats in the glib based ports. Does it make sense to have it
> without the machine learning part

You should be able to use the threshold-based classifier without needing to use the ML based engine. It works pretty well (or at least it did back when we were using it). I think you get that version automatically when you build without the libsvm features enabled.
Comment 9 Carlos Garcia Campos 2018-01-02 08:34:32 PST
(In reply to Brent Fulgham from comment #8)
> (In reply to Carlos Garcia Campos from comment #4) 
> > However, I wonder if we should revert the patch that added support for
> > resource load stats in the glib based ports. Does it make sense to have it
> > without the machine learning part
> 
> You should be able to use the threshold-based classifier without needing to
> use the ML based engine. It works pretty well (or at least it did back when
> we were using it). I think you get that version automatically when you build
> without the libsvm features enabled.

Thanks Brent, I guess that's what we had before it broke in r223253. How can we test it apart from running the layout tests?
Comment 10 Michael Catanzaro 2018-01-02 08:44:26 PST
(In reply to Carlos Garcia Campos from comment #9)
> Thanks Brent, I guess that's what we had before it broke in r223253. How can
> we test it apart from running the layout tests?

From a quick skim of that patch, it looks like we just have to implement TestController::isStatisticsRegisteredAsSubFrameUnder and TestController::isStatisticsRegisteredAsRedirectingTo to get the tests working again.
Comment 11 Michael Catanzaro 2018-01-02 15:31:18 PST
Comment on attachment 330311 [details]
Speculative fix

View in context: https://bugs.webkit.org/attachment.cgi?id=330311&action=review

I think the code changes are right.

It would be good to add a comment to document the thread-safety semantics of FileMonitor, since this has proven to be a tricky bit of code. Right now, it's not clear to the reader why the create/destroy operations need to be synchronous.

> Source/WebCore/platform/glib/FileMonitorGLib.cpp:60
> +    // The monitor can be destroyed in the work queue thread.
> +    if (&m_handlerQueue->runLoop() == &RunLoop::current())
> +        return;

Good catch.
Comment 12 Michael Catanzaro 2018-01-02 15:35:55 PST
Please also remove all the crash expectations with this patch. Otherwise we will not notice whether it is still crashing or not.
Comment 13 Michael Catanzaro 2018-01-02 17:53:01 PST
(In reply to Michael Catanzaro from comment #10)
> From a quick skim of that patch, it looks like we just have to implement
> TestController::isStatisticsRegisteredAsSubFrameUnder and
> TestController::isStatisticsRegisteredAsRedirectingTo to get the tests
> working again.

Bug #181231
Comment 14 Carlos Garcia Campos 2018-01-03 00:23:40 PST
Committed r226355: <https://trac.webkit.org/changeset/226355>