Bug 223069 - REGRESSION(r271560): [Linux] release assert in Thread::initializePlatformThreading
Summary: REGRESSION(r271560): [Linux] release assert in Thread::initializePlatformThre...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alberto Garcia
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-11 07:25 PST by seb128
Modified: 2021-03-25 05:12 PDT (History)
12 users (show)

See Also:


Attachments
Patch (1.42 KB, patch)
2021-03-25 04:33 PDT, Alberto Garcia
cgarcia: review+
Details | Formatted Diff | Diff
Patch v2 (1.40 KB, patch)
2021-03-25 04:44 PDT, Alberto Garcia
cgarcia: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description seb128 2021-03-11 07:25:24 PST
The webkit2gtk Ubuntu hirsute update (2.30.5 to 2.31.90) is blocked because the ruby-gnome tests started failing

Steps to trigger
- install Ubuntu hirsute
- enable sources and proposed repositories
- $ sudo apt install quilt xvfb xauth pkg-config libgtk-3-dev dbus-x11 gstreamer1.0-plugins-good gnome-icon-theme libxml2-utils ruby-webkit2-gtk
- $ apt source ruby-gnome; cd ruby-gnome-3.4.3
- $ xvfb-run ruby webkit2-gtk/test/run-test.rb
...
Loaded suite test
Started
Aborted (core dumped)

Gdb backtrace after rebuildwing webkit without optimization

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
        set = 
            {__val = {0, 140737488338759, 140737488338480, 140736484799089, 139642271679776, 140737488338688, 140737488338512, 140736484799131, 140737352090288, 67108868, 140737348587584, 0, 140736495473024, 140737348252777, 140737348252848, 140736495473024}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007ffff7a9a864 in __GI_abort () at abort.c:79
        save_stage = 1
        act = 
          {__sigaction_handler = {sa_handler = 0x7ffff7e0c2b0, sa_sigaction = 0x7ffff7e0c2b0}, sa_mask = {__val = {0, 140736495473024, 140737348252777, 140737348252848, 140736495473024, 140737488338624, 140737488339296, 140737488338688, 3623148696358275072, 140736484799134, 140737488338640, 140737488339296, 140737349401473, 140737232721296, 128, 140737488338816}}, sa_flags = 67108868, sa_restorer = 0x7ffff7ab5040 <__restore_rt>}
        sigs = 
            {__val = {32, 0, 140737488338784, 140736435747119, 140737488338864, 17177430016, 140737488338960, 12884967423, 140737488338896, 140736435765782, 1099511627778, 140736495472776, 8029613824315247090, 140733461823492, 12884901891, 3}}
#2  0x00007fffc1427942 in CRASH_WITH_INFO(...) () at DerivedSources/ForwardingHeaders/wtf/Assertions.h:713
#3  0x00007fffc43a8af2 in WTF::Thread::initializePlatformThreading() ()
    at ../Source/WTF/wtf/posix/ThreadingPOSIX.cpp:217
#4  0x00007fffc42ffdc8 in operator()() () at ../Source/WTF/wtf/Threading.cpp:379
#5  0x00007fffc430017c in __invoke_impl<void, WTF::initialize()::<lambda()> >(void) ()
    at /usr/include/c++/10/bits/invoke.h:60
#6  0x00007fffc4300128 in __invoke<WTF::initialize()::<lambda()> >(void) ()
    at /usr/include/c++/10/bits/invoke.h:95
#7  0x00007fffc42fffb5 in operator()() () at /usr/include/c++/10/mutex:717
#8  0x00007fffc42fffdf in operator()() () at /usr/include/c++/10/mutex:722
#9  0x00007fffc42ffff4 in _FUN() () at /usr/include/c++/10/mutex:722
#10 0x00007ffff7a6345f in __pthread_once_slow
    (once_control=0x7fffc4d1d240 <WTF::initialize()::onceKey>, init_routine=0x7ffff0c35590 <__once_proxy>)
    at pthread_once.c:116
        _buffer = 
          {__routine = 0x7ffff7a634b0 <clear_once_control>, __arg = 0x7fffc4d1d240 <WTF::initialize()::onceKey>, __canceltype = 44121, __prev = 0x7fffffffc310}
        val = <optimized out>
        newval = <optimized out>
#11 0x00007fffc42ff016 in __gthread_once() () at /usr/include/x86_64-linux-gnu/c++/10/bits/gthr-default.h:700
#12 0x00007fffc430008d in call_once<WTF::initialize()::<lambda()> >(void) () at /usr/include/c++/10/mutex:729
#13 0x00007fffc42ffe41 in WTF::initialize() () at ../Source/WTF/wtf/Threading.cpp:372
#14 0x00007fffc3a2feaa in operator()() () at ../Source/JavaScriptCore/runtime/InitializeThreading.cpp:58
#15 0x00007fffc3a354d8 in __invoke_impl<void, JSC::initialize()::<lambda()> >(void) ()
    at /usr/include/c++/10/bits/invoke.h:60
#16 0x00007fffc3a351c3 in __invoke<JSC::initialize()::<lambda()> >(void) ()
    at /usr/include/c++/10/bits/invoke.h:95
#17 0x00007fffc3a34d6f in operator()() () at /usr/include/c++/10/mutex:717
#18 0x00007fffc3a34d99 in operator()() () at /usr/include/c++/10/mutex:722
#19 0x00007fffc3a34dae in _FUN() () at /usr/include/c++/10/mutex:722
#20 0x00007ffff7a6345f in __pthread_once_slow
    (once_control=0x7fffc4d1ca64 <JSC::initialize()::onceFlag>, init_routine=0x7ffff0c35590 <__once_proxy>)
    at pthread_once.c:116
        _buffer = 
          {__routine = 0x7ffff7a634b0 <clear_once_control>, __arg = 0x7fffc4d1ca64 <JSC::initialize()::onceFlag>, __canceltype = 1435833040, __prev = 0x7fffffffc4e0}
        val = <optimized out>
        newval = <optimized out>
#21 0x00007fffc3a2ddc4 in __gthread_once() () at /usr/include/x86_64-linux-gnu/c++/10/bits/gthr-default.h:700
#22 0x00007fffc3a34e47 in call_once<JSC::initialize()::<lambda()> >(void) () at /usr/include/c++/10/mutex:729
#23 0x00007fffc3a30056 in JSC::initialize() () at ../Source/JavaScriptCore/runtime/InitializeThreading.cpp:57
#24 0x00007fffc5d4e383 in WebKit::InitializeWebKit2() () at ../Source/WebKit/Shared/WebKit2Initialize.cpp:43
#25 0x00007fffc606b00b in operator()() () at ../Source/WebKit/UIProcess/API/glib/WebKitInitialize.cpp:65
#26 0x00007fffc606b1fe in __invoke_impl<void, WebKit::webkitInitialize()::<lambda()> >(void) ()
    at /usr/include/c++/10/bits/invoke.h:60
#27 0x00007fffc606b1cd in __invoke<WebKit::webkitInitialize()::<lambda()> >(void) ()
    at /usr/include/c++/10/bits/invoke.h:95
#28 0x00007fffc606b099 in operator()() () at /usr/include/c++/10/mutex:717
#29 0x00007fffc606b0c3 in operator()() () at /usr/include/c++/10/mutex:722
#30 0x00007fffc606b0d8 in _FUN() () at /usr/include/c++/10/mutex:722
#31 0x00007ffff7a6345f in __pthread_once_slow
    (once_control=0x7fffcbfe44f8 <WebKit::webkitInitialize()::onceFlag>, init_routine=0x7ffff0c35590 <__once_proxy>) at pthread_once.c:116
        _buffer = 
          {__routine = 0x7ffff7a634b0 <clear_once_control>, __arg = 0x7fffcbfe44f8 <WebKit::webkitInitialize()::onceFlag>, __canceltype = 0, __prev = 0x0}
        val = <optimized out>
        newval = <optimized out>
#32 0x00007fffc606aecc in __gthread_once() () at /usr/include/x86_64-linux-gnu/c++/10/bits/gthr-default.h:700
#33 0x00007fffc606b171 in call_once<WebKit::webkitInitialize()::<lambda()> >(void) ()
    at /usr/include/c++/10/mutex:729
#34 0x00007fffc606b05f in WebKit::webkitInitialize() ()
    at ../Source/WebKit/UIProcess/API/glib/WebKitInitialize.cpp:64
#35 0x00007fffc606d9eb in webkit_input_method_context_class_init() ()
    at ../Source/WebKit/UIProcess/API/glib/WebKitInputMethodContext.cpp:207
#36 0x00007fffc606d682 in webkit_input_method_context_class_intern_init() ()
    at ../Source/WebKit/UIProcess/API/glib/WebKitInputMethodContext.cpp:137
#37 0x00007ffff39dba82 in g_type_class_ref () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
#38 0x00007ffff3b85058 in  () at /usr/lib/x86_64-linux-gnu/ruby/vendor_ruby/2.7.0/glib2.so
#39 0x00007ffff7d0450f in rb_ensure () at /lib/x86_64-linux-gnu/libruby-2.7.so.2.7

Downgrading the webkitgtk binaries to the hirsute 2.30.5 fixes the issue so it seems to be an issue with webkitgtk itself and not other distribution components
Comment 1 Alberto Garcia 2021-03-16 12:25:26 PDT
Ok, so here's what happens:

JSC uses SIGUSR1 to suspend and resume threads. After r271560 (see bug 220641) this signal is configurable because some users want to use SIGUSR1 for other purposes, so the patch added the JSC_SIGNAL_FOR_GC environment variable.

So far so good. The problem is that the new code in WebKit now specifically checks that no one else is handling that signal already:

https://github.com/WebKit/WebKit/blob/a807a4f6d013ac51005bb5c3153bc670fee47065/Source/WTF/wtf/posix/ThreadingPOSIX.cpp#L211

The RELEASE_ASSERT immediately after that check aborts the program.

In this specific case the problem appears when running a set of tests using Ruby. WebKit is loaded directly by the Ruby process (using gobject-introspection), and the reason why this crashes is that Ruby is installing handlers for SIGUSR1, SIGUSR2 among others:

https://sources.debian.org/src/ruby2.7/2.7.2-4/signal.c/#L1551

With a different signal there's no crash:

$ JSC_SIGNAL_FOR_GC=30 xvfb-run ruby webkit2-gtk/test/run-test.rb
Comment 2 seb128 2021-03-16 23:27:49 PDT
Thanks, the JSC_SIGNAL_FOR_GC trick is fixing the issue ... should it be reported to ruby-gnome instead now?
Comment 3 Alberto Garcia 2021-03-17 02:38:13 PDT
I suspect that this can be a common problem and therefore WebKit should not abort on a situation like this. I wouldn't report the bug to ruby-gnome yet.

Some possible alternatives:

1) Keep the previous behavior and don't abort if SIGUSR1 was already being handled. Optionally emit a warning.
2) Try different signals (USR1, USR2, ...) before aborting.
Comment 4 Yusuke Suzuki 2021-03-17 03:22:32 PDT
I have a question.

(In reply to Alberto Garcia from comment #3)
> I suspect that this can be a common problem and therefore WebKit should not
> abort on a situation like this. I wouldn't report the bug to ruby-gnome yet.
> 
> Some possible alternatives:
> 
> 1) Keep the previous behavior and don't abort if SIGUSR1 was already being
> handled. Optionally emit a warning.

In this case, what signal handler is installed? If JSC signal handler is not set, then JSC will not work. On the other hand, is it OK to discard ruby's signal handler?
Comment 5 Alberto Garcia 2021-03-17 04:59:45 PDT
(In reply to Yusuke Suzuki from comment #4)
> > Some possible alternatives:
> >
> > 1) Keep the previous behavior and don't abort if SIGUSR1 was
> > already being handled. Optionally emit a warning.

> In this case, what signal handler is installed? If JSC signal
> handler is not set, then JSC will not work. On the other hand, is it
> OK to discard ruby's signal handler?

In this case the JSC handler would be installed, overriding the
previous one.

What I'm trying to avoid is that an app that was working fine suddenly
breaks after upgrading WebKit (as this bug report shows). Even if the
cost is that an app that was already broken remains broken.

In the case of Ruby: I just took a look at the code now and as far as
I can see the only reason why it is handling SIGUSR1 (among others) is
to run any potential handler for those signals in the Ruby script
being executed.

So yes, if a Ruby script needs to handle the same signal that JSC is
using then there is a problem. I understand that this is true in
general for any WebKit embedder.
Comment 6 Yusuke Suzuki 2021-03-17 17:58:43 PDT
(In reply to Alberto Garcia from comment #5)
> (In reply to Yusuke Suzuki from comment #4)
> > > Some possible alternatives:
> > >
> > > 1) Keep the previous behavior and don't abort if SIGUSR1 was
> > > already being handled. Optionally emit a warning.
> 
> > In this case, what signal handler is installed? If JSC signal
> > handler is not set, then JSC will not work. On the other hand, is it
> > OK to discard ruby's signal handler?
> 
> In this case the JSC handler would be installed, overriding the
> previous one.
> 
> What I'm trying to avoid is that an app that was working fine suddenly
> breaks after upgrading WebKit (as this bug report shows). Even if the
> cost is that an app that was already broken remains broken.
> 
> In the case of Ruby: I just took a look at the code now and as far as
> I can see the only reason why it is handling SIGUSR1 (among others) is
> to run any potential handler for those signals in the Ruby script
> being executed.
> 
> So yes, if a Ruby script needs to handle the same signal that JSC is
> using then there is a problem. I understand that this is true in
> general for any WebKit embedder.

I think it is OK to remove this RELEASE_ASSERT for now, and we should also open a bug in Ruby binding implementations to configure JSC signal from SIGUSR1 to different one instead (e.g. SIGPWR, SIGXCPU etc.).
Comment 7 Alberto Garcia 2021-03-18 01:58:59 PDT
(In reply to Yusuke Suzuki from comment #6)
> > So yes, if a Ruby script needs to handle the same signal that JSC
> > is using then there is a problem. I understand that this is true
> > in general for any WebKit embedder.
> I think it is OK to remove this RELEASE_ASSERT for now, and we
> should also open a bug in Ruby binding implementations to configure
> JSC signal from SIGUSR1 to different one instead (e.g. SIGPWR,
> SIGXCPU etc.).

If we just remove the RELEASE_ASSERT then the JSC signal handler won't
be installed, which is different from the previous behavior, is that
ok?

My initial idea was to replace the handler (as WebKit used to do) and
emit a warning / log message.
Comment 8 Michael Catanzaro 2021-03-24 06:10:50 PDT
(In reply to Alberto Garcia from comment #7)
> My initial idea was to replace the handler (as WebKit used to do) and
> emit a warning / log message.

That seems better than not registering our SIGUSR1 handler.

Alternatively, we could just crash, which seems like a more reliable way to ensure apps know they must not handle SIGUSR1 if they link to WebKit.
Comment 9 Alberto Garcia 2021-03-24 07:26:27 PDT
(In reply to Michael Catanzaro from comment #8)
> Alternatively, we could just crash, which seems like a more reliable way to
> ensure apps know they must not handle SIGUSR1 if they link to WebKit.

"Just crashing" is what's happening now.
Comment 10 Alberto Garcia 2021-03-25 04:33:11 PDT
Created attachment 424233 [details]
Patch

This patch replaces any existing handler and emits a warning. I verified that the ruby-gnome tests pass successfully with it.
Comment 11 Carlos Garcia Campos 2021-03-25 04:36:20 PDT
Comment on attachment 424233 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424233&action=review

> Source/WTF/wtf/posix/ThreadingPOSIX.cpp:213
> +            WTFLogAlways("Overriding existing handler for signal %d. "
> +                "Set JSC_SIGNAL_FOR_GC if you want WebKit to use a different signal", signal);

I would use a single line even if it's a bit long.
Comment 12 Alberto Garcia 2021-03-25 04:44:08 PDT
Created attachment 424234 [details]
Patch v2

Same patch with a single line for the warning message
Comment 13 Alberto Garcia 2021-03-25 05:12:28 PDT
Committed r275014: <https://commits.webkit.org/r275014>