Bug 244026 - WTF::StackTrace::captureStackTrace broken on aarch64 (at least when called from ResourceError::internalError)
Summary: WTF::StackTrace::captureStackTrace broken on aarch64 (at least when called fr...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-17 03:30 PDT by Alberto Garcia
Modified: 2022-10-05 05:22 PDT (History)
5 users (show)

See Also:


Attachments
Patch (1.07 KB, patch)
2022-09-21 02:52 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alberto Garcia 2022-08-17 03:30:06 PDT
This has been reported in Purism's PureOS and in Debian doing some basic browsing. Apparently only the Debian stable (bullseye) builds are the only ones that fail.

https://source.puri.sm/Librem5/debs/epiphany/-/issues/38
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016811

[🡕] Process 14397 (WebKitNetworkPr) of user 1000 dumped core.

Stack trace of thread 14397:
#0  0x0000ffff914f92ac __GI_raise (libc.so.6 + 0x372ac)
#1  0x0000ffff914e5ea0 __GI_abort (libc.so.6 + 0x23ea0)
#2  0x0000ffff93f89c50 _Z16WTFCrashWithInfoiPKcS0_i (libjavascriptcoregtk-4.0.so.18 + 0x198c50)
#3  0x0000ffff94f2a138 _ZN3WTF10StackTrace17captureStackTraceEii (libjavascriptcoregtk-4.0.so.18 + 0x1139138)
#4  0x0000ffff94f05a30 WTFReleaseLogStackTrace (libjavascriptcoregtk-4.0.so.18 + 0x1114a30)
#5  0x0000ffff97f02988 _ZN7WebCore13internalErrorERKN3WTF3URLE (libwebkit2gtk-4.0.so.37 + 0x214a988)
#6  0x0000ffff966038e0 _ZN6WebKit29NetworkConnectionToWebProcess12preconnectToESt8optionalIN3WTF16ObjectIdentifierIN7WebCore14ResourceLoaderEEEEONS_29NetworkResourceLoadParametersE (libwebkit2gtk-4.0.so.37 + 0x84b8e0)
#7  0x0000ffff964e0eb8 _ZN6WebKit29NetworkConnectionToWebProcess46didReceiveNetworkConnectionToWebProcessMessageERN3IPC10ConnectionERNS1_7DecoderE (libwebkit2gtk-4.0.so.37 + 0x728eb8)
#8  0x0000ffff9676eb70 _ZN3IPC10Connection15dispatchMessageESt10unique_ptrINS_7DecoderESt14default_deleteIS2_EE (libwebkit2gtk-4.0.so.37 + 0x9b6b70)
#9  0x0000ffff9676ef08 _ZN3IPC10Connection26dispatchOneIncomingMessageEv (libwebkit2gtk-4.0.so.37 + 0x9b6f08)
#10 0x0000ffff94f28ad0 _ZN3WTF7RunLoop11performWorkEv (libjavascriptcoregtk-4.0.so.18 + 0x1137ad0)
#11 0x0000ffff94f819f4 _ZZN3WTF7RunLoopC1EvEN3$_18__invokeEPv (libjavascriptcoregtk-4.0.so.18 + 0x11909f4)
#12 0x0000ffff94f80d88 _ZN3WTF7RunLoop3$_08__invokeEP8_GSourcePFiPvES4_ (libjavascriptcoregtk-4.0.so.18 + 0x118fd88)
#13 0x0000ffff91a53ab4 g_main_dispatch (libglib-2.0.so.0 + 0x53ab4)
#14 0x0000ffff91a53e5c g_main_context_iterate (libglib-2.0.so.0 + 0x53e5c)
#15 0x0000ffff91a541b0 g_main_loop_run (libglib-2.0.so.0 + 0x541b0)
#16 0x0000ffff94f81384 _ZN3WTF7RunLoop3runEv (libjavascriptcoregtk-4.0.so.18 + 0x1190384)
#17 0x0000ffff9674287c _ZN6WebKit20AuxiliaryProcessMainINS_22NetworkProcessMainSoupEEEiiPPc (libwebkit2gtk-4.0.so.37 + 0x98a87c)
#18 0x0000ffff914e6218 __libc_start_main (libc.so.6 + 0x24218)
#19 0x0000000000400874 $x (WebKitNetworkProcess + 0x874)
#20 0x0000000000400874 $x (WebKitNetworkProcess + 0x874)

[🡕] Process 14382 (WebKitWebProces) of user 1000 dumped core.
                                                   
Stack trace of thread 2:
#0  0x0000ffff854082ac __GI_raise (libc.so.6 + 0x372ac)
#1  0x0000ffff853f4ea0 __GI_abort (libc.so.6 + 0x23ea0)
#2  0x0000ffff87e98c50 _Z16WTFCrashWithInfoiPKcS0_i (libjavascriptcoregtk-4.0.so.18 + 0x198c50)
#3  0x0000ffff88e39138 _ZN3WTF10StackTrace17captureStackTraceEii (libjavascriptcoregtk-4.0.so.18 + 0x1139138)
#4  0x0000ffff88e14a30 WTFReleaseLogStackTrace (libjavascriptcoregtk-4.0.so.18 + 0x1114a30)
#5  0x0000ffff8be11988 _ZN7WebCore13internalErrorERKN3WTF3URLE (libwebkit2gtk-4.0.so.37 + 0x214a988)
#6  0x0000ffff8a9c1824 _ZN6WebKit17WebLoaderStrategy30internallyFailedLoadTimerFiredEv (libwebkit2gtk-4.0.so.37 + 0xcfa824)
#7  0x0000ffff88e90aa0 _ZZN3WTF7RunLoop9TimerBaseC1ERS0_EN3$_38__invokeEPv (libjavascriptcoregtk-4.0.so.18 + 0x1190aa0)
#8  0x0000ffff88e8fd88 _ZN3WTF7RunLoop3$_08__invokeEP8_GSourcePFiPvES4_ (libjavascriptcoregtk-4.0.so.18 + 0x118fd88)
#9  0x0000ffff85962ab4 g_main_dispatch (libglib-2.0.so.0 + 0x53ab4)
#10 0x0000ffff85962e5c g_main_context_iterate (libglib-2.0.so.0 + 0x53e5c)
#11 0x0000ffff859631b0 g_main_loop_run (libglib-2.0.so.0 + 0x541b0)
#12 0x0000ffff88e90384 _ZN3WTF7RunLoop3runEv (libjavascriptcoregtk-4.0.so.18 + 0x1190384)
#13 0x0000ffff8aa7b2b4 _ZN6WebKit20AuxiliaryProcessMainINS_17WebProcessMainGtkEEEiiPPc (libwebkit2gtk-4.0.so.37 + 0xdb42b4)
#14 0x0000ffff853f5218 __libc_start_main (libc.so.6 + 0x24218)
#15 0x0000000000400874 $x (WebKitWebProcess + 0x874)
#16 0x0000000000400874 $x (WebKitWebProcess + 0x874)
Comment 1 Alberto Garcia 2022-09-20 08:39:44 PDT
Another stack trace:

/usr/lib/aarch64-linux-gnu/webkit2gtk-4.0/WebKitNetworkProcess
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000ffff7cfdfaa0 in __GI_abort () at abort.c:79
#2  0x0000ffff7fa8ac50 in WTFCrashWithInfo(int, char const*, char const*, int) () at WTF/Headers/wtf/Assertions.h:741
#3  0x0000ffff80a2d5a8 in captureStackTrace () at ../Source/WTF/wtf/StackTrace.cpp:79
#4  0x0000ffff80a08ea0 in WTFReleaseLogStackTrace () at ../Source/WTF/wtf/Assertions.cpp:592
#5  0x0000ffff83c06550 in internalError () at ../Source/WebCore/platform/network/ResourceErrorBase.cpp:97
#6  0x0000ffff820e8d1c in preconnectTo () at ../Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:735
#7  0x0000ffff81fc62f4 in callMemberFunctionImpl<WebKit::NetworkConnectionToWebProcess, void (WebKit::NetworkConnectionToWebProcess::*)(std::optional<WTF::ObjectIdentifier<WebCore::ResourceLoader> >, WebKit::NetworkResourceLoadParameters&&), std::tuple<std::optional<WTF::ObjectIdentifier<WebCore::ResourceLoader> >, WebKit::NetworkResourceLoadParameters>, 0, 1> () at ../Source/WebKit/Platform/IPC/HandleMessage.h:125
#8  callMemberFunction<WebKit::NetworkConnectionToWebProcess, void (WebKit::NetworkConnectionToWebProcess::*)(std::optional<WTF::ObjectIdentifier<WebCore::ResourceLoader> >, WebKit::NetworkResourceLoadParameters&&), std::tuple<std::optional<WTF::ObjectIdentifier<WebCore::ResourceLoader> >, WebKit::NetworkResourceLoadParameters>, std::integer_sequence<unsigned long, 0, 1> > () at ../Source/WebKit/Platform/IPC/HandleMessage.h:131
#9  handleMessage<Messages::NetworkConnectionToWebProcess::PreconnectTo, WebKit::NetworkConnectionToWebProcess, void (WebKit::NetworkConnectionToWebProcess::*)(std::optional<WTF::ObjectIdentifier<WebCore::ResourceLoader> >, WebKit::NetworkResourceLoadParameters&&)> () at ../Source/WebKit/Platform/IPC/HandleMessage.h:196
#10 didReceiveNetworkConnectionToWebProcessMessage () at DerivedSources/WebKit/NetworkConnectionToWebProcessMessageReceiver.cpp:479
#11 0x0000ffff822543d0 in dispatchMessage () at ../Source/WebKit/Platform/IPC/Connection.cpp:1134
#12 0x0000ffff82254768 in dispatchOneIncomingMessage () at ../Source/WebKit/Platform/IPC/Connection.cpp:1203
#13 0x0000ffff80a2bf40 in operator() () at ../Source/WTF/wtf/Function.h:82
#14 performWork () at ../Source/WTF/wtf/RunLoop.cpp:133
#15 0x0000ffff80a85190 in operator() () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:80
#16 __invoke () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:79
#17 0x0000ffff80a84524 in operator() () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:53
#18 __invoke () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:45
#19 0x0000ffff7d551ab4 in g_main_context_dispatch () from /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
#20 0x0000ffff7d551e5c in ?? () from /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
#21 0x0000ffff7d5521b0 in g_main_loop_run () from /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
#22 0x0000ffff80a84b20 in run () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:108
#23 0x0000ffff822280d8 in run () at ../Source/WebKit/Shared/AuxiliaryProcessMain.h:70
#24 AuxiliaryProcessMain<WebKit::NetworkProcessMainSoup> () at ../Source/WebKit/Shared/AuxiliaryProcessMain.h:96
#25 0x0000ffff7cfdfe18 in __libc_start_main (main=0x400878 <__wrap_main>, argc=3, argv=0xfffff1c90058, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:308
    #26 0x0000000000400874 in _start ()


/usr/lib/aarch64-linux-gnu/webkit2gtk-4.0/WebKitWebProcess
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000ffff99831aa0 in __GI_abort () at abort.c:79
#2  0x0000ffff9c2dcc50 in WTFCrashWithInfo(int, char const*, char const*, int) () at WTF/Headers/wtf/Assertions.h:741
#3  0x0000ffff9d27f5a8 in captureStackTrace () at ../Source/WTF/wtf/StackTrace.cpp:79
#4  0x0000ffff9d25aea0 in WTFReleaseLogStackTrace () at ../Source/WTF/wtf/Assertions.cpp:592
#5  0x0000ffffa0458550 in internalError () at ../Source/WebCore/platform/network/ResourceErrorBase.cpp:97
#6  0x0000ffff9edead30 in internallyFailedLoadTimerFired () at ../Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:495
#7  0x0000ffff9d2d723c in operator() () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:177
#8  __invoke () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:169
#9  0x0000ffff9d2d6524 in operator() () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:53
#10 __invoke () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:45
#11 0x0000ffff99da3ab4 in g_main_context_dispatch () from /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
#12 0x0000ffff99da3e5c in ?? () from /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
#13 0x0000ffff99da41b0 in g_main_loop_run () from /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
#14 0x0000ffff9d2d6b20 in run () at ../Source/WTF/wtf/glib/RunLoopGLib.cpp:108
#15 0x0000ffff9eea47c4 in run () at ../Source/WebKit/Shared/AuxiliaryProcessMain.h:70
#16 AuxiliaryProcessMain<WebKit::WebProcessMainGtk> () at ../Source/WebKit/Shared/AuxiliaryProcessMain.h:96
#17 0x0000ffff99831e18 in __libc_start_main (main=0x400878 <__wrap_main>, argc=3, argv=0xfffff7b85168, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:308
    #18 0x0000000000400874 in _start ()

The network process one hits this line:

https://github.com/WebKit/WebKit/blob/webkitgtk-2.36.6/Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp#L735
Comment 2 Michael Catanzaro 2022-09-20 11:01:59 PDT
Huh, there's a lot going on here.

First, WTFReleaseLogStackTrace is broken. It's a long function with a bunch of code, but the first line calls WTF::StackTrace::captureStackTrace, which is fatal and does not return, so the rest is all pointless. WTFReleaseLogStackTrace is clearly not intended to be fatal. Note that ResourceError::internalError is the only place where it is ever used for WPE/GTK. The only other uses are in PixelBufferConformerCV.cpp, which is platform-specific. So that's why we didn't notice.

As for the errors themselves, there are two different traces:

(1) Web process crash in WebLoaderStrategy::internallyFailedLoadTimerFired. It seems the web process is designed to call ResourceError::internalError whenever the network process crashes. So this crash is just a symptom of the network process crash. I don't think we need to investigate this further: fixing WTFReleaseLogStackTrace and fixing the network process crash would suffice.

 (2) Network process crash when calling NetworkConnectionToWebProcess::preconnectTo. We should look closer to decide what to do here. Although fixing WTFReleaseLogStackTrace would avoid the crash, I think we should go further and ensure that ResourceError::internalError does not get called. Note this only happens when ENABLE_SERVER_PRECONNECT is disabled, so the crash is specific to libsoup 2 builds only. Probably we should drop the request in NetworkConnectionToWebProcess::preconnectTo with some different error, but another option would be to find everywhere that calls it and guard it behind ENABLE_SERVER_PRECONNECT.
Comment 3 Michael Catanzaro 2022-09-20 11:19:38 PDT
(In reply to Michael Catanzaro from comment #2)
> Huh, there's a lot going on here.
> 
> First, WTFReleaseLogStackTrace is broken. It's a long function with a bunch
> of code, but the first line calls WTF::StackTrace::captureStackTrace, which
> is fatal and does not return, so the rest is all pointless.

Oh, it looks like this is not expected, but rather a bug in StackTrace::captureStackTrace:

    WTFGetBacktrace(&trace->m_skippedFrame0, &numberOfFrames);
    if (numberOfFrames) {
        RELEASE_ASSERT(numberOfFrames >= framesToSkip);

That calls backtrace() from execinfo.h, see the manpage backtrace(3). I wonder if something goes wrong there only on aarch64.
Comment 4 Carlos Garcia Campos 2022-09-21 02:40:10 PDT
I think we should never try a preconnect when ENABLE_SERVER_PRECONNECT is disabled.
Comment 5 Carlos Garcia Campos 2022-09-21 02:52:36 PDT
Created attachment 462492 [details]
Patch

Could someone try this patch?
Comment 6 Sebastian Krzyszkowiak 2022-09-22 19:28:35 PDT
(In reply to Carlos Garcia Campos from comment #5)
> Created attachment 462492 [details]
> Patch
> 
> Could someone try this patch?

I have just tried it on top of 2.36.7 and it doesn't help - the network process still crashes in the same way.
Comment 7 Carlos Garcia Campos 2022-09-23 00:43:10 PDT
Then I need to know where preconnectTo is called, and unfortunately that's not in the backtraces.
Comment 8 Zan Dobersek 2022-09-23 01:09:49 PDT
It's the Messages::NetworkConnectionToWebProcess::PreconnectTo message in WebLoaderStrategy::preconnectTo(). But the point of crash is the release assert in stack trace capturing, assuming some amount of frames that the libc's backtrace on this specific platform/configuration can't extract.
Comment 9 Michael Catanzaro 2022-09-23 07:59:44 PDT
Let's split into two bugs:

 * Created bug #245576 for the problem with preconnect that causes the internal error to be logged and stacktrace to print
 * Retitled this bug to focus it on not crashing when printing the stacktrace
Comment 10 Alberto Garcia 2022-09-26 00:45:50 PDT
(In reply to Carlos Garcia Campos from comment #5)
> Could someone try this patch?

It doesn't solve the problem with 2.38.0 either.
Comment 11 Alberto Garcia 2022-09-29 09:42:51 PDT
According to Sebastian this patch fixes the crash: https://github.com/WebKit/WebKit/pull/4790
Comment 12 Alberto Garcia 2022-09-30 06:32:06 PDT
More news: there might be a compiler problem here, because all these crashes are happening when WebKit is compiled with clang. With gcc it seems stable (or more stable at least).

I'm talking about gcc 10.2.1 and clang 11.0
Comment 13 Zan Dobersek 2022-10-03 04:34:50 PDT
The RELEASE_ASSERT triggering these crashes is in the process of being removed:
https://bugs.webkit.org/show_bug.cgi?id=245826
https://github.com/WebKit/WebKit/pull/4830
Comment 14 Michael Catanzaro 2022-10-05 05:22:10 PDT
(In reply to Zan Dobersek from comment #13)
> The RELEASE_ASSERT triggering these crashes is in the process of being
> removed:
> https://bugs.webkit.org/show_bug.cgi?id=245826
> https://github.com/WebKit/WebKit/pull/4830

This change has landed.