WPENetworkProcess segfaults occasionaly after logging assertion errors
https://bugs.webkit.org/show_bug.cgi?id=276819
Summary WPENetworkProcess segfaults occasionaly after logging assertion errors
Nils K
Reported 2024-07-19 09:46:51 PDT
I have a few machines which run a digital signage/kiosk application constantly and have seen several of these coredumps in the logs by now. They show a fixed set of content and there does not seem to be any pattern when the crashes occur. However, each log is proceeded by the same set of log messages previous to the segv which can be seen below. Versions: cog 0.18.3 WPE WebKit 2.44.1. libsoup3 3.4.4 (cog and webkit are from https://copr.fedorainfracloud.org/coprs/philn/wpewebkit/packages/, all other packages are from Fedora 40 repo) System log: Jul 18 14:56:59 Digital-Signage-Player WPENetworkProce[754]: GTask soup_session_send_async (source object: 0x55668269e060, source tag: 0x7f119bc44c50) finalized without ever returning (using g_task_return_*()). This potentially indicates a bug in the program. Jul 18 14:56:59 Digital-Signage-Player WPENetworkProce[754]: g_atomic_ref_count_dec: assertion 'old_value > 0' failed Jul 18 14:56:59 Digital-Signage-Player WPENetworkProce[754]: async_send_request_return_result: assertion 'item->task != NULL' failed Jul 18 14:57:00 Digital-Signage-Player WPENetworkProce[754]: g_object_ref: assertion 'G_IS_OBJECT (object)' failed Jul 18 14:57:00 Digital-Signage-Player WPENetworkProce[754]: soup_session_feature_request_queued: assertion 'SOUP_IS_SESSION_FEATURE (feature)' failed Jul 18 14:57:00 Digital-Signage-Player WPENetworkProce[754]: g_object_ref: assertion 'G_IS_OBJECT (object)' failed Jul 18 14:57:00 Digital-Signage-Player WPENetworkProce[754]: soup_session_feature_request_queued: assertion 'SOUP_IS_SESSION_FEATURE (feature)' failed Jul 18 14:57:00 Digital-Signage-Player kernel: WPENetworkProce[754]: segfault at 5563d41aa770 ip 00007f119a93e9b1 sp 00007ffdd56ea158 error 4 in libgobject-2.0.so.0.8000.3[7f119a910000+38000] likely on CPU 0 (core 0, socket 0) Jul 18 14:57:00 Digital-Signage-Player kernel: Code: 00 00 00 4c 89 ce e9 4e e6 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 85 ff 74 47 48 8b 07 48 85 c0 74 3f <48> 8b 00 48 3d fc 03 00 00 77 2c 48 8d 15 5d 47 02 00 48 c1 e8 02 Jul 18 14:57:00 Digital-Signage-Player systemd[1]: Created slice system-systemd\x2dcoredump.slice - Slice /system/systemd-coredump. Jul 18 14:57:00 Digital-Signage-Player systemd[1]: Started systemd-coredump@0-22650-0.service - Process Core Dump (PID 22650/UID 0). Jul 18 14:57:03 Digital-Signage-Player systemd-coredump[22651]: [🡕] Process 754 (WPENetworkProce) of user 999 dumped core. Backtrace: #0 g_type_check_instance_is_fundamentally_a (type_instance=0x5566826a0110, fundamental_type=80) at ../gobject/gtype.c:4153 #1 0x00007f119a922e27 in g_object_ref (_object=_object@entry=0x5566826a0110) at ../gobject/gobject.c:4238 #2 0x00007f119bc3e0ea in soup_session_append_queue_item (session=0x55668269e060, msg=0x5566835975c0, async=<optimized out>, cancellable=<optimized out>) at ../libsoup/soup-session.c:1361 #3 0x00007f119bc44d03 in soup_session_send_async (user_data=0x7f118c090200, callback=0x7f119c5bf9c0 <WebKit::NetworkDataTaskSoup::sendRequestCallback(_SoupSession*, _GAsyncResult*, WebKit::NetworkDataTaskSoup::SendRequestData*)>, cancellable=0x5566835868d0, io_priority=10, msg=0x5566835975c0, session=0x55668269e060) at ../libsoup/soup-session.c:3079 #4 soup_session_send_async (session=0x55668269e060, msg=0x5566835975c0, io_priority=10, cancellable=0x5566835868d0, callback=0x7f119c5bf9c0 <WebKit::NetworkDataTaskSoup::sendRequestCallback(_SoupSession*, _GAsyncResult*, WebKit::NetworkDataTaskSoup::SendRequestData*)>, user_data=0x7f118c090200) at ../libsoup/soup-session.c:3065 #5 0x00007f119c5c3a8b in WebKit::NetworkDataTaskSoup::resume() [clone .part.0] () from /lib64/libWPEWebKit-2.0.so.1 #6 0x00007f119c4f4145 in WebKit::NetworkResourceLoader::startNetworkLoad(WebCore::ResourceRequest&&, WebKit::NetworkResourceLoader::FirstLoad) () from /lib64/libWPEWebKit-2.0.so.1 #7 0x00007f119c5aa393 in WebKit::NetworkCache::Cache::completeRetrieve(WTF::Function<void (std::unique_ptr<WebKit::NetworkCache::Entry, std::default_delete<WebKit::NetworkCache::Entry> >, WebKit::NetworkCache::Cache::RetrieveInfo const&)>&&, std::unique_ptr<WebKit::NetworkCache::Entry, std::default_delete<WebKit::NetworkCache::Entry> >, WebKit::NetworkCache::Cache::RetrieveInfo&) () from /lib64/libWPEWebKit-2.0.so.1 #8 0x00007f119c5b331d in auto WebKit::NetworkCache::Cache::retrieve(WebCore::ResourceRequest const&, WebKit::NetworkCache::GlobalFrameID const&, std::optional<WebKit::NavigatingToAppBoundDomain>, bool, WTF::OptionSet<WebCore::AdvancedPrivacyProtections>, WTF::Function<void (std::unique_ptr<WebKit::NetworkCache::Entry, std::default_delete<WebKit::NetworkCache::Entry> >, WebKit::NetworkCache::Cache::RetrieveInfo const&)>&&)::{lambda(auto:1, auto:2)#1}::operator()<std::unique_ptr<WebKit::NetworkCache::Storage::Record, std::default_delete<WebKit::NetworkCache::Storage::Record> >, WebKit::NetworkCache::Storage::Timings>(std::unique_ptr<WebKit::NetworkCache::Storage::Record, std::default_delete<WebKit::NetworkCache::Storage::Record> >, WebKit::NetworkCache::Storage::Timings) () from /lib64/libWPEWebKit-2.0.so.1 #9 0x00007f119c5b400c in WTF::CompletionHandler<bool (std::unique_ptr<WebKit::NetworkCache::Storage::Record, std::default_delete<WebKit::NetworkCache::Storage::Record> >, WebKit::NetworkCache::Storage::Timings const&)>::operator()(std::unique_ptr<WebKit::NetworkCache::Storage::Record, std::default_delete<WebKit::NetworkCache::Storage::Record> >, WebKit::NetworkCache::Storage::Timings const&) () from /lib64/libWPEWebKit-2.0.so.1 #10 0x00007f119c5bdcba in WebKit::NetworkCache::Cache::retrieve(WebCore::ResourceRequest const&, WebKit::NetworkCache::GlobalFrameID const&, std::optional<WebKit::NavigatingToAppBoundDomain>, bool, WTF::OptionSet<WebCore::AdvancedPrivacyProtections>, WTF::Function<void (std::unique_ptr<WebKit::NetworkCache::Entry, std::default_delete<WebKit::NetworkCache::Entry> >, WebKit::NetworkCache::Cache::RetrieveInfo const&)>&&) () from /lib64/libWPEWebKit-2.0.so.1 #11 0x00007f119c4fdb80 in WebKit::NetworkResourceLoader::retrieveCacheEntry(WebCore::ResourceRequest const&) () from /lib64/libWPEWebKit-2.0.so.1 #12 0x00007f119c4e2bd8 in auto WebKit::NetworkLoadChecker::checkRequest(WebCore::ResourceRequest&&, WebCore::ContentSecurityPolicyClient*, WTF::CompletionHandler<void (std::variant<WebCore::ResourceRequest, WebKit::NetworkLoadChecker::RedirectionTriplet, WebCore::ResourceError>&&)>&&)::{lambda(auto:1&&)#1}::operator()<std::experimental::fundamentals_v3::expected<WebKit::NetworkLoadChecker::ContentExtensionResult, WebCore::ResourceError> >(std::experimental::fundamentals_v3::expected<WebKit::NetworkLoadChecker::ContentExtensionResult, WebCore::ResourceError>&&) () from /lib64/libWPEWebKit-2.0.so.1 #13 0x00007f119c4ebf0f in WebKit::NetworkLoadChecker::checkRequest(WebCore::ResourceRequest&&, WebCore::ContentSecurityPolicyClient*, WTF::CompletionHandler<void (std::variant<WebCore::ResourceRequest, WebKit::NetworkLoadChecker::RedirectionTriplet, WebCore::ResourceError>&&)>&&) () from /lib64/libWPEWebKit-2.0.so.1 #14 0x00007f119c4fe95e in WebKit::NetworkResourceLoader::startRequest(WebCore::ResourceRequest const&) () from /lib64/libWPEWebKit-2.0.so.1 #15 0x00007f119c4a3df9 in WebKit::NetworkConnectionToWebProcess::scheduleResourceLoad(WebKit::NetworkResourceLoadParameters&&, std::optional<WTF::ObjectIdentifierGeneric<WebKit::NetworkResourceLoadIdentifierType, WTF::ObjectIdentifierMainThreadAccessTraits> >) () from /lib64/libWPEWebKit-2.0.so.1 #16 0x00007f119f5284d2 in void IPC::handleMessage<Messages::NetworkConnectionToWebProcess::ScheduleResourceLoad, WebKit::NetworkConnectionToWebProcess, WebKit::NetworkConnectionToWebProcess, void (WebKit::NetworkResourceLoadParameters&&, std::optional<WTF::ObjectIdentifierGeneric<WebKit::NetworkResourceLoadIdentifierType, WTF::ObjectIdentifierMainThreadAccessTraits> >)>(IPC::Connection&, IPC::Decoder&, WebKit::NetworkConnectionToWebProcess*, void (WebKit::NetworkConnectionToWebProcess::*)(WebKit::NetworkResourceLoadParameters&&, std::optional<WTF::ObjectIdentifierGeneric<WebKit::NetworkResourceLoadIdentifierType, WTF::ObjectIdentifierMainThreadAccessTraits> >)) [clone .constprop.0] () from /lib64/libWPEWebKit-2.0.so.1 #17 0x00007f119c38d4ec in WebKit::NetworkConnectionToWebProcess::didReceiveNetworkConnectionToWebProcessMessage(IPC::Connection&, IPC::Decoder&) () from /lib64/libWPEWebKit-2.0.so.1 #18 0x00007f119c63611e in IPC::Connection::dispatchMessage(WTF::UniqueRef<IPC::Decoder>) [clone .part.0] () from /lib64/libWPEWebKit-2.0.so.1 #19 0x00007f119c63676d in WTF::Detail::CallableWrapper<IPC::Connection::enqueueIncomingMessage(WTF::UniqueRef<IPC::Decoder>)::{lambda()#2}, void>::call() () from /lib64/libWPEWebKit-2.0.so.1 #20 0x00007f119d6d7b7c in WTF::RunLoop::RunLoop()::{lambda(void*)#1}::_FUN(void*) [clone .lto_priv.0] () from /lib64/libWPEWebKit-2.0.so.1 #21 0x00007f119d71e53d in WTF::RunLoop::{lambda(_GSource*, int (*)(void*), void*)#1}::_FUN(_GSource*, int (*)(void*), void*) [clone .lto_priv.0] () from /lib64/libWPEWebKit-2.0.so.1 #22 0x00007f119bccce8c in g_main_dispatch (context=0x556682675c00) at ../glib/gmain.c:3344 #23 g_main_context_dispatch_unlocked (context=0x556682675c00) at ../glib/gmain.c:4152 #24 0x00007f119bd2ec98 in g_main_context_iterate_unlocked.isra.0 (context=0x556682675c00, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4217 #25 0x00007f119bcd2f37 in g_main_loop_run (loop=0x556682675d50) at ../glib/gmain.c:4419 #26 0x00007f119d6ec008 in WTF::RunLoop::run() () from /lib64/libWPEWebKit-2.0.so.1 #27 0x00007f119c585dbd in WebKit::NetworkProcessMain(int, char**) () from /lib64/libWPEWebKit-2.0.so.1 #28 0x00007f119c039088 in __libc_start_call_main (main=main@entry=0x55666e1e9070 <main>, argc=argc@entry=3, argv=argv@entry=0x7ffdd56eee98) at ../sysdeps/nptl/libc_start_call_main.h:58 #29 0x00007f119c03914b in __libc_start_main_impl (main=0x55666e1e9070 <main>, argc=3, argv=0x7ffdd56eee98, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdd56eee88) at ../csu/libc-start.c:360 #30 0x000055666e1e90a5 in _start ()
Attachments
Michael Catanzaro
Comment 1 2024-07-19 20:13:31 PDT
I would also try using G_DEBUG=fatal-criticals to get a backtrace to the first critical. It looks like a lot had already gone wrong before it crashed.
Nils K
Comment 2 2024-07-20 05:18:30 PDT
(In reply to Michael Catanzaro from comment #1) > I would also try using G_DEBUG=fatal-criticals to get a backtrace to the > first critical. It looks like a lot had already gone wrong before it crashed. I added the variable to my local device but as it only happens occasionally I fear it might take quite a while before I can provide a backtrace from that. If this has no other side-effects I will consider to also roll this out to our already deployed devices, however, I cannot provide an estimate when the next update will take place.
Claudio Saavedra
Comment 3 2024-07-22 02:31:44 PDT
(In reply to nilskemail+webkit from comment #2) > (In reply to Michael Catanzaro from comment #1) > > I would also try using G_DEBUG=fatal-criticals to get a backtrace to the > > first critical. It looks like a lot had already gone wrong before it crashed. > > I added the variable to my local device but as it only happens occasionally > I fear it might take quite a while before I can provide a backtrace from > that. > If this has no other side-effects I will consider to also roll this out to > our already deployed devices, however, I cannot provide an estimate when the > next update will take place. If you add that to deployed devices you risk having more crashes for unrelated reasons. Of course we would appreciate knowing if there are other things causing critical warnings but beware of the impact it might have for your deployments.
Nils K
Comment 4 2024-07-22 03:50:37 PDT
(In reply to Claudio Saavedra from comment #3) > If you add that to deployed devices you risk having more crashes for > unrelated reasons. Of course we would appreciate knowing if there are other > things causing critical warnings but beware of the impact it might have for > your deployments. Is there a common log entry for such cases? Searching for "CRITICAL" or "assert" in our logs only brought up log entries relating to the asserts from this bug or Gstreamer-CRITICAL from https://bugs.webkit.org/show_bug.cgi?id=260723. So at least for the current version of cog/webkit it seems fine to enable this (at least on a handful of devices)
Nils K
Comment 5 2024-09-15 08:46:58 PDT
I rolled out a new deployment with G_DEBUG=fatal-criticals yesterday and got a backtrace to the first critical: #0 _g_log_abort (breakpoint=<optimized out>) at ../glib/gmessages.c:426 #1 g_logv (log_domain=0x7f52f51135f4 "GLib-GIO", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7ffc54fb7ba0) at ../glib/gmessages.c:1273 #2 0x00007f52f68c3ea3 in g_log (log_domain=log_domain@entry=0x7f52f51135f4 "GLib-GIO", log_level=log_level@entry=G_LOG_LEVEL_CRITICAL, format=<optimized out>) at ../glib/gmessages.c:1315 #3 0x00007f52f505e413 in g_task_finalize (object=0x26f4a7f0) at ../gio/gtask.c:741 #4 0x00007f52f4f70b6a in g_object_unref (_object=0x26f4a7f0) at ../gobject/gobject.c:4484 #5 0x00007f52f68864cd in g_atomic_rc_box_release_full (mem_block=0x26fc04b0, clear_func=0x7f52f6829fa0 <soup_message_queue_item_destroy>) at ../glib/garcbox.c:230 #6 g_atomic_rc_box_release_full (mem_block=0x26fc04b0, clear_func=0x7f52f6829fa0 <soup_message_queue_item_destroy>) at ../glib/garcbox.c:213 #7 0x00007f52f6834c2d in run_until_read_done (msg=0x26eee4c0, result=<optimized out>, item=0x26fc04b0) at ../libsoup/soup-session.c:2820 #8 0x00007f52f505ac6c in g_task_return_now (task=0x261d78e0) at ../gio/gtask.c:1361 #9 0x00007f52f505aca5 in complete_in_idle_cb (task=task@entry=0x261d78e0) at ../gio/gtask.c:1375 #10 0x00007f52f68c469d in g_idle_dispatch (source=0x2614b3b0, callback=0x7f52f505ac90 <complete_in_idle_cb>, user_data=0x261d78e0) at ../glib/gmain.c:6150 #11 0x00007f52f68bde8c in g_main_dispatch (context=0x2609b9f0) at ../glib/gmain.c:3344 #12 g_main_context_dispatch_unlocked (context=0x2609b9f0) at ../glib/gmain.c:4152 #13 0x00007f52f691fc98 in g_main_context_iterate_unlocked.isra.0 (context=0x2609b9f0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4217 #14 0x00007f52f68c3f37 in g_main_loop_run (loop=0x2609bb40) at ../glib/gmain.c:4419 #15 0x00007f52f88a52d9 in WTF::RunLoop::run() () from /lib64/libWPEWebKit-2.0.so.1 #16 0x00007f52f6fec605 in WebKit::NetworkProcessMain(int, char**) () from /lib64/libWPEWebKit-2.0.so.1 #17 0x00007f52f6039088 in __libc_start_call_main (main=main@entry=0x2017a0 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffc54fb8118) at ../sysdeps/nptl/libc_start_call_main.h:58 #18 0x00007f52f603914b in __libc_start_main_impl (main=0x2017a0 <main>, argc=4, argv=0x7ffc54fb8118, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc54fb8108) at ../csu/libc-start.c:360 #19 0x00000000002016d5 in _start ()
Michael Catanzaro
Comment 6 2024-09-15 12:36:38 PDT
Clearly we have a refcounting error on that GTask causing it to be finalized before the task completes. I suspect a bug somewhere in libsoup. I doubt it's a WebKit-level bug, because WebKit doesn't have access to this GTask. If we're really unlucky it could be https://gitlab.gnome.org/GNOME/glib/-/issues/1346, since I see the GTask is owned by SoupMessageQueueItem and that gets passed across threads, but hopefully not. The rest of my comment is a tangent, not really directly relevant: I'm confused by the ownership semantics here. I wonder what's up with this code in soup_session_send_async(): item = soup_session_append_queue_item (session, msg, TRUE, cancellable); // ... item->task = g_task_new (session, item->cancellable, callback, user_data); // ... g_task_set_task_data (item->task, item, (GDestroyNotify) soup_message_queue_item_unref); Now there's a circular reference between the GTask and the SoupMessageQueueItem, each of which owns the other. That seems weird. It doesn't directly explain this bug, but maybe it's related. I see it usually should not be a problem in practice, because I see the cycle normally gets broken in async_send_request_return_result(), but maybe something is wrong somewhere else. For an example of why this pattern is fragile, look at async_send_request_return_result() in soup-session.c and observe that this function can no longer safely use the SoupMessageQueueItem after it unrefs the GTask, but this is very subtle and easy to miss. (The function looks completely fine currently, but it would be easy to introduce a bug by mistake in the future.) Unfortunately I fail to see where is the bug. (In reply to Claudio Saavedra from comment #3) > If you add that to deployed devices you risk having more crashes for > unrelated reasons. Of course we would appreciate knowing if there are other > things causing critical warnings but beware of the impact it might have for > your deployments. (P.S. Claudio's warning is correct and fair, but I recommend running with fatal criticals in production anyway, unless you just don't care about security or correctness. You really don't want to proceed past a use after free error, especially not in the unsandboxed network process, and both of your backtraces indicate this is happening.)
Claudio Saavedra
Comment 7 2024-09-15 23:39:50 PDT
I commented this here, not sure if it's related but... https://gitlab.gnome.org/GNOME/libsoup/-/merge_requests/387#note_1949816
Note You need to log in before you can comment on or make changes to this bug.