WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
199572
REGRESSION(
r246671
): [WPE][GTK] Crash in NetworkProcess since the DNS cache landed
https://bugs.webkit.org/show_bug.cgi?id=199572
Summary
REGRESSION(r246671): [WPE][GTK] Crash in NetworkProcess since the DNS cache l...
Claudio Saavedra
Reported
2019-07-08 05:02:55 PDT
I've noticed that several tests have started crashing the network process sporadically and all they have in common is that the crashing starts right after the DNS cache was landed. Opening this bug to track this potential regression. Some stacktraces to exemplify this:
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r247200%20(10899)/imported/w3c/web-platform-tests/html/dom/interfaces.worker-crash-log.txt
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r247200%20(10899)/imported/w3c/web-platform-tests/cors/redirect-preflight-2-crash-log.txt
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r247200%20(10899)/imported/w3c/web-platform-tests/websockets/interfaces/WebSocket/readyState/005-crash-log.txt
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r247200%20(10899)/imported/w3c/web-platform-tests/websockets/interfaces/WebSocket/url/006-crash-log.txt
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r247200%20(10899)/imported/w3c/web-platform-tests/websockets/keeping-connection-open/001-crash-log.txt
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r247200%20(10899)/imported/w3c/web-platform-tests/websockets/opening-handshake/005-crash-log.txt
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r247204%20(10901)/imported/w3c/web-platform-tests/content-security-policy/connect-src/connect-src-websocket-blocked.sub-crash-log.txt
The stacktraces don't tell much at first sight, but since it seems to be happening deep in glib, there might be some memory corruption somewhere, not sure if related to thread-safety issues in the dns cache resolver (or libsoup even?).
Attachments
Patch
(3.81 KB, patch)
2019-07-08 08:01 PDT
,
Claudio Saavedra
no flags
Details
Formatted Diff
Diff
Patch
(2.24 KB, patch)
2019-07-08 09:00 PDT
,
Claudio Saavedra
no flags
Details
Formatted Diff
Diff
Patch for landing
(2.25 KB, patch)
2019-07-08 09:11 PDT
,
Claudio Saavedra
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Claudio Saavedra
Comment 1
2019-07-08 05:06:16 PDT
The historical of the crashes:
https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=imported%2Fw3c%2Fweb-platform-tests%2Fcontent-security-policy%2Fconnect-src%2Fconnect-src-websocket-blocked.sub.html%20imported%2Fw3c%2Fweb-platform-tests%2Fwebsockets%2Fopening-handshake%20imported%2Fw3c%2Fweb-platform-tests%2Fwebsockets%2Fkeeping-connection-open%20imported%2Fw3c%2Fweb-platform-tests%2Fwebsockets%2Finterfaces%2FWebSocket%2Furl%2F006%20imported%2Fw3c%2Fweb-platform-tests%2Fwebsockets%2Finterfaces%2FWebSocket%2FreadyState%2F005
Michael Catanzaro
Comment 2
2019-07-08 06:27:51 PDT
At least most of these (anything involving magazine_chain_pop_head) are definitely memory corruption. Probably the rest are too. The backtraces are only going to lead you on a false chase. What we need are backtraces from asan or valgrind to point to the source of the memory corruption, not these gdb traces that only point to where it eventually blew up.
Michael Catanzaro
Comment 3
2019-07-08 06:30:06 PDT
BTW if we don't solve it soon, then we should roll it out. The DNS cache is only a small thing, not worth a serious regression.
Claudio Saavedra
Comment 4
2019-07-08 07:55:53 PDT
==27300== Thread 1: ==27300== Invalid read of size 8 ==27300== at 0xE4B5029: g_error_free (gerror.c:493) ==27300== by 0x6428719: webkitCachedResolverLookupByNameAsync(_GResolver*, char const*, _GCancellable*, void (*)(_GObject*, _GAsyncResult*, void*), void*)::{lambda(_GObject*, _GAsyncResult*, void*)#1}::_FUN(_ GObject*, _GAsyncResult*, void*) (in /home/claudio/git/gnome/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37.38.1) ==27300== by 0xED40AF8: g_task_return_now (gtask.c:1148) ==27300== by 0xED40B38: complete_in_idle_cb (gtask.c:1162) ==27300== by 0xE4C99B7: g_main_dispatch (gmain.c:3182) ==27300== by 0xE4C99B7: g_main_context_dispatch (gmain.c:3847) ==27300== by 0xE4C9D77: g_main_context_iterate.isra.26 (gmain.c:3920) ==27300== by 0xE4CA061: g_main_loop_run (gmain.c:4116) ==27300== by 0xAAEEA2F: WTF::RunLoop::run() (in /home/claudio/git/gnome/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18.14.1) ==27300== by 0x6430329: int WebKit::AuxiliaryProcessMain<WebKit::NetworkProcess, WebKit::NetworkProcessMain>(int, char**) (in /home/claudio/git/gnome/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37.38. 1) ==27300== by 0xFC7B09A: (below main) (libc-start.c:308) ==27300== Address 0x5c749bd8 is 8 bytes inside a block of size 16 free'd ==27300== at 0x48369AB: free (vg_replace_malloc.c:530) ==27300== by 0xE205350: lookup_resolved (soup-address.c:762) ==27300== by 0xED40AF8: g_task_return_now (gtask.c:1148) ==27300== by 0xED41575: g_task_return (gtask.c:1206) ==27300== by 0x6428744: webkitCachedResolverLookupByNameAsync(_GResolver*, char const*, _GCancellable*, void (*)(_GObject*, _GAsyncResult*, void*), void*)::{lambda(_GObject*, _GAsyncResult*, void*)#1}::_FUN(_ GObject*, _GAsyncResult*, void*) (in /home/claudio/git/gnome/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37.38.1) ==27300== by 0xED40AF8: g_task_return_now (gtask.c:1148) ==27300== by 0xED40B38: complete_in_idle_cb (gtask.c:1162) ==27300== by 0xE4C99B7: g_main_dispatch (gmain.c:3182) ==27300== by 0xE4C99B7: g_main_context_dispatch (gmain.c:3847) ==27300== by 0xE4C9D77: g_main_context_iterate.isra.26 (gmain.c:3920) ==27300== by 0xE4CA061: g_main_loop_run (gmain.c:4116) ==27300== by 0xAAEEA2F: WTF::RunLoop::run() (in /home/claudio/git/gnome/WebKit/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18.14.1) ==27300== by 0x6430329: int WebKit::AuxiliaryProcessMain<WebKit::NetworkProcess, WebKit::NetworkProcessMain>(int, char**) (in /home/claudio/git/gnome/WebKit/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37.38. 1) ==27300== by 0xFC7B09A: (below main) (libc-start.c:308) ==27300== Block was alloc'd at ==27300== at 0x483577F: malloc (vg_replace_malloc.c:299) ==27300== by 0xE4CF2E0: g_malloc (gmem.c:99) ==27300== by 0xE4E6782: g_slice_alloc (gslice.c:1024) ==27300== by 0xE4B4E2C: g_error_new_valist (gerror.c:410) ==27300== by 0xED42077: g_task_return_new_error (gtask.c:1811) ==27300== by 0xED44C75: do_lookup_by_name (gthreadedresolver.c:123) ==27300== by 0xED416A2: g_task_thread_pool_thread (gtask.c:1331) ==27300== by 0xE4F19B2: g_thread_pool_thread_proxy (gthreadpool.c:307) ==27300== by 0xE4F1054: g_thread_proxy (gthread.c:784) ==27300== by 0xDD5BFA2: start_thread (pthread_create.c:486) ==27300== by 0xFD504CE: clone (clone.S:95)
Claudio Saavedra
Comment 5
2019-07-08 08:01:12 PDT
Created
attachment 373625
[details]
Patch
Michael Catanzaro
Comment 6
2019-07-08 08:22:57 PDT
Comment on
attachment 373625
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=373625&action=review
Score +1 for valgrind
> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:111 > - g_task_return_error(task.get(), error.get()); > + g_task_return_error(task.get(), error);
My preference is to keep using GUniqueOutPtr -- so that we never hold ownership in a raw pointer -- and use error.release() here instead. But this is fine too.
> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:173 > - g_task_return_error(task.get(), error.get()); > + g_task_return_error(task.get(), error);
Ditto.
Carlos Garcia Campos
Comment 7
2019-07-08 08:36:13 PDT
Comment on
attachment 373625
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=373625&action=review
>> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:111 >> + g_task_return_error(task.get(), error); > > My preference is to keep using GUniqueOutPtr -- so that we never hold ownership in a raw pointer -- and use error.release() here instead. But this is fine too.
Good catch, thanks Claudio! I would also keep the smart pointer leaking it only when transferred.
> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:166 > + GError* error;
This is missing the nullptr initialization.
Michael Catanzaro
Comment 8
2019-07-08 08:59:39 PDT
Comment on
attachment 373625
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=373625&action=review
>>> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:111 >>> + g_task_return_error(task.get(), error); >> >> My preference is to keep using GUniqueOutPtr -- so that we never hold ownership in a raw pointer -- and use error.release() here instead. But this is fine too. > > Good catch, thanks Claudio! I would also keep the smart pointer leaking it only when transferred.
Claudio noticed that GUniqueOutPtr::release returns a GUniquePtr, so it would have to be error.release().release(), which is silly. So I think the smart pointer is undesirable here. I wonder if we have much code that really benefits from that behavior, or if we could change GUniqueOutPtr::release to return the raw pointer directly, as I would have expected.
>> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:166 >> + GError* error; > > This is missing the nullptr initialization.
Oh no, very good catch.
Claudio Saavedra
Comment 9
2019-07-08 09:00:28 PDT
Created
attachment 373629
[details]
Patch
Michael Catanzaro
Comment 10
2019-07-08 09:07:57 PDT
Comment on
attachment 373629
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=373629&action=review
cq+ is going to fail because you don't have the Reviewed by
> Source/WebKit/ChangeLog:10 > + but passed onto the caller. > + * NetworkProcess/glib/WebKitCachedResolver.cpp:
Missing blank line here.
> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:111 > - g_task_return_error(task.get(), error.get()); > + g_task_return_error(task.get(), error.release().release());
Carlos, what do you think?
Claudio Saavedra
Comment 11
2019-07-08 09:11:35 PDT
Created
attachment 373631
[details]
Patch for landing
Carlos Garcia Campos
Comment 12
2019-07-08 09:12:18 PDT
(In reply to Michael Catanzaro from
comment #8
)
> Comment on
attachment 373625
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=373625&action=review
> > >>> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:111 > >>> + g_task_return_error(task.get(), error); > >> > >> My preference is to keep using GUniqueOutPtr -- so that we never hold ownership in a raw pointer -- and use error.release() here instead. But this is fine too. > > > > Good catch, thanks Claudio! I would also keep the smart pointer leaking it only when transferred. > > Claudio noticed that GUniqueOutPtr::release returns a GUniquePtr, so it > would have to be error.release().release(), which is silly. So I think the > smart pointer is undesirable here. > > I wonder if we have much code that really benefits from that behavior, or if > we could change GUniqueOutPtr::release to return the raw pointer directly, > as I would have expected.
I think the idea here was to convert an OutPtr (that exists only because outPtr is not possible with std::unique_ptr) into a Ptr which is actually a std::unique_ptr. I don't know if we are actually using it that way. We could probably add leakPtr() for the release().release() case.
> >> Source/WebKit/NetworkProcess/glib/WebKitCachedResolver.cpp:166 > >> + GError* error; > > > > This is missing the nullptr initialization. > > Oh no, very good catch.
Claudio Saavedra
Comment 13
2019-07-08 09:40:09 PDT
Committed
r247209
: <
https://trac.webkit.org/changeset/247209
>
Michael Catanzaro
Comment 14
2019-07-08 11:06:24 PDT
(In reply to Carlos Garcia Campos from
comment #12
)
> I think the idea here was to convert an OutPtr (that exists only because > outPtr is not possible with std::unique_ptr) into a Ptr which is actually a > std::unique_ptr. I don't know if we are actually using it that way. We could > probably add leakPtr() for the release().release() case.
There were only three places using release before now. Two of them also want the raw pointer (they're doing release().release()) and the other is test code. Making a GUniquePtr from the raw pointer in that case is trivial: diff --git a/Tools/TestWebKitAPI/Tests/WTF/glib/GUniquePtr.cpp b/Tools/TestWebKitAPI/Tests/WTF/glib/GUniquePtr.cpp index 11f02c7b0f7..288591c8e3f 100644 --- a/Tools/TestWebKitAPI/Tests/WTF/glib/GUniquePtr.cpp +++ b/Tools/TestWebKitAPI/Tests/WTF/glib/GUniquePtr.cpp @@ -184,7 +184,7 @@ TEST(WTF_GUniquePtr, OutPtr) { GUniqueOutPtr<char> a; returnOutChar(&a.outPtr()); - GUniquePtr<char> b = a.release(); + GUniquePtr<char> b(a.release()); ASSERT_STREQ(actual.str().c_str(), takeLogStr().c_str()); actual << "g_free(" << b.get() << ");"; } So my vote is to just change release().
Michael Catanzaro
Comment 15
2019-07-08 11:06:35 PDT
Bug #199579
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug