WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
176490
[GTK] Crash in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland
https://bugs.webkit.org/show_bug.cgi?id=176490
Summary
[GTK] Crash in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland
Michael Catanzaro
Reported
2017-09-06 18:31:51 PDT
This is an Epiphany search provider crash: #0 0x00007f22ec477af7 in wl_list_insert_list (list=0x2524078, other=0x2524068) at /home/mcatanzaro/Projects/GNOME/wayland/src/wayland-util.c:89 No locals. #1 0x00007f22e6e81389 in wl_priv_signal_emit (signal=0x2524068, data=0x2524000) at /home/mcatanzaro/Projects/GNOME/wayland/src/wayland-server.c:1955 l = 0x2524000 pos = 0x7f230640dfb0 <initial+944> #2 0x00007f22e6e80215 in wl_display_destroy (display=0x2524000) at /home/mcatanzaro/Projects/GNOME/wayland/src/wayland-server.c:1077 s = 0x7f2307aa8000 next = 0x0 global = 0x7f22e0ff7000 gnext = 0x252c620 #3 0x00007f22fe89cf3c in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland ( this=0x7f22e0ff7000, __in_chrg=<optimized out>) at ../../Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:71 No locals. #4 0x00007f22fe89cf84 in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland ( this=0x7f22e0ff7000, __in_chrg=<optimized out>) at ../../Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:72 No locals. #5 0x00007f22feec6bf8 in std::default_delete<WebCore::PlatformDisplay>::operator() ( this=0x7f2304293908 <WebCore::PlatformDisplay::sharedDisplay()::display>, __ptr=0x7f22e0ff7000) at /usr/include/c++/7/bits/unique_ptr.h:78 No locals. #6 0x00007f22feec6869 in std::unique_ptr<WebCore::PlatformDisplay, std::default_delete<WebCore::PlatformDisplay> >::~unique_ptr (this=0x7f2304293908 <WebCore::PlatformDisplay::sharedDisplay()::display>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:268 __ptr = @0x7f2304293908: 0x7f22e0ff7000 #7 0x00007f230607bc68 in __run_exit_handlers () from /lib64/libc.so.6 No symbol table info available. #8 0x00007f230607bcba in exit () from /lib64/libc.so.6 No symbol table info available. #9 0x00007f2306061511 in __libc_start_main () from /lib64/libc.so.6 No symbol table info available. #10 0x00000000004041ca in _start () No symbol table info available. I'll continue to vote in favor of getting rid of these buggy exit handlers altogether. I doubt we really need to be calling wl_display_destroy().
Attachments
Patch
(4.61 KB, patch)
2017-10-26 04:32 PDT
,
Michael Catanzaro
no flags
Details
Formatted Diff
Diff
Patch
(1.85 KB, patch)
2018-03-10 18:03 PST
,
Michael Catanzaro
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Michael Catanzaro
Comment 1
2017-09-11 17:16:30 PDT
Unless graphics folks have any ideas, we need to remove the PlatformDisplay exit handlers.
Michael Catanzaro
Comment 2
2017-10-25 20:31:42 PDT
(In reply to Michael Catanzaro from
comment #1
)
> Unless graphics folks have any ideas, we need to remove the PlatformDisplay > exit handlers.
The exit handlers are not needed for program correctness as far as I'm aware, and they've been causing many, many crashes over the years. So I think it's time for them to go. At least this particular one.
Michael Catanzaro
Comment 3
2017-10-26 04:32:54 PDT
Created
attachment 324987
[details]
Patch
Carlos Garcia Campos
Comment 4
2017-10-26 04:57:11 PDT
I think we need to understand why it's crashing and only if it's not possible to fix it we can remove the exit handlers. I don't remember why, but I think we shut down the egl displays for a reason. IIRC mesa has its own exit handlers to clean up stuff too.
Michael Catanzaro
Comment 5
2017-10-26 17:14:07 PDT
(In reply to Carlos Garcia Campos from
comment #4
)
> I think we need to understand why it's crashing and only if it's not > possible to fix it we can remove the exit handlers. I don't remember why, > but I think we shut down the egl displays for a reason.
If nobody knows why, we should stop doing it and see if anyone notices a problem.
Christian Stadelmann
Comment 6
2017-10-30 05:52:32 PDT
I am seeing this crash with liferea when running liferea headless to get GIR symbols, see
https://github.com/lwindolf/liferea/issues/552
for details.
Michael Catanzaro
Comment 7
2018-02-23 11:10:37 PST
Comment on
attachment 324987
[details]
Patch Carlos, I don't understand why it's crashing, and the graphics people don't seem to have been able to debug it. Strongly suggest pushing this in now. If the exit handlers are needed to work around OS or driver bugs, then they should be reintroduced once they can be done without crashing.
Michael Catanzaro
Comment 8
2018-03-05 05:15:59 PST
Crash stats:
https://retrace.fedoraproject.org/faf/problems/bthash/?bth=7712e893327fdae3c939ee81aa54d082ebbdc717
https://retrace.fedoraproject.org/faf/problems/bthash/?bth=8adf67e9e11bb1765726e37b1f3d7028ca66a8db
Carlos Alberto Lopez Perez
Comment 9
2018-03-05 05:29:00 PST
(In reply to Carlos Garcia Campos from
comment #4
)
> I think we need to understand why it's crashing and only if it's not > possible to fix it we can remove the exit handlers. I don't remember why, > but I think we shut down the egl displays for a reason. IIRC mesa has its > own exit handlers to clean up stuff too.
Does
http://trac.webkit.org/r201595
help remembering?
Michael Catanzaro
Comment 10
2018-03-05 05:39:39 PST
(In reply to Carlos Alberto Lopez Perez from
comment #9
)
> Does
http://trac.webkit.org/r201595
help remembering?
Note: it was an exit-time destructor before that patch, too (since the PlatformDisplay was not NeverDestroyed). The goal there was to move it to run earlier at exit time, to avoid a specific crash. But as long as we do work at exit time, there will always be more crashes....
Michael Catanzaro
Comment 11
2018-03-10 09:51:53 PST
(In reply to Michael Catanzaro from
comment #8
)
> Crash stats: > >
https://retrace.fedoraproject.org/faf/problems/bthash/
> ?bth=7712e893327fdae3c939ee81aa54d082ebbdc717 > >
https://retrace.fedoraproject.org/faf/problems/bthash/
> ?bth=8adf67e9e11bb1765726e37b1f3d7028ca66a8db
I really underestimated the total number of crashes here. Those links are currently showing 704 total crashes. I missed this one:
https://retrace.fedoraproject.org/faf/reports/1793321/
which shows 2024 reports. That's 2728 total reports of this crash from Fedora alone.
Michael Catanzaro
Comment 12
2018-03-10 10:16:24 PST
(In reply to Michael Catanzaro from
comment #11
)
> which shows 2024 reports. That's 2728 total reports of this crash from > Fedora alone.
I should add: only 688 unique reports (reports from users who haven't reported the crash before). So about four reports per unique user.
Michael Catanzaro
Comment 13
2018-03-10 17:24:35 PST
ephy-search-provider crashes 100% for me... I didn't notice for so long because the search provider was broken for most of the 3.26 cycle, and the update fixing it just hit Fedora 27. Some debugging: Since this is the search provider, we wind up taking this path in PlatformDisplay::createPlatformDisplay: #if PLATFORM(WAYLAND) if (auto platformDisplay = PlatformDisplayWayland::create()) return platformDisplay; #endif Notably, that's the only path where m_nativeDisplayOwned is true, so it's the only path where the native display is destroyed. That's why the crash only occurs for the search provider. So we destroy it here: PlatformDisplayWayland::~PlatformDisplayWayland() { if (m_nativeDisplayOwned == NativeDisplayOwned::Yes) wl_display_destroy(m_display); } That's definitely wrong, because this wl_display was created with wl_display_connect(), which is Wayland client API, but wl_display_destroy() is Wayland server API. So we must never call wl_display_destroy(). The right function to use is wl_display_disconnect(), which parallels the call to wl_display_connect() that we used to create the wl_display in PlatformDisplayWayland::create. It's confusing, though, because the UI process is both a Wayland client of the desktop compositor, and also and a Wayland server for the web process. Now, with that fixed locally, there's no longer any crash inside wl_display_destroy(), and wl_display_disconnect() works fine. But I'm seeing a similar crash immediately after that, when the wl_compositor is destroyed. I suspect this indicates the root of the problem: we're trying to use the same wl_display as both a Wayland client and Wayland server, and I guess that isn't right. It doesn't happen when there's a GtkWindow because then GDK's wl_display is used as the client display instead. Thread 1 "epiphany-search" received signal SIGSEGV, Segmentation fault. 0x00007fffed7fc919 in wl_map_insert_at (map=<optimized out>, flags=<optimized out>, i=<optimized out>, data=<optimized out>) at src/wayland-util.c:247 247 src/wayland-util.c: No such file or directory. (gdb) bt #0 0x00007fffed7fc919 in wl_map_insert_at (map=<optimized out>, flags=<optimized out>, i=<optimized out>, data=<optimized out>) at src/wayland-util.c:247 #1 0x00007fffed7f8ad3 in proxy_destroy (proxy=0x5555557a2d60) at src/wayland-client.c:506 #2 0x00007fffed7f8ad3 in wl_proxy_destroy (proxy=0x5555557a2d60) at src/wayland-client.c:537 #3 0x00007ffff3c0c536 in wl_compositor_destroy (wl_compositor=<optimized out>) at /usr/include/wayland-client-protocol.h:1194 #4 0x00007ffff3c0c536 in WebCore::WlPtrDeleter<wl_compositor>::operator()(wl_compositor*) const (this=<optimized out>, ptr=<optimized out>) at /buildstream/build/Source/WebCore/platform/graphics/wayland/WlUniquePtr.h:55 #5 0x00007ffff3c0c536 in std::unique_ptr<wl_compositor, WebCore::WlPtrDeleter<wl_compositor> >::~unique_ptr() (this=0x7fffdf9f5040, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:268 #6 0x00007ffff3c0c536 in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland() (this=0x7fffdf9f5000, __in_chrg=<optimized out>) at /buildstream/build/Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:68 #7 0x00007ffff3c0c569 in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland() (this=0x7fffdf9f5000, __in_chrg=<optimized out>) at /buildstream/build/Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:72 #8 0x00007ffff6056248 in __run_exit_handlers (status=1, listp=0x7ffff63cf6f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:83 #9 0x00007ffff605629a in __GI_exit (status=<optimized out>) at exit.c:105 #10 0x00007ffff603ff31 in __libc_start_main (main= 0x55555555d974 <main>, argc=1, argv=0x7fffffffea58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffea48) at ../csu/libc-start.c:344 #11 0x00005555555599fa in _start ()
Michael Catanzaro
Comment 14
2018-03-10 17:37:19 PST
(In reply to Michael Catanzaro from
comment #13
)
> we're trying to use the > same wl_display as both a Wayland client and Wayland server, and I guess > that isn't right.
That's not true, I got confused by wl_compositor. That's client API too. Perhaps it should be destroyed before its wl_display.
Michael Catanzaro
Comment 15
2018-03-10 18:03:31 PST
Created
attachment 335523
[details]
Patch
Michael Catanzaro
Comment 16
2018-03-10 18:05:33 PST
Comment on
attachment 335523
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=335523&action=review
> Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:73 > + wl_display_disconnect(m_display);
Note: this does, in fact, destroy the wl_display. WlUniquePtr<wl_display> is a bit of a footgun, as it's not safe to use on any wl_display created with wl_display_connect().
Michael Catanzaro
Comment 17
2018-03-11 16:35:41 PDT
Comment on
attachment 335523
[details]
Patch (cq? for cgarcia)
Carlos Garcia Campos
Comment 18
2018-03-12 01:49:04 PDT
(In reply to Michael Catanzaro from
comment #16
)
> Comment on
attachment 335523
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=335523&action=review
> > > Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:73 > > + wl_display_disconnect(m_display); > > Note: this does, in fact, destroy the wl_display. > > WlUniquePtr<wl_display> is a bit of a footgun, as it's not safe to use on > any wl_display created with wl_display_connect().
Good catch! this is exactly what I wanted, once we understand the problem we can easily fix the actual bug. WlUniquePtr<wl_display> is only used once in Waylandcompositor, which is a singleton, so I think we can simply remove it to avoid confusions and use destroy/disconnect directly.
Carlos Garcia Campos
Comment 19
2018-03-12 01:49:34 PDT
Comment on
attachment 335523
[details]
Patch Thank you!
WebKit Commit Bot
Comment 20
2018-03-12 02:14:23 PDT
Comment on
attachment 335523
[details]
Patch Clearing flags on attachment: 335523 Committed
r229525
: <
https://trac.webkit.org/changeset/229525
>
WebKit Commit Bot
Comment 21
2018-03-12 02:14:25 PDT
All reviewed patches have been landed. Closing bug.
Michael Catanzaro
Comment 22
2018-03-17 16:25:09 PDT
One week later, we're now up to 3958 reports. I think this might be our biggest WebKit crasher ever....
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug