Summary: | [GTK] Crash in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Michael Catanzaro <mcatanzaro> | ||||||
Component: | WebKitGTK | Assignee: | Michael Catanzaro <mcatanzaro> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Normal | CC: | bugs-noreply, cgarcia, clopez, commit-queue, magomez, mcatanzaro, wk, zan | ||||||
Priority: | P2 | ||||||||
Version: | Other | ||||||||
Hardware: | PC | ||||||||
OS: | Linux | ||||||||
See Also: |
https://bugs.webkit.org/show_bug.cgi?id=157973 https://bugzilla.gnome.org/show_bug.cgi?id=794228 https://bugs.webkit.org/show_bug.cgi?id=184405 |
||||||||
Attachments: |
|
Description
Michael Catanzaro
2017-09-06 18:31:51 PDT
Unless graphics folks have any ideas, we need to remove the PlatformDisplay exit handlers. (In reply to Michael Catanzaro from comment #1) > Unless graphics folks have any ideas, we need to remove the PlatformDisplay > exit handlers. The exit handlers are not needed for program correctness as far as I'm aware, and they've been causing many, many crashes over the years. So I think it's time for them to go. At least this particular one. Created attachment 324987 [details]
Patch
I think we need to understand why it's crashing and only if it's not possible to fix it we can remove the exit handlers. I don't remember why, but I think we shut down the egl displays for a reason. IIRC mesa has its own exit handlers to clean up stuff too. (In reply to Carlos Garcia Campos from comment #4) > I think we need to understand why it's crashing and only if it's not > possible to fix it we can remove the exit handlers. I don't remember why, > but I think we shut down the egl displays for a reason. If nobody knows why, we should stop doing it and see if anyone notices a problem. I am seeing this crash with liferea when running liferea headless to get GIR symbols, see https://github.com/lwindolf/liferea/issues/552 for details. Comment on attachment 324987 [details]
Patch
Carlos, I don't understand why it's crashing, and the graphics people don't seem to have been able to debug it. Strongly suggest pushing this in now. If the exit handlers are needed to work around OS or driver bugs, then they should be reintroduced once they can be done without crashing.
Crash stats: https://retrace.fedoraproject.org/faf/problems/bthash/?bth=7712e893327fdae3c939ee81aa54d082ebbdc717 https://retrace.fedoraproject.org/faf/problems/bthash/?bth=8adf67e9e11bb1765726e37b1f3d7028ca66a8db (In reply to Carlos Garcia Campos from comment #4) > I think we need to understand why it's crashing and only if it's not > possible to fix it we can remove the exit handlers. I don't remember why, > but I think we shut down the egl displays for a reason. IIRC mesa has its > own exit handlers to clean up stuff too. Does http://trac.webkit.org/r201595 help remembering? (In reply to Carlos Alberto Lopez Perez from comment #9) > Does http://trac.webkit.org/r201595 help remembering? Note: it was an exit-time destructor before that patch, too (since the PlatformDisplay was not NeverDestroyed). The goal there was to move it to run earlier at exit time, to avoid a specific crash. But as long as we do work at exit time, there will always be more crashes.... (In reply to Michael Catanzaro from comment #8) > Crash stats: > > https://retrace.fedoraproject.org/faf/problems/bthash/ > ?bth=7712e893327fdae3c939ee81aa54d082ebbdc717 > > https://retrace.fedoraproject.org/faf/problems/bthash/ > ?bth=8adf67e9e11bb1765726e37b1f3d7028ca66a8db I really underestimated the total number of crashes here. Those links are currently showing 704 total crashes. I missed this one: https://retrace.fedoraproject.org/faf/reports/1793321/ which shows 2024 reports. That's 2728 total reports of this crash from Fedora alone. (In reply to Michael Catanzaro from comment #11) > which shows 2024 reports. That's 2728 total reports of this crash from > Fedora alone. I should add: only 688 unique reports (reports from users who haven't reported the crash before). So about four reports per unique user. ephy-search-provider crashes 100% for me... I didn't notice for so long because the search provider was broken for most of the 3.26 cycle, and the update fixing it just hit Fedora 27. Some debugging: Since this is the search provider, we wind up taking this path in PlatformDisplay::createPlatformDisplay: #if PLATFORM(WAYLAND) if (auto platformDisplay = PlatformDisplayWayland::create()) return platformDisplay; #endif Notably, that's the only path where m_nativeDisplayOwned is true, so it's the only path where the native display is destroyed. That's why the crash only occurs for the search provider. So we destroy it here: PlatformDisplayWayland::~PlatformDisplayWayland() { if (m_nativeDisplayOwned == NativeDisplayOwned::Yes) wl_display_destroy(m_display); } That's definitely wrong, because this wl_display was created with wl_display_connect(), which is Wayland client API, but wl_display_destroy() is Wayland server API. So we must never call wl_display_destroy(). The right function to use is wl_display_disconnect(), which parallels the call to wl_display_connect() that we used to create the wl_display in PlatformDisplayWayland::create. It's confusing, though, because the UI process is both a Wayland client of the desktop compositor, and also and a Wayland server for the web process. Now, with that fixed locally, there's no longer any crash inside wl_display_destroy(), and wl_display_disconnect() works fine. But I'm seeing a similar crash immediately after that, when the wl_compositor is destroyed. I suspect this indicates the root of the problem: we're trying to use the same wl_display as both a Wayland client and Wayland server, and I guess that isn't right. It doesn't happen when there's a GtkWindow because then GDK's wl_display is used as the client display instead. Thread 1 "epiphany-search" received signal SIGSEGV, Segmentation fault. 0x00007fffed7fc919 in wl_map_insert_at (map=<optimized out>, flags=<optimized out>, i=<optimized out>, data=<optimized out>) at src/wayland-util.c:247 247 src/wayland-util.c: No such file or directory. (gdb) bt #0 0x00007fffed7fc919 in wl_map_insert_at (map=<optimized out>, flags=<optimized out>, i=<optimized out>, data=<optimized out>) at src/wayland-util.c:247 #1 0x00007fffed7f8ad3 in proxy_destroy (proxy=0x5555557a2d60) at src/wayland-client.c:506 #2 0x00007fffed7f8ad3 in wl_proxy_destroy (proxy=0x5555557a2d60) at src/wayland-client.c:537 #3 0x00007ffff3c0c536 in wl_compositor_destroy (wl_compositor=<optimized out>) at /usr/include/wayland-client-protocol.h:1194 #4 0x00007ffff3c0c536 in WebCore::WlPtrDeleter<wl_compositor>::operator()(wl_compositor*) const (this=<optimized out>, ptr=<optimized out>) at /buildstream/build/Source/WebCore/platform/graphics/wayland/WlUniquePtr.h:55 #5 0x00007ffff3c0c536 in std::unique_ptr<wl_compositor, WebCore::WlPtrDeleter<wl_compositor> >::~unique_ptr() (this=0x7fffdf9f5040, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:268 #6 0x00007ffff3c0c536 in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland() (this=0x7fffdf9f5000, __in_chrg=<optimized out>) at /buildstream/build/Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:68 #7 0x00007ffff3c0c569 in WebCore::PlatformDisplayWayland::~PlatformDisplayWayland() (this=0x7fffdf9f5000, __in_chrg=<optimized out>) at /buildstream/build/Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:72 #8 0x00007ffff6056248 in __run_exit_handlers (status=1, listp=0x7ffff63cf6f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:83 #9 0x00007ffff605629a in __GI_exit (status=<optimized out>) at exit.c:105 #10 0x00007ffff603ff31 in __libc_start_main (main= 0x55555555d974 <main>, argc=1, argv=0x7fffffffea58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffea48) at ../csu/libc-start.c:344 #11 0x00005555555599fa in _start () (In reply to Michael Catanzaro from comment #13) > we're trying to use the > same wl_display as both a Wayland client and Wayland server, and I guess > that isn't right. That's not true, I got confused by wl_compositor. That's client API too. Perhaps it should be destroyed before its wl_display. Created attachment 335523 [details]
Patch
Comment on attachment 335523 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=335523&action=review > Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:73 > + wl_display_disconnect(m_display); Note: this does, in fact, destroy the wl_display. WlUniquePtr<wl_display> is a bit of a footgun, as it's not safe to use on any wl_display created with wl_display_connect(). Comment on attachment 335523 [details]
Patch
(cq? for cgarcia)
(In reply to Michael Catanzaro from comment #16) > Comment on attachment 335523 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=335523&action=review > > > Source/WebCore/platform/graphics/wayland/PlatformDisplayWayland.cpp:73 > > + wl_display_disconnect(m_display); > > Note: this does, in fact, destroy the wl_display. > > WlUniquePtr<wl_display> is a bit of a footgun, as it's not safe to use on > any wl_display created with wl_display_connect(). Good catch! this is exactly what I wanted, once we understand the problem we can easily fix the actual bug. WlUniquePtr<wl_display> is only used once in Waylandcompositor, which is a singleton, so I think we can simply remove it to avoid confusions and use destroy/disconnect directly. Comment on attachment 335523 [details]
Patch
Thank you!
Comment on attachment 335523 [details] Patch Clearing flags on attachment: 335523 Committed r229525: <https://trac.webkit.org/changeset/229525> All reviewed patches have been landed. Closing bug. One week later, we're now up to 3958 reports. I think this might be our biggest WebKit crasher ever.... |