Bug 177526 - [GTK] Segfault in WebPageProxy::setFindClient()
Summary: [GTK] Segfault in WebPageProxy::setFindClient()
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
Depends on:
Reported: 2017-09-27 03:43 PDT by Cédric Bellegarde
Modified: 2017-09-28 06:50 PDT (History)
2 users (show)

See Also:

backtrace (51.15 KB, text/plain)
2017-09-27 03:43 PDT, Cédric Bellegarde
no flags Details
Full backtrace (44.05 KB, text/plain)
2017-09-27 14:36 PDT, Cédric Bellegarde
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cédric Bellegarde 2017-09-27 03:43:21 PDT
Created attachment 321950 [details]

Not sure it's really related to Gtk build.

Backtrace attached.
Comment 1 Michael Catanzaro 2017-09-27 09:49:40 PDT
The crash is certainly in the GTK API layer. webkitWebViewGetPage is returning *nullptr, which should never ever happen. If you find a way to reproduce this, that would be awesome.

webkitWebViewGetPage returns the result of webkitWebViewBaseGetPage, dereferenced. I'm a bit surprised it doesn't segfault right there, but I suppose a compiler optimization might be responsible for avoiding that.

webkitWebViewBaseGetPage just returns priv->pageProxy.get(), which is initialized in webkitWebViewBaseCreateWebPage. And pageProxy is a RefPtr, so it's cleared in finalize (not dispose) by the priv destructor. So, once set, it should be valid until the WebKitWebViewBase has been finalized. So I think that, somehow, webkit_web_view_get_find_controller was called before webkitWebViewBaseCreateWebPage.

Now, in the case we care about, where the WebKitWebViewBase is a WebKitWebView, webkitWebViewBaseCreateWebPage is only called by webkitWebViewCreatePage, which is only called by webkitWebContextCreatePageForWebView, which is always called in webkitWebViewConstructed. So this looks safe: there should be no way to get into this state where a WebKitWebView has no WebPageProxy, unless the WebKitWebViewBase has already been destroyed.

I was a bit suspicious that might happen when the WebKitWebView is finalized, but no: in WTFGType.h (where finalize is defined) we see that the private struct destructor is called before the parent class is finalized, not after. And that is the right order.

So I think it's most likely that a refcounting error elsewhere has caused pageProxy to be destroyed even while WebKitWebViewBase is still valid and still has it in a RefPtr.

What I also do not understand is how the backtrace goes from WebKit::WebPageProxy::didCommitLoadForFrame to g_object_unref -> webkitFindControllerDispose. I looked at all the load-changed signal handlers in eolie and didn't find anything suspicious. Maybe a 'bt full' could shed more light on this, or an unoptimized build... if you can manage to reproduce it again.
Comment 2 Michael Catanzaro 2017-09-27 09:52:02 PDT
(In reply to Michael Catanzaro from comment #1)
> So I
> think that, somehow, webkit_web_view_get_find_controller was called before
> webkitWebViewBaseCreateWebPage.

(Strike that sentence, that's extremely unlikely.)
Comment 3 Cédric Bellegarde 2017-09-27 14:36:28 PDT
Created attachment 322016 [details]
Full backtrace
Comment 4 Michael Catanzaro 2017-09-27 15:44:23 PDT
Are there any criticals printed? Still no clue how to reproduce it? Without a reproducer, I think all we can do is speculate.

Speculation: what's probably happening is the WebKitFindController is being reffed somewhere (possibly by PyGObject, for reasons unknown), then the WebKitWebView is destroyed, then the WebKitFindController is unreffed. That should be perfectly legal, but we don't handle that case properly as WebKitFindController's dispose function relies on its WebKitWebView still being alive. (I'm sure we've seen a similar design issue elsewhere in our API before, but I don't remember where.) Anyway, that we could fix by having WebKitFindController set up a GWeakPtr to learn when its WebKitWebView is destroyed; in that case, we just skip the call to WebPageProxy::setFindClient in webkitFindControllerDispose.

But I still don't understand from the backtrace how this is happening. It looks like the WebKitWebView is still alive and emits the load-changed signal, then all of a sudden PyGObject decides to destroy the WebKitFindController. I guess it must be destroying lots of things and have succeeded in destroying the WebKitWebView even before it has finished emitted the load-changed signal. Absolutely no clue what's going on there... maybe you could try roping in the PyGObject developers?
Comment 5 Cédric Bellegarde 2017-09-27 23:05:10 PDT
I guess this is related to python GC freeing memory because it's a really random bug :-/

There is no way to reproduce this, it really happens randomly, but really often on quit with a smaller traceback:

#0  0x00007f4998131eb6 in std::swap<API::FindClient*>(API::FindClient*&, API::FindClient*&) (__b=<optimized out>, __a=<optimized out>) at /usr/include/c++/7/bits/move.h:198
#1  0x00007f4998131eb6 in std::unique_ptr<API::FindClient, std::default_delete<API::FindClient> >::reset(API::FindClient*) (__p=<optimized out>, this=<optimized out>)
    at /usr/include/c++/7/bits/unique_ptr.h:374
#2  0x00007f4998131eb6 in std::unique_ptr<API::FindClient, std::default_delete<API::FindClient> >::operator=(std::unique_ptr<API::FindClient, std::default_delete<API::FindClient> >&&) (__u=..., this=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:283
#3  0x00007f4998131eb6 in WebKit::WebPageProxy::setFindClient(std::unique_ptr<API::FindClient, std::default_delete<API::FindClient> >&&) (this=0x0, findClient=...)
    at /usr/src/debug/webkitgtk4-2.18.0-2.fc27.x86_64/Source/WebKit/UIProcess/WebPageProxy.cpp:546
#4  0x00007f4998307428 in webkitFindControllerDispose(GObject*) (object=
    0x563de8b95b20 [WebKitFindController])
    at /usr/src/debug/webkitgtk4-2.18.0-2.fc27.x86_64/Source/WebKit/UIProcess/API/glib/WebKitFindController.cpp:138
#5  0x00007f49a67144f8 in g_object_unref (_object=0x563de8b95b20) at gobject.c:3277
#6  0x00007f49a6ba04e9 in pygobject_clear (self=0x7f49452461b0)
    at pygobject-object.c:1199
#7  0x00007f49a6ba04e9 in pygobject_dealloc (self=0x7f49452461b0)
    at pygobject-object.c:1092
#8  0x00007f49aedd37ee in subtype_dealloc () at /lib64/libpython3.6m.so.1.0
#9  0x00007f49aed6ba88 in dict_dealloc () at /lib64/libpython3.6m.so.1.0
#10 0x00007f49a6b9fc57 in pygobject_clear (self=0x7f4945242e58)
    at pygobject-object.c:1204
#11 0x00007f49aed777cc in collect () at /lib64/libpython3.6m.so.1.0
#12 0x00007f49aee369bd in collect_with_callback () at /lib64/libpython3.6m.so.1.0
#13 0x00007f49aee6dfd1 in PyGC_Collect () at /lib64/libpython3.6m.so.1.0
#14 0x00007f49aee835bd in Py_FinalizeEx () at /lib64/libpython3.6m.so.1.0
#15 0x00007f49aee83728 in Py_Exit () at /lib64/libpython3.6m.so.1.0
#16 0x00007f49aee83817 in handle_system_exit () at /lib64/libpython3.6m.so.1.0
#17 0x00007f49aee83886 in PyErr_PrintEx () at /lib64/libpython3.6m.so.1.0
#18 0x00007f49aee83c82 in PyRun_SimpleFileExFlags () at /lib64/libpython3.6m.so.1.0
#19 0x00007f49aee84c03 in Py_Main () at /lib64/libpython3.6m.so.1.0
#20 0x0000563de6256d45 in main ()
Comment 6 Cédric Bellegarde 2017-09-27 23:09:26 PDT
Does this looks related for you?
Comment 7 Cédric Bellegarde 2017-09-27 23:40:35 PDT
Ok, looks it's related.

Porting https://bugzilla.gnome.org/show_bug.cgi?id=546802#c13 patch to current pygobject version seems to fix the issue. Not able to reproduce the bug since patch applied.

Thanks a lot for help ;) Will close this one later today if Eolie does not segfault again due to this bug :-)
Comment 8 Cédric Bellegarde 2017-09-28 00:48:56 PDT
Ok closing, this also fixed this bug: https://github.com/aperezdc/webkit2gtk-python-webextension-example/issues/4

I'm so happy :-)
Comment 9 Michael Catanzaro 2017-09-28 06:50:19 PDT
OK great!

I still think we should consider using GWeakPtr to keep track of the WebKitWebView, but I guess most programmers are unlikely to intentionally keep around a WebKitFindController after its WebKitWebView is dead, so closing this seems fine.