Created attachment 336253 [details] Backtrace from WebProcess core I just upgraded to Ubuntu bionic which currently has WebKitGTK 2.20.0, and WebProcess is now crashing when Geary's WebExtension is attempting to print a console message. Geary is using the standard vala to_string enum magic to get a string version of the WebKitConsoleMessageLevel returned by webkit_console_message_get_level(): https://git.gnome.org/browse/geary/tree/src/client/web-process/web-process-extension.vala#n57 This translates into the following C code: > g_enum_get_value (g_type_class_ref (webkit_console_message_level_get_type ()), webkit_console_message_get_level (message)); The call to g_type_class_ref() is the one causing the segfault. Despite that it looks like WebKitGTK bug since Geary does this in a lot of different places and it isn't crashing anywhere else. Maybe webkit_console_message_level_get_type() is returning a bogus value? Back trace from a resulting core is attached, couldn't get a full trace since gdb is crashing doing a "bt full".
NB this is a moderately severe issue, since it stops any email that causes a console message to be printed to not be displays.
Turns out this also happens with the WebKitConsoleMessageSource enum as well as the level.
I suspect https://trac.webkit.org/changeset/226366/webkit in combination with a Vala bug. Note my changelog entry: "Since one of the signal arguments is an enum, we now have to run glib-mkenums for the web process API. And that has resulted in this patch also adding GType goo for WebKitConsoleMessageLevel and WebKitConsoleMessageSource that was previously missing. Any applications that for some unlikely reason want to use these enums in properties or signals will be happy." So webkit_console_message_level_get_type() is brand new in 2.20. Did not exist at all in 2.18. So: (In reply to Michael Gratton from comment #0) > g_enum_get_value (g_type_class_ref (webkit_console_message_level_get_type ()), webkit_console_message_get_level (message)); Vala cannot possibly have generated this code in 2.18, because it would not have compiled. And if it did generate this code, then it has its own conflicting definition for webkit_console_message_level_get_type() with its own generated GType goo, infringing on WebKit's namespace, which would be a Vala bug. The first thing I would do to check is build Geary twice with the same version of Vala, once with 2.18 and once with 2.20, and diff the generated C code to see what it's doing with respect to webkit_console_message_level_get_type().
Okay, so I just found out the WebProcess crashes here when I run the locally built binary in the source tree, but does not crash when I build a debian package from the same source tree, with the same valac and libraries, install that and run it. When using the same meson build options for both, the only thing that is different in the generated C source is that the deb build's source doesn't contain vala line number annotations. The generated C line is identical to that below. The only difference with the binaries seems to be that is that the installed deb's binary has debug symbol stripped. So I'm not sure where to go with this. It's definitely only happening with enums from WebkitGTK+ though. Will try to track down some source generated against 2.18 and see what it looks like, need to reboot to try to get libvirt working for that though. o_O
Okay, so the difference in the source code when compiling against 2.18 vs against 2.20 is that for the former, valac will generate a helper function which uses a C-switch on the value returned from the message obejct, rather than trying to use g_enum_get_value and getting the name from that. E.g. for the level: > static const char* > _webkit_console_message_level_to_string (WebKitConsoleMessageLevel value) > { > switch (value) { > case WEBKIT_CONSOLE_MESSAGE_LEVEL_INFO: > return "WEBKIT_CONSOLE_MESSAGE_LEVEL_INFO"; > ... > } > return NULL; > } The g_enum_get_value() method does get used in quite a lot of other places however, it's only these newly blessed enums that are causing a crash with my local build. I'm sort of out of ideas about what could be causing the issue with the plain local build vs the deb local build though.
OK that's more complicated than what I was expecting, I'll take a look...
> g_enum_get_value (g_type_class_ref (webkit_console_message_level_get_type ()), webkit_console_message_get_level (message)); I added this debug code to ephy-web-extension-main.c: GType type; GEnumClass *class; GEnumValue *value; type = webkit_console_message_level_get_type (); class = g_type_class_ref (type); g_warning ("class=%p", class); value = g_enum_get_value (class, WEBKIT_CONSOLE_MESSAGE_LEVEL_DEBUG); g_warning ("value=%d name=%s nick=%s", value->value, value->value_name, value->value_nick); g_type_class_unref (class); It works properly: ** (WebKitWebProcess:18): WARNING **: 00:57:43.265: class=0x5635884a8820 ** (WebKitWebProcess:18): WARNING **: 00:57:43.265: value=4 name=WEBKIT_CONSOLE_MESSAGE_LEVEL_DEBUG nick=debug But I had to manually add this to the top: #define WEBKIT2_COMPILATION 1 #include <webkit2/WebKitWebProcessEnumTypes.h> due to bug #183998. I guess there's some (very slim) possibility that the missing #include is somehow confusing Vala... could you grep through the generated code and make sure Vala isn't trying to define its own webkit_console_message_level_get_type() anywhere? Could you post a copy of the generated web-process-extension.c?
BTW I tried adding Geary to world in gnome-build-meta so that I can build it against the latest WebKit, but I noticed it depends on libcanberra and I didn't really want to add that as well, so I gave up. :) Would be interesting to see what the generated source is doing.
Created attachment 336549 [details] Generated web extension code 2.20 This is the code valac is generating against WebKitGTK 2.20. Strangely it declares webkit_console_message_level_to_string(), but neither uses nor defines the function body.
Created attachment 336550 [details] Generated web extension code 2.18 This is the code valac is generating against WebKitGTK 2.18. This also webkit_console_message_level_to_string(), and again neither uses nor defines the function body, but also declares, defines and uses _webkit_console_message_level_to_string(). They differ by return type.
Created attachment 336551 [details] Generated web extension header 2.20 And for completeness, the header generated against 2.20.
Oh, it's a runtime issue. Running the installed deb build from /usr doesn't exhibit the problem, but running the same binary from ~/.../geary/debian (i.e the deb build directory) does. Is the WebProcess making assumptions about where extension libs are loaded/installed/executed from?
(In reply to Michael Gratton from comment #12) > Oh, it's a runtime issue. > > Running the installed deb build from /usr doesn't exhibit the problem, but > running the same binary from ~/.../geary/debian (i.e the deb build > directory) does. > > Is the WebProcess making assumptions about where extension libs are > loaded/installed/executed from? It loads web extensions from the directory the application sets with webkit_web_context_set_web_extensions_directory().
I don't see anything obviously wrong with the generated code.... (In reply to Michael Gratton from comment #12) > Oh, it's a runtime issue. > > Running the installed deb build from /usr doesn't exhibit the problem, but > running the same binary from ~/.../geary/debian (i.e the deb build > directory) does. > > Is the WebProcess making assumptions about where extension libs are > loaded/installed/executed from? Are you sure that it's really linked against 2.20 when you're running it from the build directory? Check with ldd.
(In reply to Michael Gratton from comment #12) > Oh, it's a runtime issue. > > Running the installed deb build from /usr doesn't exhibit the problem, but > running the same binary from ~/.../geary/debian (i.e the deb build > directory) does. NB: https://bugzilla.redhat.com/show_bug.cgi?id=1560266
> Are you sure that it's really linked against 2.20 when you're running it from > the build directory? Check with ldd. Yeah, looks like it: > mjg@payens:~/Projects/GNOME/geary$ ldd build/src/libgeary-web-process.so | grep webkit > libwebkit2gtk-4.0.so.37 => /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37 (0x00007f1cbed5c000) > mjg@payens:~/Projects/GNOME/geary$ dpkg -S /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37 > libwebkit2gtk-4.0-37:amd64: /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37 > mjg@payens:~/Projects/GNOME/geary$ dpkg -l | grep libwebkit2gtk-4.0-37 > ii libwebkit2gtk-4.0-37:amd64 2.20.1-1 amd64 Web content engine library for GTK+ Although it also looks like upgrading to 2.20.1 has fixed the issue, so maybe the missing header was somehow causing it?
Surprising, but OK, it was surely caused by the missing header, then. Somehow. I guess to figure out how, you could look at how the code generated by Vala changed between 2.20.0 and 2.20.1.