Bug 182528 - [GTK] WebProcess deadlock loading a web view from decide-policy signal callback
Summary: [GTK] WebProcess deadlock loading a web view from decide-policy signal callback
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: PC Linux
: P3 Major
Assignee: Nobody
URL:
Keywords: Gtk
Depends on:
Blocks:
 
Reported: 2018-02-05 22:34 PST by Michael Gratton
Modified: 2018-02-07 15:51 PST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Gratton 2018-02-05 22:34:18 PST
A WebProcess deadlock occurs when loading content in a WebKitWebView using webkit_web_view_load_html(), when the call chain causing the load is originates from a signal from another WebView and the single shared process model is being used.

Geary uses a WebKitWebView for both displaying email and composing new messages. If the user opens a composer while viewing an existing message, the existing WebView(s) hangs around while the composer's WebView is constructed, causing them to share a WebProcess:

1. First WebKitWebView is constructed, message body is loaded by call to webkit_web_view_load_html()
2. User opens a message composer, a new WebKitWebView is constructed and message template is loaded using webkit_web_view_load_html()
3. Initial WebKitWebView is destroyed after the composer's view replaces it in the main window, and hence they share a WebProcess.

This works fine if (2) is invoked by a GUI control, say by clicking on the Compose button. If however if the user opens a composer by clicking a mailto link in a message currently displayed by the first web view, then at step (2) the WebProcess freezes and never recovers. In this case, step (2) is started by a call chain invoked from a WebView signal (decide-policy, in this case) emitted from the first.

A workaround is to use g_idle_add() to invoke the composer from the signal handler, so that the signal handler returns before webkit_web_view_load_html() is called on the second view. The deadlock also does not occur when using the multi-process model, but that is inappropriate for Geary. I haven't tested if alternate loading methods (from URI, from bytes, etc) also have the same problem or not.

I guess "don't do this" is a reasonable solution to this bug. However since the the call to webkit_web_view_load_html() was for a different WebView instance, and since the loading and rendering is done out of process, I feel like this is a reasonable thing to expect to work.
Comment 1 Carlos Garcia Campos 2018-02-06 00:03:53 PST
I'm not sure this is really possible. The decide-policy signal is async form the UI process point of view, but not from the web process point of view. It's a sync IP message that is waiting for a response to decide whether to continue with the load or not. Any other message sent to the web process can't be processed. One possible solution could be to make the decision before creating the new web view. I guess you are actually canceling the load to forward it to the new web view. So you could first call ignore() and then create the new web view.
Comment 2 Michael Catanzaro 2018-02-06 05:52:15 PST
We should at least document this limitation.

(In reply to Michael Gratton from comment #0)
> The deadlock also
> does not occur when using the multi-process model, but that is inappropriate
> for Geary.

Why are you using the less-tested and less-robust single secondary process 
model? Is it to reduce memory overhead? For a complex application like Geary, I would recommend the multiple secondary process model.
Comment 3 Michael Gratton 2018-02-06 21:03:41 PST
(In reply to Carlos Garcia Campos from comment #1)
> One possible solution could be to make the decision
> before creating the new web view. I guess you are actually canceling the
> load to forward it to the new web view. So you could first call ignore() and
> then create the new web view.

Oh, calling ignore() before invoking the call chain that opens the second web view does indeed also fix the issue, without needing the idle callback. Thanks - that's a much better general solution.

Anyway, yeah it's a somewhat understandable technical limitation once you know what is happening under the hood. So as Michael suggests maybe worth documenting any such possibly blocking signals/calls?


(In reply to Michael Catanzaro from comment #2)
> 
> Why are you using the less-tested and less-robust single secondary process 
> model? Is it to reduce memory overhead? For a complex application like
> Geary, I would recommend the multiple secondary process model.

Because if we were using multiple secondary processes and the user loaded a large conversation, an instance would be spawned for every message at once, which seems like it could be kind of slow and yeah, memory intensive, but also lead to a janky user experience. We could delay creating web views until the user expanded the bodies (we should probably be doing this anyway), but then you still have the potential need to launch a bunch if the conversation has a number of starred or unread messages (which are expanded by default), or one for almost every message anyway if the user searches for a common word like "and", which would cause most of the message bodies to be expanded at once.

Conversations in the order of 10's of messages aren't uncommon, and 100+ are possible, and Ephy's general unresponsiveness when restoring a session with even something like 10-20 tabs under the multi-process model put me off it.

Maybe just consider me a tester for the single-shared-process model? :)
Comment 4 Michael Catanzaro 2018-02-07 05:37:20 PST
Yeah, that's too many web views to use multiple secondary processes. I was wrongly thinking it was one web view for the conversation and one per composer.

(In reply to Michael Gratton from comment #3)
> Conversations in the order of 10's of messages aren't uncommon, and 100+ are
> possible, and Ephy's general unresponsiveness when restoring a session with
> even something like 10-20 tabs under the multi-process model put me off it.

That's a *very* small number of tabs to be noticing slowness; something seems wrong there.
Comment 5 Michael Gratton 2018-02-07 14:48:00 PST
(In reply to Michael Catanzaro from comment #4)
> Yeah, that's too many web views to use multiple secondary processes. I was
> wrongly thinking it was one web view for the conversation and one per
> composer.

Yeah, sharing them is just co-incidential considering how they are constructed. I'd like to have one-per-composer, and would do so if there was API for it - something like the inverse of webkit_web_view_new_with_related_view(). Oh wait, if I constructed a separate WebContext per composer, would that do it?

> (In reply to Michael Gratton from comment #3)
> > Conversations in the order of 10's of messages aren't uncommon, and 100+ are
> > possible, and Ephy's general unresponsiveness when restoring a session with
> > even something like 10-20 tabs under the multi-process model put me off it.
> 
> That's a *very* small number of tabs to be noticing slowness; something
> seems wrong there.

Well 10 is definitely unfair - trying it just now with 5 tabs in one window and 8 in another was okay. Sorry for taking a dig at Ephy like that by the way, that wasn't the intention. :/
Comment 6 Michael Catanzaro 2018-02-07 15:51:34 PST
(In reply to Michael Gratton from comment #5)
> Oh wait, if I constructed a
> separate WebContext per composer, would that do it?

Yes, that should work!