Bug 110743

Summary: [WK2] WebPageProxy loadURL() won't work when called just after terminateProcess()
Product: WebKit Reporter: Adenilson Cavalcanti Silva <savagobr>
Component: WebKit2Assignee: Benjamin Poulain <benjamin>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, ap, benjamin, cgarcia, gyuyoung.kim, hausmann, jesus, rafael.lobo, rakuco, sergio, webkit.review.bot, zan
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 109980    
Bug Blocks:    
Attachments:
Description Flags
Makes WebPageProxy cleanup at terminateProcess() step
none
Do all cleanup on terminate (both WebPageProxy and WebProcessProxy).
none
With new test (load, crash, load)
none
Fixed failing unit test
benjamin: review-, benjamin: commit-queue-
Addressing reviewers comments: CL, member function naming, using private access, etc.
none
Patch benjamin: review+

Description Adenilson Cavalcanti Silva 2013-02-25 03:31:09 PST
Issue #109980 has a unit test that demonstrate this scenario.

loadURL() tests for the state of WebProcess to decide if there is a need to re-spawn it (in cases where a crash happened).

The issue is that the check depends in a variable that will not be updated when terminateProcess() has executed, thus, loading a page will fail. What would be expected is:

terminateProcess() --> loadURL() --> reattachtoWebProcess()

But what really is happening:

terminateProcess() --> loadURL() --> processDidCrash()


This is due to the fact that the complete WebProcess's resources teardown happens in processDidCrash(). Ideally, all cleanup should be done at terminateProcess().
Comment 1 Adenilson Cavalcanti Silva 2013-02-26 12:27:23 PST
An update on this issue: I started by moving most of the code in WebPageProxy processDidCrash() to a private function named processTeardown().

It is used by both processDidCrash() and at terminateProcess(). The basic idea here is that when terminateProcess() was called, it should do the cleanup and processDidCrash() should *not* execute in a later step.

Further investigation revealed that WebProcessProxy will call WebPageProxy::processDidCrash() when the connection to WebProcess is closed. So it seems that is required to know  at WebProcessProxy the 2 cases:

a) When WebPageProxy::terminateProcess() has executed: because in this case, we will do the process cleanup just after calling ChildProcesProxy::terminate() and WebPageProxy::processDidCrash() should *not* be executed, otherwise cleanup would be done twice.

and

b) When the process crashed by external factors: in this case no process cleanup has being done so far and *there is* the need to execute WebPageProxy::processDidCrash().

There is an added complication I've observed. Some ports (e.g. EFL) after being notified by WebPageProxy::processDidCrash() that the process has exited, will call WebPageProxy::loadURL(). This happens exactly in the middle of WebProcess associated resources tear down.

I think it makes sense to notify the port of WebProcess unavailability only in cases of real crash, not in the case of explicit termination (i.e. when WebPageProxy::terminateProcess() was called).

Finally, it would be safer to notify the port of WebProcess crash only *after* cleaning up the associated resources (so that if the port calls WebPageProxy::loadURL() or reload(), we wouldn't be in the an undefined state).
Comment 2 Benjamin Poulain 2013-02-26 18:01:48 PST
It looks like we need terminateProcess() to:
1) Ensure any future message from the WebProcess will be discarded.
2) Kill the Process
3) Reset to a clean state.
4) Maybe call processDidCrash, or maybe schedule its call, I am not sure.

When terminateProcess() finishes, any command on the PageProxy should be valid.
Comment 3 Anders Carlsson 2013-02-26 18:08:50 PST
(In reply to comment #2)
> It looks like we need terminateProcess() to:
> 4) Maybe call processDidCrash, or maybe schedule its call, I am not sure.

We shouldn’t call processDidCrash - if the client explicitly asked for the process to be terminated, we shouldn’t consider it to be a crash.
Comment 4 Adenilson Cavalcanti Silva 2013-02-28 19:03:17 PST
Created attachment 190867 [details]
Makes WebPageProxy cleanup at terminateProcess() step
Comment 5 Adenilson Cavalcanti Silva 2013-02-28 19:26:57 PST
An explanation on this patch... first, the good: it does the resources cleanup associated with a client call to process termination and implements a way for WebProcessProxy to know if it is a crash or if it is an explicit termination, thus notifying the client when needed.

The bad: it won't make the tests on #109980 to succeed because when when the WebPageProxy::reattachToWebProcess() is executed, WebProcessProxy *hasn't* completed its own cleanup.
Comment 6 Rafael Brandao 2013-03-01 15:52:28 PST
Comment on attachment 190867 [details]
Makes WebPageProxy cleanup at terminateProcess() step

View in context: https://bugs.webkit.org/attachment.cgi?id=190867&action=review

> Source/WebKit2/UIProcess/WebProcessProxy.cpp:404
> +        m_isTerminated = false;

if !m_isTerminated, then m_isTerminated is already false.
Comment 7 Adenilson Cavalcanti Silva 2013-04-07 19:57:37 PDT
Created attachment 196821 [details]
Do all cleanup on terminate (both WebPageProxy and WebProcessProxy).
Comment 8 Adenilson Cavalcanti Silva 2013-04-07 20:02:16 PDT
The good: does all cleanup (including WebProcessProxy) so that WebPageProxy::reattachToWebProcess() will work (and so loadURL()).

It will break the other unit tests that simulate crashes calling WebPageProxy::terminateProcess(), since we no longer notify the client of a 'crash' (since it really is not a crash, but was requested by the user).

Question: what to do with the other unit tests that depend on execution of a crash handler?
Comment 9 Adenilson Cavalcanti Silva 2013-04-08 05:46:27 PDT
Created attachment 196846 [details]
With new test (load, crash, load)

WebProcess termination now is done using a new function 'userRequestedTerminate' that performs complete shutdown of WebProcessProxy before returning. This ensures that at time reattachToWebProcess() is executed, WebProcessProxy is in a sane state.
    
Next, cleanup of WebPageProxy is done at terminateProcess(), using a new function named 'processTeardown()', that is shared with processDidCrash(). Finally, client is not notified of process crash, since it was terminated per client's request.
    
This changes makes the new test succeed, but will cause a regression in all tests that depend on a callback to be executed at process termination.
Comment 10 Alexey Proskuryakov 2013-04-08 09:38:38 PDT
Comment on attachment 196846 [details]
With new test (load, crash, load)

View in context: https://bugs.webkit.org/attachment.cgi?id=196846&action=review

> will cause a regression in all tests that depend on a callback to be executed at process termination.

What are these other tests? We clearly need to do something about them, can't just make them fail.

> Tools/TestWebKitAPI/Tests/WebKit2/LoadPageOnCrash.cpp:59
> +    void crashWebProcess()
> +    {
> +        WKPageTerminate(webView.page());
> +    }

Unsure why you called this function crashWebProcess. It doesn't make it crash, does it?
Comment 11 Adenilson Cavalcanti Silva 2013-04-08 10:43:18 PDT
Alexey

Thanks for commenting. Answering the questions:

a) What are the tests? All that use WKPageTerminate API, which are 3 tests (MouseMoveAfterCrash, ReloadPageAfterCrash, ResizeWindowAfterCrash).

I think they can work *if* we execute the code in the crash callback within the scope of primary test function.

b) why 'crashWebProcess': it terminates WebProcess thus simulating a crash (the other 3 tests follow a similar nomenclature). If you wish, I can change the function name.
Comment 12 Adenilson Cavalcanti Silva 2013-04-08 13:09:17 PDT
Created attachment 196945 [details]
Fixed failing unit test

Fixed failing unit test.
Comment 13 Benjamin Poulain 2013-04-08 15:09:38 PDT
Comment on attachment 196945 [details]
Fixed failing unit test

View in context: https://bugs.webkit.org/attachment.cgi?id=196945&action=review

> Source/WebKit2/ChangeLog:14
> +        WebPageProxy should do its resources clean up as soon as executing the call to terminate.
> +
> +        Its state will be undefined if depending on the execution of processDidCrash(), thus this
> +        patch moves the cleanup code to a shared private function that is used for both the cases
> +        i.e. user termination and real crash. WebProcess shutdown is done using a new method that
> +        ensures that all cleanup was done before returning. Finally, for user requested termination,
> +        clients are no longer notified of crash.

I think the changelog should be clearer. You are fixing a very specific issue. You first state what was the bug, then explain why it happened, then you put the text explaining how you fixed the problem.

> Source/WebKit2/UIProcess/WebPageProxy.cpp:1571
> +    m_process->userRequestedTerminate();
> +    processTeardown();

Why are those called in this order?

You should first clean up the state, then invoke any callback.

> Source/WebKit2/UIProcess/WebPageProxy.cpp:3803
> +    if (!m_isValid)
> +        return;

When does that happen?
    1) terminateProcess()
    2) processTeardown()
    3) processDidCrash()
?
Because I would think you cannot get processDidCrash() after processTeardown()  (because of m_process->removeMessageReceiver).

Can this be tested?

> Source/WebKit2/UIProcess/WebPageProxy.cpp:3810
> +    // After removing the message receivers and cleaning up resources, we
> +    // notify the client/port that a crash happened. This ensures
> +    // that we are in a defined state and calls to loadURL() or reload()
> +    // should work.

I don't think this comment is necessary.

> Source/WebKit2/UIProcess/WebPageProxy.h:604
> +    void processTeardown();

This should be private.

> Source/WebKit2/UIProcess/WebProcessProxy.h:119
> +    void userRequestedTerminate();
> +

The name should not have "user", as it is a callback for any programmatic termination. Not only the one triggered by the user.

Why not terminateProcess()?
Comment 14 Benjamin Poulain 2013-04-08 15:10:30 PDT
Please ping me when you have an update. Let's iterate quickly on this until fixed.
Comment 15 Adenilson Cavalcanti Silva 2013-04-09 11:22:39 PDT
Benjamin

Thanks for reviewing this patch. I will try to address the comments bellow.

> I think the changelog should be clearer. You are fixing a very specific >issue. You first state what was the bug, then explain why it happened, then >you put the text explaining how you fixed the problem.
> 

Concerning the CL... agreed. The fix is one thing (i.e. load fails after call to terminate) and the points of failure are 2:
a) Cleaning up of WebPageProxy
b) Reattach executing *before* complete WebProcessProxy cleanup.

I will upload a new version of this patch with a hopefully better explanation in the CL. 

> > Source/WebKit2/UIProcess/WebPageProxy.cpp:1571
> > +    m_process->userRequestedTerminate();
> > +    processTeardown();
> 
> Why are those called in this order?
> 

Executing WebProcess termination must be done *before* any cleanup of WebPageProxy. If you think about it, in normal crashes, this is exactly what has happened already.

This ensures that reattach/respawn of WebProcess will succeed, since when we return to WebPageProxy, WebProcessProxy is in a well defined state (and not in the middle of its own cleanup).

> 
> When does that happen?
>     1) terminateProcess()
>     2) processTeardown()
>     3) processDidCrash()

The previous behavior was to execute processDidCrash(), called from WebProcessProxy.

With this patch, using the new member function 'userRequestedTerminate', WebProcessProxy will *not* callback any method into WebPageProxy. This includes processDidCrash().

So, the execution flow for user requested termination was:

WebPageProxy................ WebProcessProxy
terminateProcess() --------->  processTerminate()
processDidCrash() <----------  DidClose()


*Now* the execution flow for this case is:

WebPageProxy................ WebProcessProxy
terminateProcess() --------->  userRequestedTerminate() //kill everything
processTearDown()

For normal crashes (i.e. not user requested termination), the execution flow hasn't changed. WebProcessProxy will call processDidCrash() into WebPageProxy which will then notify the client/port of the crash. 

Another thing that this patch changes is the ordering of steps for normal crashes. Before it was: notify the client/port of a crash and next try to do the cleanup.

Now it is: first do the cleanup, next notify the client/port of a crash. This helps to ensure that WebPageProxy is in a defined state if someone calls its member functions (e.g. loadURL()). In those cases, since the cleanup hasn't executed yet, the test for WebProcess state would fail (and reattach is never executed).

> 
> > Source/WebKit2/UIProcess/WebPageProxy.cpp:3810
> > +    // After removing the message receivers and cleaning up resources, we
> > +    // notify the client/port that a crash happened. This ensures
> > +    // that we are in a defined state and calls to loadURL() or reload()
> > +    // should work.
> 
> I don't think this comment is necessary.
> 

Ack.


> > Source/WebKit2/UIProcess/WebPageProxy.h:604
> > +    void processTeardown();
> 
> This should be private.
> 

Agreed.

> > Source/WebKit2/UIProcess/WebProcessProxy.h:119
> > +    void userRequestedTerminate();
> > +
> 
> The name should not have "user", as it is a callback for any programmatic termination. Not only the one triggered by the user.
> 

Isn't this executed when the user/client code call WKPageTerminate()?
Comment 16 Adenilson Cavalcanti Silva 2013-04-11 10:46:17 PDT
Created attachment 197640 [details]
Addressing reviewers comments: CL, member function naming, using private access, etc.

Tested on Qt/Linux, waiting to finish compile on Mac to run unit tests before asking for cq.
Comment 17 Benjamin Poulain 2013-04-11 15:13:14 PDT
Comment on attachment 197640 [details]
Addressing reviewers comments: CL, member function naming, using private access, etc.

View in context: https://bugs.webkit.org/attachment.cgi?id=197640&action=review

> Source/WebKit2/ChangeLog:17
> +        closing connections. Otherwise, WebPageProxy can even be able to
> +        detect that WebProcess is no longer running, but a call to respawn
> +        the process will fail.

"Otherwise, WebPageProxy can even be able to detect that WebProcess [...]"

> Source/WebKit2/ChangeLog:19
> +        To fix this issues, this patch moves the cleanup code to a shared private function

"this issue" or "these issues".

> Tools/TestWebKitAPI/Tests/WebKit2/LoadPageOnCrash.cpp:83
> +    EXPECT_TRUE(testHelper->firstSuccessfulLoad);

This can never be false.
Comment 18 Anders Carlsson 2013-04-11 15:15:57 PDT
Comment on attachment 197640 [details]
Addressing reviewers comments: CL, member function naming, using private access, etc.

View in context: https://bugs.webkit.org/attachment.cgi?id=197640&action=review

> Source/WebKit2/UIProcess/WebPageProxy.cpp:3810
> +void WebPageProxy::processTeardown()

This function name is ambiguous; it sounds like it will process the teardown.
Comment 19 Adenilson Cavalcanti Silva 2013-04-11 19:09:18 PDT
Heads up on test results:

Adenilsons-Mac-mini:Webkit adenilsoncavalcanti$ egrep -n Crash ../vanilla.txt | grep After
292:    Test: MouseMoveAfterCrash -> Passed
302:    Test: ReloadPageAfterCrash -> Passed
303:    Test: ResizeWindowAfterCrash -> Passed

Adenilsons-Mac-mini:Webkit adenilsoncavalcanti$ egrep -n Crash ../patched.txt | grep After
293:    Test: LoadPageAfterCrash -> Passed
294:    Test: MouseMoveAfterCrash -> Passed
304:    Test: ReloadPageAfterCrash -> Passed
305:    Test: ResizeWindowAfterCrash -> Passed


All other API tests are ok too.
Comment 20 Benjamin Poulain 2013-04-11 21:32:25 PDT
Comment on attachment 197640 [details]
Addressing reviewers comments: CL, member function naming, using private access, etc.

View in context: https://bugs.webkit.org/attachment.cgi?id=197640&action=review

>> Source/WebKit2/UIProcess/WebPageProxy.cpp:3810
>> +void WebPageProxy::processTeardown()
> 
> This function name is ambiguous; it sounds like it will process the teardown.

processTeardown-> resetStateAfterProcessExit?

Anders, any suggestion?

> Source/WebKit2/UIProcess/WebPageProxy.cpp:3899
> -
>      if (!m_isValid) {

Can m_isValid be true at all at this point?
Comment 21 Benjamin Poulain 2013-04-12 14:21:43 PDT
Created attachment 197886 [details]
Patch
Comment 22 Benjamin Poulain 2013-04-12 16:32:24 PDT
Landed: https://trac.webkit.org/r148312
Comment 23 Rafael Brandao 2013-04-19 17:22:56 PDT
(In reply to comment #22)
> Landed: https://trac.webkit.org/r148312

So now ReloadPageAfterCrash is useless. The "didCrash" callback is never called and the test is now simply tests if the page reloads. In fact, if you simply comment "WKPageTerminate" from that test, it will also pass.
Comment 24 Benjamin Poulain 2013-04-19 18:12:28 PDT
(In reply to comment #23)
> (In reply to comment #22)
> > Landed: https://trac.webkit.org/r148312
> 
> So now ReloadPageAfterCrash is useless. The "didCrash" callback is never called and the test is now simply tests if the page reloads. In fact, if you simply comment "WKPageTerminate" from that test, it will also pass.

Yep.

I suggested to Adenilson to make a injected bundle to crash the WebProcess. That way we can start making process-crash tests again.
Comment 25 Rafael Brandao 2013-04-19 20:44:08 PDT
How are we going to test WKPageTerminate if we never get a callback such as "didTerminate"? What if instead of having "processDidCrash" callback we had a single "processDidTerminate" or something like that with a boolean argument confirming if it was a crash or not? Given that the user of this API can figure out by himself if it has crashed by purpose or not checking whether he has used WKPageTerminate I think we don't even need that argument at all: he could just ignore "processDidTerminate" if he is indeed expecting the page to terminate or act as a "processDidCrash" otherwise at user level without changing the behavior we had before.
Comment 26 Benjamin Poulain 2013-04-19 20:49:22 PDT
(In reply to comment #25)
> How are we going to test WKPageTerminate if we never get a callback such as "didTerminate"? What if instead of having "processDidCrash" callback we had a single "processDidTerminate" or something like that with a boolean argument confirming if it was a crash or not? Given that the user of this API can figure out by himself if it has crashed by purpose or not checking whether he has used WKPageTerminate I think we don't even need that argument at all: he could just ignore "processDidTerminate" if he is indeed expecting the page to terminate or act as a "processDidCrash" otherwise at user level without changing the behavior we had before.

It is not clear to me how that would help WKPageTerminate. What kind of tests do you have in mind.

A callback processDidTerminate does not really fit into the API.
Comment 27 Jesus Sanchez-Palencia 2013-04-21 11:45:29 PDT
(In reply to comment #26)
> (In reply to comment #25)
> > How are we going to test WKPageTerminate if we never get a callback such as "didTerminate"? What if instead of having "processDidCrash" callback we had a single "processDidTerminate" or something like that with a boolean argument confirming if it was a crash or not? Given that the user of this API can figure out by himself if it has crashed by purpose or not checking whether he has used WKPageTerminate I think we don't even need that argument at all: he could just ignore "processDidTerminate" if he is indeed expecting the page to terminate or act as a "processDidCrash" otherwise at user level without changing the behavior we had before.
> 
> It is not clear to me how that would help WKPageTerminate. What kind of tests do you have in mind.
> 
> A callback processDidTerminate does not really fit into the API.


I agree with Benjamin here. Plus, from an API perspective, both options look a bit messy. A boolean that will ending up as a state representation, or a few assumptions around using or ignoring callbacks.

If what we want in the end is only a way to test a page-crashed-and-reload-something-else flow, the Injected Bundle path looks cleaner.
Comment 28 Rafael Brandao 2013-04-21 16:36:02 PDT
(In reply to comment #27)
> (In reply to comment #26)
> > (In reply to comment #25)
> > > How are we going to test WKPageTerminate if we never get a callback such as "didTerminate"? What if instead of having "processDidCrash" callback we had a single "processDidTerminate" or something like that with a boolean argument confirming if it was a crash or not? Given that the user of this API can figure out by himself if it has crashed by purpose or not checking whether he has used WKPageTerminate I think we don't even need that argument at all: he could just ignore "processDidTerminate" if he is indeed expecting the page to terminate or act as a "processDidCrash" otherwise at user level without changing the behavior we had before.
> > 
> > It is not clear to me how that would help WKPageTerminate. What kind of tests do you have in mind.
> > 
> > A callback processDidTerminate does not really fit into the API.
> 
> 
> I agree with Benjamin here. Plus, from an API perspective, both options look a bit messy. A boolean that will ending up as a state representation, or a few assumptions around using or ignoring callbacks.
> 
> If what we want in the end is only a way to test a page-crashed-and-reload-something-else flow, the Injected Bundle path looks cleaner.

It's not only about whether we have a crash or not, but how can we test WKPageTerminate. The callback looks like a straight-forward solution. It's still not clear to me how WKPageTerminate can be tested today... or maybe we just don't want to test it.
Comment 29 Benjamin Poulain 2013-04-21 18:47:49 PDT
(In reply to comment #28)
> It's not only about whether we have a crash or not, but how can we test WKPageTerminate. The callback looks like a straight-forward solution. It's still not clear to me how WKPageTerminate can be tested today... or maybe we just don't want to test it.

Seriously I have trouble following you. To make thing easier, what kind of tests do you envision if we had a callback?
Comment 30 Benjamin Poulain 2013-04-21 18:47:50 PDT
(In reply to comment #28)
> It's not only about whether we have a crash or not, but how can we test WKPageTerminate. The callback looks like a straight-forward solution. It's still not clear to me how WKPageTerminate can be tested today... or maybe we just don't want to test it.

Seriously I have trouble following you. To make thing easier, what kind of tests do you envision if we had a callback?