Bug 209232

Summary: Explicitly activate the new DrawingAreaProxy on PSON navigation
Product: WebKit Reporter: youenn fablet <youennf>
Component: Layout and RenderingAssignee: youenn fablet <youennf>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, cdumez, cgarcia, ews-watchlist, ggaren, japhet, koivisto, nham, simon.fraser, thorton, webkit-bug-importer, yurys, zalan
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=209741
Attachments:
Description Flags
Patch
none
Patch
none
Patch for landing
none
Patch
none
simple example none

Description youenn fablet 2020-03-18 08:06:37 PDT
When PSONing, a web page proxy listens to two DrawingAreaProxy.
While this may not be an issue when everything is working and rendering is properly suspended in WebProcess, this is fragile design, as shown by regression in https://bugs.webkit.org/show_bug.cgi?id=208918.

It seems that, ideally, when the provisional DrawingAreaProxy starts receiving some messages, WebPageProxy should stop listening to the old one.
Or maybe do some processing that ensures that rendering is swapped at the exact time draw area proxies are swapped.
Comment 1 Chris Dumez 2020-03-18 08:15:41 PDT
(In reply to youenn fablet from comment #0)
> When PSONing, a web page proxy listens to two DrawingAreaProxy.
> While this may not be an issue when everything is working and rendering is
> properly suspended in WebProcess, this is fragile design, as shown by
> regression in https://bugs.webkit.org/show_bug.cgi?id=208918.
> 
> It seems that, ideally, when the provisional DrawingAreaProxy starts
> receiving some messages, WebPageProxy should stop listening to the old one.

This would not be right. While the provisional page exists, the WebPageProxy is still the one displayed on screen and interactive. It definitely should keep listening to IPC from its DrawingArea. At any point, the provisional load can be cancelled and it would be like nothing had happened.
Comment 2 Chris Dumez 2020-03-18 08:17:21 PDT
Also, the WebPageProxy does not really have 2 DrawingAreaProxy objects. The WebPageProxy has one and the ProvisionalPageProxy has one. Both proxies use different identifiers for their IPC.
Comment 3 youenn fablet 2020-03-18 12:37:14 PDT
(In reply to Chris Dumez from comment #1)
> (In reply to youenn fablet from comment #0)
> > When PSONing, a web page proxy listens to two DrawingAreaProxy.
> > While this may not be an issue when everything is working and rendering is
> > properly suspended in WebProcess, this is fragile design, as shown by
> > regression in https://bugs.webkit.org/show_bug.cgi?id=208918.
> > 
> > It seems that, ideally, when the provisional DrawingAreaProxy starts
> > receiving some messages, WebPageProxy should stop listening to the old one.
> 
> This would not be right. While the provisional page exists, the WebPageProxy
> is still the one displayed on screen and interactive. It definitely should
> keep listening to IPC from its DrawingArea. At any point, the provisional
> load can be cancelled and it would be like nothing had happened.

Currently, both drawing areas are linked to the same WebPageProxy.
It they both receive EnterAcceleratedCompositingMode IPC messages roughly at the same time, they will call WebPageProxy:: enterAcceleratedCompositingMode and the last one will win. It would be nice to fix this.

There is nothing preventing a buggy or bad-behaving web process to send this IPC message and keep the control of what is being displayed on the supposed-to-be process-swapped tab.

It makes sense to me that the drawing areas would forward their messages to their WebPageProxy when they are set to WebPageProxy::m_drawingArea but not otherwise.
In the current design, that would probably require buffering some state.
I could check again but I believe enterAcceleratedCompositingMode messages are processed for the provisional drawing area before it is assigned to WebPageProxy::m_drawingArea.
Comment 4 youenn fablet 2020-03-19 01:18:42 PDT
Created attachment 393955 [details]
Patch
Comment 5 Chris Dumez 2020-03-19 07:50:24 PDT
Comment on attachment 393955 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=393955&action=review

> Source/WebKit/WebProcess/WebPage/WebPage.cpp:5854
> +    unfreezeLayerTree(LayerTreeFreezeReason::ProcessSwap);

I think this will likely introduce flashing when navigating cross-site. This is the reason we unfroze the layer tree in didCompletePageTransition(), because we knew that at that point there was something meaningful to paint in the new drawing area.
Comment 6 youenn fablet 2020-03-19 09:23:55 PDT
(In reply to Chris Dumez from comment #5)
> Comment on attachment 393955 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=393955&action=review
> 
> > Source/WebKit/WebProcess/WebPage/WebPage.cpp:5854
> > +    unfreezeLayerTree(LayerTreeFreezeReason::ProcessSwap);
> 
> I think this will likely introduce flashing when navigating cross-site. This
> is the reason we unfroze the layer tree in didCompletePageTransition(),
> because we knew that at that point there was something meaningful to paint
> in the new drawing area.

What is causing the flashing? How can I check that? Is it mostly iOS?
I added a timer to delay by 1 second the unfreezing and I did not see it.
Comment 7 Chris Dumez 2020-03-19 09:28:32 PDT
Comment on attachment 393955 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=393955&action=review

>>> Source/WebKit/WebProcess/WebPage/WebPage.cpp:5854
>>> +    unfreezeLayerTree(LayerTreeFreezeReason::ProcessSwap);
>> 
>> I think this will likely introduce flashing when navigating cross-site. This is the reason we unfroze the layer tree in didCompletePageTransition(), because we knew that at that point there was something meaningful to paint in the new drawing area.
> 
> What is causing the flashing? How can I check that? Is it mostly iOS?
> I added a timer to delay by 1 second the unfreezing and I did not see it.

I don't remember if it is for macOS or iOS. I think Antti wrote this code so I cc'd him. Maybe it is OK to drop the ProcessSwap freeze reason here since we hopefully still have the "PageTransition" freeze reason until didCompletePageTransition(). I am not sure but I am hoping Antti knows. Basically, to test this, you'd want to navigate cross-site (e.g. clicking search results links on google) and make sure you do not see flashes to white (or black if dark mode) when navigating. It may be more or less obvious depending on the sites so try a lot of links.
Comment 8 Geoffrey Garen 2020-03-19 09:30:50 PDT
Also swipe back and forward cross-site, with and without page cache disabled.
Comment 9 youenn fablet 2020-03-19 10:23:30 PDT
I tried on MacOS and it seems fine.
Will try on iOS as well.
Comment 10 Antti Koivisto 2020-03-19 10:59:09 PDT
Comment on attachment 393955 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=393955&action=review

>>>> Source/WebKit/WebProcess/WebPage/WebPage.cpp:5854
>>>> +    unfreezeLayerTree(LayerTreeFreezeReason::ProcessSwap);
>>> 
>>> I think this will likely introduce flashing when navigating cross-site. This is the reason we unfroze the layer tree in didCompletePageTransition(), because we knew that at that point there was something meaningful to paint in the new drawing area.
>> 
>> What is causing the flashing? How can I check that? Is it mostly iOS?
>> I added a timer to delay by 1 second the unfreezing and I did not see it.
> 
> I don't remember if it is for macOS or iOS. I think Antti wrote this code so I cc'd him. Maybe it is OK to drop the ProcessSwap freeze reason here since we hopefully still have the "PageTransition" freeze reason until didCompletePageTransition(). I am not sure but I am hoping Antti knows. Basically, to test this, you'd want to navigate cross-site (e.g. clicking search results links on google) and make sure you do not see flashes to white (or black if dark mode) when navigating. It may be more or less obvious depending on the sites so try a lot of links.

The problem was that we would unfreeze the initial empty document on a newly created  process, which would then paint and produce white (or black) flash. The code exist to prevent this initial empty document from unfreezing.

Is there now something else that stops this now? I would think both didCommitLoad and didCompletePageTransition do get called on initial empty document.

I suspect that all the process prewarming we do might be making the case harder to hit but we still shouldn't regress.
Comment 11 Antti Koivisto 2020-03-19 11:02:06 PDT
Moving the code should be fine in itself but maybe it still needs to test for the initial empty document?
Comment 12 youenn fablet 2020-03-20 01:25:46 PDT
> Is there now something else that stops this now? I would think both
> didCommitLoad and didCompletePageTransition do get called on initial empty
> document.

In the old code, we were checking for m_mainFrame to protect from unfreezing too early. This check was active within WebPage constructor.

WebFrameLoaderClient::dispatchDidCommitLoad is not called within WebPage constructor as FrameLoader::dispatchDidCommitLoad has a similar check to m_mainFrame (checking for creatingInitialEmptyDocument state).

This patch seems similar to the old code from that point of view.
This might be different for page cache but, doing some testing, I do not see any white flash.

A potential issue with the patch is that we delay a bit the unfreezing of the rendering tree in the new process. That could lead to a small delay in first paint.
Comment 13 youenn fablet 2020-03-20 01:44:04 PDT
Created attachment 394072 [details]
Patch
Comment 14 Antti Koivisto 2020-03-20 02:07:08 PDT
Ok, makes sense
Comment 15 Simon Fraser (smfr) 2020-03-20 09:46:19 PDT
Could we fix the title according to comment #2?
Comment 16 youenn fablet 2020-03-20 09:48:51 PDT
(In reply to Simon Fraser (smfr) from comment #15)
> Could we fix the title according to comment #2?

Do you have any suggestion?

Before the patch, both drawing areas were listening to IPC.
This patch makes it so that either one of them will listen, not both.
Comment 17 Simon Fraser (smfr) 2020-03-20 09:59:09 PDT
Comment on attachment 394072 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=394072&action=review

> Source/WebKit/ChangeLog:3
> +        On PSON, WebPageProxy listens to two DrawAreaProxy at the same time

DrawingAreaProxies

Maybe "Explicitly activate the new DrawingAreaProxy on PSON navigations"

> Source/WebKit/UIProcess/DrawingAreaProxy.h:61
> +    void activate();

This name is a bit generic. Maybe becomeActiveDrawingAreaForPage()

> Source/WebKit/WebProcess/WebPage/WebPage.cpp:5854
> +    auto& coreFrame = *frame->coreFrame();

This should be a Ref<>
Comment 18 youenn fablet 2020-03-20 10:14:27 PDT
Comment on attachment 394072 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=394072&action=review

>> Source/WebKit/ChangeLog:3
>> +        On PSON, WebPageProxy listens to two DrawAreaProxy at the same time
> 
> DrawingAreaProxies
> 
> Maybe "Explicitly activate the new DrawingAreaProxy on PSON navigations"

OK

>> Source/WebKit/UIProcess/DrawingAreaProxy.h:61
>> +    void activate();
> 
> This name is a bit generic. Maybe becomeActiveDrawingAreaForPage()

m_drawingArea->activate() reads better to me in WebPageProxy than m_drawingArea->becomeActiveDrawingAreaForPage().
Could change it to startListening or something related to starting receiving IPC messages.

>> Source/WebKit/WebProcess/WebPage/WebPage.cpp:5854
>> +    auto& coreFrame = *frame->coreFrame();
> 
> This should be a Ref<>

Let's revert this change actually.
There is no perf gain and, calling frame->coreFrame() makes sure that we will nullptr crash instead of a potential UAF.
I'll keep the ASSERT though.
Comment 19 Chris Dumez 2020-03-20 10:20:41 PDT
Comment on attachment 394072 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=394072&action=review

>>> Source/WebKit/UIProcess/DrawingAreaProxy.h:61
>>> +    void activate();
>> 
>> This name is a bit generic. Maybe becomeActiveDrawingAreaForPage()
> 
> m_drawingArea->activate() reads better to me in WebPageProxy than m_drawingArea->becomeActiveDrawingAreaForPage().
> Could change it to startListening or something related to starting receiving IPC messages.

I also don't like activate(), I am no idea what it means to "activate" a DrawingAreaProxy.
Comment 20 youenn fablet 2020-03-21 02:48:00 PDT
Created attachment 394164 [details]
Patch for landing
Comment 21 Geoffrey Garen 2020-03-21 11:08:19 PDT
Comment on attachment 394164 [details]
Patch for landing

Some name options:

startDrawing
startReceivingMessages
didInstallInPage

Side note: Is it still OK for the DrawingArea destructor to unconditionally call removeMessageReceiver(), even though it can no longer guarantee that it will call addMessageReceiver()? Maybe a better design would not assign WebPageProxy or WebProcessProxy at all in the constructor, and then this function would be called "setPage", and it would be clearer why we delay adding messages receiver until we have a webpage that we are installed into.
Comment 22 youenn fablet 2020-03-23 01:57:30 PDT
(In reply to Geoffrey Garen from comment #21)
> Comment on attachment 394164 [details]
> Patch for landing
> 
> Some name options:
> 
> startDrawing
> startReceivingMessages
> didInstallInPage
> 
> Side note: Is it still OK for the DrawingArea destructor to unconditionally
> call removeMessageReceiver(), even though it can no longer guarantee that it
> will call addMessageReceiver()?

Right, we might hit a debug assert.

> Maybe a better design would not assign
> WebPageProxy or WebProcessProxy at all in the constructor, and then this
> function would be called "setPage", and it would be clearer why we delay
> adding messages receiver until we have a webpage that we are installed into.

Sounds good.

My initial idea was to remove the concept of a provisional DrawingAreaProxy, but I decided to not go there since this is used in ViewGestureController.
Maybe some further clean-up could be done in that area as well.
Comment 23 youenn fablet 2020-03-23 02:03:26 PDT
> > Maybe a better design would not assign
> > WebPageProxy or WebProcessProxy at all in the constructor, and then this
> > function would be called "setPage", and it would be clearer why we delay
> > adding messages receiver until we have a webpage that we are installed into.
> 
> Sounds good.
> 
> My initial idea was to remove the concept of a provisional DrawingAreaProxy,
> but I decided to not go there since this is used in ViewGestureController.
> Maybe some further clean-up could be done in that area as well.

Well, we still want the DrawingAreaProxy to send some IPC messages.
For now, I'll just ensure we do not remove the listener if we did not add it in the first place.
I'll try to look at whether we can remove the concept of a provisional drawing area proxy once we know for sure this patch does not trigger any PLT/flash regression.
Comment 24 youenn fablet 2020-03-23 02:32:47 PDT
Created attachment 394247 [details]
Patch
Comment 25 EWS 2020-03-23 03:16:27 PDT
Committed r258829: <https://trac.webkit.org/changeset/258829>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 394247 [details].
Comment 26 Radar WebKit Bug Importer 2020-03-23 03:17:16 PDT
<rdar://problem/60765704>
Comment 27 Yury Semikhatsky 2020-03-23 19:09:35 PDT
Created attachment 394341 [details]
simple example

https://trac.webkit.org/r258829 broke requestAnimationFrame on GTK. In the attached example click on the button to trigger cross-domain navigation. There is expected to be alert fired from requestAnimationFrame which is not happening after the change.
Comment 28 Yury Semikhatsky 2020-03-23 19:10:15 PDT
Reopening because of the GTK breakage.
Comment 29 youenn fablet 2020-03-24 04:11:25 PDT
Maybe we should revert the patch then.
Carlos, could you take a quick look at the reason we are breaking GTK here?
Comment 30 Carlos Garcia Campos 2020-03-26 03:21:38 PDT
(In reply to youenn fablet from comment #29)
> Maybe we should revert the patch then.
> Carlos, could you take a quick look at the reason we are breaking GTK here?

Sure, I'll check.
Comment 31 Carlos Garcia Campos 2020-03-26 07:21:49 PDT
So, the web process drawing area sends an update message to the UI process before the drawing area proxy starts receiving messages and keeps waiting for the DidUpdate forever. I need to investigate more how to fix this. I thought painting was disabled in the drawing area at that point. I'll investigate tomorrow.
Comment 32 youenn fablet 2020-03-31 07:51:30 PDT
Thanks Carlos for the GTK fix, closing this bug.