Bug 183787

Summary: ScrollViewInsetTests.RestoreInitialContentOffsetAfterCrash API test is failing with async delegates
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: WebCore Misc.Assignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, commit-queue, dbates, ews-watchlist, japhet, rniwa, ryanhaddad, webkit-bug-importer, wenson_hsieh
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 183886, 184206, 184210    
Bug Blocks: 180568    
Attachments:
Description Flags
Patch
none
Patch
none
Archive of layout-test-results from ews107 for mac-sierra-wk2
none
Archive of layout-test-results from ews122 for ios-simulator-wk2
none
Patch
none
Patch cdumez: commit-queue+

Description Chris Dumez 2018-03-20 11:47:44 PDT
DataInteractionTests.AdditionalLinkAndImageIntoContentEditable API test is failing with async delegates.
Comment 2 Chris Dumez 2018-03-20 11:51:21 PDT
I see this error which could be related:
2018-03-20 11:50:30.865 TestWebKitAPI[34074:5841571] Encountered error: Error Domain=WKErrorDomain Code=4 "A JavaScript exception occurred" UserInfo={WKJavaScriptExceptionLineNumber=1, WKJavaScriptExceptionMessage=TypeError: null is not an object (evaluating 'editor.querySelector('a').href'), WKJavaScriptExceptionColumnNumber=26, WKJavaScriptExceptionSourceURL=file:///Volumes/Data/WebKit/OpenSource/WebKitBuild/Release-iphonesimulator/TestWebKitAPI.resources/selected-text-image-link-and-editable.html, NSLocalizedDescription=A JavaScript exception occurred} while evaluating script: editor.querySelector('a').href
Comment 3 Wenson Hsieh 2018-03-20 11:52:45 PDT
(In reply to Chris Dumez from comment #2)
> I see this error which could be related:
> 2018-03-20 11:50:30.865 TestWebKitAPI[34074:5841571] Encountered error:
> Error Domain=WKErrorDomain Code=4 "A JavaScript exception occurred"
> UserInfo={WKJavaScriptExceptionLineNumber=1,
> WKJavaScriptExceptionMessage=TypeError: null is not an object (evaluating
> 'editor.querySelector('a').href'), WKJavaScriptExceptionColumnNumber=26,
> WKJavaScriptExceptionSourceURL=file:///Volumes/Data/WebKit/OpenSource/
> WebKitBuild/Release-iphonesimulator/TestWebKitAPI.resources/selected-text-
> image-link-and-editable.html, NSLocalizedDescription=A JavaScript exception
> occurred} while evaluating script: editor.querySelector('a').href

That is because we fail to perform the drop, so when test subsequently queries the DOM and expects a dropped anchor element, we throw an exception.
Comment 4 Chris Dumez 2018-03-20 19:24:19 PDT
Potentially timing related.

Adding [webView waitForNextPresentationUpdate] calls after the navigation has completed and before the drag and drop actually starts seems to fix most API tests.

I need to make both the navigation AND the response policy delegates synchronous again to get the API tests to pass again. Making only one of those 2 synchronous does not fix the tests.
Comment 5 Chris Dumez 2018-03-20 19:37:35 PDT
Looking at ScrollViewInsetTests.RestoreInitialContentOffsetAfterCrash test that seems simpler and appears to be failing for the same reason:

In the success case:
RemoteLayerTreeDrawingArea::flushLayers()
RemoteLayerTreeDrawingArea::flushLayers() m_isFlushingSuspended
0x7f918900f018 - WebPageProxy::didCommitLoadForFrame()
PageClientImpl::didCommitLoadForMainFrame()
_didCommitLoadForMainFrame sets _firstPaintAfterCommitLoadTransactionID & _needsResetViewStateAfterCommitLoadForMainFrame
RemoteLayerTreeDrawingArea::flushLayers()
0x7f918900f018 - WebPageProxy::didFinishLoadForFrame()
PASS ScrollViewInsetTests.RestoreInitialContentOffsetAfterCrash

In the error case:
Failure:
RemoteLayerTreeDrawingArea::flushLayers()
RemoteLayerTreeDrawingArea::flushLayers() sends CommitLayer tree IPC
RemoteLayerTreeDrawingAreaProxy::commitLayerTree()
RemoteLayerTreeDrawingArea::flushLayers()
RemoteLayerTreeDrawingArea::flushLayers() m_waitingForBackingStoreSwap
0x7fa90b80ae18 - WebPageProxy::didCommitLayerTree()
PageClientImpl::didCommitLayerTree()
_didCommitLayerTree
0x7fa90b80ae18 - WebPageProxy::didCommitLoadForFrame()
PageClientImpl::didCommitLoadForMainFrame()
_didCommitLoadForMainFrame sets _firstPaintAfterCommitLoadTransactionID & _needsResetViewStateAfterCommitLoadForMainFrame
0x7fa90b80ae18 - WebPageProxy::didFinishLoadForFrame()
RemoteLayerTreeDrawingArea::flushLayers()
FAIL ScrollViewInsetTests.RestoreInitialContentOffsetAfterCrash

The test stores scrollView.contentOffset right after the load and expects scrollView.contentOffset.y to be -400. However, it is -299 after my change.
scrollView.contentOffset gets updated in WKWebView._didCommitLayerTree.

Note that in the success case, _didCommitLayerTree does not get called before the test finishes running. Therefore, the test was truly storing the *initial* contentOffset in initialContentOffset. However, with the async policy delegates, _didCommitLayerTree gets called and overwrites scrollView.contentOffset *before* didFinishNavigation is called. As a result, the test stores the updated contentOffset in initialContentOffset and therefore fails.

This looks pretty racy to me.
Comment 6 Chris Dumez 2018-03-20 20:34:22 PDT
So overall, the issue seems to be that we're now committing the layer tree during the navigation policy decision:
policyChecker().checkNavigationPolicy() BEGIN
RemoteLayerTreeDrawingArea::flushLayers()
RemoteLayerTreeDrawingArea::flushLayers() sends CommitLayer tree IPC
RemoteLayerTreeDrawingArea::flushLayers()
RemoteLayerTreeDrawingArea::flushLayers() m_waitingForBackingStoreSwap
policyChecker().checkNavigationPolicy() END
Comment 7 Chris Dumez 2018-03-20 20:45:49 PDT
(In reply to Chris Dumez from comment #6)
> So overall, the issue seems to be that we're now committing the layer tree
> during the navigation policy decision:
> policyChecker().checkNavigationPolicy() BEGIN
> RemoteLayerTreeDrawingArea::flushLayers()
> RemoteLayerTreeDrawingArea::flushLayers() sends CommitLayer tree IPC
> RemoteLayerTreeDrawingArea::flushLayers()
> RemoteLayerTreeDrawingArea::flushLayers() m_waitingForBackingStoreSwap
> policyChecker().checkNavigationPolicy() END

This happens because when we create the view, it creates a WebPage which ends up calling RemoteLayerTreeDrawingArea::scheduleCompositingLayerFlush() before we've started any real load:
1   0x10ee13853 WebKit::RemoteLayerTreeDrawingArea::scheduleCompositingLayerFlush()
2   0x11aaf565d WebCore::RenderLayerCompositor::attachRootLayer(WebCore::RenderLayerCompositor::RootLayerAttachment)
3   0x11aaf061a WebCore::RenderLayerCompositor::ensureRootLayer()
4   0x11aacbbf8 WebCore::RenderLayerCompositor::updateCompositingLayers(WebCore::CompositingUpdateType, WebCore::RenderLayer*)
5   0x11a2f6e90 WebCore::Document::resolveStyle(WebCore::Document::ResolveStyleType)
6   0x11a2f8720 WebCore::Document::createRenderTree()
7   0x11a2f876b WebCore::Document::didBecomeCurrentDocumentInFrame()
8   0x11a77b02b WebCore::Frame::setDocument(WTF::RefPtr<WebCore::Document, WTF::DumbPtrTraits<WebCore::Document> >&&)
9   0x11a6a9864 WebCore::DocumentWriter::begin(WebCore::URL const&, bool, WebCore::Document*)
10  0x11a69711b WebCore::DocumentLoader::commitData(char const*, unsigned long)
11  0x11a696973 WebCore::DocumentLoader::finishedLoading()
12  0x11a69d809 WebCore::DocumentLoader::maybeLoadEmpty()
13  0x11a69dab1 WebCore::DocumentLoader::startLoadingMainResource()
14  0x11a6b2566 WebCore::FrameLoader::init()
15  0x10eebd6a1 WebKit::WebFrame::createWithCoreMainFrame(WebKit::WebPage*, WebCore::Frame*)
16  0x10eef12fd WebKit::WebPage::WebPage(unsigned long long, WebKit::WebPageCreationParameters&&)
17  0x10eef03b8 WebKit::WebPage::create(unsigned long long, WebKit::WebPageCreationParameters&&)
18  0x10ef9f5cf WebKit::WebProcess::createWebPage(unsigned long long, WebKit::WebPageCreationParameters&&)

This schedules a timer that calls flushLayers() after a short amount of time. Now that the navigation policy decision is async, this timer can fire during the policy decision and we thus end up committing the layer tree for about:blank, before we've actually started loading the test URL.
Comment 8 Chris Dumez 2018-03-20 22:21:58 PDT
Ok, so I believe the difference in behavior may come from the fact that when the navigation policy delegate is synchronous:
1. about:blank is in the view, there is a timer started to flush layers.
2. We do the synchronous navigation policy for the test page
3. We start provisional load which calls WebFrameLoaderClient::provisionalLoadStarted() which calls didStartPageTransition() which freezes the layer tree.
So when the timer to flush layers fires during the provisional load, we return early because the layer tree is frozen.

When the navigation policy delegate is asynchronous. The timer to flush layers fires during the policy decision, before we've frozen the layer tree (because provisional has not started yet). As a result, we commit the layer tree for about:blank.

I'll experiment tomorrow with freezing the layer tree earlier, when we start the navigation policy decision, to see if this restores previous behavior.
Comment 9 Chris Dumez 2018-03-20 22:27:31 PDT
(In reply to Chris Dumez from comment #8)
> Ok, so I believe the difference in behavior may come from the fact that when
> the navigation policy delegate is synchronous:
> 1. about:blank is in the view, there is a timer started to flush layers.
> 2. We do the synchronous navigation policy for the test page
> 3. We start provisional load which calls
> WebFrameLoaderClient::provisionalLoadStarted() which calls
> didStartPageTransition() which freezes the layer tree.
> So when the timer to flush layers fires during the provisional load, we
> return early because the layer tree is frozen.
> 
> When the navigation policy delegate is asynchronous. The timer to flush
> layers fires during the policy decision, before we've frozen the layer tree
> (because provisional has not started yet). As a result, we commit the layer
> tree for about:blank.
> 
> I'll experiment tomorrow with freezing the layer tree earlier, when we start
> the navigation policy decision, to see if this restores previous behavior.

Yep, looks like this is it. If I freeze the layer tree before the navigation policy decision instead of after, then it restores previous behavior and the API tests start passing again. I'll work on a patch tomorrow.
Comment 10 Chris Dumez 2018-03-21 09:48:01 PDT
Created attachment 336201 [details]
Patch
Comment 11 Wenson Hsieh 2018-03-21 09:59:10 PDT
Comment on attachment 336201 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=336201&action=review

> Source/WebKit/WebProcess/WebCoreSupport/WebFrameLoaderClient.cpp:904
> +    if (!m_frame->isMainFrame())

This seems to be the case, but just wanted to make sure — it's not possible for m_frame->isMainFrame() to be true in WebFrameLoaderClient::dispatchDecidePolicyForNavigationAction but false in WebFrameLoaderClient::didDecidePolicyForNavigationAction, correct?

> Tools/TestWebKitAPI/Tests/ios/ScrollViewInsetTests.mm:36
> +static bool navigationComplete = false;

Nit - I think we now generally avoid using static state variables in API tests if possible (though many older tests still do this). Could we fold this into AsyncPolicyDelegateForInsetTest?
Comment 12 Chris Dumez 2018-03-21 10:08:45 PDT
Created attachment 336202 [details]
Patch
Comment 13 EWS Watchlist 2018-03-21 11:22:21 PDT
Comment on attachment 336202 [details]
Patch

Attachment 336202 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/7053708

New failing tests:
svg/as-image/svg-image-with-data-uri-reloading.html
fast/shadow-dom/clear-active-state-in-shadow.html
Comment 14 EWS Watchlist 2018-03-21 11:22:22 PDT
Created attachment 336212 [details]
Archive of layout-test-results from ews107 for mac-sierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-sierra-wk2  Platform: Mac OS X 10.12.6
Comment 15 EWS Watchlist 2018-03-21 11:46:31 PDT
Comment on attachment 336202 [details]
Patch

Attachment 336202 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/7053757

New failing tests:
svg/as-image/svg-image-with-data-uri-reloading.html
Comment 16 EWS Watchlist 2018-03-21 11:46:33 PDT
Created attachment 336214 [details]
Archive of layout-test-results from ews122 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews122  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.6
Comment 17 Chris Dumez 2018-03-21 12:08:04 PDT
Created attachment 336219 [details]
Patch
Comment 18 Chris Dumez 2018-03-21 13:05:15 PDT
Created attachment 336224 [details]
Patch
Comment 19 WebKit Commit Bot 2018-03-21 14:19:11 PDT
Comment on attachment 336224 [details]
Patch

Clearing flags on attachment: 336224

Committed r229828: <https://trac.webkit.org/changeset/229828>
Comment 20 WebKit Commit Bot 2018-03-21 14:19:13 PDT
All reviewed patches have been landed.  Closing bug.
Comment 21 Radar WebKit Bug Importer 2018-03-21 14:20:42 PDT
<rdar://problem/38722914>
Comment 22 Ryan Haddad 2018-03-21 17:10:10 PDT
It appears that this may have caused an API test timeout on iOS:

Tests that timed out:
  WebKit.NoHistoryItemScrollToFragment

https://build.webkit.org/builders/Apple%20iOS%2011%20Simulator%20Release%20WK2%20%28Tests%29/builds/3857
Comment 23 Chris Dumez 2018-03-21 18:50:26 PDT
(In reply to Ryan Haddad from comment #22)
> It appears that this may have caused an API test timeout on iOS:
> 
> Tests that timed out:
>   WebKit.NoHistoryItemScrollToFragment
> 
> https://build.webkit.org/builders/
> Apple%20iOS%2011%20Simulator%20Release%20WK2%20%28Tests%29/builds/3857

Looking now.
Comment 24 Chris Dumez 2018-03-21 18:52:15 PDT
(In reply to Chris Dumez from comment #23)
> (In reply to Ryan Haddad from comment #22)
> > It appears that this may have caused an API test timeout on iOS:
> > 
> > Tests that timed out:
> >   WebKit.NoHistoryItemScrollToFragment
> > 
> > https://build.webkit.org/builders/
> > Apple%20iOS%2011%20Simulator%20Release%20WK2%20%28Tests%29/builds/3857
> 
> Looking now.

via https://bugs.webkit.org/show_bug.cgi?id=183886