Bug 222590

Summary: Maybe-regression(STP121): window.open flakily returning null
Product: WebKit Reporter: Sam Sneddon [:gsnedders] <gsnedders>
Component: FramesAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, ap, cdumez, darin, ggaren, robert, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch none

Description Sam Sneddon [:gsnedders] 2021-03-02 03:49:27 PST
e.g. see: https://wpt.fyi/results/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/creating_browsing_context_test_01.html?q=seq%28status%3Aok%7Cstatus%3Apass%20status%3Aok%7Cstatus%3Apass%20status%3Aok%7Cstatus%3Apass%20status%3A%21ok%26status%3A%21pass%20status%3A%21ok%26status%3A%21pass%20status%3A%21ok%26status%3A%21pass%20%29&run_id=5635506173902848&run_id=5722139221032960&run_id=5714889316237312&run_id=5731985836212224&run_id=5670691686842368&run_id=6262793999220736

That it reproduces 100% of the time on Azure Pipelines but only ~15% of the time locally, which makes me suspect this is some sort of race condition where CI always loses due to relatively low performance.

The code here is pretty simple:

  var currentUrl = 'http://' + window.location.host + '/common/blank.html';
  var win = window.open(currentUrl, '', 'height=1,width=1');
  this.add_cleanup(function() { win.close(); });
  win.onload = this.step_func_done(function () {
    ...

This is throwing when trying to set win.onload as win is null. It shouldn't be, it should be a WindowProxy. (Note that WPT is run with WebKitJavaScriptCanOpenWindowsAutomatically set to 1.)

We also see window.open returning null in html/browsers/windows/browsing-context-names/choose-_blank-001.html, which is even simpler:

  var window1 = window.open('about:blank', '_blank');
  var window2 = window.open('about:blank', '_blank');
  var window3 = window.open('about:blank', '_blank');
  ...
Comment 1 Radar WebKit Bug Importer 2021-03-09 03:50:15 PST
<rdar://problem/75211786>
Comment 2 Chris Dumez 2021-03-10 16:33:01 PST
The test is not flaky when run as part as of test suite it seems..
Comment 3 Chris Dumez 2021-03-10 16:36:40 PST
OK, If I load http://w3c-test.org/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/creating_browsing_context_test_01.html?label=experimental&label=master&aligned in STP 121 I do see one of the subtests failing. I also see the popup blocking notification in the URL bar so it is likely a Safari-side issue..
Comment 4 Chris Dumez 2021-03-10 16:40:30 PST
(In reply to Chris Dumez from comment #3)
> OK, If I load
> http://w3c-test.org/html/browsers/the-window-object/apis-for-creating-and-
> navigating-browsing-contexts-by-name/creating_browsing_context_test_01.
> html?label=experimental&label=master&aligned in STP 121 I do see one of the
> subtests failing. I also see the popup blocking notification in the URL bar
> so it is likely a Safari-side issue..

Actually, it was because I had popups disabled for this site. The per-site settings from my system Safari do not apply to STP it seems. Once I allowed popups on w3c-test.org, I no longer expected any issue.
Comment 5 Chris Dumez 2021-03-10 16:41:44 PST
@Sam Sneddon: How do you reproduce? (even flakily).
Comment 7 Sam Sneddon [:gsnedders] 2021-03-11 10:55:45 PST
(In reply to Chris Dumez from comment #5)
> @Sam Sneddon: How do you reproduce? (even flakily).

From a WPT checkout:

./wpt run --webdriver-binary=/Applications/Safari\ Technology\ Preview.app/Contents/MacOS/safaridriver --rerun 100 --pause-on-unexpected --log-mach - --log-mach-level info safari html/browsers/windows/browsing-context-names/choose-_blank-001.html

(assuming popups have already been allowed)
Comment 8 Chris Dumez 2021-03-15 14:51:45 PDT
Created attachment 423246 [details]
Patch
Comment 9 Darin Adler 2021-03-15 15:08:43 PDT
Comment on attachment 423246 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=423246&action=review

> Source/WebKit/ChangeLog:15
> +        DidCommitLoadForFrame IPC (as well as preceeding load-related messages to maintain relative ordering).

typo: "preceding"

I understand the benefit of setting the flag. What is the risk of setting this flag on DidCommitLoadForFrame?

> Tools/TestWebKitAPI/Tests/WebKit/ModalAlertsSPI.cpp:156
> +        WKRetainPtr<WKContextRef> context = adoptWK(WKContextCreateWithConfiguration(nullptr));

auto?

> Tools/TestWebKitAPI/Tests/WebKit/ModalAlertsSPI.cpp:165
> +        WKRetainPtr<WKStringRef> htmlString = Util::toWK("<script>open('about:blank', '_blank')</script>");

auto?

> Tools/TestWebKitAPI/Tests/WebKit/ModalAlertsSPI.cpp:166
> +        auto blankURL = adoptWK(WKURLCreateWithUTF8CString("about:blank"));

Seems unused.
Comment 10 Chris Dumez 2021-03-15 15:12:24 PDT
(In reply to Darin Adler from comment #9)
> Comment on attachment 423246 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=423246&action=review
> 
> > Source/WebKit/ChangeLog:15
> > +        DidCommitLoadForFrame IPC (as well as preceeding load-related messages to maintain relative ordering).
> 
> typo: "preceding"
> 
> I understand the benefit of setting the flag. What is the risk of setting
> this flag on DidCommitLoadForFrame?

I guess the risk is that DidCommitLoadForFrame now gets processed out of order with regards to another async IPC message.

> 
> > Tools/TestWebKitAPI/Tests/WebKit/ModalAlertsSPI.cpp:156
> > +        WKRetainPtr<WKContextRef> context = adoptWK(WKContextCreateWithConfiguration(nullptr));
> 
> auto?
> 
> > Tools/TestWebKitAPI/Tests/WebKit/ModalAlertsSPI.cpp:165
> > +        WKRetainPtr<WKStringRef> htmlString = Util::toWK("<script>open('about:blank', '_blank')</script>");
> 
> auto?
> 
> > Tools/TestWebKitAPI/Tests/WebKit/ModalAlertsSPI.cpp:166
> > +        auto blankURL = adoptWK(WKURLCreateWithUTF8CString("about:blank"));
> 
> Seems unused.
Comment 11 Chris Dumez 2021-03-15 15:13:30 PDT
Created attachment 423251 [details]
Patch
Comment 12 Geoffrey Garen 2021-03-15 15:13:54 PDT
r=me too

I guess this fix is right but it makes me think that our current design is a little sketchy.

Under what conditions will sync IPC get processed before preceding async IPC? Perhaps a better approach would be to offer an option to annotate the *sync* IPC as allowing preceding async IPC to process first. Then we could adopt that option for CreateNewPage, or maybe for all sync IPC from WebContent. It feels clearer and more future-proof to annotate the IPC that depends on preceding IPC.
Comment 13 Chris Dumez 2021-03-15 15:19:49 PDT
(In reply to Geoffrey Garen from comment #12)
> r=me too
> 
> I guess this fix is right but it makes me think that our current design is a
> little sketchy.

I 100% agree that the current design is super fragile.

> Under what conditions will sync IPC get processed before preceding async
> IPC? 

My understanding of the code is that it happens when the destination process is waiting on a sync IPC reply. To avoid deadlocks, we dispatch sync IPC when a process is waiting for a sync IPC reply. In such cases, the sync IPC message may just the line and get processes other async IPC that was sent shortly before it.

> Perhaps a better approach would be to offer an option to annotate the
> *sync* IPC as allowing preceding async IPC to process first. Then we could
> adopt that option for CreateNewPage, or maybe for all sync IPC from
> WebContent. It feels clearer and more future-proof to annotate the IPC that
> depends on preceding IPC.
Comment 14 Darin Adler 2021-03-15 17:52:22 PDT
Comment on attachment 423251 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=423251&action=review

> Tools/TestWebKitAPI/Tests/WebKit/ModalAlertsSPI.cpp:155
> +        openedWebView = nil;

Should we be nulling this out after running tests? Why leave the last WebView allocated after testing? I think we’d to do that at the end of the ModalAlertsSPI test above, and also here, we could null things out after instead of before.

Also, should this be nullptr instead of nil?
Comment 15 Chris Dumez 2021-03-15 20:04:57 PDT
Created attachment 423287 [details]
Patch
Comment 16 Chris Dumez 2021-03-16 08:57:55 PDT
Created attachment 423336 [details]
Patch
Comment 17 Chris Dumez 2021-03-16 11:52:50 PDT
Created attachment 423372 [details]
Patch
Comment 18 Geoffrey Garen 2021-03-16 14:08:13 PDT
Comment on attachment 423372 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=423372&action=review

> Source/WebKit/Platform/IPC/Connection.cpp:152
> +        // Preserve ordering my dispatching pending async messages first.

my => by
Comment 19 Chris Dumez 2021-03-16 14:09:49 PDT
(In reply to Geoffrey Garen from comment #18)
> Comment on attachment 423372 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=423372&action=review
> 
> > Source/WebKit/Platform/IPC/Connection.cpp:152
> > +        // Preserve ordering my dispatching pending async messages first.
> 
> my => by

Will fix. Sadly it looks like I still have a few layout test failures to figure out :/ This is going me quite a bit of trouble...
Comment 20 Chris Dumez 2021-03-16 16:54:26 PDT
Created attachment 423415 [details]
Patch
Comment 21 Chris Dumez 2021-03-16 18:46:25 PDT
Created attachment 423422 [details]
Patch
Comment 22 Chris Dumez 2021-03-17 08:22:39 PDT
Created attachment 423488 [details]
Patch
Comment 23 Chris Dumez 2021-03-17 09:27:42 PDT
Comment on attachment 423488 [details]
Patch

Requesting review again since I have changed the approach to match Geoff's proposal.
Comment 24 Geoffrey Garen 2021-03-17 09:43:40 PDT
Comment on attachment 423488 [details]
Patch

r=me

I... can't believe this worked? :P But I love it!

Another way to think of MaintainOrderingWithAsyncMessages is that it's a way to say "this sync IPC is actually sent as more than one message, with the final message being marked sync -- but it would be crazy to take all those messages and package them up into one giant sync message, so we use this flag instead".

I bet we can remove DispatchMessageEvenWhenWaitingForSyncReply now, and replace it with MaintainOrderingWithAsyncMessages. The benefit would be that we annotate the message that needs the behavior, rather than the unrelated messages that just get caught in the fray -- so our code becomes more precise, which hopefully makes it more robust.
Comment 25 Darin Adler 2021-03-17 10:03:07 PDT
Comment on attachment 423488 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=423488&action=review

> Source/WebKit/ChangeLog:16
> +        To address the issue, introduce a new IPC::SendSyncOption::MaintainOrderingWithAsyncMessages flag and
> +        use it on WebPageProxy::CreateNewPage sync IPC so that it gets processed in order with surrounding async
> +        messages.

Very exciting, so much better solution.

To extend and refine this in the future:

I think we should explore how far we can go in changing all sync messages to have this behavior. Would that hurt performance in any significant way? If not, I’m not sure we need sync messages that jump the line at all. Having one less wrong choice you could make could be a plus for future WebKit programmers.

Or it’s possible that all sync messages in some particular connection might want this behavior, like all from web process to UI process.

No rush to do this, but I think this could be a help long term.
Comment 26 EWS 2021-03-17 10:09:12 PDT
Committed r274565: <https://commits.webkit.org/r274565>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 423488 [details].