Bug 237160 - REGRESSION(r290356-r290351?): [ iOS EWS ] 3 imported/w3c/web-platform-tests/service-workers/service-worker/* tests are constant text failures.
Summary: REGRESSION(r290356-r290351?): [ iOS EWS ] 3 imported/w3c/web-platform-tests/s...
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
Keywords: InRadar
: 237158 237161 237162 (view as bug list)
Depends on:
Reported: 2022-02-24 14:39 PST by Dawn Morningstar
Modified: 2022-06-23 15:33 PDT (History)
12 users (show)

See Also:

expectations (1.64 KB, patch)
2022-02-24 15:03 PST, Dawn Morningstar
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
WIP patch (18.33 KB, patch)
2022-03-02 12:55 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (38.38 KB, patch)
2022-03-02 14:43 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (38.27 KB, patch)
2022-03-02 14:51 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (38.29 KB, patch)
2022-03-03 07:21 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (2.53 KB, patch)
2022-03-17 01:16 PDT, youenn fablet
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dawn Morningstar 2022-02-24 14:39:49 PST

Comment 1 Radar WebKit Bug Importer 2022-02-24 14:40:16 PST
Comment 2 Dawn Morningstar 2022-02-24 14:43:22 PST
Failure was reproducible at iOS15 Production TOT - r290435.

Regression range appears to be r290356-r290351.
Comment 3 Dawn Morningstar 2022-02-24 14:50:40 PST
Marking expect
Comment 4 Dawn Morningstar 2022-02-24 14:52:42 PST
(In reply to Matteo Flores from comment #3)
> Marking expect

Marking expectations since these tests are affecting ews negatively.*
Comment 5 Dawn Morningstar 2022-02-24 15:03:11 PST
Created attachment 453143 [details]
Comment 6 Robert Jenner 2022-02-24 15:08:24 PST
Comment on attachment 453143 [details]

Clearing flags on attachment: 453143

Committed r290469 (247768@trunk): <https://commits.webkit.org/247768@trunk>
Comment 7 Robert Jenner 2022-02-24 15:09:07 PST
*** Bug 237161 has been marked as a duplicate of this bug. ***
Comment 8 Robert Jenner 2022-02-24 15:09:26 PST
*** Bug 237162 has been marked as a duplicate of this bug. ***
Comment 9 Robert Jenner 2022-02-24 15:10:58 PST
Since all three of these appear to go together, I have marked my bugs as duplicates. Though, expectations were still landed on them here:



So when this is resolved, those expectations will need to be removed.
Comment 10 youenn fablet 2022-03-02 08:21:49 PST
The only meaningful revision that could be related is https://trac.webkit.org/changeset/290352/webkit
Comment 11 Chris Dumez 2022-03-02 08:26:07 PST
Diff 1:
--- /Volumes/Data/worker/ios-simulator-15-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/service-workers/service-worker/clients-matchall-client-types.https-expected.txt
+++ /Volumes/Data/worker/ios-simulator-15-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/service-workers/service-worker/clients-matchall-client-types.https-actual.txt
@@ -1,4 +1,4 @@
 PASS Verify matchAll() with window client type
-FAIL Verify matchAll() with {window, sharedworker, worker} client types promise_test: Unhandled rejection with value: object "TypeError: null is not an object (evaluating 'w.port.onmessage = _ => resolve(w)')"
+FAIL Verify matchAll() with {window, sharedworker, worker} client types assert_equals: result count expected 1 but got 0

Diff 2:
--- /Volumes/Data/worker/ios-simulator-15-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/service-workers/service-worker/partitioned-service-worker.tentative.https-expected.txt
+++ /Volumes/Data/worker/ios-simulator-15-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/service-workers/service-worker/partitioned-service-worker.tentative.https-actual.txt
@@ -1,6 +1,6 @@
 The 3p iframe's postMessage:
-3p iframe's has_pending: false source: From3pFrame. 1p iframe's has_pending: true source: From1pFrame
+3p iframe's has_pending: false source: From3pFrame.
-PASS Services workers under different top-level sites are partitioned.
+FAIL Services workers under different top-level sites are partitioned. assert_equals: The 1p frames were serviced by different service workers. expected 0.2187869892862967 but got 0.9404173978803455

Diff 3:
--- /Volumes/Data/worker/ios-simulator-15-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/service-workers/service-worker/partitioned-service-worker-getRegistrations.tentative.https-expected.txt
+++ /Volumes/Data/worker/ios-simulator-15-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/service-workers/service-worker/partitioned-service-worker-getRegistrations.tentative.https-actual.txt
@@ -1,4 +1,4 @@
 This test loads a SW in a first-party context and gets the SW's (randomly) generated ID. It does the same thing for the SW but in a third-party context and then confirms that the IDs are different.
-PASS ServiceWorker's getRegistration() is partitioned
+FAIL ServiceWorker's getRegistration() is partitioned assert_true: 1p SW didn't shutdown expected true got false
Comment 12 Chris Dumez 2022-03-02 08:26:42 PST
The first diff doesn't really look like a regression but the other 2 look more concerning. I will investigate.
Comment 13 Chris Dumez 2022-03-02 08:39:09 PST
FYI, if you go to:

in Shipping Safari (on macOS), you'll see that they were already flaky. They fail on first load then succeed on reload (or if the service worker is already running).

I think my patch at r290352 just made this more obvious and we're now failing more consistently. Still worth investigating though. I think this means we're failing to implement part of the spec and somehow we were getting lucky and still passing in test in the context of our layout tests.
Comment 14 Chris Dumez 2022-03-02 10:44:29 PST
Ok, so here is what seems to be happening (I looked at partitioned-service-worker-getRegistrations.tentative.https.html but I am almost certain the issue is the same for the other one).

The test relies on an ID js variable that lives inside the service worker. The test expects this ID to not change during the execution of the test.

The test initially launches a service worker for its origin. Thanks to my change in r290352, we now run this service worker in the same WebProcess as the test (better for perf and memory usage).
The issue is that the test then loads another origin (in an iframe and in a new window). As a result, WebProcessProxy::didStartProvisionalLoadForMainFrame(), we determine that the WebContent process is no longer "origin-clean" (it has loaded more than one eTLD+1). When this happens, we have logic to terminate the service worker that's inside the current process and relaunch it is a new "origin-clean" process.

Because we relaunch the service worker, the ID variable changes and it confuses the test.

Technically, we could say that the test is wrong to assume the service worker will not exit/relaunch during the execution of the test, since service workers can famously be killed at almost any point.
However, the test is relying on FetchEvent.waitUntil() (with a promise that it doesn't resolve until the very end of the test). Per the specification, I believe we're not supposed to exit service workers that have pending events.

So while r290352 made the issue obvious, this is not really a regression from r290352. We were getting lucky before and the service worker was launching in another WebContent process (likely a dedicated one) from the beginning. As a result, the service worker would not relaunch when the test would start loading different origins.
Comment 15 Chris Dumez 2022-03-02 12:55:06 PST
Created attachment 453650 [details]
WIP patch
Comment 16 Chris Dumez 2022-03-02 14:43:03 PST
Created attachment 453662 [details]
Comment 17 Chris Dumez 2022-03-02 14:51:18 PST
Created attachment 453663 [details]
Comment 18 youenn fablet 2022-03-02 23:55:06 PST
Comment on attachment 453663 [details]

For shared workers, I would tend to use a different process:
- this is slightly better from a security point of view.
- web sites may start to rely on the 10 seconds delay for shared worker which might become flaky behavior.
- we do not have the same launch speed requirement for shared workers as we have for service workers.
Comment 19 Chris Dumez 2022-03-03 07:03:36 PST
Comment on attachment 453663 [details]

Let’s discuss the shared workers separately.
Comment 20 EWS 2022-03-03 07:04:40 PST
Tools/Scripts/svn-apply failed to apply attachment 453663 [details] to trunk.
Please resolve the conflicts and upload a new patch.
Comment 21 Chris Dumez 2022-03-03 07:21:43 PST
Created attachment 453733 [details]
Comment 22 EWS 2022-03-03 08:10:47 PST
Committed r290776 (248020@main): <https://commits.webkit.org/248020@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 453733 [details].
Comment 23 Truitt Savell 2022-03-15 10:58:55 PDT
After the changes in https://github.com/WebKit/WebKit/commit/2fe065c2c83cfd99bcad7214e74ac72c6a49a4e6

two of the tests are now working again. but imported/w3c/web-platform-tests/service-workers/service-worker/fetch-request-no-freshness-headers.https.html is now broken on Mac out and still partially flaky on iOS


--- /Volumes/Data/worker/bigsur-release-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/service-workers/service-worker/fetch-request-no-freshness-headers.https-expected.txt
+++ /Volumes/Data/worker/bigsur-release-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/service-workers/service-worker/fetch-request-no-freshness-headers.https-actual.txt
@@ -1,3 +1,3 @@
-PASS The headers of FetchEvent shouldn't contain freshness headers.
+FAIL The headers of FetchEvent shouldn't contain freshness headers. assert_false: if-none-match header must not be set in the FetchEvent's request. (url = https://localhost:9443/service-workers/service-worker/resources/fetch-request-no-freshness-headers-script.py) expected false got true
Comment 24 youenn fablet 2022-03-17 01:03:03 PDT
Alone the issue does not reproduce for LayoutTests/imported/w3c/web-platform-tests/service-
But it reproduces when running LayoutTests/imported/w3c/web-platform-tests/service-
workers/service-worker/fetch-request-no-freshness-headers.https.html and imported/w3c/web-platform-tests/service-workers/service-worker/fetch-frame-resource.https.html together.

When run alone, the service worker is in the same process as the test page and the same memory cache singleton is used. In that case, because we have a check that service worker identifiers do not match, we do not reuse the memory cache.

When the two tests are run, the service worker is in its own process and the memory cache kicks in for the web page. This adds the If-None-Match header before sending it to the service worker which will see this header.
Comment 25 youenn fablet 2022-03-17 01:16:16 PDT
Reopening to attach new patch.
Comment 26 youenn fablet 2022-03-17 01:16:20 PDT
Created attachment 454939 [details]
Comment 27 EWS 2022-03-18 08:09:55 PDT
Committed r291485 (248599@main): <https://commits.webkit.org/248599@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 454939 [details].
Comment 28 Brent Fulgham 2022-06-23 15:33:57 PDT
*** Bug 237158 has been marked as a duplicate of this bug. ***