Bug 187123

Summary: IndexedDB operations in a Page fail after a StorageProcess crash
Product: WebKit Reporter: Dawei Fenton (:realdawei) <realdawei>
Component: Tools / TestsAssignee: Brady Eidson <beidson>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, aestes, alecflett, ap, beidson, commit-queue, ews-watchlist, jbedard, jsbell, lforschler, ryanhaddad, thorton, tsavell, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Test order to reproduce
none
Patch
none
Patch
none
Patch
achristensen: review+, ews-watchlist: commit-queue-
Archive of layout-test-results from ews204 for win-future
none
Patch for EWS/Landing
none
Archive of layout-test-results from ews204 for win-future none

Description Dawei Fenton (:realdawei) 2018-06-27 16:19:26 PDT
There are a bunch of tests in a the imported/w3c/web-platform-tests/IndexedDB/* when when one fails the majority will fail in the same way (generally over 300 tests will fail).

Possible issue could be state being retained in the TestRunner.

Sample Run:
https://build.webkit.org/builders/Apple%20High%20Sierra%20Release%20WK2%20%28Tests%29/builds/5203/steps/layout-test/logs/stdio
Comment 1 Jonathan Bedard 2018-06-27 16:24:11 PDT
Created attachment 343766 [details]
Test order to reproduce
Comment 2 Dawei Fenton (:realdawei) 2018-06-27 17:54:32 PDT
(In reply to Jonathan Bedard from comment #1)
> Created attachment 343766 [details]
> Test order to reproduce

after further testing it turns out if these tests are performed in the same run (in any order) the test will fail.  When any one of tests is removed it will pass.

http/wpt/service-workers/persistent-importScripts.html
http/wpt/cache-storage/cache-put-keys.https.any.worker.html
imported/w3c/web-platform-tests/IndexedDB/close-in-upgradeneeded.html

sample output:
Regressions: Unexpected text-only failures (1)
  imported/w3c/web-platform-tests/IndexedDB/close-in-upgradeneeded.html [ Failure ]
Comment 3 Jonathan Bedard 2018-06-28 08:38:36 PDT
Given that we haven't seen this until recently, I would suspect that <https://trac.webkit.org/changeset/232516/webkit> is the root cause.
Comment 4 Radar WebKit Bug Importer 2018-06-29 02:31:50 PDT
<rdar://problem/41626526>
Comment 5 Brady Eidson 2018-07-05 12:35:22 PDT
Pages and contexts that use IDB (Documents, Workers, etc) do not recover well from a StorageProcess crash.

They don't crash, but they're unaware that the StorageProcess is gone and therefore their messages to do IDB stuff are lost in the ether and never answered.

I have a fix to make Pages recover from this and send contexts into a mode where requests after the storage process crash will at least error out immediately.
Comment 6 Brady Eidson 2018-07-05 12:44:24 PDT
Created attachment 344354 [details]
Patch
Comment 7 Brady Eidson 2018-07-05 12:45:02 PDT
Haven't even finished building this patch locally, but it should be final or near final. Trying EWS.
Comment 8 Brady Eidson 2018-07-05 13:30:19 PDT
(whoops forgot test expectations)
Comment 9 Brady Eidson 2018-07-05 13:38:18 PDT
Created attachment 344359 [details]
Patch
Comment 10 Alex Christensen 2018-07-05 13:45:17 PDT
Comment on attachment 344359 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=344359&action=review

> Source/WebCore/Modules/indexeddb/client/IDBConnectionToServer.cpp:67
> +void IDBConnectionToServer::callResultFunctionLater(ResultFunction function, const IDBResourceIdentifier& requestIdentifier)

I think this name should indicate that it will be called with an error.

> Source/WebCore/Modules/indexeddb/client/IDBConnectionToServer.cpp:364
> +        callOnMainThread([this, protectedThis = makeRef(*this), transactionIdentifier] {

Why can't these use callResultFunctionLater?

> Source/WebCore/Modules/indexeddb/client/IDBConnectionToServer.cpp:502
> +    if (m_serverConnectionIsValid)
> +        m_delegate->abortOpenAndUpgradeNeeded(databaseConnectionIdentifier, transactionIdentifier);

Couldn't the delegate check if its connection is valid instead of all these checks here?

> LayoutTests/storage/indexeddb/modern/opendatabase-after-storage-crash.html:49
> +		// Padding it makes this reliable.

how reliable?
Comment 11 Brady Eidson 2018-07-05 13:54:03 PDT
(In reply to Alex Christensen from comment #10)
> Comment on attachment 344359 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=344359&action=review
> 
> > Source/WebCore/Modules/indexeddb/client/IDBConnectionToServer.cpp:67
> > +void IDBConnectionToServer::callResultFunctionLater(ResultFunction function, const IDBResourceIdentifier& requestIdentifier)
> 
> I think this name should indicate that it will be called with an error.

Sure.

> 
> > Source/WebCore/Modules/indexeddb/client/IDBConnectionToServer.cpp:364
> > +        callOnMainThread([this, protectedThis = makeRef(*this), transactionIdentifier] {
> 
> Why can't these use callResultFunctionLater?

Because they have custom call signatures, not IDBResults.

I played with full lambdas, full templetization, and mixes - This was the cleanest/most compact form.

> 
> > Source/WebCore/Modules/indexeddb/client/IDBConnectionToServer.cpp:502
> > +    if (m_serverConnectionIsValid)
> > +        m_delegate->abortOpenAndUpgradeNeeded(databaseConnectionIdentifier, transactionIdentifier);
> 
> Couldn't the delegate check if its connection is valid instead of all these
> checks here?

That'd require a patch over twice as large, as we'd have to cover 2 delegates (WK1 and WK2)

> 
> > LayoutTests/storage/indexeddb/modern/opendatabase-after-storage-crash.html:49
> > +		// Padding it makes this reliable.
> 
> how reliable?

100% reliableā„¢
Comment 12 Brady Eidson 2018-07-05 13:58:13 PDT
Created attachment 344362 [details]
Patch
Comment 13 Alex Christensen 2018-07-05 14:01:43 PDT
Comment on attachment 344362 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=344362&action=review

> Source/WebCore/Modules/indexeddb/client/IDBConnectionToServer.cpp:67
> +void IDBConnectionToServer::callResultFunctionWithErrorLater(ResultFunction function, const IDBResourceIdentifier& requestIdentifier)

It could be made more compact by passing the const IDBRequestData&
Comment 14 Brady Eidson 2018-07-05 14:10:07 PDT
(In reply to Alex Christensen from comment #13)
> Comment on attachment 344362 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=344362&action=review
> 
> > Source/WebCore/Modules/indexeddb/client/IDBConnectionToServer.cpp:67
> > +void IDBConnectionToServer::callResultFunctionWithErrorLater(ResultFunction function, const IDBResourceIdentifier& requestIdentifier)
> 
> It could be made more compact by passing the const IDBRequestData&

Fewer characters in the signature, more in the function body. /me shrugs
Comment 15 EWS Watchlist 2018-07-05 15:43:41 PDT
Comment on attachment 344362 [details]
Patch

Attachment 344362 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/8449520

New failing tests:
http/tests/security/contentSecurityPolicy/userAgentShadowDOM/allow-video.html
http/tests/security/local-video-source-from-remote.html
http/tests/preload/onload_event.html
Comment 16 EWS Watchlist 2018-07-05 15:43:53 PDT
Created attachment 344375 [details]
Archive of layout-test-results from ews204 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews204  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 17 Brady Eidson 2018-07-05 19:51:15 PDT
The commit queue is in quite a state right now, that's for sure.
Comment 18 Brady Eidson 2018-07-05 19:53:24 PDT
The EWS bots were clearly just in a bad state... let's try this again.
Comment 19 Brady Eidson 2018-07-05 19:54:44 PDT
Created attachment 344397 [details]
Patch for EWS/Landing
Comment 20 EWS Watchlist 2018-07-05 21:48:01 PDT
Comment on attachment 344397 [details]
Patch for EWS/Landing

Attachment 344397 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/8452509

New failing tests:
http/tests/security/video-poster-cross-origin-crash2.html
Comment 21 EWS Watchlist 2018-07-05 21:48:13 PDT
Created attachment 344401 [details]
Archive of layout-test-results from ews204 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews204  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 22 Brady Eidson 2018-07-05 22:20:07 PDT
Nope this patch did not cause that failurw
Comment 23 WebKit Commit Bot 2018-07-05 22:48:42 PDT
Comment on attachment 344397 [details]
Patch for EWS/Landing

Clearing flags on attachment: 344397

Committed r233562: <https://trac.webkit.org/changeset/233562>
Comment 24 WebKit Commit Bot 2018-07-05 22:48:44 PDT
All reviewed patches have been landed.  Closing bug.
Comment 25 Truitt Savell 2018-07-06 10:03:04 PDT
It looks like the test storage/indexeddb/modern/opendatabase-after-storage-crash.html
is failing both with timeouts and text failures 

test history: 
https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=storage%2Findexeddb%2Fmodern%2Fopendatabase-after-storage-crash.html
Comment 26 Truitt Savell 2018-07-09 10:08:46 PDT
UPDATE:
It looks like the test storage/indexeddb/modern/opendatabase-after-storage-crash.html
is still a flakey failure, though the timeouts seem to be largely resolved. 

test history: 
https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=storage%2Findexeddb%2Fmodern%2Fopendatabase-after-storage-crash.html
Comment 27 Alexey Proskuryakov 2018-07-17 15:34:47 PDT
The test is still failing about 50% of the time, tracked in bug 187648.