WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
211080
Stop using waiting for a BinarySemaphore on the main thread in the NetworkProcess
https://bugs.webkit.org/show_bug.cgi?id=211080
Summary
Stop using waiting for a BinarySemaphore on the main thread in the NetworkPro...
Alex Christensen
Reported
2020-04-27 09:33:51 PDT
Stop using waiting for a BinarySemaphore on the main thread in the NetworkProcess
Attachments
Patch
(4.27 KB, patch)
2020-04-27 09:36 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(12.35 KB, patch)
2020-04-27 13:43 PDT
,
Alex Christensen
cdumez
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Alex Christensen
Comment 1
2020-04-27 09:36:57 PDT
Created
attachment 397690
[details]
Patch
Alex Christensen
Comment 2
2020-04-27 09:37:00 PDT
<
rdar://problem/62377357
>
Chris Dumez
Comment 3
2020-04-27 09:45:05 PDT
Comment on
attachment 397690
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397690&action=review
> Source/WebKit/ChangeLog:9 > + There was an out-of-date comment suggesting we needed to use a semaphore, but we've since added these in the destructor:
Yes, when this code was written, this store lived in the UIProcess, not the network process. Seems have changed a lot since.
Chris Dumez
Comment 4
2020-04-27 12:30:56 PDT
------------------------------ Test suite failed Failed TestWebKitAPI.ResourceLoadStatistics.ChildProcessesNotLaunchedDatabase /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:448 Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] Actual: false Expected: true TestWebKitAPI.ResourceLoadStatistics.ShouldNotGrandfatherOnStartupDatabase _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL. /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:330 Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] Actual: false Expected: true
Alex Christensen
Comment 5
2020-04-27 12:32:08 PDT
When running an individual test, everything's fine. When I run multiple tests, I see lots of these: ERROR: SQLite database failed to set journal_mode to WAL, error: database is locked ./platform/sql/SQLiteDatabase.cpp(176) : void WebCore::SQLiteDatabase::useWALJournalMode() ERROR: SQLite database failed to checkpoint: database is locked ./platform/sql/SQLiteDatabase.cpp(186) : void WebCore::SQLiteDatabase::useWALJournalMode() ERROR: SQLite database failed to checkpoint: disk I/O error ./platform/sql/SQLiteDatabase.cpp(186) : void WebCore::SQLiteDatabase::useWALJournalMode() ASSERTION FAILED: resetResult == 0
Chris Dumez
Comment 6
2020-04-27 12:34:05 PDT
(In reply to Chris Dumez from
comment #4
)
> ------------------------------ > Test suite failed > Failed > TestWebKitAPI.ResourceLoadStatistics.ChildProcessesNotLaunchedDatabase > > > /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/ > TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:448 > Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] > Actual: false > Expected: true > > > TestWebKitAPI.ResourceLoadStatistics.ShouldNotGrandfatherOnStartupDatabase > _RegisterApplication(), FAILED TO establish the default connection > to the WindowServer, _CGSDefaultConnection() is NULL. > > > /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/ > TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:330 > Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] > Actual: false > Expected: true
Interestingly, the 2 corresponding API tests that do not use the Database backend (ResourceLoadStatistics.ChildProcessesNotLaunched & ResourceLoadStatistics.ShouldNotGrandfatherOnStartup) do not fail, so the failure seems specific to using the database backend for ITP.
Chris Dumez
Comment 7
2020-04-27 12:37:01 PDT
(In reply to Alex Christensen from
comment #5
)
> When running an individual test, everything's fine. When I run multiple > tests, I see lots of these: > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > locked > ./platform/sql/SQLiteDatabase.cpp(176) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ERROR: SQLite database failed to checkpoint: database is locked > ./platform/sql/SQLiteDatabase.cpp(186) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ERROR: SQLite database failed to checkpoint: disk I/O error > ./platform/sql/SQLiteDatabase.cpp(186) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ASSERTION FAILED: resetResult == 0
useWALJournalMode() gets called when opening a SQLite Database. This may indicate that we are trying to open a database file that is still open (or wasn't closed properly). Given that the API test that fail use the SQLite backend, this is likely related. If I had to bet, I would say that the previous code was making sure that the database file was properly flushed/closed before something else was happening, and that's not the case anymore after your change.
Chris Dumez
Comment 8
2020-04-27 12:41:42 PDT
(In reply to Chris Dumez from
comment #7
)
> (In reply to Alex Christensen from
comment #5
) > > When running an individual test, everything's fine. When I run multiple > > tests, I see lots of these: > > > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > > locked > > ./platform/sql/SQLiteDatabase.cpp(176) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ERROR: SQLite database failed to checkpoint: database is locked > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ERROR: SQLite database failed to checkpoint: disk I/O error > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ASSERTION FAILED: resetResult == 0 > > > useWALJournalMode() gets called when opening a SQLite Database. This may > indicate that we are trying to open a database file that is still open (or > wasn't closed properly). Given that the API test that fail use the SQLite > backend, this is likely related. If I had to bet, I would say that the > previous code was making sure that the database file was properly > flushed/closed before something else was happening, and that's not the case > anymore after your change.
When NetworkSession::setResourceLoadStatisticsEnabled() gets called (and it likely gets called during testing to toggle ITP), it destroys/reconstructs the WebResourceLoadStatisticsStore. setResourceLoadStatisticsEnabled(false) would call NetworkSession::destroyResourceLoadStatistics() which would call WebResourceLoadStatisticsStore::didDestroyNetworkSession() which would trigger the code you modified. Then if the test calls setResourceLoadStatisticsEnabled(true) right away, it will try to reopen the same DB file without synchronization (because the new WorkQueue will be used).
Alex Christensen
Comment 9
2020-04-27 12:44:09 PDT
Ah, I can probably make that test-only function use a CompletionHandler to fix it.
Chris Dumez
Comment 10
2020-04-27 12:45:07 PDT
(In reply to Chris Dumez from
comment #8
)
> (In reply to Chris Dumez from
comment #7
) > > (In reply to Alex Christensen from
comment #5
) > > > When running an individual test, everything's fine. When I run multiple > > > tests, I see lots of these: > > > > > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > > > locked > > > ./platform/sql/SQLiteDatabase.cpp(176) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ERROR: SQLite database failed to checkpoint: database is locked > > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ERROR: SQLite database failed to checkpoint: disk I/O error > > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ASSERTION FAILED: resetResult == 0 > > > > > > useWALJournalMode() gets called when opening a SQLite Database. This may > > indicate that we are trying to open a database file that is still open (or > > wasn't closed properly). Given that the API test that fail use the SQLite > > backend, this is likely related. If I had to bet, I would say that the > > previous code was making sure that the database file was properly > > flushed/closed before something else was happening, and that's not the case > > anymore after your change. > > When NetworkSession::setResourceLoadStatisticsEnabled() gets called (and it > likely gets called during testing to toggle ITP), it destroys/reconstructs > the WebResourceLoadStatisticsStore. > setResourceLoadStatisticsEnabled(false) would call > NetworkSession::destroyResourceLoadStatistics() which would call > WebResourceLoadStatisticsStore::didDestroyNetworkSession() which would > trigger the code you modified. Then if the test calls > setResourceLoadStatisticsEnabled(true) right away, it will try to reopen the > same DB file without synchronization (because the new WorkQueue will be > used).
A few early fix ideas: 1. Make NetworkSession::setResourceLoadStatisticsEnabled() async and take a completion handler that only gets called after flushing / closing has happened on the background queue 2. Use a single WorkQueue for all WebResourceLoadStatisticsStore objects instead of one per WebResourceLoadStatisticsStore (so things get serialized properly even after constructing the WebResourceLoadStatisticsStore) 3. Add some synchronization in NetworkSession::setResourceLoadStatisticsEnabled()
Alex Christensen
Comment 11
2020-04-27 12:49:35 PDT
recreateResourceLoadStatisticStore was the problem. Adding CompletionHandler...
Chris Dumez
Comment 12
2020-04-27 12:55:53 PDT
(In reply to Alex Christensen from
comment #11
)
> recreateResourceLoadStatisticStore was the problem. Adding > CompletionHandler...
Yes, a completion handler would work. I may be a bit intrusive as it may require a lot of piping the completion handler through (maybe even though IPC and SPI). Note that option 2 I suggested earlier may be less work (We'd have a serial ITP WorkQueue on the NetworkProcess and use that for all WebResourceLoadStatisticsStore objects). This way things would get serialized nicely (flushing+close then re-opening).
Alex Christensen
Comment 13
2020-04-27 13:43:58 PDT
Created
attachment 397729
[details]
Patch
Chris Dumez
Comment 14
2020-04-27 13:58:06 PDT
Comment on
attachment 397729
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397729&action=review
r=me but let's wait for EWS to be green to land.
> Source/WebKit/NetworkProcess/NetworkProcess.cpp:270 > + networkSession.destroyResourceLoadStatistics([callbackAggregator = callbackAggregator.copyRef()] { });
Yes, this is probably a good idea to do.
Alex Christensen
Comment 15
2020-04-27 15:33:46 PDT
http://trac.webkit.org/r260791
Alex Christensen
Comment 16
2020-04-28 11:10:45 PDT
This change is suspected to cause iOS crashes like those in API tests in
https://build.webkit.org/builders/Apple%20iOS%2013%20Simulator%20Release%20WK2%20%28Tests%29/builds/4034
Investigating now.
Alex Christensen
Comment 17
2020-04-28 12:01:17 PDT
False alarm. Those tests were caused by
https://trac.webkit.org/changeset/260764/webkit
which is unrelated and fixed in
http://trac.webkit.org/r260836
which is not needed for this fix.
Chris Dumez
Comment 18
2020-04-29 14:34:18 PDT
This introduced a pretty bad regression unfortunately:
https://bugs.webkit.org/show_bug.cgi?id=211207
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug