RESOLVED FIXED 211080
Stop using waiting for a BinarySemaphore on the main thread in the NetworkProcess
https://bugs.webkit.org/show_bug.cgi?id=211080
Summary Stop using waiting for a BinarySemaphore on the main thread in the NetworkPro...
Alex Christensen
Reported 2020-04-27 09:33:51 PDT
Stop using waiting for a BinarySemaphore on the main thread in the NetworkProcess
Attachments
Patch (4.27 KB, patch)
2020-04-27 09:36 PDT, Alex Christensen
no flags
Patch (12.35 KB, patch)
2020-04-27 13:43 PDT, Alex Christensen
cdumez: review+
Alex Christensen
Comment 1 2020-04-27 09:36:57 PDT
Alex Christensen
Comment 2 2020-04-27 09:37:00 PDT
Chris Dumez
Comment 3 2020-04-27 09:45:05 PDT
Comment on attachment 397690 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=397690&action=review > Source/WebKit/ChangeLog:9 > + There was an out-of-date comment suggesting we needed to use a semaphore, but we've since added these in the destructor: Yes, when this code was written, this store lived in the UIProcess, not the network process. Seems have changed a lot since.
Chris Dumez
Comment 4 2020-04-27 12:30:56 PDT
------------------------------ Test suite failed Failed TestWebKitAPI.ResourceLoadStatistics.ChildProcessesNotLaunchedDatabase /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:448 Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] Actual: false Expected: true TestWebKitAPI.ResourceLoadStatistics.ShouldNotGrandfatherOnStartupDatabase _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL. /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:330 Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] Actual: false Expected: true
Alex Christensen
Comment 5 2020-04-27 12:32:08 PDT
When running an individual test, everything's fine. When I run multiple tests, I see lots of these: ERROR: SQLite database failed to set journal_mode to WAL, error: database is locked ./platform/sql/SQLiteDatabase.cpp(176) : void WebCore::SQLiteDatabase::useWALJournalMode() ERROR: SQLite database failed to checkpoint: database is locked ./platform/sql/SQLiteDatabase.cpp(186) : void WebCore::SQLiteDatabase::useWALJournalMode() ERROR: SQLite database failed to checkpoint: disk I/O error ./platform/sql/SQLiteDatabase.cpp(186) : void WebCore::SQLiteDatabase::useWALJournalMode() ASSERTION FAILED: resetResult == 0
Chris Dumez
Comment 6 2020-04-27 12:34:05 PDT
(In reply to Chris Dumez from comment #4) > ------------------------------ > Test suite failed > Failed > TestWebKitAPI.ResourceLoadStatistics.ChildProcessesNotLaunchedDatabase > > > /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/ > TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:448 > Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] > Actual: false > Expected: true > > > TestWebKitAPI.ResourceLoadStatistics.ShouldNotGrandfatherOnStartupDatabase > _RegisterApplication(), FAILED TO establish the default connection > to the WindowServer, _CGSDefaultConnection() is NULL. > > > /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/ > TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:330 > Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] > Actual: false > Expected: true Interestingly, the 2 corresponding API tests that do not use the Database backend (ResourceLoadStatistics.ChildProcessesNotLaunched & ResourceLoadStatistics.ShouldNotGrandfatherOnStartup) do not fail, so the failure seems specific to using the database backend for ITP.
Chris Dumez
Comment 7 2020-04-27 12:37:01 PDT
(In reply to Alex Christensen from comment #5) > When running an individual test, everything's fine. When I run multiple > tests, I see lots of these: > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > locked > ./platform/sql/SQLiteDatabase.cpp(176) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ERROR: SQLite database failed to checkpoint: database is locked > ./platform/sql/SQLiteDatabase.cpp(186) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ERROR: SQLite database failed to checkpoint: disk I/O error > ./platform/sql/SQLiteDatabase.cpp(186) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ASSERTION FAILED: resetResult == 0 useWALJournalMode() gets called when opening a SQLite Database. This may indicate that we are trying to open a database file that is still open (or wasn't closed properly). Given that the API test that fail use the SQLite backend, this is likely related. If I had to bet, I would say that the previous code was making sure that the database file was properly flushed/closed before something else was happening, and that's not the case anymore after your change.
Chris Dumez
Comment 8 2020-04-27 12:41:42 PDT
(In reply to Chris Dumez from comment #7) > (In reply to Alex Christensen from comment #5) > > When running an individual test, everything's fine. When I run multiple > > tests, I see lots of these: > > > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > > locked > > ./platform/sql/SQLiteDatabase.cpp(176) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ERROR: SQLite database failed to checkpoint: database is locked > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ERROR: SQLite database failed to checkpoint: disk I/O error > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ASSERTION FAILED: resetResult == 0 > > > useWALJournalMode() gets called when opening a SQLite Database. This may > indicate that we are trying to open a database file that is still open (or > wasn't closed properly). Given that the API test that fail use the SQLite > backend, this is likely related. If I had to bet, I would say that the > previous code was making sure that the database file was properly > flushed/closed before something else was happening, and that's not the case > anymore after your change. When NetworkSession::setResourceLoadStatisticsEnabled() gets called (and it likely gets called during testing to toggle ITP), it destroys/reconstructs the WebResourceLoadStatisticsStore. setResourceLoadStatisticsEnabled(false) would call NetworkSession::destroyResourceLoadStatistics() which would call WebResourceLoadStatisticsStore::didDestroyNetworkSession() which would trigger the code you modified. Then if the test calls setResourceLoadStatisticsEnabled(true) right away, it will try to reopen the same DB file without synchronization (because the new WorkQueue will be used).
Alex Christensen
Comment 9 2020-04-27 12:44:09 PDT
Ah, I can probably make that test-only function use a CompletionHandler to fix it.
Chris Dumez
Comment 10 2020-04-27 12:45:07 PDT
(In reply to Chris Dumez from comment #8) > (In reply to Chris Dumez from comment #7) > > (In reply to Alex Christensen from comment #5) > > > When running an individual test, everything's fine. When I run multiple > > > tests, I see lots of these: > > > > > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > > > locked > > > ./platform/sql/SQLiteDatabase.cpp(176) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ERROR: SQLite database failed to checkpoint: database is locked > > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ERROR: SQLite database failed to checkpoint: disk I/O error > > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ASSERTION FAILED: resetResult == 0 > > > > > > useWALJournalMode() gets called when opening a SQLite Database. This may > > indicate that we are trying to open a database file that is still open (or > > wasn't closed properly). Given that the API test that fail use the SQLite > > backend, this is likely related. If I had to bet, I would say that the > > previous code was making sure that the database file was properly > > flushed/closed before something else was happening, and that's not the case > > anymore after your change. > > When NetworkSession::setResourceLoadStatisticsEnabled() gets called (and it > likely gets called during testing to toggle ITP), it destroys/reconstructs > the WebResourceLoadStatisticsStore. > setResourceLoadStatisticsEnabled(false) would call > NetworkSession::destroyResourceLoadStatistics() which would call > WebResourceLoadStatisticsStore::didDestroyNetworkSession() which would > trigger the code you modified. Then if the test calls > setResourceLoadStatisticsEnabled(true) right away, it will try to reopen the > same DB file without synchronization (because the new WorkQueue will be > used). A few early fix ideas: 1. Make NetworkSession::setResourceLoadStatisticsEnabled() async and take a completion handler that only gets called after flushing / closing has happened on the background queue 2. Use a single WorkQueue for all WebResourceLoadStatisticsStore objects instead of one per WebResourceLoadStatisticsStore (so things get serialized properly even after constructing the WebResourceLoadStatisticsStore) 3. Add some synchronization in NetworkSession::setResourceLoadStatisticsEnabled()
Alex Christensen
Comment 11 2020-04-27 12:49:35 PDT
recreateResourceLoadStatisticStore was the problem. Adding CompletionHandler...
Chris Dumez
Comment 12 2020-04-27 12:55:53 PDT
(In reply to Alex Christensen from comment #11) > recreateResourceLoadStatisticStore was the problem. Adding > CompletionHandler... Yes, a completion handler would work. I may be a bit intrusive as it may require a lot of piping the completion handler through (maybe even though IPC and SPI). Note that option 2 I suggested earlier may be less work (We'd have a serial ITP WorkQueue on the NetworkProcess and use that for all WebResourceLoadStatisticsStore objects). This way things would get serialized nicely (flushing+close then re-opening).
Alex Christensen
Comment 13 2020-04-27 13:43:58 PDT
Chris Dumez
Comment 14 2020-04-27 13:58:06 PDT
Comment on attachment 397729 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=397729&action=review r=me but let's wait for EWS to be green to land. > Source/WebKit/NetworkProcess/NetworkProcess.cpp:270 > + networkSession.destroyResourceLoadStatistics([callbackAggregator = callbackAggregator.copyRef()] { }); Yes, this is probably a good idea to do.
Alex Christensen
Comment 15 2020-04-27 15:33:46 PDT
Alex Christensen
Comment 16 2020-04-28 11:10:45 PDT
This change is suspected to cause iOS crashes like those in API tests in https://build.webkit.org/builders/Apple%20iOS%2013%20Simulator%20Release%20WK2%20%28Tests%29/builds/4034 Investigating now.
Alex Christensen
Comment 17 2020-04-28 12:01:17 PDT
False alarm. Those tests were caused by https://trac.webkit.org/changeset/260764/webkit which is unrelated and fixed in http://trac.webkit.org/r260836 which is not needed for this fix.
Chris Dumez
Comment 18 2020-04-29 14:34:18 PDT
This introduced a pretty bad regression unfortunately: https://bugs.webkit.org/show_bug.cgi?id=211207
Note You need to log in before you can comment on or make changes to this bug.