Summary: | Stop using waiting for a BinarySemaphore on the main thread in the NetworkProcess | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Alex Christensen <achristensen> | ||||||
Component: | New Bugs | Assignee: | Alex Christensen <achristensen> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Normal | CC: | cdumez, darin, webkit-bug-importer | ||||||
Priority: | P2 | Keywords: | InRadar | ||||||
Version: | WebKit Nightly Build | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Bug Depends on: | 211207 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Alex Christensen
2020-04-27 09:33:51 PDT
Created attachment 397690 [details]
Patch
Comment on attachment 397690 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=397690&action=review > Source/WebKit/ChangeLog:9 > + There was an out-of-date comment suggesting we needed to use a semaphore, but we've since added these in the destructor: Yes, when this code was written, this store lived in the UIProcess, not the network process. Seems have changed a lot since. ------------------------------ Test suite failed Failed TestWebKitAPI.ResourceLoadStatistics.ChildProcessesNotLaunchedDatabase /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:448 Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] Actual: false Expected: true TestWebKitAPI.ResourceLoadStatistics.ShouldNotGrandfatherOnStartupDatabase _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL. /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:330 Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] Actual: false Expected: true When running an individual test, everything's fine. When I run multiple tests, I see lots of these: ERROR: SQLite database failed to set journal_mode to WAL, error: database is locked ./platform/sql/SQLiteDatabase.cpp(176) : void WebCore::SQLiteDatabase::useWALJournalMode() ERROR: SQLite database failed to checkpoint: database is locked ./platform/sql/SQLiteDatabase.cpp(186) : void WebCore::SQLiteDatabase::useWALJournalMode() ERROR: SQLite database failed to checkpoint: disk I/O error ./platform/sql/SQLiteDatabase.cpp(186) : void WebCore::SQLiteDatabase::useWALJournalMode() ASSERTION FAILED: resetResult == 0 (In reply to Chris Dumez from comment #4) > ------------------------------ > Test suite failed > Failed > TestWebKitAPI.ResourceLoadStatistics.ChildProcessesNotLaunchedDatabase > > > /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/ > TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:448 > Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] > Actual: false > Expected: true > > > TestWebKitAPI.ResourceLoadStatistics.ShouldNotGrandfatherOnStartupDatabase > _RegisterApplication(), FAILED TO establish the default connection > to the WindowServer, _CGSDefaultConnection() is NULL. > > > /Volumes/Data/worker/macOS-Mojave-Release-Build-EWS/build/Tools/ > TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:330 > Value of: [message isEqualToString:@"PopulatedWithoutGrandfathering"] > Actual: false > Expected: true Interestingly, the 2 corresponding API tests that do not use the Database backend (ResourceLoadStatistics.ChildProcessesNotLaunched & ResourceLoadStatistics.ShouldNotGrandfatherOnStartup) do not fail, so the failure seems specific to using the database backend for ITP. (In reply to Alex Christensen from comment #5) > When running an individual test, everything's fine. When I run multiple > tests, I see lots of these: > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > locked > ./platform/sql/SQLiteDatabase.cpp(176) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ERROR: SQLite database failed to checkpoint: database is locked > ./platform/sql/SQLiteDatabase.cpp(186) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ERROR: SQLite database failed to checkpoint: disk I/O error > ./platform/sql/SQLiteDatabase.cpp(186) : void > WebCore::SQLiteDatabase::useWALJournalMode() > ASSERTION FAILED: resetResult == 0 useWALJournalMode() gets called when opening a SQLite Database. This may indicate that we are trying to open a database file that is still open (or wasn't closed properly). Given that the API test that fail use the SQLite backend, this is likely related. If I had to bet, I would say that the previous code was making sure that the database file was properly flushed/closed before something else was happening, and that's not the case anymore after your change. (In reply to Chris Dumez from comment #7) > (In reply to Alex Christensen from comment #5) > > When running an individual test, everything's fine. When I run multiple > > tests, I see lots of these: > > > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > > locked > > ./platform/sql/SQLiteDatabase.cpp(176) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ERROR: SQLite database failed to checkpoint: database is locked > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ERROR: SQLite database failed to checkpoint: disk I/O error > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > WebCore::SQLiteDatabase::useWALJournalMode() > > ASSERTION FAILED: resetResult == 0 > > > useWALJournalMode() gets called when opening a SQLite Database. This may > indicate that we are trying to open a database file that is still open (or > wasn't closed properly). Given that the API test that fail use the SQLite > backend, this is likely related. If I had to bet, I would say that the > previous code was making sure that the database file was properly > flushed/closed before something else was happening, and that's not the case > anymore after your change. When NetworkSession::setResourceLoadStatisticsEnabled() gets called (and it likely gets called during testing to toggle ITP), it destroys/reconstructs the WebResourceLoadStatisticsStore. setResourceLoadStatisticsEnabled(false) would call NetworkSession::destroyResourceLoadStatistics() which would call WebResourceLoadStatisticsStore::didDestroyNetworkSession() which would trigger the code you modified. Then if the test calls setResourceLoadStatisticsEnabled(true) right away, it will try to reopen the same DB file without synchronization (because the new WorkQueue will be used). Ah, I can probably make that test-only function use a CompletionHandler to fix it. (In reply to Chris Dumez from comment #8) > (In reply to Chris Dumez from comment #7) > > (In reply to Alex Christensen from comment #5) > > > When running an individual test, everything's fine. When I run multiple > > > tests, I see lots of these: > > > > > > ERROR: SQLite database failed to set journal_mode to WAL, error: database is > > > locked > > > ./platform/sql/SQLiteDatabase.cpp(176) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ERROR: SQLite database failed to checkpoint: database is locked > > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ERROR: SQLite database failed to checkpoint: disk I/O error > > > ./platform/sql/SQLiteDatabase.cpp(186) : void > > > WebCore::SQLiteDatabase::useWALJournalMode() > > > ASSERTION FAILED: resetResult == 0 > > > > > > useWALJournalMode() gets called when opening a SQLite Database. This may > > indicate that we are trying to open a database file that is still open (or > > wasn't closed properly). Given that the API test that fail use the SQLite > > backend, this is likely related. If I had to bet, I would say that the > > previous code was making sure that the database file was properly > > flushed/closed before something else was happening, and that's not the case > > anymore after your change. > > When NetworkSession::setResourceLoadStatisticsEnabled() gets called (and it > likely gets called during testing to toggle ITP), it destroys/reconstructs > the WebResourceLoadStatisticsStore. > setResourceLoadStatisticsEnabled(false) would call > NetworkSession::destroyResourceLoadStatistics() which would call > WebResourceLoadStatisticsStore::didDestroyNetworkSession() which would > trigger the code you modified. Then if the test calls > setResourceLoadStatisticsEnabled(true) right away, it will try to reopen the > same DB file without synchronization (because the new WorkQueue will be > used). A few early fix ideas: 1. Make NetworkSession::setResourceLoadStatisticsEnabled() async and take a completion handler that only gets called after flushing / closing has happened on the background queue 2. Use a single WorkQueue for all WebResourceLoadStatisticsStore objects instead of one per WebResourceLoadStatisticsStore (so things get serialized properly even after constructing the WebResourceLoadStatisticsStore) 3. Add some synchronization in NetworkSession::setResourceLoadStatisticsEnabled() recreateResourceLoadStatisticStore was the problem. Adding CompletionHandler... (In reply to Alex Christensen from comment #11) > recreateResourceLoadStatisticStore was the problem. Adding > CompletionHandler... Yes, a completion handler would work. I may be a bit intrusive as it may require a lot of piping the completion handler through (maybe even though IPC and SPI). Note that option 2 I suggested earlier may be less work (We'd have a serial ITP WorkQueue on the NetworkProcess and use that for all WebResourceLoadStatisticsStore objects). This way things would get serialized nicely (flushing+close then re-opening). Created attachment 397729 [details]
Patch
Comment on attachment 397729 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=397729&action=review r=me but let's wait for EWS to be green to land. > Source/WebKit/NetworkProcess/NetworkProcess.cpp:270 > + networkSession.destroyResourceLoadStatistics([callbackAggregator = callbackAggregator.copyRef()] { }); Yes, this is probably a good idea to do. This change is suspected to cause iOS crashes like those in API tests in https://build.webkit.org/builders/Apple%20iOS%2013%20Simulator%20Release%20WK2%20%28Tests%29/builds/4034 Investigating now. False alarm. Those tests were caused by https://trac.webkit.org/changeset/260764/webkit which is unrelated and fixed in http://trac.webkit.org/r260836 which is not needed for this fix. This introduced a pretty bad regression unfortunately: https://bugs.webkit.org/show_bug.cgi?id=211207 |