...
Created attachment 380807 [details] Patch
Comment on attachment 380807 [details] Patch It looks like some tests are timing out. My best guess is that some calls to RunLoop::dispatch() implicitly rely on ordering, and when we don’t use the explicit function queue, we break that ordering. I think it’s worth testing dispatch_async as the implementation of wakeUp. (We tested that once, but only with a broken patch that called isMain(), which is incorrect). That way, we preserve ordering. I’d change wakeUp to take an argument indicating whether this == s_mainRunLoop. Then, if true, the CF implementation should use dispatch_async to call the function that executes all tasks, rather than CFRunLoopWakeup. If that’s not fast enough, or it breaks things, then we should just add an inline function that calls dispatch_async or RunLoop::dispatch, depending on platform. And then call that just from IDB code.
Created attachment 380845 [details] patch - mach message Out of curiosity, I disassembled libdispatch to see what method it uses to signal the RunLoop. It looks like libdispatch uses a "version 1" RunLoop source <https://developer.apple.com/documentation/corefoundation/cfrunloopsource?language=objc>, which triggers by mach message, rather than CFRunLoopWakeup(). I've duplicated that behavior here. Can you test the performance of this approach?
(In reply to Geoffrey Garen from comment #3) (I verified that this patch passes the css3 layout tests.)
Created attachment 380846 [details] patch - mach message
Created attachment 380851 [details] patch - getUserMedia.html It looks like the iOS API test failure for TestWebKitAPI.WebKit2.CaptureMute is real. The test is racy, and its timing has changed slightly. Here's a patch that should make the test more deterministic.
(In reply to Geoffrey Garen from comment #3) > Created attachment 380845 [details] > patch - mach message > > Out of curiosity, I disassembled libdispatch to see what method it uses to > signal the RunLoop. It looks like libdispatch uses a "version 1" RunLoop > source > <https://developer.apple.com/documentation/corefoundation/ > cfrunloopsource?language=objc>, which triggers by mach message, rather than > CFRunLoopWakeup(). I've duplicated that behavior here. > > Can you test the performance of this approach? It's about 15% speed up on the test PerformanceTests/IndexedDB/basic/objectstore-get.html.
> > Can you test the performance of this approach? > > It's about 15% speed up on the test > PerformanceTests/IndexedDB/basic/objectstore-get.html. So, still slower than dispatch_async?
Oh, actually, it's expected that the patch above doesn't achieve all 20%, because it doesn't include the fix for callOnMainThread. I'll upload a patch that includes callOnMainThread and the regression test change all in one.
Created attachment 380952 [details] patch - mach message
(In reply to Geoffrey Garen from comment #9) > Oh, actually, it's expected that the patch above doesn't achieve all 20%, > because it doesn't include the fix for callOnMainThread. I'll upload a patch > that includes callOnMainThread and the regression test change all in one. I previously made the change of the callOnMainThread locally and tested. I tried with the latest patch and it seemed it's still not as fast as using dispatch_async directly. It's close though. Also I debugged the layout test failure of the dispatch_async patch. The cause is that we try to block main thread when waiting for reply of sync messages. Our implementation is: 1. Start a task on main thread 2. While not receiving reply, ask RunLoop to run. RunLoop can listen for input sources and processes data from sources(receiving and handling reply) at this time. (Thread is blocked.) 3. Continue to finish the task It looks like when we do dispatch_async, the RunLoop doesn't think it is a source and thus time out at step2.
(In reply to Sihui Liu from comment #11) > (In reply to Geoffrey Garen from comment #9) > > Oh, actually, it's expected that the patch above doesn't achieve all 20%, > > because it doesn't include the fix for callOnMainThread. I'll upload a patch > > that includes callOnMainThread and the regression test change all in one. > > I previously made the change of the callOnMainThread locally and tested. I > tried with the latest patch and it seemed it's still not as fast as using > dispatch_async directly. It's close though. > > Also I debugged the layout test failure of the dispatch_async patch. The > cause is that we try to block main thread when waiting for reply of sync > messages. Our implementation is: > 1. Start a task on main thread > 2. While not receiving reply, ask RunLoop to run. RunLoop can listen for > input sources and processes data from sources(receiving and handling reply) > at this time. (Thread is blocked.) > 3. Continue to finish the task > It looks like when we do dispatch_async, the RunLoop doesn't think it is a > source and thus time out at step2. Is this in Connection::waitForSyncReply? Or something else? I suspect that dispatch_async is not willing to execute inside a nested RunLoop. I believe that's a fundamental difference between libdispatch and CFRunLoop. (libdispatch requires all tasks to dispatch in order, and that means prohibiting re-entrant dispatch. Relatedly, libdispatch explicitly documents that dispatch_sync(dispatch_get_current_queue(), ...) is always a deadlock.) So, I guess we can't use dispatch_async. Let's try to finish up the mach_message / version 1 RunLoopSource solution, since it delivers most of the speedup. Can you do that? Outstanding issues: 1. inspector/css/pseudo-creation.html fails. It used to fail sometimes (e.g. https://build.webkit.org/builders/Apple%20High%20Sierra%20Debug%20WK1%20(Tests)/builds/11323), and now it fails all the time. So, the code has gotten better (more deterministic), but we broke a test. It looks like something about InspectorProtocol.eventHandler is racy relative to promise resolution. We need to fix this, or perhaps disable the test. 2. TestWebKitAPI.WebKitLegacy.ScrollingDoesNotPauseMedia times out. It must either time out in its call to Util::run, or its call to callOnMainThreadAndWait. I think it's probably callOnMainThreadAndWait. If so, this test has a simple bug. It calls "callOnMainThreadAndWait" *from the main thread*. I'm not sure exactly why this ever worked, but I think you can fix it by just switching to callOnMainThread.
Created attachment 381101 [details] Patch
Created attachment 381103 [details] Patch
Comment on attachment 381103 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=381103&action=review > Source/WTF/ChangeLog:9 > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. What is CFRunloopSource1? Why is this faster? If it's not the current default, what is the reason it's not and what potential downsides are there?
Comment on attachment 381103 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=381103&action=review Looks great, and thanks for fixing those tests. One comment on callOonMainThread. > Source/WTF/wtf/cocoa/MainThreadCocoa.mm:136 > - [staticMainThreadCaller performSelectorOnMainThread:@selector(call) withObject:nil waitUntilDone:NO]; > + dispatch_async(dispatch_get_main_queue(), ^{ > + [staticMainThreadCaller call]; > + }); You can remove some indirection here by deleting staticMainThreadCaller and JSWTFMainThreadCaller and just calling WTF::dispatchFunctionsFromMainThread() directly. staticMainThreadCaller and JSWTFMainThreadCaller only existed for the sake of doing the async dispatch, which you're doing a different way now. Side note: Is dispatch_async better than RunLoop::main().dispatch() in this case? Also, is it OK that callOnMainThread will not run in nested run loops?
Created attachment 381106 [details] Patch
(In reply to Sam Weinig from comment #15) > Comment on attachment 381103 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > > Source/WTF/ChangeLog:9 > > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. > > What is CFRunloopSource1? Why is this faster? If it's not the current > default, what is the reason it's not and what potential downsides are there? It's version 1 source in CFRunloop: https://developer.apple.com/documentation/corefoundation/cfrunloopsource-rhr According to our profiling on test like OpenSource/PerformanceTests/IndexedDB/basic/objectstore-get.html, we spent a lot of time on waking up the runloop when receiving IPC message. And we tried libdispatch, which uses version 1 source for dispatching to runloop, and is much faster. So we think we could probably adopt this for better performance. I don't know why version 0 source is our default..
(In reply to Geoffrey Garen from comment #16) > Comment on attachment 381103 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > Looks great, and thanks for fixing those tests. One comment on > callOonMainThread. > > > Source/WTF/wtf/cocoa/MainThreadCocoa.mm:136 > > - [staticMainThreadCaller performSelectorOnMainThread:@selector(call) withObject:nil waitUntilDone:NO]; > > + dispatch_async(dispatch_get_main_queue(), ^{ > > + [staticMainThreadCaller call]; > > + }); > > You can remove some indirection here by deleting staticMainThreadCaller and > JSWTFMainThreadCaller and just calling > WTF::dispatchFunctionsFromMainThread() directly. staticMainThreadCaller and > JSWTFMainThreadCaller only existed for the sake of doing the async dispatch, > which you're doing a different way now. > staticMainThreadCaller is used in the #if USE(WEB_THREAD) case, which is a few lines above. I would change this to WTF::dispatchFunctionsFromMainThread(). > Side note: Is dispatch_async better than RunLoop::main().dispatch() in this > case? Also, is it OK that callOnMainThread will not run in nested run loops? The reason why I didn't use RunLoop::main().dispatch() because RunLoop::main() needs to run after RunLoop::initializeMainRunLoop() and callOnMainThread does not seem to care about whether we have initialized mainRunLoop or not. Why can't callOnMainThread run in nested run loops?
Created attachment 381124 [details] Patch
(In reply to Sihui Liu from comment #18) > (In reply to Sam Weinig from comment #15) > > Comment on attachment 381103 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > > > > Source/WTF/ChangeLog:9 > > > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > > > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. > > > > What is CFRunloopSource1? Why is this faster? If it's not the current > > default, what is the reason it's not and what potential downsides are there? > > It's version 1 source in CFRunloop: > https://developer.apple.com/documentation/corefoundation/cfrunloopsource-rhr > > According to our profiling on test like > OpenSource/PerformanceTests/IndexedDB/basic/objectstore-get.html, we spent a > lot of time on waking up the runloop when receiving IPC message. And we > tried libdispatch, which uses version 1 source for dispatching to runloop, > and is much faster. So we think we could probably adopt this for better > performance. Does that mean this is not a speedup on it's own?
(In reply to Sam Weinig from comment #21) > (In reply to Sihui Liu from comment #18) > > (In reply to Sam Weinig from comment #15) > > > Comment on attachment 381103 [details] > > > Patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > > > > > > Source/WTF/ChangeLog:9 > > > > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > > > > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. > > > > > > What is CFRunloopSource1? Why is this faster? If it's not the current > > > default, what is the reason it's not and what potential downsides are there? > > > > It's version 1 source in CFRunloop: > > https://developer.apple.com/documentation/corefoundation/cfrunloopsource-rhr > > > > According to our profiling on test like > > OpenSource/PerformanceTests/IndexedDB/basic/objectstore-get.html, we spent a > > lot of time on waking up the runloop when receiving IPC message. And we > > tried libdispatch, which uses version 1 source for dispatching to runloop, > > and is much faster. So we think we could probably adopt this for better > > performance. > > Does that mean this is not a speedup on it's own? It's a speedup on it's own. The patch gives us about 15% speedup on the test we profiled (PerformanceTests/IndexedDB/basic/objectstore-get.html). It saves time by eliminating CFRunLoopWakeUps from the trace.
> staticMainThreadCaller is used in the #if USE(WEB_THREAD) case, which is a > few lines above. You can change that case to stop using staticMainThreadCaller too. > > Side note: Is dispatch_async better than RunLoop::main().dispatch() in this > > case? Also, is it OK that callOnMainThread will not run in nested run loops? > > The reason why I didn't use RunLoop::main().dispatch() because > RunLoop::main() needs to run after RunLoop::initializeMainRunLoop() and > callOnMainThread does not seem to care about whether we have initialized > mainRunLoop or not. callOnMainThread depends on initializeMainThread() to initialize staticMainThreadCaller. > Why can't callOnMainThread run in nested run loops? Because dispatch_async doesn't run in nested run loops. (I confirmed this theory at https://www.thecave.com/2015/08/10/dispatch-async-to-main-queue-doesnt-work-with-modal-window-on-mac-os-x/.) If there isn't a specific advantage to dispatch_async here, maybe RunLoop::main() is better, to avoid this minor change in behavior.
Created attachment 381180 [details] Patch
(In reply to Sihui Liu from comment #22) > (In reply to Sam Weinig from comment #21) > > (In reply to Sihui Liu from comment #18) > > > (In reply to Sam Weinig from comment #15) > > > > Comment on attachment 381103 [details] > > > > Patch > > > > > > > > View in context: > > > > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > > > > > > > > Source/WTF/ChangeLog:9 > > > > > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > > > > > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. > > > > > > > > What is CFRunloopSource1? Why is this faster? If it's not the current > > > > default, what is the reason it's not and what potential downsides are there? > > > > > > It's version 1 source in CFRunloop: > > > https://developer.apple.com/documentation/corefoundation/cfrunloopsource-rhr > > > > > > According to our profiling on test like > > > OpenSource/PerformanceTests/IndexedDB/basic/objectstore-get.html, we spent a > > > lot of time on waking up the runloop when receiving IPC message. And we > > > tried libdispatch, which uses version 1 source for dispatching to runloop, > > > and is much faster. So we think we could probably adopt this for better > > > performance. > > > > Does that mean this is not a speedup on it's own? > > It's a speedup on it's own. The patch gives us about 15% speedup on the test > we profiled (PerformanceTests/IndexedDB/basic/objectstore-get.html). It > saves time by eliminating CFRunLoopWakeUps from the trace. Very nice!
Comment on attachment 381180 [details] Patch r=me
Created attachment 381217 [details] Patch for landing
The commit-queue encountered the following flaky tests while processing attachment 381217 [details]: inspector/console/webcore-logging.html bug 203118 (authors: drousso@apple.com and eric.carlson@apple.com) The commit-queue is continuing to process your patch.
The commit-queue encountered the following flaky tests while processing attachment 381217 [details]: imported/w3c/web-platform-tests/websockets/bufferedAmount-unchanged-by-sync-xhr.any.worker.html bug 202003 (author: youennf@gmail.com) The commit-queue is continuing to process your patch.
Comment on attachment 381217 [details] Patch for landing Clearing flags on attachment: 381217 Committed r251261: <https://trac.webkit.org/changeset/251261>
All reviewed patches have been landed. Closing bug.
<rdar://problem/56387205>
It looks like the changes in https://trac.webkit.org/changeset/251261/webkit broke inspector/console/webcore-logging.html History: https://results.webkit.org/?suite=layout-tests&test=inspector%2Fconsole%2Fwebcore-logging.html Diff: --- /Volumes/Data/slave/mojave-release-tests-wk1/build/layout-test-results/inspector/console/webcore-logging-expected.txt +++ /Volumes/Data/slave/mojave-release-tests-wk1/build/layout-test-results/inspector/console/webcore-logging-actual.txt @@ -27,7 +27,9 @@ -- Running test case: Console.Logging.MediaLogging PASS: Media logging disabled. PASS: Media logging has been enabled. -PASS: Media log message should have source 'media'. +FAIL: Media log message should have source 'media'. + Expected: truthy + Actual: false -- Running test case: Console.Logging.LogAsJSONWithoutRepeat PASS: Media logging disabled. This reproduces easily by running the test in iterations. It fails on 251261 but not 251260.
This also broke inspector/runtime/getProperties.html History https://results.webkit.org/?suite=layout-tests&test=inspector%2Fruntime%2FgetProperties.html
Reverted r251261 for reason: This broke multiple tests Committed r251514: <https://trac.webkit.org/changeset/251514>
(In reply to Truitt Savell from comment #35) > Reverted r251261 for reason: > > This broke multiple tests > > Committed r251514: <https://trac.webkit.org/changeset/251514> inspector/console/webcore-logging.html was fixed by r251482 (and got reverted too), which fixed the test and made another meaningful bug fix. inspector/runtime/getProperties.html is caused by that the test file doesn't handle error case correctly. (The diff is one error console message.) scrollingcoordinator/ios/non-stable-viewport-scroll.html is flaky by its nature, as I commented in rdar://problem/56410921 and I replied in Slack. Are there any other failure caused by this patch? If yes, what are they? If no, what led this to be reverted?
(In reply to Sihui Liu from comment #36) > (In reply to Truitt Savell from comment #35) > > Reverted r251261 for reason: > > > > This broke multiple tests > > > > Committed r251514: <https://trac.webkit.org/changeset/251514> > > inspector/console/webcore-logging.html was fixed by r251482 (and got > reverted too), which fixed the test and made another meaningful bug fix. > > inspector/runtime/getProperties.html is caused by that the test file doesn't > handle error case correctly. (The diff is one error console message.) > > scrollingcoordinator/ios/non-stable-viewport-scroll.html is flaky by its > nature, as I commented in rdar://problem/56410921 and I replied in Slack. > > Are there any other failure caused by this patch? If yes, what are they? > If no, what led this to be reverted? Those are the tests that lead us to roll this patch out
> inspector/console/webcore-logging.html was fixed by r251482 (and got > reverted too), which fixed the test and made another meaningful bug fix. Sounds great! > inspector/runtime/getProperties.html is caused by that the test file doesn't > handle error case correctly. (The diff is one error console message.) Does this mean that inspector/runtime/getProperties.html does fail when this patch is applied? If so, I'd suggest updating the test to handle error or updating the expected result to include the console message when re-landing. > scrollingcoordinator/ios/non-stable-viewport-scroll.html is flaky by its > nature, as I commented in rdar://problem/56410921 and I replied in Slack. Was this test failing sometimes before this patch landed? If so, seems OK to ignore. If not, we probably need to skip this test until it gets fixed.
> inspector/runtime/getProperties.html is caused by that the test file doesn't handle error case correctly. (The diff is one error console message.) For this case, let's use if (window.internals) window.internals.settings.setUnhandledPromiseRejectionToConsoleEnabled(false); and then update the test result not to expect a console message.
> scrollingcoordinator/ios/non-stable-viewport-scroll.html is flaky by its > nature, as I commented in rdar://problem/56410921 and I replied in Slack. For this case, the test is marked as flaky, so I think it's OK to land without fixing it, since doing so won't make the bots red. (We analyzed this test a bit, and fixing it will be non-trivial. More information from <rdar://problem/56410921>: "Talked to Simon, the test is flaky without my patch. We found some graphics work is done after callback of uiController.doAfterPresentationUpdate and RenderLayerCompositor::layerTreeAsText, but layerTreeAsText should show us the final result. With my patch, some of those work (result is not reflected in the -expected.txt file) has bigger chance to finish sooner than layerTreeAsText, so the test has become more flaky." )
Seems like this patch was a significant (~11%) StyleBench perf regression: https://perf.webkit.org/v3/#/charts?since=1571129192465&paneList=((20-1649-36468600-null-(5-2.5-500))) Rollout brought the perf back where it was. This should be looked into before relanding.
(In reply to Antti Koivisto from comment #41) > Seems like this patch was a significant (~11%) StyleBench perf regression: > > https://perf.webkit.org/v3/#/charts?since=1571129192465&paneList=((20-1649- > 36468600-null-(5-2.5-500))) > > Rollout brought the perf back where it was. This should be looked into > before relanding. That's surprising! How can I run this test?
(In reply to Sihui Liu from comment #42) > (In reply to Antti Koivisto from comment #41) > > Seems like this patch was a significant (~11%) StyleBench perf regression: > > > > https://perf.webkit.org/v3/#/charts?since=1571129192465&paneList=((20-1649- > > 36468600-null-(5-2.5-500))) > > > > Rollout brought the perf back where it was. This should be looked into > > before relanding. > > That's surprising! How can I run this test? OpenSource/PerformanceTests/StyleBench/index.html
cd PerformanceTests/StyleBench python -m SimpleHTTPServer 8000 navigate to localhost:8000 Alternatively individual subtests can be run with web server by opening PerformanceTests/StyleBench/InteractiveRunner.html
> Alternatively individual subtests can be run with web server by opening *without web server
There is also Tools/Scripts/run-perf-tests <path>
Are there any show stoppers for landing this? It might help with first paint performance (type 1 sources don't prevent runloop observers from firing and so don't delay layer flushes). I don't think StyleBench regression is a show stopper, it is not a zero regression test. It sounded like it was a bug in how measurements are made we understood what was wrong.
(In reply to Antti Koivisto from comment #47) > Are there any show stoppers for landing this? It might help with first paint > performance (type 1 sources don't prevent runloop observers from firing and > so don't delay layer flushes). > > I don't think StyleBench regression is a show stopper, it is not a zero > regression test. It sounded like it was a bug in how measurements are made > we understood what was wrong. No,the we could try landing this again.
I still think we should try to commit this but I'm not sure if this will help first paint perf due to excessive batching before an observer fires (namely the one that causes layer flush to occur). As far as I can tell, it looks like servicing a type 0 source will still turn the entire run loop and fire observers. After a type 0 source is handled: 1. The GCD main queue port is polled with 0 timeout; if this succeeds, go to (5) skipping before/after waiting observers 2. Before waiting observers fired 3. wakeup/timer/source1 port set polled with a 0 timeout 4. after waiting observers fired 5. GCD main queue / wakeup / timer / source1 callback runs This patch mainly seems to optimize things by using a source1 mach port to wake up the run loop instead of CFRunLoopWakeUp. This is more efficient because this patch just sends a mach message, while CFRunLoopWakeUp both takes a lock and sends a mach message. But both before and after this patch, we can end up handling a bunch of IPC messages in a row before a run loop observer fires because of the way WTF::RunLoop::performWork batches up work.
(In reply to Ben Nham from comment #49) > I still think we should try to commit this but I'm not sure if this will > help first paint perf due to excessive batching before an observer fires > (namely the one that causes layer flush to occur). As far as I can tell, it > looks like servicing a type 0 source will still turn the entire run loop and > fire observers. It does not fire "before waiting" observers, which the CoreAnimation commit observer is: https://github.com/opensource-apple/CF/blob/master/CFRunLoop.c#L2403 https://github.com/opensource-apple/CF/blob/master/CFRunLoop.c#L2425 After a type 0 source is handled: > > 1. The GCD main queue port is polled with 0 timeout; if this succeeds, go to > (5) skipping before/after waiting observers > 2. Before waiting observers fired No, because of "if (!poll". > 3. wakeup/timer/source1 port set polled with a 0 timeout > 4. after waiting observers fired > 5. GCD main queue / wakeup / timer / source1 callback runs > > This patch mainly seems to optimize things by using a source1 mach port to > wake up the run loop instead of CFRunLoopWakeUp. This is more efficient > because this patch just sends a mach message, while CFRunLoopWakeUp both > takes a lock and sends a mach message. But both before and after this patch, > we can end up handling a bunch of IPC messages in a row before a run loop > observer fires because of the way WTF::RunLoop::performWork batches up work.
I see, I missed that. So the hypothesis is basically that an IPC that causes a layout to occur in a source 0 callback, which then calls flushLayers, and then CA's auto-commit observer doesn't run as soon as it should because of this source 0 behavior?
(In reply to Ben Nham from comment #51) > I see, I missed that. So the hypothesis is basically that an IPC that causes > a layout to occur in a source 0 callback, which then calls flushLayers, and > then CA's auto-commit observer doesn't run as soon as it should because of > this source 0 behavior? More simply put, a flood of IPC messages can postpone rendering for many event loop cycles.
Created attachment 390927 [details] Patch
Created attachment 391575 [details] Patch
Created attachment 391705 [details] Patch
Created attachment 401478 [details] Patch - just the RunLoop parts Let's see if we can land the refactoring part of this patch without the behavior change, to shrink the diff and help move to just one function queue.
Comment on attachment 401478 [details] Patch - just the RunLoop parts Looks green. Do we need a new bug for this change?
Comment on attachment 401478 [details] Patch - just the RunLoop parts View in context: https://bugs.webkit.org/attachment.cgi?id=401478&action=review > Source/WTF/ChangeLog:3 > + Use CFRunLoopSource1 for faster task dispatch The tittle does not describe this patch.
Comment on attachment 401478 [details] Patch - just the RunLoop parts Yeah, I'll obsolete this patch and post in a separate bug.
Refactoring patch is now posted @ https://bugs.webkit.org/show_bug.cgi?id=213043.
Created attachment 414072 [details] Patch
Comment on attachment 414072 [details] Patch Re-based patch.
Created attachment 414073 [details] Patch
Note using version 1 source may cause issue in our test runner, details in <rdar://problem/59637925>.
Created attachment 414266 [details] Patch
Created attachment 414303 [details] Patch
Created attachment 414710 [details] microbenchmark Uploading a microbenchmark for posterity. Interestingly, the fastest case for V0 and V1 is about equal. But on average V0 is 2X slower; and in the worst case, it's 5X slower, causing a full 2ms of latency!
Comment on attachment 414303 [details] Patch Tests pass and microbenchmark confirms a speedup. So, I think this is ready for review.
Committed r270132: <https://trac.webkit.org/changeset/270132> All reviewed patches have been landed. Closing bug and clearing flags on attachment 414303 [details].
Reverted r270132 for reason: Caused at least one regression test failure (webkit.org/b/219401) Committed r270311: <https://trac.webkit.org/changeset/270311>
(In reply to Ryan Haddad from comment #70) > Caused at least one regression test failure (webkit.org/b/219401) It appears that these WPT worklet tests that had become flaky on iOS are OK again after the revert, so these were also affected: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&suite=layout-tests&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Fanimation-worklet-csp.https.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Fanimation-worklet-referrer.https.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Flayout-worklet-referrer.https.html https://build.webkit.org/results/Apple-iOS-14-Simulator-Release-WK2-Tests/r270300%20(1074)/results.html
Created attachment 415939 [details] Patch
Previous test failures should be resolved by: https://bugs.webkit.org/show_bug.cgi?id=219717 https://bugs.webkit.org/show_bug.cgi?id=219656
Committed r270661: <https://trac.webkit.org/changeset/270661> All reviewed patches have been landed. Closing bug and clearing flags on attachment 415939 [details].
Looks like there are still more failures/timeouts because of this change. The failures are a bit difficult to enumerate because some are flaky, but here are the ones that clearly start with this change: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&test=fast%2Ftext%2Fcanvas-color-fonts%2FCOLR.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-001.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-003.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-001.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-002.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-003.html
Two more that appear to be iOS specific: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&test=imported%2Fw3c%2Fweb-platform-tests%2Fworkers%2Fmodules%2Fshared-worker-import-csp.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Flayout-worklet-csp.https.html
Another: https://results.webkit.org/?suite=layout-tests&test=inspector%2Fformatting%2Fsource-map-javascript-2.html
Reverted r270661 for reason: Caused layout test failures and timeouts Committed r270746: <https://trac.webkit.org/changeset/270746>
(In reply to Ryan Haddad from comment #78) > Reverted r270661 for reason: > > Caused layout test failures and timeouts Here is one link that shows all the ones I've found so far: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&test=fast%2Ftext%2Fcanvas-color-fonts%2FCOLR.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-001.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-003.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-001.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-002.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-003.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworkers%2Fmodules%2Fshared-worker-import-csp.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Flayout-worklet-csp.https.html&test=inspector%2Fformatting%2Fsource-map-javascript-2.html&test=inspector%2Fruntime%2Fparse.html
Comment on attachment 415939 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415939&action=review > Source/WTF/wtf/cf/RunLoopCF.cpp:76 > + mach_msg_header_t header; > + header.msgh_bits = MACH_MSGH_BITS(MACH_MSG_TYPE_COPY_SEND, 0); > + header.msgh_size = sizeof(mach_msg_header_t); > + header.msgh_remote_port = CFMachPortGetPort(m_port.get()); > + header.msgh_local_port = MACH_PORT_NULL; > + header.msgh_id = 0; > + mach_msg_return_t result = mach_msg(&header, MACH_SEND_MSG | MACH_SEND_TIMEOUT, header.msgh_size, 0, MACH_PORT_NULL, 0, MACH_PORT_NULL); > + RELEASE_ASSERT(result == MACH_MSG_SUCCESS || result == MACH_SEND_TIMED_OUT); > + if (result == MACH_SEND_TIMED_OUT) > + mach_msg_destroy(&header); Could this be simplified by using CFMessagePort instead of CFMachPort?
> Could this be simplified by using CFMessagePort instead of CFMachPort? Unfortunately, CFMessagePortSendRequest still locks a pthread_mutex_t before sending, and pthread_mutex_t is most of the overhead we are trying to avoid.
(In reply to Geoffrey Garen from comment #81) > > Could this be simplified by using CFMessagePort instead of CFMachPort? > > Unfortunately, CFMessagePortSendRequest still locks a pthread_mutex_t before > sending, and pthread_mutex_t is most of the overhead we are trying to avoid. Ah, got it, thanks! If there is a little bit of flakiness associated with this patch, could this possibly be addressed by dispatching off a run loop observer for the main thread run loop, in order to make the ordering deterministic?
> If there is a little bit of flakiness associated with this patch, could this > possibly be addressed by dispatching off a run loop observer for the main > thread run loop, in order to make the ordering deterministic? I had the same thought! :P I do think that approach *could* work. I spent a few days trying to make that work, but I ended up with too many test failures. It's possible that I did not succeed in making behavior more deterministic; or maybe I succeeded in making behavior more deterministic, but deterministically failing. One gotcha is that run loop observers don't run at *exactly* the same point as run loop sources. Another gotcha is that the relative order of run loop observers is different in the case where you first call CFRunLoopRun() vs the case where you were already sleeping inside CFRunLoopRun(). My next move is to A/B test to see if the change is worth it. If it is, I think I can make things more deterministic by standardizing how we schedule tasks. (I started this in https://trac.webkit.org/changeset/271493/webkit.)