202874 – Use a Version 1 CFRunLoopSource for faster task dispatch

REOPENED 202874

Use a Version 1 CFRunLoopSource for faster task dispatch

https://bugs.webkit.org/show_bug.cgi?id=202874

Summary Use a Version 1 CFRunLoopSource for faster task dispatch

Sihui Liu

Reported 2019-10-11 19:10:30 PDT

...

Attachments
Patch (4.63 KB, patch) 2019-10-11 19:22 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
patch - mach message (2.62 KB, patch) 2019-10-13 08:48 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
patch - mach message (2.76 KB, patch) 2019-10-13 08:52 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
patch - getUserMedia.html (3.04 KB, patch) 2019-10-13 14:20 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
patch - mach message (8.26 KB, patch) 2019-10-14 20:26 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
Patch (14.05 KB, patch) 2019-10-16 13:12 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (13.75 KB, patch) 2019-10-16 13:27 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (13.76 KB, patch) 2019-10-16 14:11 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (13.77 KB, patch) 2019-10-16 15:31 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (17.71 KB, patch) 2019-10-17 01:30 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch for landing (17.39 KB, patch) 2019-10-17 12:58 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (8.09 KB, patch) 2020-02-17 09:59 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (10.82 KB, patch) 2020-02-24 14:19 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (10.79 KB, patch) 2020-02-25 17:10 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch - just the RunLoop parts (5.23 KB, patch) 2020-06-09 14:57 PDT, Geoffrey Garen	koivisto: review+ koivisto: commit-queue-	Details Formatted Diff Diff
Patch (4.19 KB, patch) 2020-11-13 11:54 PST, Geoffrey Garen	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (4.37 KB, patch) 2020-11-13 12:00 PST, Geoffrey Garen	no flags	Details Formatted Diff Diff
Patch (7.82 KB, patch) 2020-11-16 12:17 PST, Geoffrey Garen	no flags	Details Formatted Diff Diff
Patch (5.93 KB, patch) 2020-11-16 20:40 PST, Geoffrey Garen	no flags	Details Formatted Diff Diff
microbenchmark (40.26 KB, application/zip) 2020-11-20 13:28 PST, Geoffrey Garen	no flags	Details
Patch (4.75 KB, patch) 2020-12-10 15:35 PST, Geoffrey Garen	no flags	Details Formatted Diff Diff
Show Obsolete (18) View All Add attachment proposed patch, testcase, etc.

Sihui Liu

Comment 1 2019-10-11 19:22:52 PDT

Created attachment 380807 [details] Patch

Geoffrey Garen

Comment 2 2019-10-12 11:36:10 PDT

Comment on attachment 380807 [details] Patch It looks like some tests are timing out. My best guess is that some calls to RunLoop::dispatch() implicitly rely on ordering, and when we don’t use the explicit function queue, we break that ordering. I think it’s worth testing dispatch_async as the implementation of wakeUp. (We tested that once, but only with a broken patch that called isMain(), which is incorrect). That way, we preserve ordering. I’d change wakeUp to take an argument indicating whether this == s_mainRunLoop. Then, if true, the CF implementation should use dispatch_async to call the function that executes all tasks, rather than CFRunLoopWakeup. If that’s not fast enough, or it breaks things, then we should just add an inline function that calls dispatch_async or RunLoop::dispatch, depending on platform. And then call that just from IDB code.

Geoffrey Garen

Comment 3 2019-10-13 08:48:05 PDT

Created attachment 380845 [details] patch - mach message Out of curiosity, I disassembled libdispatch to see what method it uses to signal the RunLoop. It looks like libdispatch uses a "version 1" RunLoop source <https://developer.apple.com/documentation/corefoundation/cfrunloopsource?language=objc>, which triggers by mach message, rather than CFRunLoopWakeup(). I've duplicated that behavior here. Can you test the performance of this approach?

Geoffrey Garen

Comment 4 2019-10-13 08:48:55 PDT

(In reply to Geoffrey Garen from comment #3) (I verified that this patch passes the css3 layout tests.)

Geoffrey Garen

Comment 5 2019-10-13 08:52:03 PDT

Created attachment 380846 [details] patch - mach message

Geoffrey Garen

Comment 6 2019-10-13 14:20:42 PDT

Created attachment 380851 [details] patch - getUserMedia.html It looks like the iOS API test failure for TestWebKitAPI.WebKit2.CaptureMute is real. The test is racy, and its timing has changed slightly. Here's a patch that should make the test more deterministic.

Sihui Liu

Comment 7 2019-10-13 23:35:16 PDT

(In reply to Geoffrey Garen from comment #3) > Created attachment 380845 [details] > patch - mach message > > Out of curiosity, I disassembled libdispatch to see what method it uses to > signal the RunLoop. It looks like libdispatch uses a "version 1" RunLoop > source > <https://developer.apple.com/documentation/corefoundation/ > cfrunloopsource?language=objc>, which triggers by mach message, rather than > CFRunLoopWakeup(). I've duplicated that behavior here. > > Can you test the performance of this approach? It's about 15% speed up on the test PerformanceTests/IndexedDB/basic/objectstore-get.html.

Geoffrey Garen

Comment 8 2019-10-14 09:42:25 PDT

> > Can you test the performance of this approach? > > It's about 15% speed up on the test > PerformanceTests/IndexedDB/basic/objectstore-get.html. So, still slower than dispatch_async?

Geoffrey Garen

Comment 9 2019-10-14 20:24:49 PDT

Oh, actually, it's expected that the patch above doesn't achieve all 20%, because it doesn't include the fix for callOnMainThread. I'll upload a patch that includes callOnMainThread and the regression test change all in one.

Geoffrey Garen

Comment 10 2019-10-14 20:26:46 PDT

Created attachment 380952 [details] patch - mach message

Sihui Liu

Comment 11 2019-10-15 00:49:17 PDT

(In reply to Geoffrey Garen from comment #9) > Oh, actually, it's expected that the patch above doesn't achieve all 20%, > because it doesn't include the fix for callOnMainThread. I'll upload a patch > that includes callOnMainThread and the regression test change all in one. I previously made the change of the callOnMainThread locally and tested. I tried with the latest patch and it seemed it's still not as fast as using dispatch_async directly. It's close though. Also I debugged the layout test failure of the dispatch_async patch. The cause is that we try to block main thread when waiting for reply of sync messages. Our implementation is: 1. Start a task on main thread 2. While not receiving reply, ask RunLoop to run. RunLoop can listen for input sources and processes data from sources(receiving and handling reply) at this time. (Thread is blocked.) 3. Continue to finish the task It looks like when we do dispatch_async, the RunLoop doesn't think it is a source and thus time out at step2.

Geoffrey Garen

Comment 12 2019-10-15 10:37:57 PDT

(In reply to Sihui Liu from comment #11) > (In reply to Geoffrey Garen from comment #9) > > Oh, actually, it's expected that the patch above doesn't achieve all 20%, > > because it doesn't include the fix for callOnMainThread. I'll upload a patch > > that includes callOnMainThread and the regression test change all in one. > > I previously made the change of the callOnMainThread locally and tested. I > tried with the latest patch and it seemed it's still not as fast as using > dispatch_async directly. It's close though. > > Also I debugged the layout test failure of the dispatch_async patch. The > cause is that we try to block main thread when waiting for reply of sync > messages. Our implementation is: > 1. Start a task on main thread > 2. While not receiving reply, ask RunLoop to run. RunLoop can listen for > input sources and processes data from sources(receiving and handling reply) > at this time. (Thread is blocked.) > 3. Continue to finish the task > It looks like when we do dispatch_async, the RunLoop doesn't think it is a > source and thus time out at step2. Is this in Connection::waitForSyncReply? Or something else? I suspect that dispatch_async is not willing to execute inside a nested RunLoop. I believe that's a fundamental difference between libdispatch and CFRunLoop. (libdispatch requires all tasks to dispatch in order, and that means prohibiting re-entrant dispatch. Relatedly, libdispatch explicitly documents that dispatch_sync(dispatch_get_current_queue(), ...) is always a deadlock.) So, I guess we can't use dispatch_async. Let's try to finish up the mach_message / version 1 RunLoopSource solution, since it delivers most of the speedup. Can you do that? Outstanding issues: 1. inspector/css/pseudo-creation.html fails. It used to fail sometimes (e.g. https://build.webkit.org/builders/Apple%20High%20Sierra%20Debug%20WK1%20(Tests)/builds/11323), and now it fails all the time. So, the code has gotten better (more deterministic), but we broke a test. It looks like something about InspectorProtocol.eventHandler is racy relative to promise resolution. We need to fix this, or perhaps disable the test. 2. TestWebKitAPI.WebKitLegacy.ScrollingDoesNotPauseMedia times out. It must either time out in its call to Util::run, or its call to callOnMainThreadAndWait. I think it's probably callOnMainThreadAndWait. If so, this test has a simple bug. It calls "callOnMainThreadAndWait" *from the main thread*. I'm not sure exactly why this ever worked, but I think you can fix it by just switching to callOnMainThread.

Sihui Liu

Comment 13 2019-10-16 13:12:41 PDT

Created attachment 381101 [details] Patch

Sihui Liu

Comment 14 2019-10-16 13:27:40 PDT

Created attachment 381103 [details] Patch

Sam Weinig

Comment 15 2019-10-16 13:35:05 PDT

Comment on attachment 381103 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=381103&action=review > Source/WTF/ChangeLog:9 > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. What is CFRunloopSource1? Why is this faster? If it's not the current default, what is the reason it's not and what potential downsides are there?

Geoffrey Garen

Comment 16 2019-10-16 13:36:11 PDT

Comment on attachment 381103 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=381103&action=review Looks great, and thanks for fixing those tests. One comment on callOonMainThread. > Source/WTF/wtf/cocoa/MainThreadCocoa.mm:136 > - [staticMainThreadCaller performSelectorOnMainThread:@selector(call) withObject:nil waitUntilDone:NO]; > + dispatch_async(dispatch_get_main_queue(), ^{ > + [staticMainThreadCaller call]; > + }); You can remove some indirection here by deleting staticMainThreadCaller and JSWTFMainThreadCaller and just calling WTF::dispatchFunctionsFromMainThread() directly. staticMainThreadCaller and JSWTFMainThreadCaller only existed for the sake of doing the async dispatch, which you're doing a different way now. Side note: Is dispatch_async better than RunLoop::main().dispatch() in this case? Also, is it OK that callOnMainThread will not run in nested run loops?

Sihui Liu

Comment 17 2019-10-16 14:11:56 PDT

Created attachment 381106 [details] Patch

Sihui Liu

Comment 18 2019-10-16 14:51:29 PDT

(In reply to Sam Weinig from comment #15) > Comment on attachment 381103 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > > Source/WTF/ChangeLog:9 > > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. > > What is CFRunloopSource1? Why is this faster? If it's not the current > default, what is the reason it's not and what potential downsides are there? It's version 1 source in CFRunloop: https://developer.apple.com/documentation/corefoundation/cfrunloopsource-rhr According to our profiling on test like OpenSource/PerformanceTests/IndexedDB/basic/objectstore-get.html, we spent a lot of time on waking up the runloop when receiving IPC message. And we tried libdispatch, which uses version 1 source for dispatching to runloop, and is much faster. So we think we could probably adopt this for better performance. I don't know why version 0 source is our default..

Sihui Liu

Comment 19 2019-10-16 15:06:02 PDT

(In reply to Geoffrey Garen from comment #16) > Comment on attachment 381103 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > Looks great, and thanks for fixing those tests. One comment on > callOonMainThread. > > > Source/WTF/wtf/cocoa/MainThreadCocoa.mm:136 > > - [staticMainThreadCaller performSelectorOnMainThread:@selector(call) withObject:nil waitUntilDone:NO]; > > + dispatch_async(dispatch_get_main_queue(), ^{ > > + [staticMainThreadCaller call]; > > + }); > > You can remove some indirection here by deleting staticMainThreadCaller and > JSWTFMainThreadCaller and just calling > WTF::dispatchFunctionsFromMainThread() directly. staticMainThreadCaller and > JSWTFMainThreadCaller only existed for the sake of doing the async dispatch, > which you're doing a different way now. > staticMainThreadCaller is used in the #if USE(WEB_THREAD) case, which is a few lines above. I would change this to WTF::dispatchFunctionsFromMainThread(). > Side note: Is dispatch_async better than RunLoop::main().dispatch() in this > case? Also, is it OK that callOnMainThread will not run in nested run loops? The reason why I didn't use RunLoop::main().dispatch() because RunLoop::main() needs to run after RunLoop::initializeMainRunLoop() and callOnMainThread does not seem to care about whether we have initialized mainRunLoop or not. Why can't callOnMainThread run in nested run loops?

Sihui Liu

Comment 20 2019-10-16 15:31:50 PDT

Created attachment 381124 [details] Patch

Sam Weinig

Comment 21 2019-10-16 16:46:16 PDT

(In reply to Sihui Liu from comment #18) > (In reply to Sam Weinig from comment #15) > > Comment on attachment 381103 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > > > > Source/WTF/ChangeLog:9 > > > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > > > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. > > > > What is CFRunloopSource1? Why is this faster? If it's not the current > > default, what is the reason it's not and what potential downsides are there? > > It's version 1 source in CFRunloop: > https://developer.apple.com/documentation/corefoundation/cfrunloopsource-rhr > > According to our profiling on test like > OpenSource/PerformanceTests/IndexedDB/basic/objectstore-get.html, we spent a > lot of time on waking up the runloop when receiving IPC message. And we > tried libdispatch, which uses version 1 source for dispatching to runloop, > and is much faster. So we think we could probably adopt this for better > performance. Does that mean this is not a speedup on it's own?

Sihui Liu

Comment 22 2019-10-16 17:22:19 PDT

(In reply to Sam Weinig from comment #21) > (In reply to Sihui Liu from comment #18) > > (In reply to Sam Weinig from comment #15) > > > Comment on attachment 381103 [details] > > > Patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > > > > > > Source/WTF/ChangeLog:9 > > > > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > > > > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. > > > > > > What is CFRunloopSource1? Why is this faster? If it's not the current > > > default, what is the reason it's not and what potential downsides are there? > > > > It's version 1 source in CFRunloop: > > https://developer.apple.com/documentation/corefoundation/cfrunloopsource-rhr > > > > According to our profiling on test like > > OpenSource/PerformanceTests/IndexedDB/basic/objectstore-get.html, we spent a > > lot of time on waking up the runloop when receiving IPC message. And we > > tried libdispatch, which uses version 1 source for dispatching to runloop, > > and is much faster. So we think we could probably adopt this for better > > performance. > > Does that mean this is not a speedup on it's own? It's a speedup on it's own. The patch gives us about 15% speedup on the test we profiled (PerformanceTests/IndexedDB/basic/objectstore-get.html). It saves time by eliminating CFRunLoopWakeUps from the trace.

Geoffrey Garen

Comment 23 2019-10-16 21:16:51 PDT

> staticMainThreadCaller is used in the #if USE(WEB_THREAD) case, which is a > few lines above. You can change that case to stop using staticMainThreadCaller too. > > Side note: Is dispatch_async better than RunLoop::main().dispatch() in this > > case? Also, is it OK that callOnMainThread will not run in nested run loops? > > The reason why I didn't use RunLoop::main().dispatch() because > RunLoop::main() needs to run after RunLoop::initializeMainRunLoop() and > callOnMainThread does not seem to care about whether we have initialized > mainRunLoop or not. callOnMainThread depends on initializeMainThread() to initialize staticMainThreadCaller. > Why can't callOnMainThread run in nested run loops? Because dispatch_async doesn't run in nested run loops. (I confirmed this theory at https://www.thecave.com/2015/08/10/dispatch-async-to-main-queue-doesnt-work-with-modal-window-on-mac-os-x/.) If there isn't a specific advantage to dispatch_async here, maybe RunLoop::main() is better, to avoid this minor change in behavior.

Sihui Liu

Comment 24 2019-10-17 01:30:06 PDT

Created attachment 381180 [details] Patch

Sam Weinig

Comment 25 2019-10-17 07:59:00 PDT

(In reply to Sihui Liu from comment #22) > (In reply to Sam Weinig from comment #21) > > (In reply to Sihui Liu from comment #18) > > > (In reply to Sam Weinig from comment #15) > > > > Comment on attachment 381103 [details] > > > > Patch > > > > > > > > View in context: > > > > https://bugs.webkit.org/attachment.cgi?id=381103&action=review > > > > > > > > > Source/WTF/ChangeLog:9 > > > > > + We used CFRunLoopWakeUp to wake up runloop to process source, which seems to be slow according to profiling. To > > > > > + avoid calling CFRunLoopWakeUp, use CFRunloopSource1 instead of CFRunloopSource0. > > > > > > > > What is CFRunloopSource1? Why is this faster? If it's not the current > > > > default, what is the reason it's not and what potential downsides are there? > > > > > > It's version 1 source in CFRunloop: > > > https://developer.apple.com/documentation/corefoundation/cfrunloopsource-rhr > > > > > > According to our profiling on test like > > > OpenSource/PerformanceTests/IndexedDB/basic/objectstore-get.html, we spent a > > > lot of time on waking up the runloop when receiving IPC message. And we > > > tried libdispatch, which uses version 1 source for dispatching to runloop, > > > and is much faster. So we think we could probably adopt this for better > > > performance. > > > > Does that mean this is not a speedup on it's own? > > It's a speedup on it's own. The patch gives us about 15% speedup on the test > we profiled (PerformanceTests/IndexedDB/basic/objectstore-get.html). It > saves time by eliminating CFRunLoopWakeUps from the trace. Very nice!

Geoffrey Garen

Comment 26 2019-10-17 10:40:20 PDT

Comment on attachment 381180 [details] Patch r=me

Sihui Liu

Comment 27 2019-10-17 12:58:29 PDT

Created attachment 381217 [details] Patch for landing

WebKit Commit Bot

Comment 28 2019-10-17 14:09:57 PDT

The commit-queue encountered the following flaky tests while processing attachment 381217 [details]: inspector/console/webcore-logging.html bug 203118 (authors: drousso@apple.com and eric.carlson@apple.com) The commit-queue is continuing to process your patch.

WebKit Commit Bot

Comment 29 2019-10-17 14:10:03 PDT

The commit-queue encountered the following flaky tests while processing attachment 381217 [details]: imported/w3c/web-platform-tests/websockets/bufferedAmount-unchanged-by-sync-xhr.any.worker.html bug 202003 (author: youennf@gmail.com) The commit-queue is continuing to process your patch.

WebKit Commit Bot

Comment 30 2019-10-17 15:05:30 PDT

Comment on attachment 381217 [details] Patch for landing Clearing flags on attachment: 381217 Committed r251261: <https://trac.webkit.org/changeset/251261>

WebKit Commit Bot

Comment 31 2019-10-17 15:05:33 PDT

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 32 2019-10-17 15:06:29 PDT

<rdar://problem/56387205>

Truitt Savell

Comment 33 2019-10-18 09:34:03 PDT

It looks like the changes in https://trac.webkit.org/changeset/251261/webkit broke inspector/console/webcore-logging.html History: https://results.webkit.org/?suite=layout-tests&test=inspector%2Fconsole%2Fwebcore-logging.html Diff: --- /Volumes/Data/slave/mojave-release-tests-wk1/build/layout-test-results/inspector/console/webcore-logging-expected.txt +++ /Volumes/Data/slave/mojave-release-tests-wk1/build/layout-test-results/inspector/console/webcore-logging-actual.txt @@ -27,7 +27,9 @@ -- Running test case: Console.Logging.MediaLogging PASS: Media logging disabled. PASS: Media logging has been enabled. -PASS: Media log message should have source 'media'. +FAIL: Media log message should have source 'media'. + Expected: truthy + Actual: false -- Running test case: Console.Logging.LogAsJSONWithoutRepeat PASS: Media logging disabled. This reproduces easily by running the test in iterations. It fails on 251261 but not 251260.

Truitt Savell

Comment 34 2019-10-18 09:39:55 PDT

This also broke inspector/runtime/getProperties.html History https://results.webkit.org/?suite=layout-tests&test=inspector%2Fruntime%2FgetProperties.html

Truitt Savell

Comment 35 2019-10-23 17:06:03 PDT

Reverted r251261 for reason: This broke multiple tests Committed r251514: <https://trac.webkit.org/changeset/251514>

Sihui Liu

Comment 36 2019-10-23 17:45:12 PDT

(In reply to Truitt Savell from comment #35) > Reverted r251261 for reason: > > This broke multiple tests > > Committed r251514: <https://trac.webkit.org/changeset/251514> inspector/console/webcore-logging.html was fixed by r251482 (and got reverted too), which fixed the test and made another meaningful bug fix. inspector/runtime/getProperties.html is caused by that the test file doesn't handle error case correctly. (The diff is one error console message.) scrollingcoordinator/ios/non-stable-viewport-scroll.html is flaky by its nature, as I commented in rdar://problem/56410921 and I replied in Slack. Are there any other failure caused by this patch? If yes, what are they? If no, what led this to be reverted?

Truitt Savell

Comment 37 2019-10-24 08:06:24 PDT

(In reply to Sihui Liu from comment #36) > (In reply to Truitt Savell from comment #35) > > Reverted r251261 for reason: > > > > This broke multiple tests > > > > Committed r251514: <https://trac.webkit.org/changeset/251514> > > inspector/console/webcore-logging.html was fixed by r251482 (and got > reverted too), which fixed the test and made another meaningful bug fix. > > inspector/runtime/getProperties.html is caused by that the test file doesn't > handle error case correctly. (The diff is one error console message.) > > scrollingcoordinator/ios/non-stable-viewport-scroll.html is flaky by its > nature, as I commented in rdar://problem/56410921 and I replied in Slack. > > Are there any other failure caused by this patch? If yes, what are they? > If no, what led this to be reverted? Those are the tests that lead us to roll this patch out

Geoffrey Garen

Comment 38 2019-10-24 12:47:36 PDT

> inspector/console/webcore-logging.html was fixed by r251482 (and got > reverted too), which fixed the test and made another meaningful bug fix. Sounds great! > inspector/runtime/getProperties.html is caused by that the test file doesn't > handle error case correctly. (The diff is one error console message.) Does this mean that inspector/runtime/getProperties.html does fail when this patch is applied? If so, I'd suggest updating the test to handle error or updating the expected result to include the console message when re-landing. > scrollingcoordinator/ios/non-stable-viewport-scroll.html is flaky by its > nature, as I commented in rdar://problem/56410921 and I replied in Slack. Was this test failing sometimes before this patch landed? If so, seems OK to ignore. If not, we probably need to skip this test until it gets fixed.

Geoffrey Garen

Comment 39 2019-10-24 13:25:49 PDT

> inspector/runtime/getProperties.html is caused by that the test file doesn't handle error case correctly. (The diff is one error console message.) For this case, let's use if (window.internals) window.internals.settings.setUnhandledPromiseRejectionToConsoleEnabled(false); and then update the test result not to expect a console message.

Geoffrey Garen

Comment 40 2019-10-24 13:46:06 PDT

> scrollingcoordinator/ios/non-stable-viewport-scroll.html is flaky by its > nature, as I commented in rdar://problem/56410921 and I replied in Slack. For this case, the test is marked as flaky, so I think it's OK to land without fixing it, since doing so won't make the bots red. (We analyzed this test a bit, and fixing it will be non-trivial. More information from <rdar://problem/56410921>: "Talked to Simon, the test is flaky without my patch. We found some graphics work is done after callback of uiController.doAfterPresentationUpdate and RenderLayerCompositor::layerTreeAsText, but layerTreeAsText should show us the final result. With my patch, some of those work (result is not reflected in the -expected.txt file) has bigger chance to finish sooner than layerTreeAsText, so the test has become more flaky." )

Antti Koivisto

Comment 41 2019-10-25 01:53:33 PDT

Seems like this patch was a significant (~11%) StyleBench perf regression: https://perf.webkit.org/v3/#/charts?since=1571129192465&paneList=((20-1649-36468600-null-(5-2.5-500))) Rollout brought the perf back where it was. This should be looked into before relanding.

Sihui Liu

Comment 42 2019-10-25 06:22:34 PDT

(In reply to Antti Koivisto from comment #41) > Seems like this patch was a significant (~11%) StyleBench perf regression: > > https://perf.webkit.org/v3/#/charts?since=1571129192465&paneList=((20-1649- > 36468600-null-(5-2.5-500))) > > Rollout brought the perf back where it was. This should be looked into > before relanding. That's surprising! How can I run this test?

Geoffrey Garen

Comment 43 2019-10-25 08:14:19 PDT

(In reply to Sihui Liu from comment #42) > (In reply to Antti Koivisto from comment #41) > > Seems like this patch was a significant (~11%) StyleBench perf regression: > > > > https://perf.webkit.org/v3/#/charts?since=1571129192465&paneList=((20-1649- > > 36468600-null-(5-2.5-500))) > > > > Rollout brought the perf back where it was. This should be looked into > > before relanding. > > That's surprising! How can I run this test? OpenSource/PerformanceTests/StyleBench/index.html

Antti Koivisto

Comment 44 2019-10-25 09:07:53 PDT

cd PerformanceTests/StyleBench python -m SimpleHTTPServer 8000 navigate to localhost:8000 Alternatively individual subtests can be run with web server by opening PerformanceTests/StyleBench/InteractiveRunner.html

Antti Koivisto

Comment 45 2019-10-25 09:08:32 PDT

> Alternatively individual subtests can be run with web server by opening *without web server

Ryosuke Niwa

Comment 46 2019-10-25 11:26:46 PDT

There is also Tools/Scripts/run-perf-tests <path>

Antti Koivisto

Comment 47 2020-02-13 13:43:54 PST

Are there any show stoppers for landing this? It might help with first paint performance (type 1 sources don't prevent runloop observers from firing and so don't delay layer flushes). I don't think StyleBench regression is a show stopper, it is not a zero regression test. It sounded like it was a bug in how measurements are made we understood what was wrong.

Sihui Liu

Comment 48 2020-02-13 14:05:31 PST

(In reply to Antti Koivisto from comment #47) > Are there any show stoppers for landing this? It might help with first paint > performance (type 1 sources don't prevent runloop observers from firing and > so don't delay layer flushes). > > I don't think StyleBench regression is a show stopper, it is not a zero > regression test. It sounded like it was a bug in how measurements are made > we understood what was wrong. No,the we could try landing this again.

Ben Nham

Comment 49 2020-02-14 11:42:32 PST

I still think we should try to commit this but I'm not sure if this will help first paint perf due to excessive batching before an observer fires (namely the one that causes layer flush to occur). As far as I can tell, it looks like servicing a type 0 source will still turn the entire run loop and fire observers. After a type 0 source is handled: 1. The GCD main queue port is polled with 0 timeout; if this succeeds, go to (5) skipping before/after waiting observers 2. Before waiting observers fired 3. wakeup/timer/source1 port set polled with a 0 timeout 4. after waiting observers fired 5. GCD main queue / wakeup / timer / source1 callback runs This patch mainly seems to optimize things by using a source1 mach port to wake up the run loop instead of CFRunLoopWakeUp. This is more efficient because this patch just sends a mach message, while CFRunLoopWakeUp both takes a lock and sends a mach message. But both before and after this patch, we can end up handling a bunch of IPC messages in a row before a run loop observer fires because of the way WTF::RunLoop::performWork batches up work.

Simon Fraser (smfr)

Comment 50 2020-02-14 11:47:22 PST

(In reply to Ben Nham from comment #49) > I still think we should try to commit this but I'm not sure if this will > help first paint perf due to excessive batching before an observer fires > (namely the one that causes layer flush to occur). As far as I can tell, it > looks like servicing a type 0 source will still turn the entire run loop and > fire observers. It does not fire "before waiting" observers, which the CoreAnimation commit observer is: https://github.com/opensource-apple/CF/blob/master/CFRunLoop.c#L2403 https://github.com/opensource-apple/CF/blob/master/CFRunLoop.c#L2425 After a type 0 source is handled: > > 1. The GCD main queue port is polled with 0 timeout; if this succeeds, go to > (5) skipping before/after waiting observers > 2. Before waiting observers fired No, because of "if (!poll". > 3. wakeup/timer/source1 port set polled with a 0 timeout > 4. after waiting observers fired > 5. GCD main queue / wakeup / timer / source1 callback runs > > This patch mainly seems to optimize things by using a source1 mach port to > wake up the run loop instead of CFRunLoopWakeUp. This is more efficient > because this patch just sends a mach message, while CFRunLoopWakeUp both > takes a lock and sends a mach message. But both before and after this patch, > we can end up handling a bunch of IPC messages in a row before a run loop > observer fires because of the way WTF::RunLoop::performWork batches up work.

Ben Nham

Comment 51 2020-02-14 11:57:19 PST

I see, I missed that. So the hypothesis is basically that an IPC that causes a layout to occur in a source 0 callback, which then calls flushLayers, and then CA's auto-commit observer doesn't run as soon as it should because of this source 0 behavior?

Simon Fraser (smfr)

Comment 52 2020-02-14 13:19:49 PST

(In reply to Ben Nham from comment #51) > I see, I missed that. So the hypothesis is basically that an IPC that causes > a layout to occur in a source 0 callback, which then calls flushLayers, and > then CA's auto-commit observer doesn't run as soon as it should because of > this source 0 behavior? More simply put, a flood of IPC messages can postpone rendering for many event loop cycles.

Sihui Liu

Comment 53 2020-02-17 09:59:53 PST

Created attachment 390927 [details] Patch

Sihui Liu

Comment 54 2020-02-24 14:19:34 PST

Created attachment 391575 [details] Patch

Sihui Liu

Comment 55 2020-02-25 17:10:17 PST

Created attachment 391705 [details] Patch

Geoffrey Garen

Comment 56 2020-06-09 14:57:17 PDT

Created attachment 401478 [details] Patch - just the RunLoop parts Let's see if we can land the refactoring part of this patch without the behavior change, to shrink the diff and help move to just one function queue.

Sihui Liu

Comment 57 2020-06-09 21:53:56 PDT

Comment on attachment 401478 [details] Patch - just the RunLoop parts Looks green. Do we need a new bug for this change?

Antti Koivisto

Comment 58 2020-06-09 23:32:39 PDT

Comment on attachment 401478 [details] Patch - just the RunLoop parts View in context: https://bugs.webkit.org/attachment.cgi?id=401478&action=review > Source/WTF/ChangeLog:3 > + Use CFRunLoopSource1 for faster task dispatch The tittle does not describe this patch.

Geoffrey Garen

Comment 59 2020-06-10 12:50:25 PDT

Comment on attachment 401478 [details] Patch - just the RunLoop parts Yeah, I'll obsolete this patch and post in a separate bug.

Geoffrey Garen

Comment 60 2020-06-10 12:55:09 PDT

Refactoring patch is now posted @ https://bugs.webkit.org/show_bug.cgi?id=213043.

Geoffrey Garen

Comment 61 2020-11-13 11:54:53 PST

Created attachment 414072 [details] Patch

Geoffrey Garen

Comment 62 2020-11-13 11:56:19 PST

Comment on attachment 414072 [details] Patch Re-based patch.

Geoffrey Garen

Comment 63 2020-11-13 12:00:26 PST

Created attachment 414073 [details] Patch

Sihui Liu

Comment 64 2020-11-13 16:05:59 PST

Note using version 1 source may cause issue in our test runner, details in <rdar://problem/59637925>.

Geoffrey Garen

Comment 65 2020-11-16 12:17:58 PST

Created attachment 414266 [details] Patch

Geoffrey Garen

Comment 66 2020-11-16 20:40:28 PST

Created attachment 414303 [details] Patch

Geoffrey Garen

Comment 67 2020-11-20 13:28:14 PST

Created attachment 414710 [details] microbenchmark Uploading a microbenchmark for posterity. Interestingly, the fastest case for V0 and V1 is about equal. But on average V0 is 2X slower; and in the worst case, it's 5X slower, causing a full 2ms of latency!

Geoffrey Garen

Comment 68 2020-11-20 13:29:37 PST

Comment on attachment 414303 [details] Patch Tests pass and microbenchmark confirms a speedup. So, I think this is ready for review.

EWS

Comment 69 2020-11-20 14:04:47 PST

Committed r270132: <https://trac.webkit.org/changeset/270132> All reviewed patches have been landed. Closing bug and clearing flags on attachment 414303 [details].

Ryan Haddad

Comment 70 2020-12-01 10:00:55 PST

Reverted r270132 for reason: Caused at least one regression test failure (webkit.org/b/219401) Committed r270311: <https://trac.webkit.org/changeset/270311>

Ryan Haddad

Comment 71 2020-12-01 13:47:01 PST

(In reply to Ryan Haddad from comment #70) > Caused at least one regression test failure (webkit.org/b/219401) It appears that these WPT worklet tests that had become flaky on iOS are OK again after the revert, so these were also affected: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&suite=layout-tests&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Fanimation-worklet-csp.https.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Fanimation-worklet-referrer.https.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Flayout-worklet-referrer.https.html https://build.webkit.org/results/Apple-iOS-14-Simulator-Release-WK2-Tests/r270300%20(1074)/results.html

Geoffrey Garen

Comment 72 2020-12-10 15:35:37 PST

Created attachment 415939 [details] Patch

Geoffrey Garen

Comment 73 2020-12-10 15:36:28 PST

Previous test failures should be resolved by: https://bugs.webkit.org/show_bug.cgi?id=219717 https://bugs.webkit.org/show_bug.cgi?id=219656

EWS

Comment 74 2020-12-10 16:17:48 PST

Committed r270661: <https://trac.webkit.org/changeset/270661> All reviewed patches have been landed. Closing bug and clearing flags on attachment 415939 [details].

Ryan Haddad

Comment 75 2020-12-12 21:22:36 PST

Looks like there are still more failures/timeouts because of this change. The failures are a bit difficult to enumerate because some are flaky, but here are the ones that clearly start with this change: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&test=fast%2Ftext%2Fcanvas-color-fonts%2FCOLR.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-001.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-003.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-001.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-002.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-003.html

Ryan Haddad

Comment 76 2020-12-12 21:27:20 PST

Two more that appear to be iOS specific: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&test=imported%2Fw3c%2Fweb-platform-tests%2Fworkers%2Fmodules%2Fshared-worker-import-csp.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Flayout-worklet-csp.https.html

Ryan Haddad

Comment 77 2020-12-12 21:40:57 PST

Another: https://results.webkit.org/?suite=layout-tests&test=inspector%2Fformatting%2Fsource-map-javascript-2.html

Ryan Haddad

Comment 78 2020-12-12 21:44:20 PST

Reverted r270661 for reason: Caused layout test failures and timeouts Committed r270746: <https://trac.webkit.org/changeset/270746>

Ryan Haddad

Comment 79 2020-12-12 21:46:35 PST

(In reply to Ryan Haddad from comment #78) > Reverted r270661 for reason: > > Caused layout test failures and timeouts Here is one link that shows all the ones I've found so far: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&suite=layout-tests&test=fast%2Ftext%2Fcanvas-color-fonts%2FCOLR.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-001.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-003.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-001.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-002.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fcss%2Fcss-transitions%2Fproperties-value-inherit-003.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworkers%2Fmodules%2Fshared-worker-import-csp.html&test=imported%2Fw3c%2Fweb-platform-tests%2Fworklets%2Flayout-worklet-csp.https.html&test=inspector%2Fformatting%2Fsource-map-javascript-2.html&test=inspector%2Fruntime%2Fparse.html

Per Arne Vollan

Comment 80 2020-12-17 03:30:51 PST

Comment on attachment 415939 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415939&action=review > Source/WTF/wtf/cf/RunLoopCF.cpp:76 > + mach_msg_header_t header; > + header.msgh_bits = MACH_MSGH_BITS(MACH_MSG_TYPE_COPY_SEND, 0); > + header.msgh_size = sizeof(mach_msg_header_t); > + header.msgh_remote_port = CFMachPortGetPort(m_port.get()); > + header.msgh_local_port = MACH_PORT_NULL; > + header.msgh_id = 0; > + mach_msg_return_t result = mach_msg(&header, MACH_SEND_MSG | MACH_SEND_TIMEOUT, header.msgh_size, 0, MACH_PORT_NULL, 0, MACH_PORT_NULL); > + RELEASE_ASSERT(result == MACH_MSG_SUCCESS || result == MACH_SEND_TIMED_OUT); > + if (result == MACH_SEND_TIMED_OUT) > + mach_msg_destroy(&header); Could this be simplified by using CFMessagePort instead of CFMachPort?

Geoffrey Garen

Comment 81 2020-12-17 08:42:25 PST

> Could this be simplified by using CFMessagePort instead of CFMachPort? Unfortunately, CFMessagePortSendRequest still locks a pthread_mutex_t before sending, and pthread_mutex_t is most of the overhead we are trying to avoid.

Per Arne Vollan

Comment 82 2021-01-20 14:14:12 PST

(In reply to Geoffrey Garen from comment #81) > > Could this be simplified by using CFMessagePort instead of CFMachPort? > > Unfortunately, CFMessagePortSendRequest still locks a pthread_mutex_t before > sending, and pthread_mutex_t is most of the overhead we are trying to avoid. Ah, got it, thanks! If there is a little bit of flakiness associated with this patch, could this possibly be addressed by dispatching off a run loop observer for the main thread run loop, in order to make the ordering deterministic?

Geoffrey Garen

Comment 83 2021-01-21 15:27:31 PST

> If there is a little bit of flakiness associated with this patch, could this > possibly be addressed by dispatching off a run loop observer for the main > thread run loop, in order to make the ordering deterministic? I had the same thought! :P I do think that approach *could* work. I spent a few days trying to make that work, but I ended up with too many test failures. It's possible that I did not succeed in making behavior more deterministic; or maybe I succeeded in making behavior more deterministic, but deterministically failing. One gotcha is that run loop observers don't run at *exactly* the same point as run loop sources. Another gotcha is that the relative order of run loop observers is different in the case where you first call CFRunLoopRun() vs the case where you were already sleeping inside CFRunLoopRun(). My next move is to A/B test to see if the change is worth it. If it is, I think I can make things more deterministic by standardizing how we schedule tasks. (I started this in https://trac.webkit.org/changeset/271493/webkit.)

Note You need to log in before you can comment on or make changes to this bug.

Status REOPENED

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Geoffrey Garen

Reported

2019-10-11 19:10 PDT

Modified

2021-01-21 15:27 PST History

CC List

20 users Show

URL

Keywords InRadar

Depends on

213043 219656 219717 219785

Blocks

Dependencies

tree graph

andersca
benjamin
cdumez
cmarcelo
commit-queue
dbates
ews-watchlist
ggaren
koivisto
nham
pvollan
rniwa
sam
sihui_liu
simon.fraser
thorton
tsavell
webkit-bug-importer
wenson_hsieh
youennf