RESOLVED FIXED 217290
Using WASM function size as the cap for choosing a register allocator causes performance regressions.
https://bugs.webkit.org/show_bug.cgi?id=217290
Summary Using WASM function size as the cap for choosing a register allocator causes ...
Alekandr Guryanov
Reported 2020-10-04 01:29:54 PDT
Created attachment 410455 [details] Safari 13.1.1 Hi. I am the author of the js-dos project (dosbox in js). After updating to Safari 14 I see a dramatic performance loss in js-dos dhry2 test. It reproduced on all devices: iphone 6, iphone 7, iphone 8, iphone x, macos catalina. All Safari starting from 14 affected (I tested 14.0, 14.0.1 and Technology Preview). The test is a variation of dhry 2 test for dos. It runs inside js-dosbox and reports operation count. Time is measured with Date.now(). So it only does math operations and estimate time. You can run this test using link. https://dos.zone/en/play/https%3A%2F%2Fdoszone-uploads.s3.dualstack.eu-central-1.amazonaws.com%2Foriginal%2F2X%2Fb%2Fb4b5275904d86a4ab8a20917b2b7e34f0df47bf7.jsdos My results are in attached screenshots. Results from (MacBook Pro (13-inch, 2017, Two Thunderbolt 3 ports, Catalina 10.15.17): Safari 13.1.1 - (640_000, VAX: 48) Safari 14.0.0 - (80_000, VAX: 5.5) Safari 14.0.1 - (80_000, VAX: 5.5) Safari TP - (80_000, VAX: 5.9) Firefox - (640_000, VAX: 36.9) -- I also tried to test in recent builds of webkit (267874, 267919, 267940). But, always have this error: MacBook-Pro-Alexander:~ caiiiycuk$ /Users/caiiiycuk/Downloads/267940/run-webkit-archive ; exit; Setting DYLD FRAMEWORK and LIBRARY paths to /Users/caiiiycuk/Downloads/267940/Release 2020-10-04 15:21:50.936 SafariForWebKitDevelopment[4346:301066] -[BrowserWKView _setInspectorDelegate:]: unrecognized selector sent to instance 0x7fef794c9c30 2020-10-04 15:21:50.936 SafariForWebKitDevelopment[4346:301066] -[BrowserWKView _setInspectorDelegate:]: unrecognized selector sent to instance 0x7fef794c9c30 2020-10-04 15:21:50.947 SafariForWebKitDevelopment[4346:301066] .sdef warning for argument 'FileType' of command 'save' in suite 'Standard Suite': 'saveable file format' is not a valid type name. 2020-10-04 15:21:50.995 SafariForWebKitDevelopment[4346:301134] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'SandboxBroker': Connection invalid 2020-10-04 15:21:51.083 com.apple.WebKit.WebContent.Development[4348:301125] Error loading /Library/Apple/System/Library/StagedFrameworks/Safari/Safari.framework/Safari: dlopen(/Library/Apple/System/Library/StagedFrameworks/Safari/Safari.framework/Safari, 265): Symbol not found: _OBJC_CLASS_$_WBSContentBlockerStatisticsSQLiteStore Referenced from: /Library/Apple/System/Library/StagedFrameworks/Safari/SafariSharedUI.framework/Versions/A/SafariSharedUI Expected in: /System/Library/PrivateFrameworks/SafariShared.framework/Versions/A/SafariShared in /Library/Apple/System/Library/StagedFrameworks/Safari/SafariSharedUI.framework/Versions/A/SafariSharedUI InjectedBundle::load failed - Could not load the executable from the bundle. 2020-10-04 15:22:01.457 SafariForWebKitDevelopment[4346:301066] -[BrowserWKView _setInspectorDelegate:]: unrecognized selector sent to instance 0x7fef7d529cc0
Attachments
Safari 13.1.1 (142.44 KB, image/png)
2020-10-04 01:29 PDT, Alekandr Guryanov
no flags
Safari 14.0.0 (133.23 KB, image/png)
2020-10-04 01:30 PDT, Alekandr Guryanov
no flags
Safari 14.0.1 (128.71 KB, image/png)
2020-10-04 01:30 PDT, Alekandr Guryanov
no flags
Safari TP (151.85 KB, image/png)
2020-10-04 01:31 PDT, Alekandr Guryanov
no flags
Patch (8.98 KB, patch)
2020-10-23 13:22 PDT, Keith Miller
no flags
all-in-one-test (695.87 KB, application/zip)
2020-10-26 02:02 PDT, Alekandr Guryanov
no flags
Performance on Safari 14.4 (61.36 KB, image/jpeg)
2021-02-25 19:30 PST, Alekandr Guryanov
no flags
Patch (2.99 KB, patch)
2021-11-08 14:17 PST, Robin Morisset
no flags
Alekandr Guryanov
Comment 1 2020-10-04 01:30:24 PDT
Created attachment 410456 [details] Safari 14.0.0
Alekandr Guryanov
Comment 2 2020-10-04 01:30:49 PDT
Created attachment 410457 [details] Safari 14.0.1
Alekandr Guryanov
Comment 3 2020-10-04 01:31:12 PDT
Created attachment 410458 [details] Safari TP
Alekandr Guryanov
Comment 4 2020-10-04 02:14:27 PDT
Minimal test: https://js-dos.com/v7/build/examples/dhry2.html There are 2 buttons: 1) Start worker - run test in worker 2) Start direct - run test without worker Results are nearly same.
Radar WebKit Bug Importer
Comment 5 2020-10-04 14:23:54 PDT
Saam Barati
Comment 6 2020-10-15 17:06:41 PDT
Thanks for your report. Will try to reproduce
Keith Miller
Comment 7 2020-10-23 11:16:18 PDT
My guess is that this is related to: https://bugs.webkit.org/show_bug.cgi?id=212105
Keith Miller
Comment 8 2020-10-23 11:29:13 PDT
(In reply to Keith Miller from comment #7) > My guess is that this is related to: > https://bugs.webkit.org/show_bug.cgi?id=212105 I was able to bisect this to d8a0fb4bf380350af94c8cf7c399471e1186b93c..4fbc651e99458aeb44610e57d25d88fe7c1c498a and that bug looks like the most likely culprit.
Keith Miller
Comment 9 2020-10-23 13:22:01 PDT
Michael Saboff
Comment 10 2020-10-23 13:36:58 PDT
Comment on attachment 412207 [details] Patch r=me
EWS
Comment 11 2020-10-23 15:53:34 PDT
Committed r268942: <https://trac.webkit.org/changeset/268942> All reviewed patches have been landed. Closing bug and clearing flags on attachment 412207 [details].
Saam Barati
Comment 12 2020-10-23 17:50:39 PDT
Comment on attachment 412207 [details] Patch Can we get a microbenchmark for this?
Keith Miller
Comment 13 2020-10-23 20:29:03 PDT
(In reply to Saam Barati from comment #12) > Comment on attachment 412207 [details] > Patch > > Can we get a microbenchmark for this? I could try to create one but it might not be worth the time. Alekandr, would you be willing to let us copy your Wasm module into our repo as a test case?
Alekandr Guryanov
Comment 14 2020-10-24 06:00:45 PDT
(In reply to Keith Miller from comment #13) > (In reply to Saam Barati from comment #12) > > Comment on attachment 412207 [details] > > Patch > > > > Can we get a microbenchmark for this? > > I could try to create one but it might not be worth the time. Alekandr, > would you be willing to let us copy your Wasm module into our repo as a test > case? Sure! Do you need only a wasm module? If you want I can create an all-in-one wasm file with dhry2 test embedded, that you need to simply run.
Keith Miller
Comment 15 2020-10-24 21:13:35 PDT
(In reply to Alekandr Guryanov from comment #14) > (In reply to Keith Miller from comment #13) > > (In reply to Saam Barati from comment #12) > > > Comment on attachment 412207 [details] > > > Patch > > > > > > Can we get a microbenchmark for this? > > > > I could try to create one but it might not be worth the time. Alekandr, > > would you be willing to let us copy your Wasm module into our repo as a test > > case? > > Sure! Do you need only a wasm module? If you want I can create an all-in-one > wasm file with dhry2 test embedded, that you need to simply run. Sure, if it's not too much work, an all-in-one is probably a bit easier to work with.
Keith Miller
Comment 16 2020-10-24 21:13:54 PDT
> Sure, if it's not too much work, an all-in-one is probably a bit easier to > work with. Thanks!
Alekandr Guryanov
Comment 17 2020-10-26 02:02:43 PDT
Created attachment 412294 [details] all-in-one-test This is a wasm file with an embedded dhry2 test (function names are not stripped).
Alekandr Guryanov
Comment 18 2021-02-25 19:25:12 PST
Even on Safari 14.4 performance is still very bad.
Alekandr Guryanov
Comment 19 2021-02-25 19:30:35 PST
Created attachment 421600 [details] Performance on Safari 14.4
Robin Morisset
Comment 20 2021-11-08 14:01:38 PST
Sorry for taking so long to come back to this bug. The problem is that 25k for the option --maximumTmpsForGraphColoring is too small for some benchmarks (including the Dhry2 in js-dosbox example in the original comment), causing the performance regressions; but anything significantly above 25k caused jetsams on some other webpages (which was the original reason for this limit to exist). The jetsams were coming from two data-structures: the interference graph used by AirAllocateRegistersByGraphColoring.cpp and the interference graph used by AirAllocateStackByGraphColoring.cpp. I massively reduced their respective memory consumption in: - https://trac.webkit.org/changeset/277714/webkit - https://trac.webkit.org/changeset/278186/webkit One of the pages that used to jetsam with a larger value of --maximumTmpsForGraphColoring was mruby-wasm.aotoki.dev: - the first of the above patches reduced the memory consumption of the register allocator from 2 x 262MB to 2 x 20MB (2x because of trying to compile two functions in parallel) - the second patch reduced the memory consumption of the stack allocator on the same functions from 221MB + 399MB to 6.5MB + 10.5MB Since I saw similar reductions on every benchmark I tried, I believed it is now safe to try to increase --maximumTmpsForGraphColoring. I verified on the example you provided, there are exactly two functions with > 25k temporaries (54845 and 54856). Increasing --maximumTmpsForGraphColoring from 25k to 60k increased the "VAX" score on my machine (ToT minibrowser on an M1 MBP 2020) from about 27 to about 70.
Robin Morisset
Comment 21 2021-11-08 14:17:23 PST
Yusuke Suzuki
Comment 22 2021-11-08 15:39:53 PST
Comment on attachment 443607 [details] Patch r=me
Alekandr Guryanov
Comment 23 2021-11-08 22:01:52 PST
(In reply to Robin Morisset from comment #20) > I verified on the example you provided, there are exactly two functions with > > 25k temporaries (54845 and 54856). Increasing > --maximumTmpsForGraphColoring from 25k to 60k increased the "VAX" score on > my machine (ToT minibrowser on an M1 MBP 2020) from about 27 to about 70. So this also means if I split this two functions into smaller somehow (like split them) it will increase performance on same value, correct?
Robin Morisset
Comment 24 2021-11-09 13:19:11 PST
Comment on attachment 443607 [details] Patch Thanks Yusuke for the review.
Robin Morisset
Comment 25 2021-11-09 13:22:27 PST
(In reply to Alekandr Guryanov from comment #23) > (In reply to Robin Morisset from comment #20) > > > I verified on the example you provided, there are exactly two functions with > > > 25k temporaries (54845 and 54856). Increasing > > --maximumTmpsForGraphColoring from 25k to 60k increased the "VAX" score on > > my machine (ToT minibrowser on an M1 MBP 2020) from about 27 to about 70. > > So this also means if I split this two functions into smaller somehow (like > split them) it will increase performance on same value, correct? Correct. If you want to improve the performance on the currently shipping version of Safari, keeping these functions split should help a lot. I believe these giant functions typically occur because of very aggressive inlining by LLVM/Emscripten, so preventing some of it somehow might be the easiest way to achieve this.
EWS
Comment 26 2021-11-09 13:56:14 PST
Committed r285533 (244050@main): <https://commits.webkit.org/244050@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 443607 [details].
Brent Fulgham
Comment 27 2022-02-04 13:36:57 PST
This change should be present in STP 139, iOS 15.4 Beta, and macOS 12.3 Beta.
Note You need to log in before you can comment on or make changes to this bug.