Bug 217290

Summary:

Using WASM function size as the cap for choosing a register allocator causes performance regressions.

Product:

WebKit

Reporter:

Alekandr Guryanov <caiiiycuk>

Component:

WebAssembly

Assignee:

Robin Morisset <rmorisset>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

alonzakai, bfulgham, chi187, ews-watchlist, fpizlo, keith_miller, mark.lam, msaboff, rmorisset, saam, tzagallo, webkit-bug-importer, ysuzuki

Priority:

Keywords:

InRadar

Version:

Safari Technology Preview

Hardware:

Unspecified

OS:

Unspecified

Attachments:

Description	Flags
Safari 13.1.1	none
Safari 14.0.0	none
Safari 14.0.1	none
Safari TP	none
Patch	none
all-in-one-test	none
Performance on Safari 14.4	none
Patch	none

Alekandr Guryanov

Reported 2020-10-04 01:29:54 PDT

Created attachment 410455 [details] Safari 13.1.1 Hi. I am the author of the js-dos project (dosbox in js). After updating to Safari 14 I see a dramatic performance loss in js-dos dhry2 test. It reproduced on all devices: iphone 6, iphone 7, iphone 8, iphone x, macos catalina. All Safari starting from 14 affected (I tested 14.0, 14.0.1 and Technology Preview). The test is a variation of dhry 2 test for dos. It runs inside js-dosbox and reports operation count. Time is measured with Date.now(). So it only does math operations and estimate time. You can run this test using link. https://dos.zone/en/play/https%3A%2F%2Fdoszone-uploads.s3.dualstack.eu-central-1.amazonaws.com%2Foriginal%2F2X%2Fb%2Fb4b5275904d86a4ab8a20917b2b7e34f0df47bf7.jsdos My results are in attached screenshots. Results from (MacBook Pro (13-inch, 2017, Two Thunderbolt 3 ports, Catalina 10.15.17): Safari 13.1.1 - (640_000, VAX: 48) Safari 14.0.0 - (80_000, VAX: 5.5) Safari 14.0.1 - (80_000, VAX: 5.5) Safari TP - (80_000, VAX: 5.9) Firefox - (640_000, VAX: 36.9) -- I also tried to test in recent builds of webkit (267874, 267919, 267940). But, always have this error: MacBook-Pro-Alexander:~ caiiiycuk$ /Users/caiiiycuk/Downloads/267940/run-webkit-archive ; exit; Setting DYLD FRAMEWORK and LIBRARY paths to /Users/caiiiycuk/Downloads/267940/Release 2020-10-04 15:21:50.936 SafariForWebKitDevelopment[4346:301066] -[BrowserWKView _setInspectorDelegate:]: unrecognized selector sent to instance 0x7fef794c9c30 2020-10-04 15:21:50.936 SafariForWebKitDevelopment[4346:301066] -[BrowserWKView _setInspectorDelegate:]: unrecognized selector sent to instance 0x7fef794c9c30 2020-10-04 15:21:50.947 SafariForWebKitDevelopment[4346:301066] .sdef warning for argument 'FileType' of command 'save' in suite 'Standard Suite': 'saveable file format' is not a valid type name. 2020-10-04 15:21:50.995 SafariForWebKitDevelopment[4346:301134] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'SandboxBroker': Connection invalid 2020-10-04 15:21:51.083 com.apple.WebKit.WebContent.Development[4348:301125] Error loading /Library/Apple/System/Library/StagedFrameworks/Safari/Safari.framework/Safari: dlopen(/Library/Apple/System/Library/StagedFrameworks/Safari/Safari.framework/Safari, 265): Symbol not found: _OBJC_CLASS_$_WBSContentBlockerStatisticsSQLiteStore Referenced from: /Library/Apple/System/Library/StagedFrameworks/Safari/SafariSharedUI.framework/Versions/A/SafariSharedUI Expected in: /System/Library/PrivateFrameworks/SafariShared.framework/Versions/A/SafariShared in /Library/Apple/System/Library/StagedFrameworks/Safari/SafariSharedUI.framework/Versions/A/SafariSharedUI InjectedBundle::load failed - Could not load the executable from the bundle. 2020-10-04 15:22:01.457 SafariForWebKitDevelopment[4346:301066] -[BrowserWKView _setInspectorDelegate:]: unrecognized selector sent to instance 0x7fef7d529cc0

Attachments
Safari 13.1.1 (142.44 KB, image/png) 2020-10-04 01:29 PDT, Alekandr Guryanov	no flags	Details
Safari 14.0.0 (133.23 KB, image/png) 2020-10-04 01:30 PDT, Alekandr Guryanov	no flags	Details
Safari 14.0.1 (128.71 KB, image/png) 2020-10-04 01:30 PDT, Alekandr Guryanov	no flags	Details
Safari TP (151.85 KB, image/png) 2020-10-04 01:31 PDT, Alekandr Guryanov	no flags	Details
Patch (8.98 KB, patch) 2020-10-23 13:22 PDT, Keith Miller	no flags	Details Formatted Diff Diff
all-in-one-test (695.87 KB, application/zip) 2020-10-26 02:02 PDT, Alekandr Guryanov	no flags	Details
Performance on Safari 14.4 (61.36 KB, image/jpeg) 2021-02-25 19:30 PST, Alekandr Guryanov	no flags	Details
Patch (2.99 KB, patch) 2021-11-08 14:17 PST, Robin Morisset	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Alekandr Guryanov

Comment 1 2020-10-04 01:30:24 PDT

Created attachment 410456 [details] Safari 14.0.0

Alekandr Guryanov

Comment 2 2020-10-04 01:30:49 PDT

Created attachment 410457 [details] Safari 14.0.1

Alekandr Guryanov

Comment 3 2020-10-04 01:31:12 PDT

Created attachment 410458 [details] Safari TP

Alekandr Guryanov

Comment 4 2020-10-04 02:14:27 PDT

Minimal test: https://js-dos.com/v7/build/examples/dhry2.html There are 2 buttons: 1) Start worker - run test in worker 2) Start direct - run test without worker Results are nearly same.

Radar WebKit Bug Importer

Comment 5 2020-10-04 14:23:54 PDT

<rdar://problem/69934870>

Saam Barati

Comment 6 2020-10-15 17:06:41 PDT

Thanks for your report. Will try to reproduce

Keith Miller

Comment 7 2020-10-23 11:16:18 PDT

My guess is that this is related to: https://bugs.webkit.org/show_bug.cgi?id=212105

Keith Miller

Comment 8 2020-10-23 11:29:13 PDT

(In reply to Keith Miller from comment #7) > My guess is that this is related to: > https://bugs.webkit.org/show_bug.cgi?id=212105 I was able to bisect this to d8a0fb4bf380350af94c8cf7c399471e1186b93c..4fbc651e99458aeb44610e57d25d88fe7c1c498a and that bug looks like the most likely culprit.

Keith Miller

Comment 9 2020-10-23 13:22:01 PDT

Created attachment 412207 [details] Patch

Michael Saboff

Comment 10 2020-10-23 13:36:58 PDT

Comment on attachment 412207 [details] Patch r=me

EWS

Comment 11 2020-10-23 15:53:34 PDT

Committed r268942: <https://trac.webkit.org/changeset/268942> All reviewed patches have been landed. Closing bug and clearing flags on attachment 412207 [details].

Saam Barati

Comment 12 2020-10-23 17:50:39 PDT

Comment on attachment 412207 [details] Patch Can we get a microbenchmark for this?

Keith Miller

Comment 13 2020-10-23 20:29:03 PDT

(In reply to Saam Barati from comment #12) > Comment on attachment 412207 [details] > Patch > > Can we get a microbenchmark for this? I could try to create one but it might not be worth the time. Alekandr, would you be willing to let us copy your Wasm module into our repo as a test case?

Alekandr Guryanov

Comment 14 2020-10-24 06:00:45 PDT

(In reply to Keith Miller from comment #13) > (In reply to Saam Barati from comment #12) > > Comment on attachment 412207 [details] > > Patch > > > > Can we get a microbenchmark for this? > > I could try to create one but it might not be worth the time. Alekandr, > would you be willing to let us copy your Wasm module into our repo as a test > case? Sure! Do you need only a wasm module? If you want I can create an all-in-one wasm file with dhry2 test embedded, that you need to simply run.

Keith Miller

Comment 15 2020-10-24 21:13:35 PDT

(In reply to Alekandr Guryanov from comment #14) > (In reply to Keith Miller from comment #13) > > (In reply to Saam Barati from comment #12) > > > Comment on attachment 412207 [details] > > > Patch > > > > > > Can we get a microbenchmark for this? > > > > I could try to create one but it might not be worth the time. Alekandr, > > would you be willing to let us copy your Wasm module into our repo as a test > > case? > > Sure! Do you need only a wasm module? If you want I can create an all-in-one > wasm file with dhry2 test embedded, that you need to simply run. Sure, if it's not too much work, an all-in-one is probably a bit easier to work with.

Keith Miller

Comment 16 2020-10-24 21:13:54 PDT

> Sure, if it's not too much work, an all-in-one is probably a bit easier to > work with. Thanks!

Alekandr Guryanov

Comment 17 2020-10-26 02:02:43 PDT

Created attachment 412294 [details] all-in-one-test This is a wasm file with an embedded dhry2 test (function names are not stripped).

Alekandr Guryanov

Comment 18 2021-02-25 19:25:12 PST

Even on Safari 14.4 performance is still very bad.

Alekandr Guryanov

Comment 19 2021-02-25 19:30:35 PST

Created attachment 421600 [details] Performance on Safari 14.4

Robin Morisset

Comment 20 2021-11-08 14:01:38 PST

Sorry for taking so long to come back to this bug. The problem is that 25k for the option --maximumTmpsForGraphColoring is too small for some benchmarks (including the Dhry2 in js-dosbox example in the original comment), causing the performance regressions; but anything significantly above 25k caused jetsams on some other webpages (which was the original reason for this limit to exist). The jetsams were coming from two data-structures: the interference graph used by AirAllocateRegistersByGraphColoring.cpp and the interference graph used by AirAllocateStackByGraphColoring.cpp. I massively reduced their respective memory consumption in: - https://trac.webkit.org/changeset/277714/webkit - https://trac.webkit.org/changeset/278186/webkit One of the pages that used to jetsam with a larger value of --maximumTmpsForGraphColoring was mruby-wasm.aotoki.dev: - the first of the above patches reduced the memory consumption of the register allocator from 2 x 262MB to 2 x 20MB (2x because of trying to compile two functions in parallel) - the second patch reduced the memory consumption of the stack allocator on the same functions from 221MB + 399MB to 6.5MB + 10.5MB Since I saw similar reductions on every benchmark I tried, I believed it is now safe to try to increase --maximumTmpsForGraphColoring. I verified on the example you provided, there are exactly two functions with > 25k temporaries (54845 and 54856). Increasing --maximumTmpsForGraphColoring from 25k to 60k increased the "VAX" score on my machine (ToT minibrowser on an M1 MBP 2020) from about 27 to about 70.

Robin Morisset

Comment 21 2021-11-08 14:17:23 PST

Created attachment 443607 [details] Patch

Yusuke Suzuki

Comment 22 2021-11-08 15:39:53 PST

Comment on attachment 443607 [details] Patch r=me

Alekandr Guryanov

Comment 23 2021-11-08 22:01:52 PST

(In reply to Robin Morisset from comment #20) > I verified on the example you provided, there are exactly two functions with > > 25k temporaries (54845 and 54856). Increasing > --maximumTmpsForGraphColoring from 25k to 60k increased the "VAX" score on > my machine (ToT minibrowser on an M1 MBP 2020) from about 27 to about 70. So this also means if I split this two functions into smaller somehow (like split them) it will increase performance on same value, correct?

Robin Morisset

Comment 24 2021-11-09 13:19:11 PST

Comment on attachment 443607 [details] Patch Thanks Yusuke for the review.

Robin Morisset

Comment 25 2021-11-09 13:22:27 PST

(In reply to Alekandr Guryanov from comment #23) > (In reply to Robin Morisset from comment #20) > > > I verified on the example you provided, there are exactly two functions with > > > 25k temporaries (54845 and 54856). Increasing > > --maximumTmpsForGraphColoring from 25k to 60k increased the "VAX" score on > > my machine (ToT minibrowser on an M1 MBP 2020) from about 27 to about 70. > > So this also means if I split this two functions into smaller somehow (like > split them) it will increase performance on same value, correct? Correct. If you want to improve the performance on the currently shipping version of Safari, keeping these functions split should help a lot. I believe these giant functions typically occur because of very aggressive inlining by LLVM/Emscripten, so preventing some of it somehow might be the easiest way to achieve this.

EWS

Comment 26 2021-11-09 13:56:14 PST

Committed r285533 (244050@main): <https://commits.webkit.org/244050@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 443607 [details].

Brent Fulgham

Comment 27 2022-02-04 13:36:57 PST

This change should be present in STP 139, iOS 15.4 Beta, and macOS 12.3 Beta.

Note You need to log in before you can comment on or make changes to this bug.