WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
225848
Make AirAllocateRegistersByGraphColoring use less memory
https://bugs.webkit.org/show_bug.cgi?id=225848
Summary
Make AirAllocateRegistersByGraphColoring use less memory
Robin Morisset
Reported
2021-05-15 18:46:29 PDT
The main benefit would be to allow us to increase Options::maximumTmpsForGraphColoring without causing more jetsams. It might also help our memory usage in general, and is likely to make the register allocator somewhat faster as well.
Attachments
WIP
(62.72 KB, patch)
2021-05-15 18:47 PDT
,
Robin Morisset
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
WIP
(62.81 KB, patch)
2021-05-15 19:16 PDT
,
Robin Morisset
no flags
Details
Formatted Diff
Diff
Patch
(84.17 KB, patch)
2021-05-17 10:26 PDT
,
Robin Morisset
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(84.17 KB, patch)
2021-05-17 11:29 PDT
,
Robin Morisset
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(84.25 KB, patch)
2021-05-17 11:40 PDT
,
Robin Morisset
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(84.25 KB, patch)
2021-05-17 11:45 PDT
,
Robin Morisset
no flags
Details
Formatted Diff
Diff
Patch
(84.25 KB, patch)
2021-05-17 11:50 PDT
,
Robin Morisset
no flags
Details
Formatted Diff
Diff
Patch
(84.89 KB, patch)
2021-05-17 14:36 PDT
,
Robin Morisset
rmorisset
: review-
rmorisset
: commit-queue-
Details
Formatted Diff
Diff
Patch
(110.08 KB, patch)
2021-05-18 16:26 PDT
,
Robin Morisset
no flags
Details
Formatted Diff
Diff
memory use of the interference graphs in JetStream2
(12.04 KB, text/plain)
2021-05-18 16:26 PDT
,
Robin Morisset
no flags
Details
memory use of the interference graphs while visiting mruby-wasm.aotoki.dev
(1.37 KB, text/plain)
2021-05-18 16:27 PDT
,
Robin Morisset
no flags
Details
Show Obsolete
(8)
View All
Add attachment
proposed patch, testcase, etc.
Robin Morisset
Comment 1
2021-05-15 18:47:07 PDT
Created
attachment 428760
[details]
WIP for EWS
Robin Morisset
Comment 2
2021-05-15 19:16:44 PDT
Created
attachment 428764
[details]
WIP Fix unused variable (and style nits)
Robin Morisset
Comment 3
2021-05-17 10:26:36 PDT
Created
attachment 428844
[details]
Patch
Robin Morisset
Comment 4
2021-05-17 11:29:00 PDT
Created
attachment 428849
[details]
Patch Fixing nits in the test file.
Robin Morisset
Comment 5
2021-05-17 11:40:22 PDT
Created
attachment 428850
[details]
Patch Trying to fix the test file again.
Robin Morisset
Comment 6
2021-05-17 11:45:00 PDT
Comment on
attachment 428850
[details]
Patch test file still has warnings
Robin Morisset
Comment 7
2021-05-17 11:45:52 PDT
Created
attachment 428852
[details]
Patch
Robin Morisset
Comment 8
2021-05-17 11:50:06 PDT
Created
attachment 428853
[details]
Patch
Robin Morisset
Comment 9
2021-05-17 14:31:56 PDT
This patch is unfortunately not enough to run mruby-wasm.aotoki.dev with an increased Options::maxTmpsForGraphColoring on a phone (it still jetsams). I've added some instrumentation and found out why: that webpage has two functions with roughly 30k Tmps (which is OK, tsf-wasm has functions that big with no issues) and >20M edges in the interference graph (which is the source of the problem, for comparison tsf-wasm only has roughly 1M edges in its largest functions). This patch reduces the memory consumption of the interference graph for each of these functions from roughly 250MB to 85MB (assuming an average occupancy rate for the relevant HashSet / SmallSet), and saves a few more MB from the rest of the datastructures in the register allocator (e.g. UseCounts). But considering that both functions are compiled in parallel, and that we keep alive during that time both the Air code and the B3 procedure (with its roughly 450k B3::Value per function..), it is not surprising that we still jetsam. I have some ideas for how to save some more memory in this case, but eventually we might just have to detect while building the interference graph when things are getting out of hand and aborting to the faster/cheaper register allocator. Oh, also this pretty dense interference graph leads to a ton of spilling: after the first iteration the number of edges is down to 3M, but the number of Tmps has increased to 300k. So it is not even clear that we are gaining that much by trying to do fancy graph coloring in this case.
Robin Morisset
Comment 10
2021-05-17 14:36:20 PDT
Created
attachment 428871
[details]
Patch
Robin Morisset
Comment 11
2021-05-18 10:32:54 PDT
Comment on
attachment 428871
[details]
Patch Removing from review as I've got a better version coming up.
Robin Morisset
Comment 12
2021-05-18 16:26:14 PDT
Created
attachment 428992
[details]
Patch Now with an even better datastructure for the interference graph.
Robin Morisset
Comment 13
2021-05-18 16:26:57 PDT
Created
attachment 428993
[details]
memory use of the interference graphs in JetStream2
Robin Morisset
Comment 14
2021-05-18 16:27:34 PDT
Created
attachment 428994
[details]
memory use of the interference graphs while visiting mruby-wasm.aotoki.dev
Robin Morisset
Comment 15
2021-05-18 19:58:33 PDT
Comment on
attachment 428992
[details]
Patch Thanks for the review! I'm sending this to the commit queue, but I'll post more performance numbers here when I get them.
EWS
Comment 16
2021-05-18 20:24:21 PDT
Committed
r277714
(
237893@main
): <
https://commits.webkit.org/237893@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 428992
[details]
.
Radar WebKit Bug Importer
Comment 17
2021-05-18 20:25:17 PDT
<
rdar://problem/78186936
>
Robin Morisset
Comment 18
2021-05-20 13:46:05 PDT
I am starting to get numbers for this patch combined with the typo fix in
https://bugs.webkit.org/show_bug.cgi?id=225977
. On an iPhone7, I've got RAMification numbers. JITTests as a whole got better by 0.63% (p<0.02). Per test, here are the statistically significant improvements, with the score improvement first (based on end and peak memory), and followed by the peak memory improvement: - 3d-raytrace-SP: 1.03%, 2.51% - Box2D: 1.85, 2.23 - OfflineAssembler: 11.21, 19.70 - UniPoker: 0.66, 0.83 - crypto: 0.76, 0.82 - crypto-aes-SP: 1.85, 3.58 - crypto-md5-SP: 5.46, 10.93 - crypto-sha1-SP: 1.01, 1.99 - date-format-tofte-SP: 0.99, 1.63 - date-formap-xparb-SP: 0.59, 0.77 - delta-blue: 0.94, 1.28 - gaussian-blur: 0.37, 0.37 - gcc-loops-wasm: 2.71, 4.83 - lebab-wtb: 4.15, 4.36 - body-SP: 0.49, 0.49 - quicksort-wasm: 4.24, 4.24 - raytrace: 2.97, 5.54 - regexp: 3.92, 7.67 - stanford-crypto-aes: 0.91, 1.62 - stanford-crypto-sha256: 0.12, 0.31 - tsf-wasm: not significant at the score level because of a massive outlier in end memory, 13.52% for peak memory All of these are p<0.02 except for UniPoker which is p<0.05, and there is not one significant memory regression (there is a large one in typescript, but at p>0.1 it looks like pure noise). As expected we see that the wins are mostly on peak memory, and are largest on those subtests which have especially large interference graphs (e.g. tsf-wasm and OfflineAssembler).
Robin Morisset
Comment 19
2021-05-21 00:18:56 PDT
Here are the numbers for JetStream2 on an iPhone 7. This patch (+ the typo fix) increases the total score by 1.27% (p<0.02). It has a significant (p<0.05) effect on the following subtests: - FlightPlanner: regression of 3.73% ! - HashSetWasm: speedup of 7.15% - ML: speedup of 2.80% (p<0.02) - gaussian-blur: speedup of 3.72% (p<0.02) - lelab-wtb: speedup of 1.40% - n-body-SP: speedup of 2.51% - tagcloud-SP: speedup of 23.27% (p<0.02) - tsf-wasm: the runtime is 17.69% faster (p<0.02), but there is no significant effect on the score as a whole because of a gigantic outlier in startup time. While I don't fully understand why some benchmarks win so much more than others, and the FlightPlanner regression is a head-scratcher, overall it looks like a very clear gain. I don't have good numbers for other machines at this point, but it unfortunately looks like more powerful machines don't benefit as much in JetStream2, even though they get the same memory reduction.
Robin Morisset
Comment 20
2021-05-21 23:49:43 PDT
Last batch of performance numbers for this patch: I've tested Speedometer 2 on both an iPhone 7 and an M1 Mac (MacBookAir10,1). The iPhone 7 shows a 0.53% improvement overall (p<0.02), with the following significant effects on subtests: - 1.35% better on Elm - 0.61% better on EmberJS-Debug - 0.81% better on Inferno - 1.17% better on React-Redux - 1.05% worse (!) on Vanilla-ES2015 All of these are statistically significant at p<0.02, and there is no other result significant at p<0.05. I conclude that while the effect is less dramatic than on JetStream2, there is still a clear benefit to minimizing the memory pressure caused by the compiler threads in that case. Unfortunately, the picture on the M1 Mac is different: no effect on the overall score, and the only statistically significant effects on subtests are regressions: - 0.27% worse on EmberJS-Debug (p<0.05) - 1.58% worse on Vanilla-ES2015-Babel-WebPack (p<0.02) - 1.20% worse on Vanilla-ES2015 (p<0.02) I have no clue of what can explain these results.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug