Bug 194081 - B3 should use associativity to optimize expression trees
Summary: B3 should use associativity to optimize expression trees
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Robin Morisset
URL:
Keywords: InRadar
Depends on:
Blocks: 154319
  Show dependency treegraph
 
Reported: 2019-01-30 23:00 PST by Robin Morisset
Modified: 2019-04-04 16:52 PDT (History)
8 users (show)

See Also:


Attachments
Patch (17.83 KB, patch)
2019-02-12 14:27 PST, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (17.83 KB, patch)
2019-02-12 14:41 PST, Robin Morisset
rmorisset: review-
Details | Formatted Diff | Diff
Patch (28.55 KB, patch)
2019-03-29 11:43 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (27.91 KB, patch)
2019-03-29 11:57 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (28.15 KB, patch)
2019-03-29 12:28 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (45.83 KB, patch)
2019-04-03 15:31 PDT, Robin Morisset
fpizlo: review+
Details | Formatted Diff | Diff
Patch (45.48 KB, patch)
2019-04-03 18:25 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (45.47 KB, patch)
2019-04-03 18:26 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Morisset 2019-01-30 23:00:43 PST
Currently, it is only using associativity for Add (and not even that comprehensively if I read the code correctly). So for example (3+x) + (y+4) is correctly reduced to x+y+7, but (3*x)*(y*7) is not reduced to x*y*12.
It should apply to Mul, BitOr and BitAnd.
Comment 1 Saam Barati 2019-01-30 23:38:31 PST
(In reply to Robin Morisset from comment #0)
> Currently, it is only using associativity for Add (and not even that
> comprehensively if I read the code correctly). So for example (3+x) + (y+4)
> is correctly reduced to x+y+7, but (3*x)*(y*7) is not reduced to x*y*12.
> It should apply to Mul, BitOr and BitAnd.

That’s surprising.
Comment 2 Robin Morisset 2019-02-12 14:27:20 PST
Created attachment 361843 [details]
Patch
Comment 3 EWS Watchlist 2019-02-12 14:31:59 PST
Attachment 361843 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:648:  One line control clauses should not use braces.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:934:  An else should appear on the same line as the preceding }  [whitespace/newline] [4]
ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:1043:  An else should appear on the same line as the preceding }  [whitespace/newline] [4]
ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:1096:  An else should appear on the same line as the preceding }  [whitespace/newline] [4]
ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2182:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2270:  One line control clauses should not use braces.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2273:  Missing space before {  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2278:  One line control clauses should not use braces.  [whitespace/braces] [4]
Total errors found: 8 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Robin Morisset 2019-02-12 14:41:34 PST
Created attachment 361846 [details]
Patch

Just fixed the style nits.
Comment 5 Filip Pizlo 2019-03-25 14:31:17 PDT
Comment on attachment 361846 [details]
Patch

Can you add testb3 tests for at least some cases of this?
Comment 6 Robin Morisset 2019-03-27 14:48:33 PDT
(In reply to Filip Pizlo from comment #5)
> Comment on attachment 361846 [details]
> Patch
> 
> Can you add testb3 tests for at least some cases of this?

I did not add tests because several of the already existing tests stress this code (and found bugs in earlier versions of this patch), but sure I can add a few more.

I looked a the performance impact of this patch:
https://perf-safari.apple.com/v3/#/analysis/task/6346
It is quite variable, with some large wins and losses, but the overall result is neutral.
After thinking some more about it, I can see several issues with my approach:
- The compile time can be quadratic in pathological cases, due to walking the graph forward, it would be a lot better to walk it backwards.
- We can accidentally duplicate quite a lot of code as we don't have use counts available
- We make the depth of the expression depth (and thus the latency to get its result) linear in its size instead of logarithmic. In other words, we completely sacrifice all ILP.

These problems combined have convinced me that putting this in B3ReduceStrength was the wrong approach. I should have a separate pass that does nothing but these associativity reorderings, trying to produce perfectly balanced expression trees, quickly and without code duplication. The fact that even the currently flawed approach got some serious speed-ups is encouraging that this would be a win.
Comment 7 Robin Morisset 2019-03-29 11:43:01 PDT
Created attachment 366286 [details]
Patch

New approach as a new pass, and with some extra tests.
I'll provide perf results as soon as I have them.
Comment 8 Robin Morisset 2019-03-29 11:57:59 PDT
Created attachment 366288 [details]
Patch

This time with a patch file that actually applies cleanly (made from svn diff and not git diff that seems to make patch files for new files that svn cannot read).
Comment 9 EWS Watchlist 2019-03-29 12:02:59 PDT
Attachment 366288 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:81:  Missing space before ( in switch(  [whitespace/parens] [5]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:87:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:131:  Non-label code inside switch statements should be indented.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:130:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:131:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:136:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:137:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:138:  When wrapping a line, only indent 4 spaces.  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:139:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:140:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:141:  When wrapping a line, only indent 4 spaces.  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:142:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:143:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:144:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:145:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:186:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:214:  Missing space around : in range-based for statement  [whitespace/colon] [4]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:217:  Missing space around : in range-based for statement  [whitespace/colon] [4]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:248:  Missing space around : in range-based for statement  [whitespace/colon] [4]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:267:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:268:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
Total errors found: 21 in 10 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Robin Morisset 2019-03-29 12:28:15 PDT
Created attachment 366294 [details]
Patch

Fix style nits and missing import in testB3, that for some reason only the gtk build complained about.
Comment 11 Robin Morisset 2019-03-29 12:40:40 PDT
The missing import was visibly not enough to appease the gtk-wk2 bot.. I will have to look deeper into what the linker error message means.
Comment 12 Robin Morisset 2019-03-30 08:54:25 PDT
The results from the bots on MacBookPro just came in and they are bad (overall neutral, but all of the subtests with highly significant changes were regressed). I should understand why before trying to land this.
Comment 13 Robin Morisset 2019-04-02 14:36:41 PDT
Comment on attachment 366294 [details]
Patch

I've got the performance numbers from this patch on JetStream2 and they're overall neutral.
So I looked in more detail at when this optimization fires in a non-trivial way on all of JetStream2, and there are only a few subtests where it triggers:
- Unipoker: balances some BitOr trees (insignificant slowdown)
- tsf-wasm: balances a ton of BitOr trees (insignificant 1.4% speedup)
- stanford-crypto-sha256 and stanford-crypto-pbkdf2: balances a few BitXor trees (insignificant slowdown on first iteration, strongly significant 2% speedup on average iteration and 3.5% speedup on worst iteration)
- octane-zlib: balances a couple huge Add trees (no change)
- ML: balances one small BitXor tree (no change)
- mandreel: balances a couple (Add/BitAnd) small trees (0.7% insignificant slowdown on first iteration, strongly significant 1% improvement on average iteration and 2% on worst-case iteration)
- HashSet-wasm: balances a few BitOr trees (insignificant speedup)
- base64-SP: only balances one tiny BitXor tree (significant 2% slowdown on average iteration(?), insignificant change otherwise)

So at best we get a couple percent on three benchmarks, and are otherwise neutral.
Most likely we are just seeing noise even in these benchmarks where it balances trees. Surprisingly (to me) it does not find simplification opportunities beyond simple balancing.

I thus put this patch out-of-review, as it does not seem obviously worth it at this point. I plan to revisit it once we have better value analysis in B3. It should enable significantly more simplifications, and make more large trees appear by turning CheckAdd/Sub/Mul into their non-checked variants.
Comment 14 Filip Pizlo 2019-04-02 14:55:31 PDT
(In reply to Robin Morisset from comment #13)
> Comment on attachment 366294 [details]
> Patch
> 
> I've got the performance numbers from this patch on JetStream2 and they're
> overall neutral.
> So I looked in more detail at when this optimization fires in a non-trivial
> way on all of JetStream2, and there are only a few subtests where it
> triggers:
> - Unipoker: balances some BitOr trees (insignificant slowdown)
> - tsf-wasm: balances a ton of BitOr trees (insignificant 1.4% speedup)
> - stanford-crypto-sha256 and stanford-crypto-pbkdf2: balances a few BitXor
> trees (insignificant slowdown on first iteration, strongly significant 2%
> speedup on average iteration and 3.5% speedup on worst iteration)
> - octane-zlib: balances a couple huge Add trees (no change)
> - ML: balances one small BitXor tree (no change)
> - mandreel: balances a couple (Add/BitAnd) small trees (0.7% insignificant
> slowdown on first iteration, strongly significant 1% improvement on average
> iteration and 2% on worst-case iteration)
> - HashSet-wasm: balances a few BitOr trees (insignificant speedup)
> - base64-SP: only balances one tiny BitXor tree (significant 2% slowdown on
> average iteration(?), insignificant change otherwise)
> 
> So at best we get a couple percent on three benchmarks, and are otherwise
> neutral.
> Most likely we are just seeing noise even in these benchmarks where it
> balances trees. Surprisingly (to me) it does not find simplification
> opportunities beyond simple balancing.
> 
> I thus put this patch out-of-review, as it does not seem obviously worth it
> at this point. I plan to revisit it once we have better value analysis in
> B3. It should enable significantly more simplifications, and make more large
> trees appear by turning CheckAdd/Sub/Mul into their non-checked variants.

It's worth the benefit if you make a microbenchmark and it speeds up a lot (>=20%, say) and the rest of these results hold.
Comment 15 Robin Morisset 2019-04-03 15:31:43 PDT
Created attachment 366657 [details]
Patch

I wrote three microbenchmarks. None of them had any speedups. Upon investigating, I found that CSE tends to leave around dead values that are only eliminated much further (in the ReduceStrength just after my optimizations). This was a problem, as my optimization pass gives up when a node got a use count greater than one, and these dead values were using various subtrees.

So I ripped out killDeadCode from B3ReduceStrength, made it its own pass and added it after CSE. This worked: 2 of the three microbenchmarks now show wins of respectively 16% and 42%.

More surprisingly and very happily, this dead code elimination by itself was a massive win on a few other microbenchmarks: 50% on instanceof-always-hit-two and 16% on two variations of licm-dragons.
I suspect it is by enabling B3TailDuplication to do its work, as it runs after CSE, and needs tiny blocks to be able to do anything.

I sent this patch to the bots for more serious performance results on JetStream2, I will update this bug as soon as the results come in.

Beyond that the only difference with the previous patch, is that my pass now skips trees of size 3, as B3ReduceStrength can deal with them. This should help avoid a possible hit to compile time.
Comment 16 Filip Pizlo 2019-04-03 16:50:12 PDT
Comment on attachment 366657 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366657&action=review

r=me with the one nit that EliminateDeadCode doesn't need a class.

> Source/JavaScriptCore/b3/B3EliminateDeadCode.h:46
> +// This class and its run function are public, so that they can be called directly from B3ReduceStrength
> +class EliminateDeadCode {
> +public:
> +    EliminateDeadCode(Procedure& proc)
> +        : m_proc(proc)
> +    {
> +    }
> +
> +    bool run();
> +
> +private:
> +    Procedure& m_proc;
> +};

Seems like this should just be a function - maybe bool eliminateDeadCodeImpl(Procedure&).
Comment 17 Robin Morisset 2019-04-03 18:25:08 PDT
Created attachment 366686 [details]
Patch

Thanks for the review!
I replaced the class EliminateDeadCode by a function eliminateDeadCodeImpl as suggested, and fixed a dataLog that I had forgotten from debugging.
Comment 18 Robin Morisset 2019-04-03 18:26:12 PDT
Created attachment 366687 [details]
Patch

Now without OOPS in changelog.
Comment 19 WebKit Commit Bot 2019-04-03 20:37:29 PDT
Comment on attachment 366687 [details]
Patch

Clearing flags on attachment: 366687

Committed r243851: <https://trac.webkit.org/changeset/243851>
Comment 20 WebKit Commit Bot 2019-04-03 20:37:31 PDT
All reviewed patches have been landed.  Closing bug.
Comment 21 Radar WebKit Bug Importer 2019-04-03 20:38:20 PDT
<rdar://problem/49590146>
Comment 22 Saam Barati 2019-04-04 16:14:54 PDT
What was the result on JetStream 2?
Comment 23 Robin Morisset 2019-04-04 16:52:00 PDT
(In reply to Saam Barati from comment #22)
> What was the result on JetStream 2?

All bots failed :-(
I am trying again with different configurations/machines, maybe I'll be more lucky this time.