194081 – B3 should use associativity to optimize expression trees

RESOLVED FIXED 194081

B3 should use associativity to optimize expression trees

https://bugs.webkit.org/show_bug.cgi?id=194081

Summary B3 should use associativity to optimize expression trees

Robin Morisset

Reported 2019-01-30 23:00:43 PST

Currently, it is only using associativity for Add (and not even that comprehensively if I read the code correctly). So for example (3+x) + (y+4) is correctly reduced to x+y+7, but (3*x)*(y*7) is not reduced to x*y*12. It should apply to Mul, BitOr and BitAnd.

Attachments
Patch (17.83 KB, patch) 2019-02-12 14:27 PST, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (17.83 KB, patch) 2019-02-12 14:41 PST, Robin Morisset	rmorisset: review-	Details Formatted Diff Diff
Patch (28.55 KB, patch) 2019-03-29 11:43 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (27.91 KB, patch) 2019-03-29 11:57 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (28.15 KB, patch) 2019-03-29 12:28 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (45.83 KB, patch) 2019-04-03 15:31 PDT, Robin Morisset	fpizlo: review+	Details Formatted Diff Diff
Patch (45.48 KB, patch) 2019-04-03 18:25 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Patch (45.47 KB, patch) 2019-04-03 18:26 PDT, Robin Morisset	no flags	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Saam Barati

Comment 1 2019-01-30 23:38:31 PST

(In reply to Robin Morisset from comment #0) > Currently, it is only using associativity for Add (and not even that > comprehensively if I read the code correctly). So for example (3+x) + (y+4) > is correctly reduced to x+y+7, but (3*x)*(y*7) is not reduced to x*y*12. > It should apply to Mul, BitOr and BitAnd. That’s surprising.

Robin Morisset

Comment 2 2019-02-12 14:27:20 PST

Created attachment 361843 [details] Patch

EWS Watchlist

Comment 3 2019-02-12 14:31:59 PST

Attachment 361843 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:648: One line control clauses should not use braces. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:934: An else should appear on the same line as the preceding } [whitespace/newline] [4] ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:1043: An else should appear on the same line as the preceding } [whitespace/newline] [4] ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:1096: An else should appear on the same line as the preceding } [whitespace/newline] [4] ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2182: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2270: One line control clauses should not use braces. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2273: Missing space before { [whitespace/braces] [5] ERROR: Source/JavaScriptCore/b3/B3ReduceStrength.cpp:2278: One line control clauses should not use braces. [whitespace/braces] [4] Total errors found: 8 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.

Robin Morisset

Comment 4 2019-02-12 14:41:34 PST

Created attachment 361846 [details] Patch Just fixed the style nits.

Filip Pizlo

Comment 5 2019-03-25 14:31:17 PDT

Comment on attachment 361846 [details] Patch Can you add testb3 tests for at least some cases of this?

Robin Morisset

Comment 6 2019-03-27 14:48:33 PDT

(In reply to Filip Pizlo from comment #5) > Comment on attachment 361846 [details] > Patch > > Can you add testb3 tests for at least some cases of this? I did not add tests because several of the already existing tests stress this code (and found bugs in earlier versions of this patch), but sure I can add a few more. I looked a the performance impact of this patch: https://perf-safari.apple.com/v3/#/analysis/task/6346 It is quite variable, with some large wins and losses, but the overall result is neutral. After thinking some more about it, I can see several issues with my approach: - The compile time can be quadratic in pathological cases, due to walking the graph forward, it would be a lot better to walk it backwards. - We can accidentally duplicate quite a lot of code as we don't have use counts available - We make the depth of the expression depth (and thus the latency to get its result) linear in its size instead of logarithmic. In other words, we completely sacrifice all ILP. These problems combined have convinced me that putting this in B3ReduceStrength was the wrong approach. I should have a separate pass that does nothing but these associativity reorderings, trying to produce perfectly balanced expression trees, quickly and without code duplication. The fact that even the currently flawed approach got some serious speed-ups is encouraging that this would be a win.

Robin Morisset

Comment 7 2019-03-29 11:43:01 PDT

Created attachment 366286 [details] Patch New approach as a new pass, and with some extra tests. I'll provide perf results as soon as I have them.

Robin Morisset

Comment 8 2019-03-29 11:57:59 PDT

Created attachment 366288 [details] Patch This time with a patch file that actually applies cleanly (made from svn diff and not git diff that seems to make patch files for new files that svn cannot read).

EWS Watchlist

Comment 9 2019-03-29 12:02:59 PDT

Attachment 366288 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:81: Missing space before ( in switch( [whitespace/parens] [5] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:87: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:131: Non-label code inside switch statements should be indented. [whitespace/indent] [4] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:130: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:131: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:136: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:137: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:138: When wrapping a line, only indent 4 spaces. [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:139: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:140: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:141: When wrapping a line, only indent 4 spaces. [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:142: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:143: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:144: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:145: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:186: More than one command on the same line [whitespace/newline] [4] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:214: Missing space around : in range-based for statement [whitespace/colon] [4] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:217: Missing space around : in range-based for statement [whitespace/colon] [4] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:248: Missing space around : in range-based for statement [whitespace/colon] [4] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:267: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] ERROR: Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp:268: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Total errors found: 21 in 10 files If any of these errors are false positives, please file a bug against check-webkit-style.

Robin Morisset

Comment 10 2019-03-29 12:28:15 PDT

Created attachment 366294 [details] Patch Fix style nits and missing import in testB3, that for some reason only the gtk build complained about.

Robin Morisset

Comment 11 2019-03-29 12:40:40 PDT

The missing import was visibly not enough to appease the gtk-wk2 bot.. I will have to look deeper into what the linker error message means.

Robin Morisset

Comment 12 2019-03-30 08:54:25 PDT

The results from the bots on MacBookPro just came in and they are bad (overall neutral, but all of the subtests with highly significant changes were regressed). I should understand why before trying to land this.

Robin Morisset

Comment 13 2019-04-02 14:36:41 PDT

Comment on attachment 366294 [details] Patch I've got the performance numbers from this patch on JetStream2 and they're overall neutral. So I looked in more detail at when this optimization fires in a non-trivial way on all of JetStream2, and there are only a few subtests where it triggers: - Unipoker: balances some BitOr trees (insignificant slowdown) - tsf-wasm: balances a ton of BitOr trees (insignificant 1.4% speedup) - stanford-crypto-sha256 and stanford-crypto-pbkdf2: balances a few BitXor trees (insignificant slowdown on first iteration, strongly significant 2% speedup on average iteration and 3.5% speedup on worst iteration) - octane-zlib: balances a couple huge Add trees (no change) - ML: balances one small BitXor tree (no change) - mandreel: balances a couple (Add/BitAnd) small trees (0.7% insignificant slowdown on first iteration, strongly significant 1% improvement on average iteration and 2% on worst-case iteration) - HashSet-wasm: balances a few BitOr trees (insignificant speedup) - base64-SP: only balances one tiny BitXor tree (significant 2% slowdown on average iteration(?), insignificant change otherwise) So at best we get a couple percent on three benchmarks, and are otherwise neutral. Most likely we are just seeing noise even in these benchmarks where it balances trees. Surprisingly (to me) it does not find simplification opportunities beyond simple balancing. I thus put this patch out-of-review, as it does not seem obviously worth it at this point. I plan to revisit it once we have better value analysis in B3. It should enable significantly more simplifications, and make more large trees appear by turning CheckAdd/Sub/Mul into their non-checked variants.

Filip Pizlo

Comment 14 2019-04-02 14:55:31 PDT

(In reply to Robin Morisset from comment #13) > Comment on attachment 366294 [details] > Patch > > I've got the performance numbers from this patch on JetStream2 and they're > overall neutral. > So I looked in more detail at when this optimization fires in a non-trivial > way on all of JetStream2, and there are only a few subtests where it > triggers: > - Unipoker: balances some BitOr trees (insignificant slowdown) > - tsf-wasm: balances a ton of BitOr trees (insignificant 1.4% speedup) > - stanford-crypto-sha256 and stanford-crypto-pbkdf2: balances a few BitXor > trees (insignificant slowdown on first iteration, strongly significant 2% > speedup on average iteration and 3.5% speedup on worst iteration) > - octane-zlib: balances a couple huge Add trees (no change) > - ML: balances one small BitXor tree (no change) > - mandreel: balances a couple (Add/BitAnd) small trees (0.7% insignificant > slowdown on first iteration, strongly significant 1% improvement on average > iteration and 2% on worst-case iteration) > - HashSet-wasm: balances a few BitOr trees (insignificant speedup) > - base64-SP: only balances one tiny BitXor tree (significant 2% slowdown on > average iteration(?), insignificant change otherwise) > > So at best we get a couple percent on three benchmarks, and are otherwise > neutral. > Most likely we are just seeing noise even in these benchmarks where it > balances trees. Surprisingly (to me) it does not find simplification > opportunities beyond simple balancing. > > I thus put this patch out-of-review, as it does not seem obviously worth it > at this point. I plan to revisit it once we have better value analysis in > B3. It should enable significantly more simplifications, and make more large > trees appear by turning CheckAdd/Sub/Mul into their non-checked variants. It's worth the benefit if you make a microbenchmark and it speeds up a lot (>=20%, say) and the rest of these results hold.

Robin Morisset

Comment 15 2019-04-03 15:31:43 PDT

Created attachment 366657 [details] Patch I wrote three microbenchmarks. None of them had any speedups. Upon investigating, I found that CSE tends to leave around dead values that are only eliminated much further (in the ReduceStrength just after my optimizations). This was a problem, as my optimization pass gives up when a node got a use count greater than one, and these dead values were using various subtrees. So I ripped out killDeadCode from B3ReduceStrength, made it its own pass and added it after CSE. This worked: 2 of the three microbenchmarks now show wins of respectively 16% and 42%. More surprisingly and very happily, this dead code elimination by itself was a massive win on a few other microbenchmarks: 50% on instanceof-always-hit-two and 16% on two variations of licm-dragons. I suspect it is by enabling B3TailDuplication to do its work, as it runs after CSE, and needs tiny blocks to be able to do anything. I sent this patch to the bots for more serious performance results on JetStream2, I will update this bug as soon as the results come in. Beyond that the only difference with the previous patch, is that my pass now skips trees of size 3, as B3ReduceStrength can deal with them. This should help avoid a possible hit to compile time.

Filip Pizlo

Comment 16 2019-04-03 16:50:12 PDT

Comment on attachment 366657 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=366657&action=review r=me with the one nit that EliminateDeadCode doesn't need a class. > Source/JavaScriptCore/b3/B3EliminateDeadCode.h:46 > +// This class and its run function are public, so that they can be called directly from B3ReduceStrength > +class EliminateDeadCode { > +public: > + EliminateDeadCode(Procedure& proc) > + : m_proc(proc) > + { > + } > + > + bool run(); > + > +private: > + Procedure& m_proc; > +}; Seems like this should just be a function - maybe bool eliminateDeadCodeImpl(Procedure&).

Robin Morisset

Comment 17 2019-04-03 18:25:08 PDT

Created attachment 366686 [details] Patch Thanks for the review! I replaced the class EliminateDeadCode by a function eliminateDeadCodeImpl as suggested, and fixed a dataLog that I had forgotten from debugging.

Robin Morisset

Comment 18 2019-04-03 18:26:12 PDT

Created attachment 366687 [details] Patch Now without OOPS in changelog.

WebKit Commit Bot

Comment 19 2019-04-03 20:37:29 PDT

Comment on attachment 366687 [details] Patch Clearing flags on attachment: 366687 Committed r243851: <https://trac.webkit.org/changeset/243851>

WebKit Commit Bot

Comment 20 2019-04-03 20:37:31 PDT

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 21 2019-04-03 20:38:20 PDT

<rdar://problem/49590146>

Saam Barati

Comment 22 2019-04-04 16:14:54 PDT

What was the result on JetStream 2?

Robin Morisset

Comment 23 2019-04-04 16:52:00 PDT

(In reply to Saam Barati from comment #22) > What was the result on JetStream 2? All bots failed :-( I am trying again with different configurations/machines, maybe I'll be more lucky this time.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Robin Morisset

Reported

2019-01-30 23:00 PST

Modified

2019-04-04 16:52 PDT History

CC List

8 users Show

URL

Keywords InRadar

Depends on

Blocks

154319

Dependencies

tree graph