Bug 133136 - Latest emscripten life benchmark is 4x slower because the DFG doesn't realize that arithmetic on booleans is a thing
Summary: Latest emscripten life benchmark is 4x slower because the DFG doesn't realize...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-20 14:20 PDT by Alon Zakai
Modified: 2014-05-30 11:53 PDT (History)
9 users (show)

See Also:


Attachments
it begins (28.95 KB, patch)
2014-05-23 15:53 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
moar! (50.10 KB, patch)
2014-05-24 14:27 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
almost done (77.42 KB, patch)
2014-05-25 11:25 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (115.33 KB, patch)
2014-05-25 19:16 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (114.99 KB, patch)
2014-05-25 20:17 PDT, Filip Pizlo
oliver: review+
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2 (727.82 KB, application/zip)
2014-05-25 21:46 PDT, Build Bot
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alon Zakai 2014-05-20 14:20:53 PDT
The emscripten-asmjs benchmarks on arewefastyet.com have been updated (new builds with latest emscripten). JSC is faster on all of those but one (which is expected, as the new benchmarks used a newer LLVM, emscripten improvements, etc.), but one of them, the life benchmark, became 4x slower. Source of benchmark is at

https://github.com/haytjes/arewefastyet/blob/master/benchmarks/asmjs-ubench/life.js

This is a microbenchmark and perhaps not that interesting, and the new builds speed up JSC everywhere else, especially on the large benchmarks. Still, perhaps something is being hit here that might show up in other places.

Measurement was done by arewefastyet itself ( http://arewefastyet.com/#machine=12&view=breakdown&suite=asmjs-ubench ) and happens on both 64 and 32 bit OS X, which I think means with and without FTL.
Comment 1 Filip Pizlo 2014-05-23 14:29:48 PDT
Oh, LMAO!

You're doing a clever thing where you add a boolean to a number, and we're being very dumb about it - we end up:

- Predicting that the result of boolean + number will be a string.  That's false.

- Deciding that the best way to execute boolean + number is with the deepest of addition slow paths.  That's dumb.

I'll fix it.
Comment 2 Filip Pizlo 2014-05-23 15:53:46 PDT
Created attachment 231997 [details]
it begins

I'm going to make a comprehensive fix here.  In most places where we would speculate integer or number, we can accept booleans also.  We should of course be smart about it - like, if we have it on good authority that the input to a math operation is *only* integers then we shouldn't emit code to allow booleans.  I think that also we need to tweak the slow path counting in the baseline JIT - the baseline will think that booleans require taking slow path even though they basically don't.  It'll be interesting to figure out what to do about that.
Comment 3 Filip Pizlo 2014-05-24 14:27:28 PDT
Created attachment 232027 [details]
moar!

Some key concepts:

- Except for the prediction propagation and type fixup phases, which are super early in the pipeline, nobody has to know about the fact that booleans may flow into numerical operations because there will just be a BooleanToNumber node that will take a value and, if that value is a boolean, will convert it to the equivalent numerical value.  It will have a BooleanUse mode where it will also speculate that the input is a boolean but it can also do UntypedUse in which case it will pass through any non-booleans.  This operation is very easy to model in all of the compiler tiers.

- No changes to the baseline JIT.  The Baseline JIT will still believe that boolean inputs require taking the slow path and it will still report that it took slow path for any such operations.  The DFG will now be smart enough to ignore baseline JIT slow path profiling on operations that were known to have had boolean inputs.  That's a little quirky, but it's probably easier than modifying the baseline JIT to track booleans correctly.

Most of this change is very mechanical and I'm not quite done yet.  Soon though!
Comment 4 Filip Pizlo 2014-05-25 11:25:10 PDT
Created attachment 232042 [details]
almost done

It compiles.  Now I just need to get it to work.
Comment 5 Filip Pizlo 2014-05-25 18:42:24 PDT
It works.  I will post a patch soon.  I think I need to think about some good tests.
Comment 6 Filip Pizlo 2014-05-25 19:16:56 PDT
Created attachment 232049 [details]
the patch
Comment 7 WebKit Commit Bot 2014-05-25 19:19:18 PDT
Attachment 232049 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/dfg/DFGConstantFoldingPhase.cpp:83:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
ERROR: Source/JavaScriptCore/dfg/DFGConstantFoldingPhase.cpp:83:  Multi line control clauses should use braces.  [whitespace/braces] [4]
Total errors found: 2 in 77 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Filip Pizlo 2014-05-25 19:19:53 PDT
(In reply to comment #7)
> Attachment 232049 [details] did not pass style-queue:
> 
> 
> ERROR: Source/JavaScriptCore/dfg/DFGConstantFoldingPhase.cpp:83:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]

Fixed.

> ERROR: Source/JavaScriptCore/dfg/DFGConstantFoldingPhase.cpp:83:  Multi line control clauses should use braces.  [whitespace/braces] [4]

Nope.

> Total errors found: 2 in 77 files
> 
> 
> If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Darin Adler 2014-05-25 19:45:28 PDT
Comment on attachment 232049 [details]
the patch

EFL EWS failure seems to be due to an additional warning that the compiler has:

JavaScriptCore/dfg/DFGFixupPhase.cpp:1588:15: error: variable 'currentValue' set but not used [-Werror=unused-but-set-variable]

I think it’s possible it detected a real problem; perhaps m_insertionSet.insertNode should be using currentValue instead of m_currentNode.
Comment 10 Filip Pizlo 2014-05-25 19:57:58 PDT
(In reply to comment #9)
> (From update of attachment 232049 [details])
> EFL EWS failure seems to be due to an additional warning that the compiler has:
> 
> JavaScriptCore/dfg/DFGFixupPhase.cpp:1588:15: error: variable 'currentValue' set but not used [-Werror=unused-but-set-variable]
> 
> I think it’s possible it detected a real problem; perhaps m_insertionSet.insertNode should be using currentValue instead of m_currentNode.

Ah!  It's actually the opposite - the variable shouldn't be there.  It's a vestige of an earlier version of the patch.  That branch in which it is assigned is also a vestige.
Comment 11 Filip Pizlo 2014-05-25 20:17:52 PDT
Created attachment 232050 [details]
the patch
Comment 12 Build Bot 2014-05-25 21:46:13 PDT
Comment on attachment 232050 [details]
the patch

Attachment 232050 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/5333382259015680

New failing tests:
media/W3C/video/src/src_reflects_attribute_not_source_elements.html
Comment 13 Build Bot 2014-05-25 21:46:16 PDT
Created attachment 232056 [details]
Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-10  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 14 Filip Pizlo 2014-05-26 10:40:23 PDT
(In reply to comment #13)
> Created an attachment (id=232056) [details]
> Archive of layout-test-results from webkit-ews-10 for mac-mountainlion-wk2
> 
> The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
> Bot: webkit-ews-10  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5

This failure doesn't look related to anything I did.
Comment 15 Filip Pizlo 2014-05-26 10:44:31 PDT
Landed in http://trac.webkit.org/changeset/169354
Comment 16 Csaba Osztrogonác 2014-05-27 01:01:14 PDT
(In reply to comment #15)
> Landed in http://trac.webkit.org/changeset/169354

It broke 10 JSC stress tests on the Apple 32 bit buildbot:

** The following JSC stress test failures have been introduced:
	stress/tricky-array-bounds-checks.js.always-trigger-copy-phase
	stress/tricky-array-bounds-checks.js.default
	stress/tricky-array-bounds-checks.js.default-ftl
	stress/tricky-array-bounds-checks.js.dfg-eager
	stress/tricky-array-bounds-checks.js.dfg-eager-no-cjit-validate
	stress/tricky-array-bounds-checks.js.ftl-eager
	stress/tricky-array-bounds-checks.js.ftl-eager-no-cjit
	stress/tricky-array-bounds-checks.js.ftl-no-cjit-no-inline-validate
	stress/tricky-array-bounds-checks.js.ftl-no-cjit-validate
	stress/tricky-array-bounds-checks.js.no-cjit-validate-phases

Results for JSC stress tests:
    10 failures found.
Comment 18 Geoffrey Garen 2014-05-27 11:24:57 PDT
<rdar://problem/17040598>
Comment 19 Alon Zakai 2014-05-30 11:46:03 PDT
Great stuff! Massive speedup on arewefastyet due to this.

Just fyi, v8 has a similar issue, and has apparently decided not to optimize it (the results of which can also be seen on arewefastyet). We are therefore looking into workarounds in emscripten for v8, which will probably involve emitting |0 or such on boolean-generating expressions in the contexts where v8 requires it (which will increase code size, but hopefully not too badly). Please let me know if there are any concerns on your side regarding that, but I assume especially since the optimizations here, that |0 etc. on booleans would not be a problem for you?
Comment 20 Filip Pizlo 2014-05-30 11:53:05 PDT
(In reply to comment #19)
> Great stuff! Massive speedup on arewefastyet due to this.
> 
> Just fyi, v8 has a similar issue, and has apparently decided not to optimize it (the results of which can also be seen on arewefastyet). We are therefore looking into workarounds in emscripten for v8, which will probably involve emitting |0 or such on boolean-generating expressions in the contexts where v8 requires it (which will increase code size, but hopefully not too badly). Please let me know if there are any concerns on your side regarding that, but I assume especially since the optimizations here, that |0 etc. on booleans would not be a problem for you?

Nope, it shouldn't be a problem.

The downside is that you're emitting more code, which is unfortunate.  It seems silly that V8 isn't optimizing a fairly sensible feature of JavaScript.  After all, in C++ I can do:

bool x = ...
int y = x + 1;

And it works just like in JS. ;-)