Bug 113862

Summary:

Negative zero checks cause unnecessary speculation failures on SunSpider on ARMv7

Product:

WebKit

Reporter:

Roman Zhuykov <zhroma>

Component:

JavaScriptCore

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

commit-queue, fpizlo, ggaren, msaboff

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Attachments:

Description	Flags
Proposed patch	fpizlo: review-
testcase	none
Proposed patch with layout test	fpizlo: review+
New patch	ggaren: review+, ggaren: commit-queue-
patch with smaller layout tests	none

Roman Zhuykov

Reported 2013-04-03 03:04:27 PDT

While testing SunSpider math-spectral-norm on x86-64 and ARMv7 Linux I found, that there are a lot of "Speculation failures" on ARM, but not on x86. All these speculation failures happen because of the following: The hottest function in math-spectral-norm is A, it is inlined in some other functions on DFG. Function arguments are integer numbers. function A(i,j) { return 1/((i+j)*(i+j+1)/2+i+1); } When DFG JIT creates code for the first division (by 2) -- it adds negative-zero checks. On x86 when ArithDiv divides integers x/y speculation failure happens only when (x == 0 && y < 0), and such check is inserted into assembly only when "!nodeCanIgnoreNegativeZero(node->arithNodeFlags())". But on ARM, FixupPhase thansforms division into 4 operations (convert x do double dx, convert y to double dy, division dz=dx/dy, convert dz to integer z if possible). And the last operation DoubleAsInt32 always makes a speculation failure when its integer result is zero. In the above function A every time when i and j are both zeros -- this check gives a speculation failure. The main idea is that we should check nodeCanIgnoreNegativeZero() for DoubleAsInt32 nodes, and create zero-check only when needed. Also, I found that NodeNeedsNegZero flag is wrongly set to 1 for this division. The result of this division is then added to constant (+i+1). The logic behind the code in BackwardPropagationPhase assumes that when one of ArithAdd operands is non-negative-zero constant we shouldn't check negative zero for the other operand. But the "isNotNegZero" function has wrong implementation: it actually checks for "isNegativeZero". Also in ArithSub node condition that allows to clear NodeNeedsNegZero flag can be extended. Fixing all these issues allows us to have right NeedsNegZero flags and remove some unnecessary negative-zero checks. This gives a 22% speed up on the math-spectral-norm on ARMv7 Linux.

Attachments
Proposed patch (10.93 KB, patch) 2013-04-03 03:08 PDT, Roman Zhuykov	fpizlo: review-	Details Formatted Diff Diff
testcase (207 bytes, text/plain) 2013-04-03 06:45 PDT, Roman Zhuykov	no flags	Details
Proposed patch with layout test (13.16 KB, patch) 2013-04-08 05:58 PDT, Roman Zhuykov	fpizlo: review+	Details Formatted Diff Diff
New patch (23.32 KB, patch) 2013-04-10 07:02 PDT, Roman Zhuykov	ggaren: review+ ggaren: commit-queue-	Details Formatted Diff Diff
patch with smaller layout tests (23.23 KB, patch) 2013-04-25 13:57 PDT, Roman Zhuykov	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Roman Zhuykov

Comment 1 2013-04-03 03:08:26 PDT

Created attachment 196312 [details] Proposed patch

Roman Zhuykov

Comment 2 2013-04-03 06:45:11 PDT

Created attachment 196342 [details] testcase For problem with wrong "isNotNegZero" implementation I just found the following testcase: function main() { var x = 1; var y = -1; var s; var i; for (i = 0; i < 100000; i++) { if (i == 99999) x = 0; s = 1/(x/y+(-0)); } return s; } print(1/(0/-1+(-0))); print(main()); On x86-64 Linux this prints (wrong): -Infinity Infinity While on ARMv7 Linux (right): -Infinity -Infinity My patch fixes this.

Filip Pizlo

Comment 3 2013-04-05 12:08:56 PDT

Comment on attachment 196312 [details] Proposed patch This looks good. But, I am withholding r+ until you add layout tests for the cases where we generated incorrect code.

Roman Zhuykov

Comment 4 2013-04-08 05:58:25 PDT

Created attachment 196848 [details] Proposed patch with layout test Added testcase to layout tests. Is now OK?

Filip Pizlo

Comment 5 2013-04-08 07:25:51 PDT

(In reply to comment #4) > Created an attachment (id=196848) [details] > Proposed patch with layout test > > Added testcase to layout tests. Is now OK? Yup, looks good to me! Did you mean to set r?

Roman Zhuykov

Comment 6 2013-04-08 07:34:17 PDT

Yes, could you please review?

Roman Zhuykov

Comment 7 2013-04-10 07:02:27 PDT

Created attachment 197263 [details] New patch Comparing SunSpider on x86-64 and ARMv7 I found that ArithDiv nodes cause unnecessary speculation failures only in math-spectral-norm, but ArithMod nodes cause them in 8 other tests from SunSpider. Fixing all of them gives significant SunSpider speedup. On x86 when ArithMod divides integers "x%y" speculation failure happens only when "result == 0 && x < 0", but on ARMv7 it checks only "result == 0". The attached patch fixes the problem. As for ArithDiv, speculation checks for ArithMod nodes should be created on all platforms only when NodeNeedsNegZero is set. In special case on ARMv7s, when second operand is power-of-two-constant, there were no speculation checks. The same assembly can be used on ARMv7, but we must insert negative checks on both platforms. Also I found that sometimes speculationCheck is called with Overflow ExitKind, although is should be NegativeZero. And the last -- I see that compileArithNegate calls nodeCanIgnoreNegativeZero(node->arithNodeFlags()), and it wrongly always returns true. So, now there are two new node types where arithNodeFlags should not mask NodeNeedsNegZero -- DoubleAsInt32, ArithNegate. And certainly ArithMul, ArithDiv, ArithMod. Now there are three layout tests in the patch: Negative-zero-divide shows early discussed problem with isNotNegZero implementation, and it fails on x86 without my patch. Negative-zero-modulo shows problem with ArithMod, where the second operand is power-of-two-constant, it should fail on ARMv7s without my patch. Negative-zero-negate shows problem with ArithNegate, it fails on all platforms without my patch. Here are performance results for ARMv7. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.050x as fast 1473.7ms +/- 0.7% 1403.9ms +/- 0.6% significant ============================================================================= 3d: 1.081x as fast 267.3ms +/- 1.8% 247.3ms +/- 2.0% significant cube: - 96.4ms +/- 2.3% 96.3ms +/- 2.5% morph: - 44.6ms +/- 0.8% 44.6ms +/- 1.1% raytrace: 1.187x as fast 126.3ms +/- 2.4% 106.4ms +/- 2.2% significant access: - 149.3ms +/- 1.6% 149.0ms +/- 1.3% binary-trees: - 27.8ms +/- 2.4% 27.7ms +/- 2.7% fannkuch: - 50.3ms +/- 1.0% 50.3ms +/- 0.7% nbody: - 47.2ms +/- 2.5% 47.0ms +/- 2.2% nsieve: - 24.0ms +/- 1.4% 24.0ms +/- 1.4% bitops: - 91.0ms +/- 0.9% 91.0ms +/- 0.9% 3bit-bits-in-byte: - 16.7ms +/- 2.1% 16.7ms +/- 2.1% bits-in-byte: - 23.5ms +/- 1.6% 23.5ms +/- 1.6% bitwise-and: - 23.0ms +/- 1.5% 22.7ms +/- 1.5% nsieve-bits: ?? 27.8ms +/- 1.1% 28.1ms +/- 1.4% not conclusive: might be *1.011x as slow* controlflow: - 25.0ms +/- 2.7% 24.7ms +/- 2.0% recursive: - 25.0ms +/- 2.7% 24.7ms +/- 2.0% crypto: 1.166x as fast 175.2ms +/- 1.6% 150.3ms +/- 1.5% significant aes: 1.179x as fast 80.5ms +/- 1.4% 68.3ms +/- 1.3% significant md5: 1.166x as fast 56.8ms +/- 1.9% 48.7ms +/- 2.5% significant sha1: 1.138x as fast 37.9ms +/- 1.9% 33.3ms +/- 1.0% significant date: 1.030x as fast 170.8ms +/- 1.4% 165.8ms +/- 1.2% significant format-tofte: - 87.5ms +/- 1.6% 87.2ms +/- 1.3% format-xparb: 1.060x as fast 83.3ms +/- 1.6% 78.6ms +/- 1.4% significant math: 1.080x as fast 121.5ms +/- 0.9% 112.5ms +/- 1.3% significant cordic: - 33.6ms +/- 1.5% 33.1ms +/- 1.6% partial-sums: - 51.9ms +/- 0.8% 51.8ms +/- 1.1% spectral-norm: 1.30x as fast 36.0ms +/- 1.3% 27.6ms +/- 2.2% significant regexp: ?? 69.7ms +/- 0.5% 69.9ms +/- 0.6% not conclusive: might be *1.003x as slow* dna: ?? 69.7ms +/- 0.5% 69.9ms +/- 0.6% not conclusive: might be *1.003x as slow* string: 1.027x as fast 403.9ms +/- 0.5% 393.4ms +/- 0.8% significant base64: - 38.8ms +/- 1.2% 38.8ms +/- 1.7% fasta: 1.010x as fast 68.0ms +/- 0.7% 67.3ms +/- 0.7% significant tagcloud: - 94.7ms +/- 1.4% 94.0ms +/- 1.2% unpack-code: 1.019x as fast 148.0ms +/- 0.7% 145.2ms +/- 0.6% significant validate-input: 1.131x as fast 54.4ms +/- 0.7% 48.1ms +/- 0.8% significant

Roman Zhuykov

Comment 8 2013-04-15 05:36:07 PDT

Could you please review the new patch?

Geoffrey Garen

Comment 9 2013-04-25 12:33:07 PDT

Comment on attachment 197263 [details] New patch View in context: https://bugs.webkit.org/attachment.cgi?id=197263&action=review r=me cq- because of the layout test issue. Any committer can fix the layout test and then land, or you can post a new patch, and I can cq+ it. > LayoutTests/fast/js/regress/script-tests/negative-zero-divide.js:6 > + for (i = 0; i < 100000; i++) { Please change this to 100. That's our de facto limit for getting code to run in the DFG, without making regression tests run too long. > LayoutTests/fast/js/regress/script-tests/negative-zero-divide.js:7 > + if (i == 99999) And 99 here. > LayoutTests/fast/js/regress/script-tests/negative-zero-modulo.js:6 > + for (i = 0; i < 100000; i++) { > + if (i == 99999) Ditto. > LayoutTests/fast/js/regress/script-tests/negative-zero-negate.js:6 > + for (i = 0; i < 100000; i++) { > + if (i == 99999) Ditto.

Roman Zhuykov

Comment 10 2013-04-25 13:57:58 PDT

Created attachment 199746 [details] patch with smaller layout tests (In reply to comment #9) > (From update of attachment 197263 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=197263&action=review > > r=me > > cq- because of the layout test issue. Any committer can fix the layout test and then land, or you can post a new patch, and I can cq+ it. > > > LayoutTests/fast/js/regress/script-tests/negative-zero-divide.js:6 > > + for (i = 0; i < 100000; i++) { > > Please change this to 100. That's our de facto limit for getting code to run in the DFG, without making regression tests run too long. Thank you for your review. Unfortunately, "i < 100" is not enough. In script-tests directory there are 95 tests, 71 of them have loops with more than 10000 iterations, 55 tests have more than 100000 loop iterations. I've just tested my negative-zero-negate.js on ARMv7 Linux. The test starts to fail (without my patch) only with "i < 1094" or more loop iterations. Checking with DFG_ENABLE_DEBUG_VERBOSE=1 shows, that there are no DFG compilations when I use "i < 1093" or smaller values. I have created the new patch with "i < 2001". This is fast enough, and give some reserve if DFG-starting parameters would change later. Is patch now OK?

Filip Pizlo

Comment 11 2013-04-25 14:00:13 PDT

(In reply to comment #10) > Created an attachment (id=199746) [details] > patch with smaller layout tests > > (In reply to comment #9) > > (From update of attachment 197263 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=197263&action=review > > > > r=me > > > > cq- because of the layout test issue. Any committer can fix the layout test and then land, or you can post a new patch, and I can cq+ it. > > > > > LayoutTests/fast/js/regress/script-tests/negative-zero-divide.js:6 > > > + for (i = 0; i < 100000; i++) { > > > > Please change this to 100. That's our de facto limit for getting code to run in the DFG, without making regression tests run too long. > > Thank you for your review. Unfortunately, "i < 100" is not enough. In script-tests directory there are 95 tests, 71 of them have loops with more than 10000 iterations, 55 tests have more than 100000 loop iterations. I've just tested my negative-zero-negate.js on ARMv7 Linux. The test starts to fail (without my patch) only with "i < 1094" or more loop iterations. Checking with DFG_ENABLE_DEBUG_VERBOSE=1 shows, that there are no DFG compilations when I use "i < 1093" or smaller values. I have created the new patch with "i < 2001". This is fast enough, and give some reserve if DFG-starting parameters would change later. Is patch now OK? The usual trick is to use a helper function. Repeated function invocation triggers compilation sooner, than repeated loop iteration.

WebKit Commit Bot

Comment 12 2013-04-25 15:51:32 PDT

Comment on attachment 199746 [details] patch with smaller layout tests Clearing flags on attachment: 199746 Committed r149152: <http://trac.webkit.org/changeset/149152>

WebKit Commit Bot

Comment 13 2013-04-25 15:51:34 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.