WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
157168
REGRESSION(
r200208
): It made 2 JSC stress tests fail on x86
https://bugs.webkit.org/show_bug.cgi?id=157168
Summary
REGRESSION(r200208): It made 2 JSC stress tests fail on x86
Csaba Osztrogonác
Reported
2016-04-29 02:25:59 PDT
ARMv7 Thumb2 (JSCOnly port): ----------------------------- before:
https://build.webkit.org/builders/JSCOnly%20Linux%20ARMv7%20Thumb2%20Release/builds/415
after:
https://build.webkit.org/builders/JSCOnly%20Linux%20ARMv7%20Thumb2%20Release/builds/416
GTK x86: --------- before:
https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/60939
after:
https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/60940
new failures: stress/math-pow-stable-results.js.always-trigger-copy-phase stress/math-pow-stable-results.js.default stress/math-pow-stable-results.js.dfg-eager stress/math-pow-stable-results.js.dfg-eager-no-cjit-validate stress/math-pow-stable-results.js.dfg-maximal-flush-validate-no-cjit stress/math-pow-stable-results.js.no-cjit-validate-phases stress/math-pow-stable-results.js.no-llint stress/math-pow-with-constants.js.always-trigger-copy-phase stress/math-pow-with-constants.js.default stress/math-pow-with-constants.js.dfg-maximal-flush-validate-no-cjit stress/math-pow-with-constants.js.no-cjit-validate-phases stress/math-pow-with-constants.js.no-llint It's strange that no regression on Apple Mac x86 bots, GTK ARMv7 Thumb2 and JSCOnly ARMv7 with ARM instruction set.
Attachments
disassembly of stress/math-pow-stable-results.js.default on ARMv7 Thumb2
(1.14 MB, text/plain)
2016-05-02 03:22 PDT
,
Csaba Osztrogonác
no flags
Details
a preprocessed JSC source
(3.70 MB, text/plain)
2016-05-02 03:23 PDT
,
Csaba Osztrogonác
no flags
Details
Patch
(2.68 KB, patch)
2016-05-15 21:16 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(2.68 KB, patch)
2016-05-15 22:02 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(3.23 KB, patch)
2016-05-15 23:38 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(4.13 KB, patch)
2016-05-16 18:02 PDT
,
Yusuke Suzuki
benjamin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2016-04-29 09:15:32 PDT
<
rdar://problem/26004839
>
Benjamin Poulain
Comment 2
2016-04-29 14:20:04 PDT
Landed
http://trac.webkit.org/changeset/200263
to get more information. Is it possible to get all the compile flags of those Linux build? Something like fast-math could cause something like this. Ossy, do you have access to the hardware? Getting those binaries would be a start. If you could also disassemble the test that would be even better.
Benjamin Poulain
Comment 3
2016-04-29 15:23:54 PDT
I cannot reproduce on my Mac with a 32bits build. Yusuke, can you reproduce by any chance? Can you have a quick look?
Yusuke Suzuki
Comment 4
2016-04-30 03:03:33 PDT
(In reply to
comment #3
)
> I cannot reproduce on my Mac with a 32bits build. > > Yusuke, can you reproduce by any chance? Can you have a quick look?
OK, I'll check this on Linux x86 hardware in GTK port :)
Yusuke Suzuki
Comment 5
2016-04-30 05:38:16 PDT
Setting up the 32bit LXC container. I'll build JSC on it.
Yusuke Suzuki
Comment 6
2016-04-30 06:40:24 PDT
I've just built and tested. And I cannot reproduce the results. Currently, I think this regression is due to the processor's capabilities. Seeing the original patch, I've found that we use sqrtDouble without seeing supportsFloatingPointSqrt(). sqrtDouble is only allowed in SSE2 environment in x86. I'll create the "attempt-to-fix" patch. ossy@, could you try it when I submit it here?
Yusuke Suzuki
Comment 7
2016-04-30 06:45:06 PDT
(In reply to
comment #6
)
> I've just built and tested. > And I cannot reproduce the results. > > Currently, I think this regression is due to the processor's capabilities. > Seeing the original patch, I've found that we use sqrtDouble without seeing > supportsFloatingPointSqrt(). sqrtDouble is only allowed in SSE2 environment > in x86. > I'll create the "attempt-to-fix" patch. ossy@, could you try it when I > submit it here?
Hmm, but, in this case, DFG is not enabled (supportsFloatingPoint() also relies on SSE2).
Yusuke Suzuki
Comment 8
2016-04-30 07:09:25 PDT
ossy@, could you dump the /proc/cpuinfo of this machine?
Csaba Osztrogonác
Comment 9
2016-05-02 02:27:54 PDT
(In reply to
comment #8
)
> ossy@, could you dump the /proc/cpuinfo of this machine?
It is an Arndale Octa board - 4 x Cortex A7 + 4x Cortex A15 (But only same cores can run at the same time)
http://www.arndaleboard.org/wiki/index.php/Main_Page
$ cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 3 (v7l) Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc0f CPU revision : 3 [snip] (same for all processors: 0,1,2,3) Hardware : SAMSUNG EXYNOS5 (Flattened Device Tree) Revision : 0000 Serial : 0000000000000000
Yusuke Suzuki
Comment 10
2016-05-02 02:42:36 PDT
(In reply to
comment #9
)
> (In reply to
comment #8
) > > ossy@, could you dump the /proc/cpuinfo of this machine? > > It is an Arndale Octa board - 4 x Cortex A7 + 4x Cortex A15 > (But only same cores can run at the same time) > >
http://www.arndaleboard.org/wiki/index.php/Main_Page
> > $ cat /proc/cpuinfo > processor : 0 > model name : ARMv7 Processor rev 3 (v7l) > Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 > idiva idivt vfpd32 lpae > CPU implementer : 0x41 > CPU architecture: 7 > CPU variant : 0x2 > CPU part : 0xc0f > CPU revision : 3 > > [snip] (same for all processors: 0,1,2,3) > > Hardware : SAMSUNG EXYNOS5 (Flattened Device Tree) > Revision : 0000 > Serial : 0000000000000000
How about x86 32bit machine? Are sse (and others) enabled? My tested x86 LXC 32bit container (i686 Ubuntu 14.04) is 8 cores (hyperthreading), SSE, SSE2, SSE3, SSE4.1, SSE4.2, and AVX enabled. $ cat /proc/cpuinfo processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2860QM CPU @ 2.50GHz stepping : 7 microcode : 0x1a cpu MHz : 800.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid bogomips : 4999.71 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: $ uname -a Linux my32bitbox 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 i686 i686 i686 GNU/Linux
Csaba Osztrogonác
Comment 11
2016-05-02 02:46:38 PDT
(In reply to
comment #6
)
> I've just built and tested. > And I cannot reproduce the results. > > Currently, I think this regression is due to the processor's capabilities. > Seeing the original patch, I've found that we use sqrtDouble without seeing > supportsFloatingPointSqrt(). sqrtDouble is only allowed in SSE2 environment > in x86. > I'll create the "attempt-to-fix" patch. ossy@, could you try it when I > submit it here?
I can try patches only on our ARM board, I don't have any x86 hardware. GTK guys can help, since they maintain the GTK x86 bot.
Csaba Osztrogonác
Comment 12
2016-05-02 03:07:16 PDT
(In reply to
comment #2
)
> Landed
http://trac.webkit.org/changeset/200263
to get more information.
It seems it is a rounding issue:
https://build.webkit.org/builders/JSCOnly%20Linux%20ARMv7%20Thumb2%20Release/builds/452
Running stress/math-pow-stable-results.js.always-trigger-copy-phase stress/math-pow-stable-results.js.default: Exception: Failed opaquePow with base = 1.4426950408889634 exponent = -0.5 expected (0.8325546111576977) got (0.8325546111576978) stress/math-pow-stable-results.js.default: ERROR: Unexpected exit code: 3 FAIL: stress/math-pow-stable-results.js.default stress/math-pow-stable-results.js.always-trigger-copy-phase: Exception: Failed opaquePow with base = 1.4426950408889634 exponent = -0.5 expected (0.8325546111576977) got (0.8325546111576978) Running stress/math-pow-stable-results.js.no-llint stress/math-pow-stable-results.js.always-trigger-copy-phase: ERROR: Unexpected exit code: 3 FAIL: stress/math-pow-stable-results.js.always-trigger-copy-phase Running stress/math-pow-stable-results.js.no-cjit-validate-phases stress/math-pow-stable-results.js.no-llint: Exception: Failed opaquePow with base = 2 exponent = -0.5 expected (0.7071067811865475) got (0.7071067811865476) stress/math-pow-stable-results.js.no-llint: ERROR: Unexpected exit code: 3 FAIL: stress/math-pow-stable-results.js.no-llint Running stress/math-pow-stable-results.js.dfg-eager stress/math-pow-stable-results.js.no-cjit-validate-phases: Exception: Failed opaquePow with base = 2 exponent = -0.5 expected (0.7071067811865475) got (0.7071067811865476) stress/math-pow-stable-results.js.no-cjit-validate-phases: ERROR: Unexpected exit code: 3 FAIL: stress/math-pow-stable-results.js.no-cjit-validate-phases Running stress/math-pow-stable-results.js.dfg-eager-no-cjit-validate stress/math-pow-stable-results.js.dfg-eager: Exception: Failed opaquePow with base = 2 exponent = -0.5 expected (0.7071067811865475) got (0.7071067811865476) stress/math-pow-stable-results.js.dfg-eager: ERROR: Unexpected exit code: 3 FAIL: stress/math-pow-stable-results.js.dfg-eager Running stress/math-pow-stable-results.js.dfg-maximal-flush-validate-no-cjit stress/math-pow-stable-results.js.dfg-eager-no-cjit-validate: Exception: Failed opaquePow with base = 2 exponent = -0.5 expected (0.7071067811865475) got (0.7071067811865476) stress/math-pow-stable-results.js.dfg-eager-no-cjit-validate: ERROR: Unexpected exit code: 3 FAIL: stress/math-pow-stable-results.js.dfg-eager-no-cjit-validate Running stress/regexp-exec-effect-after-exception.js.default stress/math-pow-stable-results.js.dfg-maximal-flush-validate-no-cjit: Exception: Failed opaquePow with base = 2 exponent = -0.5 expected (0.7071067811865475) got (0.7071067811865476) stress/math-pow-stable-results.js.dfg-maximal-flush-validate-no-cjit: ERROR: Unexpected exit code: 3 FAIL: stress/math-pow-stable-results.js.dfg-maximal-flush-validate-no-cjit
> Is it possible to get all the compile flags of those Linux build? Something > like fast-math could cause something like this.
> Ossy, do you have access to the hardware? Getting those binaries would be a > start. If you could also disassemble the test that would be even better.
Sure, I'll upload the disassembly soon.
Csaba Osztrogonác
Comment 13
2016-05-02 03:22:39 PDT
Created
attachment 277902
[details]
disassembly of stress/math-pow-stable-results.js.default on ARMv7 Thumb2 working directory: .tests/stress $ ../../.vm/JavaScriptCore.framework/Resources/jsc --dumpDisassembly=true --useFTLJIT\=false --useFunctionDotArguments\=true math-pow-stable-results.js
Csaba Osztrogonác
Comment 14
2016-05-02 03:23:41 PDT
Created
attachment 277903
[details]
a preprocessed JSC source to be able to check all build time flags
Csaba Osztrogonác
Comment 15
2016-05-02 06:52:55 PDT
A clean build fixed it on the ARMv7 Thumb2 JSCOnly bot:
https://build.webkit.org/builders/JSCOnly%20Linux%20ARMv7%20Thumb2%20Release/builds/453
Let's see if clean build fixes it on x86 too:
https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/61021
Yusuke Suzuki
Comment 16
2016-05-02 09:10:54 PDT
(In reply to
comment #15
)
> A clean build fixed it on the ARMv7 Thumb2 JSCOnly bot: >
https://build.webkit.org/builders/JSCOnly%20Linux%20ARMv7%20Thumb2%20Release/
> builds/453 > > Let's see if clean build fixes it on x86 too: >
https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/61021
Nice. OK.
Yusuke Suzuki
Comment 17
2016-05-02 10:08:14 PDT
(In reply to
comment #16
)
> (In reply to
comment #15
) > > A clean build fixed it on the ARMv7 Thumb2 JSCOnly bot: > >
https://build.webkit.org/builders/JSCOnly%20Linux%20ARMv7%20Thumb2%20Release/
> > builds/453 > > > > Let's see if clean build fixes it on x86 too: > >
https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/61021
> > Nice. OK.
x86 32bit is still failing. It seems that this error is due to the precision. I'm now considering about x87.
Csaba Osztrogonác
Comment 18
2016-05-10 02:00:10 PDT
It would be great to know if GTK x86 bot has SSE2 support or not.
Yusuke Suzuki
Comment 19
2016-05-15 14:44:41 PDT
I've just extracted C runtime code (operationMathPow()), and executed it.
https://gist.github.com/Constellation/2ae6dc8ff7bc4d0bb7870503d654848a
Interestingly, I've got the 9.652406070129542e-160 in 32/64 environment. Even if we enable/disable sse in 32bit (80bit double is used in x87), we got the same result. I think, there is some bugs in JIT code. I'll investigate it.
Yusuke Suzuki
Comment 20
2016-05-15 16:19:13 PDT
Ah, OK. I think I've found the cause of this. This is due to x87's 80bit double precision. The result depends on fldl / fstpl. They perform 80 => 64 / 64 => 80 bit rouding.
https://gist.github.com/Constellation/2ae6dc8ff7bc4d0bb7870503d654848a
We got the following disasm from the above code. One is compiled without any options, one is compiled with -O3. No option:
https://gist.github.com/Constellation/ff034fa6929c2f3570108214df5fe30b
O3:
https://gist.github.com/Constellation/c161ff854cd3615a7226527d83e9cbe5
In particular, we should focus on the following part. 93 // If the exponent is a small positive int32 integer, we do a fast exponentiation 94 double result = 1; 95 while (yAsInt) { 96 if (yAsInt & 1) 97 result *= x; 98 x *= x; 99 yAsInt >>= 1; 100 } 101 return result; O0: 80486da: d9 e8 fld1 80486dc: dd 5d e8 fstpl -0x18(%ebp) 80486df: 83 7d e4 00 cmpl $0x0,-0x1c(%ebp) 80486e3: 74 21 je 8048706 <_Z16operationMathPowdd+0x1eb> 80486e5: 8b 45 e4 mov -0x1c(%ebp),%eax 80486e8: 83 e0 01 and $0x1,%eax 80486eb: 85 c0 test %eax,%eax 80486ed: 74 09 je 80486f8 <_Z16operationMathPowdd+0x1dd> 80486ef: dd 45 e8 fldl -0x18(%ebp) 80486f2: dc 4d d0 fmull -0x30(%ebp) 80486f5: dd 5d e8 fstpl -0x18(%ebp) 80486f8: dd 45 d0 fldl -0x30(%ebp) 80486fb: dc 4d d0 fmull -0x30(%ebp) 80486fe: dd 5d d0 fstpl -0x30(%ebp) 8048701: d1 7d e4 sarl -0x1c(%ebp) 8048704: eb d9 jmp 80486df <_Z16operationMathPowdd+0x1c4> 8048706: dd 45 e8 fldl -0x18(%ebp) 8048709: eb 16 jmp 8048721 <_Z16operationMathPowdd+0x206> O3: 80486c9: dd d8 fstp %st(0) 80486cb: d9 e8 fld1 80486cd: eb 03 jmp 80486d2 <_Z16operationMathPowdd+0x142> 80486cf: 90 nop 80486d0: d9 c9 fxch %st(1) 80486d2: a8 01 test $0x1,%al 80486d4: 74 0a je 80486e0 <_Z16operationMathPowdd+0x150> 80486d6: d8 c9 fmul %st(1),%st 80486d8: d9 c9 fxch %st(1) 80486da: eb 06 jmp 80486e2 <_Z16operationMathPowdd+0x152> 80486dc: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi 80486e0: d9 c9 fxch %st(1) 80486e2: d1 f8 sar %eax 80486e4: d8 c8 fmul %st(0),%st 80486e6: 75 e8 jne 80486d0 <_Z16operationMathPowdd+0x140> 80486e8: dd d8 fstp %st(0) 80486ea: e9 33 ff ff ff jmp 8048622 <_Z16operationMathPowdd+0x92> In O0 ver, we always perform fstpl / fldl onto the result of multiplying. This performs 80bit => 64bit rounding. But, in O3 ver, we don't do that. It means that multiplying is calculated in 80bit in x87. As a result, it causes the precision problem. If operationMathPow is compiled like O3 cases, C runtime: It performs 80bit multiplying. And we round 80bit => 64bit at the end of operationMathPow. DFG: It always uses SSE. So there is no rounding problem. (SSE registers are 64bit).
Yusuke Suzuki
Comment 21
2016-05-15 17:27:50 PDT
The default flag for mfpmath in clang is sse in OS X. That's why 32bit OS X port does not show this problem.
Yusuke Suzuki
Comment 22
2016-05-15 21:16:43 PDT
Created
attachment 278998
[details]
Patch
Yusuke Suzuki
Comment 23
2016-05-15 22:02:46 PDT
Created
attachment 279000
[details]
Patch
Yusuke Suzuki
Comment 24
2016-05-15 23:38:07 PDT
Created
attachment 279002
[details]
Patch
Yusuke Suzuki
Comment 25
2016-05-15 23:48:49 PDT
Comment on
attachment 279002
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=279002&action=review
Added comments.
> Source/JavaScriptCore/ChangeLog:18 > + occur in OS X clang 32bit environment. This is because `-mfpmath=sse` is enabled by default in OS X clang 32bit.
Currently, in this patch, I use `volatile double` approach. But there are several alternative designs. (1) Do not use this fast path in operationMathPow & DFG JIT if `CPU(X86) && (!defined(__SSE2_MATH__) || !defined(__SSE2__))`. This is more reliable than using `volatile double`. But it incurs some performance penalty in DFG JIT compared to the current case, since DFG JIT gives up emitting this fast path even if SSE2 is available. (2) Emit x87 in DFG when `CPU(X86) && (!defined(__SSE2_MATH__) || !defined(__SSE2__))`. I think it's not good.
Yusuke Suzuki
Comment 26
2016-05-16 00:04:45 PDT
Comment on
attachment 279002
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=279002&action=review
>> Source/JavaScriptCore/ChangeLog:18 >> + occur in OS X clang 32bit environment. This is because `-mfpmath=sse` is enabled by default in OS X clang 32bit. > > Currently, in this patch, I use `volatile double` approach. But there are several alternative designs. > > (1) Do not use this fast path in operationMathPow & DFG JIT if `CPU(X86) && (!defined(__SSE2_MATH__) || !defined(__SSE2__))`. > > This is more reliable than using `volatile double`. But it incurs some performance penalty in DFG JIT compared to the current case, > since DFG JIT gives up emitting this fast path even if SSE2 is available. > > (2) Emit x87 in DFG when `CPU(X86) && (!defined(__SSE2_MATH__) || !defined(__SSE2__))`. > > I think it's not good.
Reading this[1], `volatile double` seems OK. [1]:
https://gcc.gnu.org/wiki/x87note
Yusuke Suzuki
Comment 27
2016-05-16 00:07:29 PDT
Comment on
attachment 279002
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=279002&action=review
>>> Source/JavaScriptCore/ChangeLog:18 >>> + occur in OS X clang 32bit environment. This is because `-mfpmath=sse` is enabled by default in OS X clang 32bit. >> >> Currently, in this patch, I use `volatile double` approach. But there are several alternative designs. >> >> (1) Do not use this fast path in operationMathPow & DFG JIT if `CPU(X86) && (!defined(__SSE2_MATH__) || !defined(__SSE2__))`. >> >> This is more reliable than using `volatile double`. But it incurs some performance penalty in DFG JIT compared to the current case, >> since DFG JIT gives up emitting this fast path even if SSE2 is available. >> >> (2) Emit x87 in DFG when `CPU(X86) && (!defined(__SSE2_MATH__) || !defined(__SSE2__))`. >> >> I think it's not good. > > Reading this[1], `volatile double` seems OK. > > [1]:
https://gcc.gnu.org/wiki/x87note
And this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
Alex Christensen
Comment 28
2016-05-16 10:50:42 PDT
Comment on
attachment 279002
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=279002&action=review
> Source/JavaScriptCore/runtime/MathCommon.cpp:438 > if (static_cast<double>(yAsInt) == y && yAsInt > 0 && yAsInt <= maxExponentForIntegerMathPow) {
This should work if yAsInt == 0, too. It would correctly return 1, and it would do so faster than mathPowInternal. We could probably also do special optimizations for yAsInt < 0, but there would probably be lots of edge cases and precision problems to worry about. Same with non-{-0.5, 0.5} multiples of 0.5, which probably aren't very common.
> Source/JavaScriptCore/runtime/MathCommon.cpp:448 > + typedef volatile double DoubleValue;
Will we have to do this elsewhere, too? I've solved 80-bit precision (actually 64-bit mantissa) rounding issues on Windows with _control_fp and _PC_53, but that changes the rounding mode globally for the application.
Benjamin Poulain
Comment 29
2016-05-16 17:23:20 PDT
Comment on
attachment 279002
[details]
Patch Good job finding the issue! I wish there was a way for gcc to use SSE for the entire JSC. I am sure there are plenty of similar bugs that are not covered by tests :(
Yusuke Suzuki
Comment 30
2016-05-16 18:02:55 PDT
Created
attachment 279077
[details]
Patch
Yusuke Suzuki
Comment 31
2016-05-16 18:04:16 PDT
Comment on
attachment 279002
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=279002&action=review
>> Source/JavaScriptCore/runtime/MathCommon.cpp:438 >> if (static_cast<double>(yAsInt) == y && yAsInt > 0 && yAsInt <= maxExponentForIntegerMathPow) { > > This should work if yAsInt == 0, too. It would correctly return 1, and it would do so faster than mathPowInternal. > We could probably also do special optimizations for yAsInt < 0, but there would probably be lots of edge cases and precision problems to worry about. Same with non-{-0.5, 0.5} multiples of 0.5, which probably aren't very common.
Nice catch!. DFG / FTL already handles this 0 case. So operationMathPow should be aware of that since operationMathPow's processing should be aligned to DFG / FTL's ones.
>> Source/JavaScriptCore/runtime/MathCommon.cpp:448 >> + typedef volatile double DoubleValue; > > Will we have to do this elsewhere, too? I've solved 80-bit precision (actually 64-bit mantissa) rounding issues on Windows with _control_fp and _PC_53, but that changes the rounding mode globally for the application.
I think this should be limited. The requirement is that the generated value is consistent in all the tiers (from LLInt to FTL). If the value is calculated in 80bit, it's ok if the DFG also calculate it in 80bit. So, what we should take care is that the mathematical value calculated in DFG like this. Yeah, as you said, _control_fp in Win (And _FPU_SETCW & _FPU_DOUBLE in the other environments) changes the FPU precision globally. So in this patch, I take the small changes that effect is limited. However, if changing the configuration is acceptable, changing it can be a good alternative.
Yusuke Suzuki
Comment 32
2016-05-16 18:24:37 PDT
(In reply to
comment #29
)
> Comment on
attachment 279002
[details]
> Patch > > Good job finding the issue! > > I wish there was a way for gcc to use SSE for the entire JSC. I am sure > there are plenty of similar bugs that are not covered by tests :(
Yeah. -mfpmath=sse & -march=native can enforce gcc to use SSE2 (muld etc.) if they are supported. However, in the environment that does not support SSE2, we need to take the conservative approach. And I think we still need to consider about such environments since the binary packages for i686 (e.g. Debian's i686 port) need to consider the CPU capabilities conservatively. Linux drops i386 support, but i686 support still remains. And i686 (Pentium Pro ~) does not include SSE2 (Pentium 4 ~). After reconsidering about the probelm, I think Alex's suggestion - changing FPU precision globally - seems nice in WebKit globally. Pros: - We don't need to consider about x87 80bit precision anymore. - In Windows, FPU precision is configured 64bit by default. (And in Linux, this is configued 80bit). So I don't think enforcing 64bit precision incurs some critical problems. Cons: - The change affects globally. So all the x87 processing in the process is now enforced to 64bit precision. - Even if FPU config is changed, libc's math function may return the value in 80bit precision (For example, I've checked that std::pow() still returns 80bit precision value even if we changed x87 precision to 64bit in x86). Potentially, this incurs some inconsistency problems. - We need to take care of the other libraries that may change the FPU configuration to keep this 64bit. For example, DirectX is known that it changes FPU configuration (without D3DCREATE_FPU_PRESERVE). Alex, Benjamin, what do you think of?
Benjamin Poulain
Comment 33
2016-05-16 18:41:21 PDT
Comment on
attachment 279077
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=279077&action=review
> Source/JavaScriptCore/b3/B3MathExtras.cpp:53 > - start->appendNew<Const32Value>(procedure, origin, 1000)); > + start->appendNew<Const32Value>(procedure, origin, maxExponentForIntegerMathPow));
Missing header for this.
Benjamin Poulain
Comment 34
2016-05-16 18:42:05 PDT
Changing the FPU configuration seems fragile to me. We would see bugs appearing if any library decides to change it too. Alex, what do you think?
Yusuke Suzuki
Comment 35
2016-05-16 18:45:33 PDT
(In reply to
comment #34
)
> Changing the FPU configuration seems fragile to me. We would see bugs > appearing if any library decides to change it too. > > Alex, what do you think?
(In reply to
comment #32
)
> (In reply to
comment #29
) > > Comment on
attachment 279002
[details]
> > Patch > > > > Good job finding the issue! > > > > I wish there was a way for gcc to use SSE for the entire JSC. I am sure > > there are plenty of similar bugs that are not covered by tests :( > > Yeah. -mfpmath=sse & -march=native can enforce gcc to use SSE2 (muld etc.) > if they are supported. However, in the environment that does not support > SSE2, we need to take the conservative approach. And I think we still need > to consider about such environments since the binary packages for i686 (e.g. > Debian's i686 port) need to consider the CPU capabilities conservatively. > Linux drops i386 support, but i686 support still remains. And i686 (Pentium > Pro ~) does not include SSE2 (Pentium 4 ~). > > After reconsidering about the probelm, I think Alex's suggestion - changing > FPU precision globally - seems nice in WebKit globally. > > Pros: > - We don't need to consider about x87 80bit precision anymore. > - In Windows, FPU precision is configured 64bit by default. (And in Linux, > this is configued 80bit). So I don't think enforcing 64bit precision incurs > some critical problems.
Ah, and potential concern is that there are so many Linux libraries that is not used in Windows; gtk, etc.
> > Cons: > - The change affects globally. So all the x87 processing in the process is > now enforced to 64bit precision. > - Even if FPU config is changed, libc's math function may return the value > in 80bit precision (For example, I've checked that std::pow() still returns > 80bit precision value even if we changed x87 precision to 64bit in x86). > Potentially, this incurs some inconsistency problems. > - We need to take care of the other libraries that may change the FPU > configuration to keep this 64bit. For example, DirectX is known that it > changes FPU configuration (without D3DCREATE_FPU_PRESERVE). > > Alex, Benjamin, what do you think of?
Alex Christensen
Comment 36
2016-05-16 21:06:14 PDT
Comment on
attachment 279077
[details]
Patch I think this fix is good, but there are certainly more, and if we found all of them, the code would look messy everywhere and do lots of unnecessary storing to the stack. Requiring all JSC applications to use 53-bit mantissa mode would be VERY bad, because some of them intentionally change it for more precision in fancy math libraries. I had to debug such a problem once and that was no fun.
Yusuke Suzuki
Comment 37
2016-05-16 21:25:55 PDT
(In reply to
comment #36
)
> Comment on
attachment 279077
[details]
> Patch > > I think this fix is good, but there are certainly more, and if we found all > of them, the code would look messy everywhere and do lots of unnecessary > storing to the stack. Requiring all JSC applications to use 53-bit mantissa > mode would be VERY bad, because some of them intentionally change it for > more precision in fancy math libraries. I had to debug such a problem once > and that was no fun.
Sounds reasonable. OK, I'll go with this patch and watch the build bot. Thanks :)
Yusuke Suzuki
Comment 38
2016-05-16 21:26:22 PDT
Comment on
attachment 279077
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=279077&action=review
>> Source/JavaScriptCore/b3/B3MathExtras.cpp:53 >> + start->appendNew<Const32Value>(procedure, origin, maxExponentForIntegerMathPow)); > > Missing header for this.
Oops, thanks. Fixed.
Yusuke Suzuki
Comment 39
2016-05-16 21:35:17 PDT
Committed
r200996
: <
http://trac.webkit.org/changeset/200996
>
Yusuke Suzuki
Comment 40
2016-05-16 22:00:17 PDT
(In reply to
comment #36
)
> Comment on
attachment 279077
[details]
> Patch > > I think this fix is good, but there are certainly more, and if we found all > of them, the code would look messy everywhere and do lots of unnecessary > storing to the stack. Requiring all JSC applications to use 53-bit mantissa > mode would be VERY bad, because some of them intentionally change it for > more precision in fancy math libraries. I had to debug such a problem once > and that was no fun.
Once the new code like this is found, we need to extract this DoubleValue type and use it in such code to ensure 64bit precision. I believe such code is rare; the problematic pattern is that the same floating point calculation is done in C runtime and Baseline / DFG (not FTL) JIT SSE2 code separately. FTL does not have a problem since FTL is not enabled in x86 32bit.
Yusuke Suzuki
Comment 41
2016-05-16 22:43:44 PDT
operations with base = 0.6931471805599453 exponent = 999 are fixed. But still failing due to exponent = 0.5 / -0.5 special path cases.
Yusuke Suzuki
Comment 42
2016-05-16 22:53:02 PDT
Ah, OK. math-pow-with-constants.js cases are fixed. But math-pow-stable-results.js cases remain. This is because of the similar issue in sqrt. operationMathPow uses sqrt. This c sqrt function is implementation-dependent, but, actually, the result is calculated in 80bit precision in glibc. However, DFG JIT fast path for 0.5 / -0.5 uses sqrt SSE operation. Of course, they have 64bit precision. I think the similar issue should occur in ArithSqrt and std::sqrt. Seeing the ArithXXX nodes at glance, the other nodes do not have the problems.
Yusuke Suzuki
Comment 43
2016-05-16 23:33:16 PDT
I've just opened the new bug to handle sqrt and pow.
https://bugs.webkit.org/show_bug.cgi?id=157787
Yusuke Suzuki
Comment 44
2016-05-17 06:42:38 PDT
Committed
r201014
: <
http://trac.webkit.org/changeset/201014
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug