Full Text Bug Listing

Andreas Kling

Reported 2014-05-07 05:19:55 PDT

[X86] Emit BT instruction for single-bit tests.

Andreas Kling

Comment 1 2014-05-07 05:20:29 PDT

Created attachment 230993 [details] Patch

Michael Saboff

Comment 2 2014-05-07 09:32:01 PDT

Comment on attachment 230993 [details] Patch r=me. The logic looks fine. Is there documentation from Intel that says use bt over test?

Andreas Kling

Comment 3 2014-05-07 16:10:03 PDT

(In reply to comment #2) > (From update of attachment 230993 [details]) > r=me. > > The logic looks fine. Is there documentation from Intel that says use bt over test? From Intel Technology Journal, Vol. 11, Issue 4: "The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurst® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor."

Andreas Kling

Comment 4 2014-05-07 16:11:41 PDT

Note that BT only clobbers the CF flag, while TEST clobbers CF, OF, SF, ZF and PF. :)

Michael Saboff

Comment 5 2014-05-07 16:14:51 PDT

(In reply to comment #3) > (In reply to comment #2) > > (From update of attachment 230993 [details] [details]) > > r=me. > > > > The logic looks fine. Is there documentation from Intel that says use bt over test? > > From Intel Technology Journal, Vol. 11, Issue 4: > > "The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurst® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor." Thanks for the reference. I looked in the optimization guide and couldn't find anything.

Andreas Kling

Comment 6 2014-05-07 16:24:46 PDT

Committed r168451: <http://trac.webkit.org/changeset/168451>

Filip Pizlo

Comment 7 2014-05-07 19:36:20 PDT

(In reply to comment #5) > (In reply to comment #3) > > (In reply to comment #2) > > > (From update of attachment 230993 [details] [details] [details]) > > > r=me. > > > > > > The logic looks fine. Is there documentation from Intel that says use bt over test? > > > > From Intel Technology Journal, Vol. 11, Issue 4: > > > > "The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurst® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor." > > Thanks for the reference. I looked in the optimization guide and couldn't find anything. Did you run any benchmarks? I don't trust anything that any Intel documentation says. It has been wrong in the past.

Filip Pizlo

Comment 8 2014-05-07 19:49:51 PDT

So, neither of the big-time compilers pattern match the BT instruction even if you tell them to target Core. GCC appears to use and directly and relies on it leaving some bits behind while LLVM uses test. I decided to look at what LLVM does and it appears that it only resorts to BT if the immediate is not encodable with TEST. So, it would appear that either we've discovered something that professional compiler writers have overlooked (or, in the case of LLVM, which knows about BT and can use it, we have discovered something that professional compiler writers have misunderstood), or there is something that the Intel manual isn't revealing. Note that those compilers are engineered to know about *every single instruction* that every possible processor might have. They have been tuned very carefully for a long time and sometimes they do it based on cost models of those instructions. Often, if you want to figure out which instructions to select, it's usually a good bet to just look at what they do. In this case, it appears that BT is basically a bad idea. In general, I don't think that "the Intel manual said so" should ever be used for a justification for a patch.

Filip Pizlo

Comment 9 2014-05-07 19:52:02 PDT

Comment on attachment 230993 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=230993&action=review > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1202 > + int singleBitIndex(unsigned mask) > + { > + switch (mask) { > + case 0x00000001: return 0; > + case 0x00000002: return 1; > + case 0x00000004: return 2; > + case 0x00000008: return 3; > + case 0x00000010: return 4; > + case 0x00000020: return 5; > + case 0x00000040: return 6; > + case 0x00000080: return 7; > + case 0x00000100: return 8; > + case 0x00000200: return 9; > + case 0x00000400: return 10; > + case 0x00000800: return 11; > + case 0x00001000: return 12; > + case 0x00002000: return 13; > + case 0x00004000: return 14; > + case 0x00008000: return 15; > + case 0x00010000: return 16; > + case 0x00020000: return 17; > + case 0x00040000: return 18; > + case 0x00080000: return 19; > + case 0x00100000: return 20; > + case 0x00200000: return 21; > + case 0x00400000: return 22; > + case 0x00800000: return 23; > + case 0x01000000: return 24; > + case 0x02000000: return 25; > + case 0x04000000: return 26; > + case 0x08000000: return 27; > + case 0x10000000: return 28; > + case 0x20000000: return 29; > + case 0x40000000: return 30; > + case 0x80000000: return 31; > + default: return -1; > + } We have a function to count the number of set bits in an int and to compute the log2 of an int. Why not use those instead? > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1212 > + int bitIndex = singleBitIndex(mask.m_value); > + if ((cond == Zero || cond == NonZero) && bitIndex != -1) { > + m_assembler.bt_i8r(bitIndex, reg); > + return Jump(m_assembler.jCC(cond == Zero ? X86Assembler::ConditionNC : X86Assembler::ConditionC)); > + } > + The LLVM tuning appears to disagree with you - it will pick BT only if the immediate is both a power of two and not representable in the immediate of a TEST. I think this will only happen if you want to quickly test one of the high 32 bits of a 64 bit integer. Obviously, this won't happen here. > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1224 > + int bitIndex = singleBitIndex(mask.m_value); > + if ((cond == Zero || cond == NonZero) && bitIndex != -1) { > + m_assembler.bt_i8m(bitIndex, address.offset, address.base); > + return Jump(m_assembler.jCC(cond == Zero ? X86Assembler::ConditionNC : X86Assembler::ConditionC)); > + } > + Ditto.

Filip Pizlo

Comment 10 2014-05-07 20:09:58 PDT

Another thing: it appears that BT is a bigger instruction than TEST - it requires one extra prefix byte in the opcode. Seriously, if you have a choice between a one-opcode instruction and a two-opcode instruction and there is no evidence that the two-opcode one is better then you should use the one-opcode one. The Intel manual does not constitute evidence.

Andreas Kling

Comment 11 2014-05-07 20:22:13 PDT

Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime.. BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one. In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet: void foo (void); int test (int x, int n) { if (x & (1 << n)) foo (); return 0; } Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.) Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me. I'm gonna let the benchmarks finish and paste the results here.

Filip Pizlo

Comment 12 2014-05-07 20:24:52 PDT

(In reply to comment #11) > Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime.. > > BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one. There is a 8-bit immediate form of TEST. > > In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet: > > void foo (void); > > int test (int x, int n) > { > if (x & (1 << n)) > foo (); > > return 0; > } > > Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.) > > Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me. > > I'm gonna let the benchmarks finish and paste the results here. OK. I would expect no performance difference. If there is no performance difference then we should just defer to what the big compilers do.

Andreas Kling

Comment 13 2014-05-07 20:27:32 PDT

(In reply to comment #12) > (In reply to comment #11) > > Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime.. > > > > BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one. > > There is a 8-bit immediate form of TEST. Hah. You're right. And JSC tries real hard to generate it too. > > In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet: > > > > void foo (void); > > > > int test (int x, int n) > > { > > if (x & (1 << n)) > > foo (); > > > > return 0; > > } > > > > Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.) > > > > Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me. > > > > I'm gonna let the benchmarks finish and paste the results here. > > OK. I would expect no performance difference. If there is no performance difference then we should just defer to what the big compilers do. Definitely.

Andreas Kling

Comment 14 2014-05-07 21:06:02 PDT

Benchmark report for SunSpider, LongSpider, V8Spider, Octane, Kraken, JSRegress, and AsmBench on locals-iMac (iMac14,2). VMs tested: "WithoutBT" at /Volumes/Data/Source/Safari/Ref-OpenSource/WebKitBuild/Release/jsc "WithBT" at /Volumes/Data/Source/Safari/OpenSource/WebKitBuild/Release/jsc Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. WithoutBT WithBT SunSpider: 3d-cube 3.8618+-0.3228 ? 3.8717+-0.4712 ? 3d-morph 4.7336+-0.1498 4.6899+-0.1356 3d-raytrace 4.8748+-0.3113 ? 4.9585+-0.2855 ? might be 1.0172x slower access-binary-trees 1.4155+-0.2128 1.3615+-0.0389 might be 1.0397x faster access-fannkuch 4.4414+-0.0878 ? 4.5675+-0.2150 ? might be 1.0284x slower access-nbody 2.2466+-0.0433 ? 2.2950+-0.1750 ? might be 1.0215x slower access-nsieve 3.0829+-0.0559 ? 3.1282+-0.1827 ? might be 1.0147x slower bitops-3bit-bits-in-byte 1.2347+-0.0768 ? 1.2832+-0.2258 ? might be 1.0393x slower bitops-bits-in-byte 2.2117+-0.1158 2.2071+-0.1079 bitops-bitwise-and 1.9141+-0.0362 ? 1.9182+-0.0586 ? bitops-nsieve-bits 3.0856+-0.0882 ? 3.0890+-0.1131 ? controlflow-recursive 1.4973+-0.0714 1.4958+-0.0937 crypto-aes 3.1792+-0.0513 ? 3.1852+-0.0432 ? crypto-md5 1.7664+-0.1233 1.7493+-0.1290 crypto-sha1 1.9402+-0.0816 1.8497+-0.0111 might be 1.0490x faster date-format-tofte 5.8865+-0.1886 ? 7.2452+-3.8198 ? might be 1.2308x slower date-format-xparb 5.0603+-0.1396 4.8799+-0.1556 might be 1.0370x faster math-cordic 2.4778+-0.0507 2.4227+-0.0787 might be 1.0228x faster math-partial-sums 4.3050+-0.0610 ? 4.4823+-0.1742 ? might be 1.0412x slower math-spectral-norm 1.4714+-0.0216 1.4637+-0.0517 regexp-dna 6.2526+-0.1737 6.1306+-0.1809 might be 1.0199x faster string-base64 3.5334+-0.2921 3.4504+-0.1135 might be 1.0240x faster string-fasta 5.6995+-0.1165 ? 5.7885+-0.0988 ? might be 1.0156x slower string-tagcloud 8.3417+-0.1268 8.2930+-0.1537 string-unpack-code 18.4899+-0.7973 ? 18.6165+-0.9547 ? string-validate-input 4.0388+-0.1404 4.0292+-0.1176 <arithmetic> * 4.1170+-0.0473 ? 4.1712+-0.1430 ? might be 1.0132x slower <geometric> 3.3258+-0.0302 ? 3.3432+-0.0482 ? might be 1.0052x slower <harmonic> 2.8016+-0.0476 2.8001+-0.0427 might be 1.0005x faster WithoutBT WithBT LongSpider: 3d-cube 1233.4095+-11.3659 ? 1253.3698+-40.1243 ? might be 1.0162x slower 3d-morph 741.4250+-27.4430 727.5229+-2.3772 might be 1.0191x faster 3d-raytrace 765.6695+-6.0680 759.1818+-6.9697 access-binary-trees 916.7892+-12.5503 906.1600+-5.9240 might be 1.0117x faster access-fannkuch 340.1515+-14.4538 ? 347.3749+-11.1170 ? might be 1.0212x slower access-nbody 738.1696+-17.3423 ? 760.7197+-47.0308 ? might be 1.0305x slower access-nsieve 931.6458+-12.5809 ? 1016.2460+-258.9499 ? might be 1.0908x slower bitops-3bit-bits-in-byte 85.7283+-4.8629 ? 85.7570+-4.4779 ? bitops-bits-in-byte 135.4017+-2.2564 ? 136.0309+-12.2223 ? bitops-nsieve-bits 611.3268+-15.0354 608.5967+-10.1367 controlflow-recursive 384.7675+-8.6854 ? 388.6284+-13.3114 ? might be 1.0100x slower crypto-aes 886.7181+-7.7085 ? 886.8901+-5.0394 ? crypto-md5 669.3856+-26.4656 664.0869+-6.8129 crypto-sha1 852.5552+-13.2281 ? 864.0634+-25.9422 ? might be 1.0135x slower date-format-tofte 575.9711+-7.3666 ? 585.4905+-19.7535 ? might be 1.0165x slower date-format-xparb 869.0757+-14.4675 859.2290+-11.2992 might be 1.0115x faster math-cordic 904.5382+-8.9082 903.4843+-13.9417 math-partial-sums 545.4088+-9.0132 ? 546.7880+-1.3511 ? math-spectral-norm 870.8215+-9.0842 858.9225+-8.2623 might be 1.0139x faster string-base64 334.2463+-0.8221 ? 334.5509+-2.8044 ? string-fasta 576.7892+-12.4559 557.0745+-10.5421 might be 1.0354x faster string-tagcloud 214.0127+-2.0530 ? 216.3233+-10.8192 ? might be 1.0108x slower <arithmetic> 644.7276+-1.7497 ? 648.4769+-12.9180 ? might be 1.0058x slower <geometric> * 548.6369+-2.1365 ? 550.8949+-4.9976 ? might be 1.0041x slower <harmonic> 412.0620+-5.1406 ? 413.3966+-3.7046 ? might be 1.0032x slower WithoutBT WithBT V8Spider: crypto 40.2859+-0.7060 ? 40.6218+-1.1536 ? deltablue 53.4686+-2.1510 ? 55.6558+-6.8348 ? might be 1.0409x slower earley-boyer 35.8690+-0.2621 ? 36.2894+-1.3433 ? might be 1.0117x slower raytrace 21.0439+-1.0350 ? 22.0540+-2.7833 ? might be 1.0480x slower regexp 51.2761+-0.4551 50.9445+-0.5523 richards 58.1104+-0.7669 ? 59.4163+-2.3087 ? might be 1.0225x slower splay 28.8427+-1.1775 28.8246+-1.4657 <arithmetic> 41.2709+-0.4303 ? 41.9723+-1.0863 ? might be 1.0170x slower <geometric> * 39.0966+-0.5156 ? 39.7623+-1.0809 ? might be 1.0170x slower <harmonic> 36.7740+-0.6788 ? 37.4516+-1.2412 ? might be 1.0184x slower WithoutBT WithBT Octane: encrypt 0.23290+-0.00210 ! 0.23882+-0.00202 ! definitely 1.0254x slower decrypt 4.47660+-0.11228 ? 4.48817+-0.13490 ? deltablue x2 0.28874+-0.00434 ? 0.29091+-0.00125 ? earley 0.46939+-0.00614 ? 0.47122+-0.00557 ? boyer 5.84523+-0.16830 ? 5.98271+-0.14127 ? might be 1.0235x slower navier-stokes x2 6.50766+-0.22653 6.33958+-0.01118 might be 1.0265x faster raytrace x2 1.88956+-0.03615 ? 1.93935+-0.13718 ? might be 1.0264x slower richards x2 0.15715+-0.00077 ! 0.16161+-0.00215 ! definitely 1.0284x slower splay x2 0.40737+-0.00817 0.39903+-0.01484 might be 1.0209x faster regexp x2 39.24923+-0.55669 ? 39.49937+-0.79358 ? pdfjs x2 49.89777+-0.62815 ? 57.38340+-23.58802 ? might be 1.1500x slower mandreel x2 76.35007+-1.36543 ? 77.09823+-0.78193 ? gbemu x2 36.48157+-1.44080 ? 36.73093+-0.65766 ? closure 0.48830+-0.00459 ? 0.48972+-0.00176 ? jquery 5.82282+-0.07320 ? 5.90402+-0.16880 ? might be 1.0139x slower box2d x2 12.35646+-0.34995 12.33273+-0.21968 zlib x2 512.80497+-15.97694 ? 516.73765+-18.50990 ? typescript x2 574.16754+-9.84969 ? 579.11633+-24.49226 ? <arithmetic> 87.94838+-1.29661 ? 89.12110+-2.48256 ? might be 1.0133x slower <geometric> * 7.48049+-0.04648 ? 7.58400+-0.18156 ? might be 1.0138x slower <harmonic> 0.84961+-0.00400 ! 0.86056+-0.00286 ! definitely 1.0129x slower WithoutBT WithBT Kraken: ai-astar 247.752+-7.939 ? 248.547+-2.499 ? audio-beat-detection 112.494+-0.187 ? 112.762+-1.634 ? audio-dft 144.443+-11.872 143.542+-1.914 audio-fft 67.710+-0.130 67.406+-0.221 audio-oscillator 141.561+-7.109 140.845+-1.927 imaging-darkroom 146.387+-0.528 ? 148.050+-1.902 ? might be 1.0114x slower imaging-desaturate 80.594+-1.448 ? 80.683+-1.630 ? imaging-gaussian-blur 163.158+-4.679 ? 163.566+-11.871 ? json-parse-financial 38.503+-1.343 37.966+-0.089 might be 1.0142x faster json-stringify-tinderbox 53.159+-1.567 ? 54.250+-2.986 ? might be 1.0205x slower stanford-crypto-aes 45.046+-1.236 ? 46.167+-5.939 ? might be 1.0249x slower stanford-crypto-ccm 44.463+-6.681 ? 47.611+-7.063 ? might be 1.0708x slower stanford-crypto-pbkdf2 129.166+-3.487 ? 130.468+-3.760 ? might be 1.0101x slower stanford-crypto-sha256-iterative 47.244+-0.394 ? 48.188+-2.022 ? might be 1.0200x slower <arithmetic> * 104.406+-0.568 ? 105.004+-1.334 ? might be 1.0057x slower <geometric> 88.640+-0.561 ? 89.480+-1.698 ? might be 1.0095x slower <harmonic> 75.429+-0.917 ? 76.435+-2.080 ? might be 1.0133x slower WithoutBT WithBT JSRegress: adapt-to-double-divide 16.8749+-0.3478 ? 17.0203+-0.7817 ? aliased-arguments-getbyval 0.6537+-0.0267 0.6158+-0.0504 might be 1.0617x faster allocate-big-object 1.8135+-0.0744 ? 1.8187+-0.0972 ? arity-mismatch-inlining 0.6396+-0.0393 0.6182+-0.0588 might be 1.0346x faster array-access-polymorphic-structure 5.7628+-0.3277 ? 5.7760+-0.0780 ? array-nonarray-polymorhpic-access 23.9935+-0.7104 23.7007+-0.4905 might be 1.0124x faster array-prototype-every 60.9655+-1.0479 ? 62.1391+-1.7703 ? might be 1.0192x slower array-prototype-forEach 60.9336+-0.2432 ? 62.1194+-2.2588 ? might be 1.0195x slower array-prototype-map 75.3367+-1.8058 75.1887+-0.2945 array-prototype-some 60.2330+-0.1637 ? 61.2446+-0.9690 ? might be 1.0168x slower array-with-double-add 3.0668+-0.0147 ? 3.0843+-0.1094 ? array-with-double-increment 2.3682+-0.1052 ? 2.4139+-0.2878 ? might be 1.0193x slower array-with-double-mul-add 3.5223+-0.2210 3.4009+-0.0387 might be 1.0357x faster array-with-double-sum 2.9102+-0.0923 ? 2.9844+-0.2202 ? might be 1.0255x slower array-with-int32-add-sub 5.4126+-0.1816 5.3853+-0.0529 array-with-int32-or-double-sum 2.9734+-0.0501 2.9376+-0.0703 might be 1.0122x faster ArrayBuffer-DataView-alloc-large-long-lived 59.5416+-1.3383 59.4380+-1.0794 ArrayBuffer-DataView-alloc-long-lived 17.5625+-0.3735 17.5107+-0.5725 ArrayBuffer-Int32Array-byteOffset 3.2175+-0.4383 3.1110+-0.1813 might be 1.0342x faster ArrayBuffer-Int8Array-alloc-large-long-lived 58.9164+-1.9242 ? 59.5763+-1.1377 ? might be 1.0112x slower ArrayBuffer-Int8Array-alloc-long-lived-buffer 27.4478+-0.3950 ? 27.7172+-0.4536 ? ArrayBuffer-Int8Array-alloc-long-lived 16.5187+-0.3092 ? 16.8021+-0.3588 ? might be 1.0172x slower ArrayBuffer-Int8Array-alloc 15.1373+-0.7507 ? 15.2064+-0.6375 ? asmjs_bool_bug 5.0901+-0.1368 5.0353+-0.1861 might be 1.0109x faster assign-custom-setter-polymorphic 2.3589+-0.0777 ? 2.3894+-0.0526 ? might be 1.0129x slower assign-custom-setter 3.1022+-0.1024 ? 3.1182+-0.1303 ? basic-set 9.1317+-1.2772 8.7687+-0.2894 might be 1.0414x faster big-int-mul 2.9904+-0.1940 2.9700+-0.1380 boolean-test 2.6247+-0.2063 2.5182+-0.1577 might be 1.0423x faster branch-fold 3.2577+-0.0550 ? 3.3245+-0.0925 ? might be 1.0205x slower by-val-generic 7.3796+-0.1543 ? 7.4485+-0.0709 ? call-spread-apply 12.1317+-0.3349 ? 12.1882+-0.1723 ? call-spread-call 4.8375+-0.0777 4.8325+-0.0795 captured-assignments 0.3293+-0.0250 0.3065+-0.0194 might be 1.0745x faster cast-int-to-double 8.2139+-0.1624 ? 8.2618+-0.2152 ? cell-argument 10.3142+-0.2305 10.2570+-0.0835 cfg-simplify 2.5598+-0.2526 2.4996+-0.2191 might be 1.0241x faster chain-getter-access 19.5402+-0.7694 ? 21.0107+-4.2560 ? might be 1.0753x slower cmpeq-obj-to-obj-other 7.3638+-0.3202 7.3226+-0.1575 constant-test 4.1171+-0.0626 ? 4.1469+-0.1105 ? DataView-custom-properties 63.4835+-1.2854 63.2479+-1.3707 delay-tear-off-arguments-strictmode 2.1216+-0.0992 2.0983+-0.0798 might be 1.0111x faster destructuring-arguments 5.0398+-2.2027 4.5248+-0.5397 might be 1.1138x faster destructuring-swap 4.1785+-0.1576 ? 4.2252+-0.1212 ? might be 1.0112x slower direct-arguments-getbyval 0.5582+-0.0244 ? 0.5785+-0.0355 ? might be 1.0364x slower double-get-by-val-out-of-bounds 3.3326+-0.0797 ? 3.3958+-0.2144 ? might be 1.0190x slower double-pollution-getbyval 7.9317+-0.1571 ? 8.0587+-0.6736 ? might be 1.0160x slower double-pollution-putbyoffset 3.3505+-0.0502 3.3494+-0.1109 double-to-int32-typed-array-no-inline 1.6681+-0.0728 ? 1.7090+-0.0957 ? might be 1.0245x slower double-to-int32-typed-array 1.3768+-0.0319 ? 1.3882+-0.0640 ? double-to-uint32-typed-array-no-inline 1.7084+-0.0609 ? 1.7646+-0.0533 ? might be 1.0329x slower double-to-uint32-typed-array 1.4252+-0.0356 ? 1.4645+-0.0627 ? might be 1.0276x slower empty-string-plus-int 5.8543+-0.2353 ? 5.8599+-0.1637 ? emscripten-cube2hash 24.1876+-0.6467 ? 24.5358+-0.8620 ? might be 1.0144x slower external-arguments-getbyval 1.1234+-0.0364 ? 1.1275+-0.0489 ? external-arguments-putbyval 1.5861+-0.0376 1.5779+-0.1227 fixed-typed-array-storage-var-index 1.0112+-0.0393 ? 1.0138+-0.0228 ? fixed-typed-array-storage 0.6431+-0.0435 ? 0.6665+-0.0386 ? might be 1.0364x slower Float32Array-matrix-mult 3.8936+-0.0388 ? 3.9861+-0.4784 ? might be 1.0238x slower Float32Array-to-Float64Array-set 47.2681+-4.1753 46.2629+-0.9537 might be 1.0217x faster Float64Array-alloc-long-lived 62.4034+-1.5670 62.1289+-0.7957 Float64Array-to-Int16Array-set 54.9105+-1.3133 ? 55.0078+-0.2631 ? fold-double-to-int 11.4935+-0.0516 ? 11.8701+-1.0125 ? might be 1.0328x slower for-of-iterate-array-entries 5.5264+-0.1621 ? 5.6089+-0.5281 ? might be 1.0149x slower for-of-iterate-array-keys 2.1867+-0.1355 ? 2.3857+-0.4778 ? might be 1.0910x slower for-of-iterate-array-values 1.9847+-0.0625 ? 2.0040+-0.0976 ? fround 22.5043+-0.3450 22.3303+-0.3655 function-dot-apply 1.1051+-0.2472 1.0642+-0.0609 might be 1.0384x faster function-test 2.6726+-0.0653 2.6379+-0.0750 might be 1.0131x faster function-with-eval 16.8524+-0.4956 16.6487+-0.3545 might be 1.0122x faster get-by-id-chain-from-try-block 5.5303+-0.0536 ? 5.6227+-0.1750 ? might be 1.0167x slower get-by-id-proto-or-self 11.2628+-0.6030 11.1470+-0.6664 might be 1.0104x faster get-by-id-self-or-proto 11.3000+-0.2177 ? 11.3342+-0.3082 ? get-by-val-out-of-bounds 3.1969+-0.0392 3.1866+-0.0619 get_callee_monomorphic 2.7919+-0.0911 ? 2.7993+-0.0988 ? get_callee_polymorphic 2.7815+-0.1871 2.6780+-0.0679 might be 1.0387x faster getter 10.5140+-0.1731 ? 10.6976+-0.2849 ? might be 1.0175x slower global-var-const-infer-fire-from-opt 0.6813+-0.1792 ? 0.7118+-0.0319 ? might be 1.0447x slower global-var-const-infer 0.5468+-0.0490 ? 0.5789+-0.1194 ? might be 1.0588x slower HashMap-put-get-iterate-keys 21.2709+-0.4996 ? 21.3666+-0.2132 ? HashMap-put-get-iterate 21.2148+-1.3146 ? 21.2971+-0.8254 ? HashMap-string-put-get-iterate 25.7957+-0.6637 25.6835+-0.4569 imul-double-only 9.3182+-0.0438 9.2085+-0.1093 might be 1.0119x faster imul-int-only 8.8632+-0.2467 ? 8.9577+-0.2918 ? might be 1.0107x slower imul-mixed 11.8339+-0.1459 11.7974+-0.3412 in-four-cases 12.1102+-0.0464 ? 12.2891+-0.6639 ? might be 1.0148x slower in-one-case-false 6.3717+-0.1432 ? 6.3901+-0.1092 ? in-one-case-true 6.3008+-0.1341 ? 6.4802+-0.2394 ? might be 1.0285x slower in-two-cases 6.7437+-0.3053 6.5772+-0.1301 might be 1.0253x faster indexed-properties-in-objects 2.4080+-0.0527 ? 2.4578+-0.0999 ? might be 1.0207x slower infer-closure-const-then-mov-no-inline 2.7401+-0.0363 2.7217+-0.0521 infer-closure-const-then-mov 18.2510+-1.3620 17.4486+-0.4636 might be 1.0460x faster infer-closure-const-then-put-to-scope-no-inline 10.4136+-0.5281 10.2925+-0.1459 might be 1.0118x faster infer-closure-const-then-put-to-scope 21.2559+-0.1137 ? 21.9273+-1.2916 ? might be 1.0316x slower infer-closure-const-then-reenter-no-inline 48.6475+-0.4516 ? 48.8307+-0.8195 ? infer-closure-const-then-reenter 21.8364+-0.6542 21.3676+-0.2367 might be 1.0219x faster infer-one-time-closure-ten-vars 16.9810+-0.2948 ? 17.6420+-0.8652 ? might be 1.0389x slower infer-one-time-closure-two-vars 17.1103+-0.5771 ? 17.1935+-0.0337 ? infer-one-time-closure 16.8564+-0.2188 ! 17.4272+-0.2913 ! definitely 1.0339x slower infer-one-time-deep-closure 34.2043+-1.6270 34.0645+-0.8032 inline-arguments-access 0.9550+-0.0741 ? 0.9711+-0.0556 ? might be 1.0169x slower inline-arguments-aliased-access 1.0683+-0.0491 ? 1.0773+-0.0635 ? inline-arguments-local-escape 11.1242+-0.1217 10.8904+-0.3473 might be 1.0215x faster inline-get-scoped-var 3.9630+-0.1105 ? 4.0118+-0.0897 ? might be 1.0123x slower inlined-put-by-id-transition 8.7558+-0.1617 ? 8.8181+-0.4925 ? int-or-other-abs-then-get-by-val 5.1650+-0.1496 5.0964+-0.0715 might be 1.0134x faster int-or-other-abs-zero-then-get-by-val 17.4343+-0.3743 17.1487+-0.3421 might be 1.0167x faster int-or-other-add-then-get-by-val 6.6006+-0.3991 ? 6.6057+-0.1506 ? int-or-other-add 6.1335+-0.0362 ! 6.2252+-0.0407 ! definitely 1.0149x slower int-or-other-div-then-get-by-val 4.1525+-0.1009 ? 4.5359+-1.2117 ? might be 1.0923x slower int-or-other-max-then-get-by-val 4.5270+-0.0592 ? 4.6199+-0.1652 ? might be 1.0205x slower int-or-other-min-then-get-by-val 4.7253+-0.0658 ? 4.8945+-0.4785 ? might be 1.0358x slower int-or-other-mod-then-get-by-val 3.8647+-0.0481 ? 3.8676+-0.0776 ? int-or-other-mul-then-get-by-val 4.2692+-0.2850 4.1349+-0.0897 might be 1.0325x faster int-or-other-neg-then-get-by-val 4.8512+-0.0716 ? 4.8620+-0.0756 ? int-or-other-neg-zero-then-get-by-val 17.0445+-0.6193 ? 17.1204+-0.8477 ? int-or-other-sub-then-get-by-val 6.8100+-0.1849 6.7993+-0.2825 int-or-other-sub 5.3719+-0.1693 ? 5.4512+-0.0418 ? might be 1.0148x slower int-overflow-local 3.6978+-0.0868 ? 3.9538+-0.5597 ? might be 1.0692x slower Int16Array-alloc-long-lived 45.3895+-1.8998 44.7217+-0.1982 might be 1.0149x faster Int16Array-bubble-sort-with-byteLength 17.9753+-1.9966 ? 19.4047+-0.7376 ? might be 1.0795x slower Int16Array-bubble-sort 16.3075+-0.1436 ? 16.3254+-0.3998 ? Int16Array-load-int-mul 1.1965+-0.0420 ? 1.2754+-0.1717 ? might be 1.0659x slower Int16Array-to-Int32Array-set 43.2715+-1.2176 42.6219+-0.8450 might be 1.0152x faster Int32Array-alloc-large 12.7415+-1.3867 12.6401+-0.8268 Int32Array-alloc-long-lived 49.8883+-0.3697 ? 50.0630+-0.2130 ? Int32Array-alloc 2.4424+-0.0594 ? 2.4855+-0.0960 ? might be 1.0176x slower Int32Array-Int8Array-view-alloc 8.4851+-0.5946 8.3391+-0.2479 might be 1.0175x faster int52-spill 6.5042+-0.1587 ? 6.6044+-0.2007 ? might be 1.0154x slower Int8Array-alloc-long-lived 40.8506+-0.8158 40.7183+-0.6195 Int8Array-load-with-byteLength 3.1348+-0.0431 3.1196+-0.0197 Int8Array-load 3.1297+-0.0665 3.1229+-0.0654 integer-divide 10.0457+-0.3361 ? 10.2057+-0.4906 ? might be 1.0159x slower integer-modulo 1.3184+-0.1003 1.3168+-0.0482 large-int-captured 5.4174+-0.2135 ? 5.4364+-0.1193 ? large-int-neg 14.0964+-0.1772 ? 14.6466+-1.6208 ? might be 1.0390x slower large-int 13.1874+-0.3276 13.1515+-0.3709 logical-not 4.0453+-0.5646 3.8853+-0.1481 might be 1.0412x faster lots-of-fields 6.6533+-0.1082 ? 6.8633+-0.6718 ? might be 1.0316x slower make-indexed-storage 2.2798+-0.0716 2.0951+-0.3863 might be 1.0882x faster make-rope-cse 3.7122+-0.1525 3.6635+-0.1045 might be 1.0133x faster marsaglia-larger-ints 65.3561+-1.9722 ? 65.7378+-1.0200 ? marsaglia-osr-entry 28.7213+-0.9645 28.5740+-0.5672 method-on-number 17.7003+-0.6576 17.4322+-0.2939 might be 1.0154x faster misc-strict-eq 37.1450+-1.3845 ? 37.9603+-0.3439 ? might be 1.0219x slower negative-zero-divide 0.2499+-0.0143 ? 0.2610+-0.0328 ? might be 1.0444x slower negative-zero-modulo 0.2535+-0.0205 0.2505+-0.0099 might be 1.0119x faster negative-zero-negate 0.2707+-0.0592 0.2560+-0.0397 might be 1.0572x faster nested-function-parsing 22.6801+-0.9370 21.7504+-0.7935 might be 1.0427x faster new-array-buffer-dead 2.6555+-0.3041 2.6259+-0.1047 might be 1.0113x faster new-array-buffer-push 6.2847+-0.1264 6.2371+-0.1371 new-array-dead 19.3544+-0.3096 19.2313+-0.7646 new-array-push 4.1797+-0.1613 ? 4.2916+-0.0853 ? might be 1.0268x slower number-test 2.4422+-0.0393 ? 2.5016+-0.1472 ? might be 1.0243x slower object-closure-call 4.5679+-0.3652 4.5142+-0.1235 might be 1.0119x faster object-test 2.5947+-0.0619 2.5651+-0.0743 might be 1.0116x faster poly-stricteq 45.2128+-1.6479 44.3875+-1.8210 might be 1.0186x faster polymorphic-array-call 1.3683+-0.0965 1.3183+-0.0297 might be 1.0379x faster polymorphic-get-by-id 2.5158+-0.1022 2.4898+-0.0680 might be 1.0104x faster polymorphic-put-by-id 41.5616+-49.3263 ? 58.5696+-58.4548 ? might be 1.4092x slower polymorphic-structure 13.1450+-0.3808 ? 13.2355+-0.2921 ? polyvariant-monomorphic-get-by-id 4.9765+-0.1883 4.8690+-0.0827 might be 1.0221x faster proto-getter-access 19.4456+-0.4488 ? 19.4763+-0.2443 ? put-by-id 11.9398+-0.1511 ? 12.0521+-0.5963 ? put-by-val-large-index-blank-indexing-type 5.8743+-0.0572 ? 6.0046+-0.2782 ? might be 1.0222x slower put-by-val-machine-int 1.9415+-0.0820 ? 1.9819+-0.0261 ? might be 1.0208x slower rare-osr-exit-on-local 12.8328+-0.4133 12.6913+-0.3078 might be 1.0112x faster register-pressure-from-osr 15.8768+-0.2107 ? 15.9279+-0.2688 ? setter 10.6903+-0.2440 ? 10.7468+-0.3192 ? simple-activation-demo 23.5605+-0.5218 23.4045+-0.4602 simple-getter-access 31.9802+-2.4350 31.5043+-0.7733 might be 1.0151x faster slow-array-profile-convergence 2.2819+-0.2244 ? 2.2859+-0.0866 ? slow-convergence 2.4819+-0.0562 ? 2.5894+-0.1911 ? might be 1.0433x slower sparse-conditional 0.9025+-0.1286 ? 0.9067+-0.0637 ? splice-to-remove 34.7090+-1.0254 ? 35.1881+-1.5913 ? might be 1.0138x slower string-char-code-at 12.7178+-1.0729 ? 13.5532+-0.5287 ? might be 1.0657x slower string-concat-object 1.8017+-0.0376 ? 1.9086+-0.3089 ? might be 1.0594x slower string-concat-pair-object 1.7284+-0.0443 ? 1.7340+-0.0454 ? string-concat-pair-simple 10.0825+-0.0947 ? 10.5920+-1.1789 ? might be 1.0505x slower string-concat-simple 10.7007+-0.9321 10.3337+-0.1551 might be 1.0355x faster string-cons-repeat 6.6406+-0.0839 ? 6.6747+-0.1166 ? string-cons-tower 6.8616+-0.1788 6.8306+-0.1878 string-equality 24.5867+-0.3643 24.4832+-1.6380 string-get-by-val-big-char 7.1893+-0.1049 7.0750+-0.1196 might be 1.0162x faster string-get-by-val-out-of-bounds-insane 3.1706+-0.0713 ? 3.2250+-0.0501 ? might be 1.0172x slower string-get-by-val-out-of-bounds 2.6890+-0.0451 ? 2.7777+-0.0818 ? might be 1.0330x slower string-get-by-val 2.4312+-0.0622 ? 2.4323+-0.0076 ? string-hash 1.5512+-0.0271 ? 1.5786+-0.0337 ? might be 1.0176x slower string-long-ident-equality 21.9065+-0.8889 21.6010+-0.4800 might be 1.0141x faster string-repeat-arith 25.8989+-1.3851 25.8905+-1.1841 string-sub 50.5704+-1.2057 ? 50.8372+-0.4374 ? string-test 2.3288+-0.0712 2.2717+-0.0281 might be 1.0251x faster string-var-equality 36.8400+-7.3311 34.3920+-0.6728 might be 1.0712x faster structure-hoist-over-transitions 1.9705+-0.0168 ? 1.9705+-0.0354 ? switch-char-constant 2.0526+-0.0831 ? 2.0605+-0.0338 ? switch-char 4.5370+-0.0409 ? 4.5388+-0.0767 ? switch-constant 6.4991+-0.1493 ? 6.5764+-0.1101 ? might be 1.0119x slower switch-string-basic-big-var 12.2833+-0.7003 12.2460+-0.2190 switch-string-basic-big 12.8124+-0.3156 12.6300+-0.1919 might be 1.0144x faster switch-string-basic-var 11.9585+-0.1198 ? 12.0510+-0.2383 ? switch-string-basic 12.5106+-0.8789 11.8444+-0.0804 might be 1.0563x faster switch-string-big-length-tower-var 18.0536+-1.1105 17.3878+-0.4741 might be 1.0383x faster switch-string-length-tower-var 12.4534+-0.1587 ? 12.6730+-1.1704 ? might be 1.0176x slower switch-string-length-tower 11.1302+-0.0723 ? 11.2511+-0.1786 ? might be 1.0109x slower switch-string-short 11.0715+-0.0798 ? 11.2012+-0.2591 ? might be 1.0117x slower switch 9.6192+-0.1459 ? 9.6927+-0.0503 ? tear-off-arguments-simple 1.5692+-0.2965 1.4616+-0.0286 might be 1.0736x faster tear-off-arguments 2.3118+-0.1417 2.2860+-0.0390 might be 1.0113x faster temporal-structure 11.5048+-0.3431 ? 11.6188+-0.2192 ? to-int32-boolean 11.4911+-0.1208 ? 11.6115+-0.2057 ? might be 1.0105x slower undefined-test 2.4951+-0.0878 2.4440+-0.0289 might be 1.0209x faster unprofiled-licm 32.7851+-0.9030 ? 32.9747+-0.2451 ? weird-inlining-const-prop 1.3372+-0.0895 ? 1.3865+-0.1259 ? might be 1.0369x slower <arithmetic> 12.8363+-0.1977 ? 12.9325+-0.2726 ? might be 1.0075x slower <geometric> * 6.5808+-0.0440 ? 6.6054+-0.0271 ? might be 1.0037x slower <harmonic> 3.0357+-0.0627 ? 3.0359+-0.0389 ? might be 1.0001x slower WithoutBT WithBT AsmBench: bigfib.cpp 904.8443+-16.4908 ? 920.8026+-19.2952 ? might be 1.0176x slower cray.c 518.1676+-9.0886 ? 521.0995+-5.9082 ? dry.c 821.8760+-22.7555 815.6345+-14.5357 FloatMM.c 1414.0047+-23.0696 ? 1434.0497+-18.9260 ? might be 1.0142x slower gcc-loops.cpp 8355.7632+-79.4811 ? 8450.5498+-127.5787 ? might be 1.0113x slower n-body.c 1454.5997+-27.5138 ? 1457.6918+-20.9632 ? Quicksort.c 729.0630+-11.6187 ? 730.0959+-13.2260 ? stepanov_container.cpp 4540.4850+-79.5394 ? 4553.9360+-34.2360 ? Towers.c 392.7635+-27.2796 384.2907+-5.8973 might be 1.0220x faster <arithmetic> 2125.7297+-14.6593 ? 2140.9056+-8.1701 ? might be 1.0071x slower <geometric> * 1270.1096+-6.6314 ? 1273.8272+-4.0344 ? might be 1.0029x slower <harmonic> 908.3719+-12.2412 906.8104+-3.6583 might be 1.0017x faster WithoutBT WithBT All benchmarks: <arithmetic> 126.3109+-0.2650 ? 127.2125+-1.1030 ? might be 1.0071x slower <geometric> 11.5171+-0.0530 ? 11.5785+-0.0528 ? might be 1.0053x slower <harmonic> 2.7865+-0.0337 ? 2.7976+-0.0230 ? might be 1.0040x slower WithoutBT WithBT Geomean of preferred means: <scaled-result> 47.8749+-0.0571 ! 48.2860+-0.1744 ! definitely 1.0086x slower

WebKit Commit Bot

Comment 15 2014-05-07 21:08:04 PDT

Re-opened since this is blocked by bug 132670

Attachments
Patch (4.81 KB, patch) 2014-05-07 05:20 PDT, Andreas Kling	msaboff: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Bug 132650