| Summary: | [X86] Emit BT instruction for single-bit tests. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Andreas Kling <kling> | ||||
| Component: | JavaScriptCore | Assignee: | Andreas Kling <kling> | ||||
| Status: | RESOLVED WONTFIX | ||||||
| Severity: | Normal | CC: | commit-queue, fpizlo, kling, msaboff | ||||
| Priority: | P2 | ||||||
| Version: | 528+ (Nightly build) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Bug Depends on: | 132670 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Andreas Kling
2014-05-07 05:19:55 PDT
Created attachment 230993 [details]
Patch
Comment on attachment 230993 [details]
Patch
r=me.
The logic looks fine. Is there documentation from Intel that says use bt over test?
(In reply to comment #2) > (From update of attachment 230993 [details]) > r=me. > > The logic looks fine. Is there documentation from Intel that says use bt over test? From Intel Technology Journal, Vol. 11, Issue 4: "The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurstĀ® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor." Note that BT only clobbers the CF flag, while TEST clobbers CF, OF, SF, ZF and PF. :) (In reply to comment #3) > (In reply to comment #2) > > (From update of attachment 230993 [details] [details]) > > r=me. > > > > The logic looks fine. Is there documentation from Intel that says use bt over test? > > From Intel Technology Journal, Vol. 11, Issue 4: > > "The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurstĀ® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor." Thanks for the reference. I looked in the optimization guide and couldn't find anything. Committed r168451: <http://trac.webkit.org/changeset/168451> (In reply to comment #5) > (In reply to comment #3) > > (In reply to comment #2) > > > (From update of attachment 230993 [details] [details] [details]) > > > r=me. > > > > > > The logic looks fine. Is there documentation from Intel that says use bt over test? > > > > From Intel Technology Journal, Vol. 11, Issue 4: > > > > "The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurstĀ® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor." > > Thanks for the reference. I looked in the optimization guide and couldn't find anything. Did you run any benchmarks? I don't trust anything that any Intel documentation says. It has been wrong in the past. So, neither of the big-time compilers pattern match the BT instruction even if you tell them to target Core. GCC appears to use and directly and relies on it leaving some bits behind while LLVM uses test. I decided to look at what LLVM does and it appears that it only resorts to BT if the immediate is not encodable with TEST. So, it would appear that either we've discovered something that professional compiler writers have overlooked (or, in the case of LLVM, which knows about BT and can use it, we have discovered something that professional compiler writers have misunderstood), or there is something that the Intel manual isn't revealing. Note that those compilers are engineered to know about *every single instruction* that every possible processor might have. They have been tuned very carefully for a long time and sometimes they do it based on cost models of those instructions. Often, if you want to figure out which instructions to select, it's usually a good bet to just look at what they do. In this case, it appears that BT is basically a bad idea. In general, I don't think that "the Intel manual said so" should ever be used for a justification for a patch. Comment on attachment 230993 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=230993&action=review > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1202 > + int singleBitIndex(unsigned mask) > + { > + switch (mask) { > + case 0x00000001: return 0; > + case 0x00000002: return 1; > + case 0x00000004: return 2; > + case 0x00000008: return 3; > + case 0x00000010: return 4; > + case 0x00000020: return 5; > + case 0x00000040: return 6; > + case 0x00000080: return 7; > + case 0x00000100: return 8; > + case 0x00000200: return 9; > + case 0x00000400: return 10; > + case 0x00000800: return 11; > + case 0x00001000: return 12; > + case 0x00002000: return 13; > + case 0x00004000: return 14; > + case 0x00008000: return 15; > + case 0x00010000: return 16; > + case 0x00020000: return 17; > + case 0x00040000: return 18; > + case 0x00080000: return 19; > + case 0x00100000: return 20; > + case 0x00200000: return 21; > + case 0x00400000: return 22; > + case 0x00800000: return 23; > + case 0x01000000: return 24; > + case 0x02000000: return 25; > + case 0x04000000: return 26; > + case 0x08000000: return 27; > + case 0x10000000: return 28; > + case 0x20000000: return 29; > + case 0x40000000: return 30; > + case 0x80000000: return 31; > + default: return -1; > + } We have a function to count the number of set bits in an int and to compute the log2 of an int. Why not use those instead? > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1212 > + int bitIndex = singleBitIndex(mask.m_value); > + if ((cond == Zero || cond == NonZero) && bitIndex != -1) { > + m_assembler.bt_i8r(bitIndex, reg); > + return Jump(m_assembler.jCC(cond == Zero ? X86Assembler::ConditionNC : X86Assembler::ConditionC)); > + } > + The LLVM tuning appears to disagree with you - it will pick BT only if the immediate is both a power of two and not representable in the immediate of a TEST. I think this will only happen if you want to quickly test one of the high 32 bits of a 64 bit integer. Obviously, this won't happen here. > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1224 > + int bitIndex = singleBitIndex(mask.m_value); > + if ((cond == Zero || cond == NonZero) && bitIndex != -1) { > + m_assembler.bt_i8m(bitIndex, address.offset, address.base); > + return Jump(m_assembler.jCC(cond == Zero ? X86Assembler::ConditionNC : X86Assembler::ConditionC)); > + } > + Ditto. Another thing: it appears that BT is a bigger instruction than TEST - it requires one extra prefix byte in the opcode. Seriously, if you have a choice between a one-opcode instruction and a two-opcode instruction and there is no evidence that the two-opcode one is better then you should use the one-opcode one. The Intel manual does not constitute evidence. Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime..
BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one.
In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet:
void foo (void);
int test (int x, int n)
{
if (x & (1 << n))
foo ();
return 0;
}
Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.)
Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me.
I'm gonna let the benchmarks finish and paste the results here.
(In reply to comment #11) > Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime.. > > BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one. There is a 8-bit immediate form of TEST. > > In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet: > > void foo (void); > > int test (int x, int n) > { > if (x & (1 << n)) > foo (); > > return 0; > } > > Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.) > > Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me. > > I'm gonna let the benchmarks finish and paste the results here. OK. I would expect no performance difference. If there is no performance difference then we should just defer to what the big compilers do. (In reply to comment #12) > (In reply to comment #11) > > Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime.. > > > > BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one. > > There is a 8-bit immediate form of TEST. Hah. You're right. And JSC tries real hard to generate it too. > > In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet: > > > > void foo (void); > > > > int test (int x, int n) > > { > > if (x & (1 << n)) > > foo (); > > > > return 0; > > } > > > > Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.) > > > > Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me. > > > > I'm gonna let the benchmarks finish and paste the results here. > > OK. I would expect no performance difference. If there is no performance difference then we should just defer to what the big compilers do. Definitely. Benchmark report for SunSpider, LongSpider, V8Spider, Octane, Kraken, JSRegress, and AsmBench on locals-iMac (iMac14,2).
VMs tested:
"WithoutBT" at /Volumes/Data/Source/Safari/Ref-OpenSource/WebKitBuild/Release/jsc
"WithBT" at /Volumes/Data/Source/Safari/OpenSource/WebKitBuild/Release/jsc
Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements.
Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level
timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.
WithoutBT WithBT
SunSpider:
3d-cube 3.8618+-0.3228 ? 3.8717+-0.4712 ?
3d-morph 4.7336+-0.1498 4.6899+-0.1356
3d-raytrace 4.8748+-0.3113 ? 4.9585+-0.2855 ? might be 1.0172x slower
access-binary-trees 1.4155+-0.2128 1.3615+-0.0389 might be 1.0397x faster
access-fannkuch 4.4414+-0.0878 ? 4.5675+-0.2150 ? might be 1.0284x slower
access-nbody 2.2466+-0.0433 ? 2.2950+-0.1750 ? might be 1.0215x slower
access-nsieve 3.0829+-0.0559 ? 3.1282+-0.1827 ? might be 1.0147x slower
bitops-3bit-bits-in-byte 1.2347+-0.0768 ? 1.2832+-0.2258 ? might be 1.0393x slower
bitops-bits-in-byte 2.2117+-0.1158 2.2071+-0.1079
bitops-bitwise-and 1.9141+-0.0362 ? 1.9182+-0.0586 ?
bitops-nsieve-bits 3.0856+-0.0882 ? 3.0890+-0.1131 ?
controlflow-recursive 1.4973+-0.0714 1.4958+-0.0937
crypto-aes 3.1792+-0.0513 ? 3.1852+-0.0432 ?
crypto-md5 1.7664+-0.1233 1.7493+-0.1290
crypto-sha1 1.9402+-0.0816 1.8497+-0.0111 might be 1.0490x faster
date-format-tofte 5.8865+-0.1886 ? 7.2452+-3.8198 ? might be 1.2308x slower
date-format-xparb 5.0603+-0.1396 4.8799+-0.1556 might be 1.0370x faster
math-cordic 2.4778+-0.0507 2.4227+-0.0787 might be 1.0228x faster
math-partial-sums 4.3050+-0.0610 ? 4.4823+-0.1742 ? might be 1.0412x slower
math-spectral-norm 1.4714+-0.0216 1.4637+-0.0517
regexp-dna 6.2526+-0.1737 6.1306+-0.1809 might be 1.0199x faster
string-base64 3.5334+-0.2921 3.4504+-0.1135 might be 1.0240x faster
string-fasta 5.6995+-0.1165 ? 5.7885+-0.0988 ? might be 1.0156x slower
string-tagcloud 8.3417+-0.1268 8.2930+-0.1537
string-unpack-code 18.4899+-0.7973 ? 18.6165+-0.9547 ?
string-validate-input 4.0388+-0.1404 4.0292+-0.1176
<arithmetic> * 4.1170+-0.0473 ? 4.1712+-0.1430 ? might be 1.0132x slower
<geometric> 3.3258+-0.0302 ? 3.3432+-0.0482 ? might be 1.0052x slower
<harmonic> 2.8016+-0.0476 2.8001+-0.0427 might be 1.0005x faster
WithoutBT WithBT
LongSpider:
3d-cube 1233.4095+-11.3659 ? 1253.3698+-40.1243 ? might be 1.0162x slower
3d-morph 741.4250+-27.4430 727.5229+-2.3772 might be 1.0191x faster
3d-raytrace 765.6695+-6.0680 759.1818+-6.9697
access-binary-trees 916.7892+-12.5503 906.1600+-5.9240 might be 1.0117x faster
access-fannkuch 340.1515+-14.4538 ? 347.3749+-11.1170 ? might be 1.0212x slower
access-nbody 738.1696+-17.3423 ? 760.7197+-47.0308 ? might be 1.0305x slower
access-nsieve 931.6458+-12.5809 ? 1016.2460+-258.9499 ? might be 1.0908x slower
bitops-3bit-bits-in-byte 85.7283+-4.8629 ? 85.7570+-4.4779 ?
bitops-bits-in-byte 135.4017+-2.2564 ? 136.0309+-12.2223 ?
bitops-nsieve-bits 611.3268+-15.0354 608.5967+-10.1367
controlflow-recursive 384.7675+-8.6854 ? 388.6284+-13.3114 ? might be 1.0100x slower
crypto-aes 886.7181+-7.7085 ? 886.8901+-5.0394 ?
crypto-md5 669.3856+-26.4656 664.0869+-6.8129
crypto-sha1 852.5552+-13.2281 ? 864.0634+-25.9422 ? might be 1.0135x slower
date-format-tofte 575.9711+-7.3666 ? 585.4905+-19.7535 ? might be 1.0165x slower
date-format-xparb 869.0757+-14.4675 859.2290+-11.2992 might be 1.0115x faster
math-cordic 904.5382+-8.9082 903.4843+-13.9417
math-partial-sums 545.4088+-9.0132 ? 546.7880+-1.3511 ?
math-spectral-norm 870.8215+-9.0842 858.9225+-8.2623 might be 1.0139x faster
string-base64 334.2463+-0.8221 ? 334.5509+-2.8044 ?
string-fasta 576.7892+-12.4559 557.0745+-10.5421 might be 1.0354x faster
string-tagcloud 214.0127+-2.0530 ? 216.3233+-10.8192 ? might be 1.0108x slower
<arithmetic> 644.7276+-1.7497 ? 648.4769+-12.9180 ? might be 1.0058x slower
<geometric> * 548.6369+-2.1365 ? 550.8949+-4.9976 ? might be 1.0041x slower
<harmonic> 412.0620+-5.1406 ? 413.3966+-3.7046 ? might be 1.0032x slower
WithoutBT WithBT
V8Spider:
crypto 40.2859+-0.7060 ? 40.6218+-1.1536 ?
deltablue 53.4686+-2.1510 ? 55.6558+-6.8348 ? might be 1.0409x slower
earley-boyer 35.8690+-0.2621 ? 36.2894+-1.3433 ? might be 1.0117x slower
raytrace 21.0439+-1.0350 ? 22.0540+-2.7833 ? might be 1.0480x slower
regexp 51.2761+-0.4551 50.9445+-0.5523
richards 58.1104+-0.7669 ? 59.4163+-2.3087 ? might be 1.0225x slower
splay 28.8427+-1.1775 28.8246+-1.4657
<arithmetic> 41.2709+-0.4303 ? 41.9723+-1.0863 ? might be 1.0170x slower
<geometric> * 39.0966+-0.5156 ? 39.7623+-1.0809 ? might be 1.0170x slower
<harmonic> 36.7740+-0.6788 ? 37.4516+-1.2412 ? might be 1.0184x slower
WithoutBT WithBT
Octane:
encrypt 0.23290+-0.00210 ! 0.23882+-0.00202 ! definitely 1.0254x slower
decrypt 4.47660+-0.11228 ? 4.48817+-0.13490 ?
deltablue x2 0.28874+-0.00434 ? 0.29091+-0.00125 ?
earley 0.46939+-0.00614 ? 0.47122+-0.00557 ?
boyer 5.84523+-0.16830 ? 5.98271+-0.14127 ? might be 1.0235x slower
navier-stokes x2 6.50766+-0.22653 6.33958+-0.01118 might be 1.0265x faster
raytrace x2 1.88956+-0.03615 ? 1.93935+-0.13718 ? might be 1.0264x slower
richards x2 0.15715+-0.00077 ! 0.16161+-0.00215 ! definitely 1.0284x slower
splay x2 0.40737+-0.00817 0.39903+-0.01484 might be 1.0209x faster
regexp x2 39.24923+-0.55669 ? 39.49937+-0.79358 ?
pdfjs x2 49.89777+-0.62815 ? 57.38340+-23.58802 ? might be 1.1500x slower
mandreel x2 76.35007+-1.36543 ? 77.09823+-0.78193 ?
gbemu x2 36.48157+-1.44080 ? 36.73093+-0.65766 ?
closure 0.48830+-0.00459 ? 0.48972+-0.00176 ?
jquery 5.82282+-0.07320 ? 5.90402+-0.16880 ? might be 1.0139x slower
box2d x2 12.35646+-0.34995 12.33273+-0.21968
zlib x2 512.80497+-15.97694 ? 516.73765+-18.50990 ?
typescript x2 574.16754+-9.84969 ? 579.11633+-24.49226 ?
<arithmetic> 87.94838+-1.29661 ? 89.12110+-2.48256 ? might be 1.0133x slower
<geometric> * 7.48049+-0.04648 ? 7.58400+-0.18156 ? might be 1.0138x slower
<harmonic> 0.84961+-0.00400 ! 0.86056+-0.00286 ! definitely 1.0129x slower
WithoutBT WithBT
Kraken:
ai-astar 247.752+-7.939 ? 248.547+-2.499 ?
audio-beat-detection 112.494+-0.187 ? 112.762+-1.634 ?
audio-dft 144.443+-11.872 143.542+-1.914
audio-fft 67.710+-0.130 67.406+-0.221
audio-oscillator 141.561+-7.109 140.845+-1.927
imaging-darkroom 146.387+-0.528 ? 148.050+-1.902 ? might be 1.0114x slower
imaging-desaturate 80.594+-1.448 ? 80.683+-1.630 ?
imaging-gaussian-blur 163.158+-4.679 ? 163.566+-11.871 ?
json-parse-financial 38.503+-1.343 37.966+-0.089 might be 1.0142x faster
json-stringify-tinderbox 53.159+-1.567 ? 54.250+-2.986 ? might be 1.0205x slower
stanford-crypto-aes 45.046+-1.236 ? 46.167+-5.939 ? might be 1.0249x slower
stanford-crypto-ccm 44.463+-6.681 ? 47.611+-7.063 ? might be 1.0708x slower
stanford-crypto-pbkdf2 129.166+-3.487 ? 130.468+-3.760 ? might be 1.0101x slower
stanford-crypto-sha256-iterative 47.244+-0.394 ? 48.188+-2.022 ? might be 1.0200x slower
<arithmetic> * 104.406+-0.568 ? 105.004+-1.334 ? might be 1.0057x slower
<geometric> 88.640+-0.561 ? 89.480+-1.698 ? might be 1.0095x slower
<harmonic> 75.429+-0.917 ? 76.435+-2.080 ? might be 1.0133x slower
WithoutBT WithBT
JSRegress:
adapt-to-double-divide 16.8749+-0.3478 ? 17.0203+-0.7817 ?
aliased-arguments-getbyval 0.6537+-0.0267 0.6158+-0.0504 might be 1.0617x faster
allocate-big-object 1.8135+-0.0744 ? 1.8187+-0.0972 ?
arity-mismatch-inlining 0.6396+-0.0393 0.6182+-0.0588 might be 1.0346x faster
array-access-polymorphic-structure 5.7628+-0.3277 ? 5.7760+-0.0780 ?
array-nonarray-polymorhpic-access 23.9935+-0.7104 23.7007+-0.4905 might be 1.0124x faster
array-prototype-every 60.9655+-1.0479 ? 62.1391+-1.7703 ? might be 1.0192x slower
array-prototype-forEach 60.9336+-0.2432 ? 62.1194+-2.2588 ? might be 1.0195x slower
array-prototype-map 75.3367+-1.8058 75.1887+-0.2945
array-prototype-some 60.2330+-0.1637 ? 61.2446+-0.9690 ? might be 1.0168x slower
array-with-double-add 3.0668+-0.0147 ? 3.0843+-0.1094 ?
array-with-double-increment 2.3682+-0.1052 ? 2.4139+-0.2878 ? might be 1.0193x slower
array-with-double-mul-add 3.5223+-0.2210 3.4009+-0.0387 might be 1.0357x faster
array-with-double-sum 2.9102+-0.0923 ? 2.9844+-0.2202 ? might be 1.0255x slower
array-with-int32-add-sub 5.4126+-0.1816 5.3853+-0.0529
array-with-int32-or-double-sum 2.9734+-0.0501 2.9376+-0.0703 might be 1.0122x faster
ArrayBuffer-DataView-alloc-large-long-lived
59.5416+-1.3383 59.4380+-1.0794
ArrayBuffer-DataView-alloc-long-lived 17.5625+-0.3735 17.5107+-0.5725
ArrayBuffer-Int32Array-byteOffset 3.2175+-0.4383 3.1110+-0.1813 might be 1.0342x faster
ArrayBuffer-Int8Array-alloc-large-long-lived
58.9164+-1.9242 ? 59.5763+-1.1377 ? might be 1.0112x slower
ArrayBuffer-Int8Array-alloc-long-lived-buffer
27.4478+-0.3950 ? 27.7172+-0.4536 ?
ArrayBuffer-Int8Array-alloc-long-lived 16.5187+-0.3092 ? 16.8021+-0.3588 ? might be 1.0172x slower
ArrayBuffer-Int8Array-alloc 15.1373+-0.7507 ? 15.2064+-0.6375 ?
asmjs_bool_bug 5.0901+-0.1368 5.0353+-0.1861 might be 1.0109x faster
assign-custom-setter-polymorphic 2.3589+-0.0777 ? 2.3894+-0.0526 ? might be 1.0129x slower
assign-custom-setter 3.1022+-0.1024 ? 3.1182+-0.1303 ?
basic-set 9.1317+-1.2772 8.7687+-0.2894 might be 1.0414x faster
big-int-mul 2.9904+-0.1940 2.9700+-0.1380
boolean-test 2.6247+-0.2063 2.5182+-0.1577 might be 1.0423x faster
branch-fold 3.2577+-0.0550 ? 3.3245+-0.0925 ? might be 1.0205x slower
by-val-generic 7.3796+-0.1543 ? 7.4485+-0.0709 ?
call-spread-apply 12.1317+-0.3349 ? 12.1882+-0.1723 ?
call-spread-call 4.8375+-0.0777 4.8325+-0.0795
captured-assignments 0.3293+-0.0250 0.3065+-0.0194 might be 1.0745x faster
cast-int-to-double 8.2139+-0.1624 ? 8.2618+-0.2152 ?
cell-argument 10.3142+-0.2305 10.2570+-0.0835
cfg-simplify 2.5598+-0.2526 2.4996+-0.2191 might be 1.0241x faster
chain-getter-access 19.5402+-0.7694 ? 21.0107+-4.2560 ? might be 1.0753x slower
cmpeq-obj-to-obj-other 7.3638+-0.3202 7.3226+-0.1575
constant-test 4.1171+-0.0626 ? 4.1469+-0.1105 ?
DataView-custom-properties 63.4835+-1.2854 63.2479+-1.3707
delay-tear-off-arguments-strictmode 2.1216+-0.0992 2.0983+-0.0798 might be 1.0111x faster
destructuring-arguments 5.0398+-2.2027 4.5248+-0.5397 might be 1.1138x faster
destructuring-swap 4.1785+-0.1576 ? 4.2252+-0.1212 ? might be 1.0112x slower
direct-arguments-getbyval 0.5582+-0.0244 ? 0.5785+-0.0355 ? might be 1.0364x slower
double-get-by-val-out-of-bounds 3.3326+-0.0797 ? 3.3958+-0.2144 ? might be 1.0190x slower
double-pollution-getbyval 7.9317+-0.1571 ? 8.0587+-0.6736 ? might be 1.0160x slower
double-pollution-putbyoffset 3.3505+-0.0502 3.3494+-0.1109
double-to-int32-typed-array-no-inline 1.6681+-0.0728 ? 1.7090+-0.0957 ? might be 1.0245x slower
double-to-int32-typed-array 1.3768+-0.0319 ? 1.3882+-0.0640 ?
double-to-uint32-typed-array-no-inline 1.7084+-0.0609 ? 1.7646+-0.0533 ? might be 1.0329x slower
double-to-uint32-typed-array 1.4252+-0.0356 ? 1.4645+-0.0627 ? might be 1.0276x slower
empty-string-plus-int 5.8543+-0.2353 ? 5.8599+-0.1637 ?
emscripten-cube2hash 24.1876+-0.6467 ? 24.5358+-0.8620 ? might be 1.0144x slower
external-arguments-getbyval 1.1234+-0.0364 ? 1.1275+-0.0489 ?
external-arguments-putbyval 1.5861+-0.0376 1.5779+-0.1227
fixed-typed-array-storage-var-index 1.0112+-0.0393 ? 1.0138+-0.0228 ?
fixed-typed-array-storage 0.6431+-0.0435 ? 0.6665+-0.0386 ? might be 1.0364x slower
Float32Array-matrix-mult 3.8936+-0.0388 ? 3.9861+-0.4784 ? might be 1.0238x slower
Float32Array-to-Float64Array-set 47.2681+-4.1753 46.2629+-0.9537 might be 1.0217x faster
Float64Array-alloc-long-lived 62.4034+-1.5670 62.1289+-0.7957
Float64Array-to-Int16Array-set 54.9105+-1.3133 ? 55.0078+-0.2631 ?
fold-double-to-int 11.4935+-0.0516 ? 11.8701+-1.0125 ? might be 1.0328x slower
for-of-iterate-array-entries 5.5264+-0.1621 ? 5.6089+-0.5281 ? might be 1.0149x slower
for-of-iterate-array-keys 2.1867+-0.1355 ? 2.3857+-0.4778 ? might be 1.0910x slower
for-of-iterate-array-values 1.9847+-0.0625 ? 2.0040+-0.0976 ?
fround 22.5043+-0.3450 22.3303+-0.3655
function-dot-apply 1.1051+-0.2472 1.0642+-0.0609 might be 1.0384x faster
function-test 2.6726+-0.0653 2.6379+-0.0750 might be 1.0131x faster
function-with-eval 16.8524+-0.4956 16.6487+-0.3545 might be 1.0122x faster
get-by-id-chain-from-try-block 5.5303+-0.0536 ? 5.6227+-0.1750 ? might be 1.0167x slower
get-by-id-proto-or-self 11.2628+-0.6030 11.1470+-0.6664 might be 1.0104x faster
get-by-id-self-or-proto 11.3000+-0.2177 ? 11.3342+-0.3082 ?
get-by-val-out-of-bounds 3.1969+-0.0392 3.1866+-0.0619
get_callee_monomorphic 2.7919+-0.0911 ? 2.7993+-0.0988 ?
get_callee_polymorphic 2.7815+-0.1871 2.6780+-0.0679 might be 1.0387x faster
getter 10.5140+-0.1731 ? 10.6976+-0.2849 ? might be 1.0175x slower
global-var-const-infer-fire-from-opt 0.6813+-0.1792 ? 0.7118+-0.0319 ? might be 1.0447x slower
global-var-const-infer 0.5468+-0.0490 ? 0.5789+-0.1194 ? might be 1.0588x slower
HashMap-put-get-iterate-keys 21.2709+-0.4996 ? 21.3666+-0.2132 ?
HashMap-put-get-iterate 21.2148+-1.3146 ? 21.2971+-0.8254 ?
HashMap-string-put-get-iterate 25.7957+-0.6637 25.6835+-0.4569
imul-double-only 9.3182+-0.0438 9.2085+-0.1093 might be 1.0119x faster
imul-int-only 8.8632+-0.2467 ? 8.9577+-0.2918 ? might be 1.0107x slower
imul-mixed 11.8339+-0.1459 11.7974+-0.3412
in-four-cases 12.1102+-0.0464 ? 12.2891+-0.6639 ? might be 1.0148x slower
in-one-case-false 6.3717+-0.1432 ? 6.3901+-0.1092 ?
in-one-case-true 6.3008+-0.1341 ? 6.4802+-0.2394 ? might be 1.0285x slower
in-two-cases 6.7437+-0.3053 6.5772+-0.1301 might be 1.0253x faster
indexed-properties-in-objects 2.4080+-0.0527 ? 2.4578+-0.0999 ? might be 1.0207x slower
infer-closure-const-then-mov-no-inline 2.7401+-0.0363 2.7217+-0.0521
infer-closure-const-then-mov 18.2510+-1.3620 17.4486+-0.4636 might be 1.0460x faster
infer-closure-const-then-put-to-scope-no-inline
10.4136+-0.5281 10.2925+-0.1459 might be 1.0118x faster
infer-closure-const-then-put-to-scope 21.2559+-0.1137 ? 21.9273+-1.2916 ? might be 1.0316x slower
infer-closure-const-then-reenter-no-inline
48.6475+-0.4516 ? 48.8307+-0.8195 ?
infer-closure-const-then-reenter 21.8364+-0.6542 21.3676+-0.2367 might be 1.0219x faster
infer-one-time-closure-ten-vars 16.9810+-0.2948 ? 17.6420+-0.8652 ? might be 1.0389x slower
infer-one-time-closure-two-vars 17.1103+-0.5771 ? 17.1935+-0.0337 ?
infer-one-time-closure 16.8564+-0.2188 ! 17.4272+-0.2913 ! definitely 1.0339x slower
infer-one-time-deep-closure 34.2043+-1.6270 34.0645+-0.8032
inline-arguments-access 0.9550+-0.0741 ? 0.9711+-0.0556 ? might be 1.0169x slower
inline-arguments-aliased-access 1.0683+-0.0491 ? 1.0773+-0.0635 ?
inline-arguments-local-escape 11.1242+-0.1217 10.8904+-0.3473 might be 1.0215x faster
inline-get-scoped-var 3.9630+-0.1105 ? 4.0118+-0.0897 ? might be 1.0123x slower
inlined-put-by-id-transition 8.7558+-0.1617 ? 8.8181+-0.4925 ?
int-or-other-abs-then-get-by-val 5.1650+-0.1496 5.0964+-0.0715 might be 1.0134x faster
int-or-other-abs-zero-then-get-by-val 17.4343+-0.3743 17.1487+-0.3421 might be 1.0167x faster
int-or-other-add-then-get-by-val 6.6006+-0.3991 ? 6.6057+-0.1506 ?
int-or-other-add 6.1335+-0.0362 ! 6.2252+-0.0407 ! definitely 1.0149x slower
int-or-other-div-then-get-by-val 4.1525+-0.1009 ? 4.5359+-1.2117 ? might be 1.0923x slower
int-or-other-max-then-get-by-val 4.5270+-0.0592 ? 4.6199+-0.1652 ? might be 1.0205x slower
int-or-other-min-then-get-by-val 4.7253+-0.0658 ? 4.8945+-0.4785 ? might be 1.0358x slower
int-or-other-mod-then-get-by-val 3.8647+-0.0481 ? 3.8676+-0.0776 ?
int-or-other-mul-then-get-by-val 4.2692+-0.2850 4.1349+-0.0897 might be 1.0325x faster
int-or-other-neg-then-get-by-val 4.8512+-0.0716 ? 4.8620+-0.0756 ?
int-or-other-neg-zero-then-get-by-val 17.0445+-0.6193 ? 17.1204+-0.8477 ?
int-or-other-sub-then-get-by-val 6.8100+-0.1849 6.7993+-0.2825
int-or-other-sub 5.3719+-0.1693 ? 5.4512+-0.0418 ? might be 1.0148x slower
int-overflow-local 3.6978+-0.0868 ? 3.9538+-0.5597 ? might be 1.0692x slower
Int16Array-alloc-long-lived 45.3895+-1.8998 44.7217+-0.1982 might be 1.0149x faster
Int16Array-bubble-sort-with-byteLength 17.9753+-1.9966 ? 19.4047+-0.7376 ? might be 1.0795x slower
Int16Array-bubble-sort 16.3075+-0.1436 ? 16.3254+-0.3998 ?
Int16Array-load-int-mul 1.1965+-0.0420 ? 1.2754+-0.1717 ? might be 1.0659x slower
Int16Array-to-Int32Array-set 43.2715+-1.2176 42.6219+-0.8450 might be 1.0152x faster
Int32Array-alloc-large 12.7415+-1.3867 12.6401+-0.8268
Int32Array-alloc-long-lived 49.8883+-0.3697 ? 50.0630+-0.2130 ?
Int32Array-alloc 2.4424+-0.0594 ? 2.4855+-0.0960 ? might be 1.0176x slower
Int32Array-Int8Array-view-alloc 8.4851+-0.5946 8.3391+-0.2479 might be 1.0175x faster
int52-spill 6.5042+-0.1587 ? 6.6044+-0.2007 ? might be 1.0154x slower
Int8Array-alloc-long-lived 40.8506+-0.8158 40.7183+-0.6195
Int8Array-load-with-byteLength 3.1348+-0.0431 3.1196+-0.0197
Int8Array-load 3.1297+-0.0665 3.1229+-0.0654
integer-divide 10.0457+-0.3361 ? 10.2057+-0.4906 ? might be 1.0159x slower
integer-modulo 1.3184+-0.1003 1.3168+-0.0482
large-int-captured 5.4174+-0.2135 ? 5.4364+-0.1193 ?
large-int-neg 14.0964+-0.1772 ? 14.6466+-1.6208 ? might be 1.0390x slower
large-int 13.1874+-0.3276 13.1515+-0.3709
logical-not 4.0453+-0.5646 3.8853+-0.1481 might be 1.0412x faster
lots-of-fields 6.6533+-0.1082 ? 6.8633+-0.6718 ? might be 1.0316x slower
make-indexed-storage 2.2798+-0.0716 2.0951+-0.3863 might be 1.0882x faster
make-rope-cse 3.7122+-0.1525 3.6635+-0.1045 might be 1.0133x faster
marsaglia-larger-ints 65.3561+-1.9722 ? 65.7378+-1.0200 ?
marsaglia-osr-entry 28.7213+-0.9645 28.5740+-0.5672
method-on-number 17.7003+-0.6576 17.4322+-0.2939 might be 1.0154x faster
misc-strict-eq 37.1450+-1.3845 ? 37.9603+-0.3439 ? might be 1.0219x slower
negative-zero-divide 0.2499+-0.0143 ? 0.2610+-0.0328 ? might be 1.0444x slower
negative-zero-modulo 0.2535+-0.0205 0.2505+-0.0099 might be 1.0119x faster
negative-zero-negate 0.2707+-0.0592 0.2560+-0.0397 might be 1.0572x faster
nested-function-parsing 22.6801+-0.9370 21.7504+-0.7935 might be 1.0427x faster
new-array-buffer-dead 2.6555+-0.3041 2.6259+-0.1047 might be 1.0113x faster
new-array-buffer-push 6.2847+-0.1264 6.2371+-0.1371
new-array-dead 19.3544+-0.3096 19.2313+-0.7646
new-array-push 4.1797+-0.1613 ? 4.2916+-0.0853 ? might be 1.0268x slower
number-test 2.4422+-0.0393 ? 2.5016+-0.1472 ? might be 1.0243x slower
object-closure-call 4.5679+-0.3652 4.5142+-0.1235 might be 1.0119x faster
object-test 2.5947+-0.0619 2.5651+-0.0743 might be 1.0116x faster
poly-stricteq 45.2128+-1.6479 44.3875+-1.8210 might be 1.0186x faster
polymorphic-array-call 1.3683+-0.0965 1.3183+-0.0297 might be 1.0379x faster
polymorphic-get-by-id 2.5158+-0.1022 2.4898+-0.0680 might be 1.0104x faster
polymorphic-put-by-id 41.5616+-49.3263 ? 58.5696+-58.4548 ? might be 1.4092x slower
polymorphic-structure 13.1450+-0.3808 ? 13.2355+-0.2921 ?
polyvariant-monomorphic-get-by-id 4.9765+-0.1883 4.8690+-0.0827 might be 1.0221x faster
proto-getter-access 19.4456+-0.4488 ? 19.4763+-0.2443 ?
put-by-id 11.9398+-0.1511 ? 12.0521+-0.5963 ?
put-by-val-large-index-blank-indexing-type
5.8743+-0.0572 ? 6.0046+-0.2782 ? might be 1.0222x slower
put-by-val-machine-int 1.9415+-0.0820 ? 1.9819+-0.0261 ? might be 1.0208x slower
rare-osr-exit-on-local 12.8328+-0.4133 12.6913+-0.3078 might be 1.0112x faster
register-pressure-from-osr 15.8768+-0.2107 ? 15.9279+-0.2688 ?
setter 10.6903+-0.2440 ? 10.7468+-0.3192 ?
simple-activation-demo 23.5605+-0.5218 23.4045+-0.4602
simple-getter-access 31.9802+-2.4350 31.5043+-0.7733 might be 1.0151x faster
slow-array-profile-convergence 2.2819+-0.2244 ? 2.2859+-0.0866 ?
slow-convergence 2.4819+-0.0562 ? 2.5894+-0.1911 ? might be 1.0433x slower
sparse-conditional 0.9025+-0.1286 ? 0.9067+-0.0637 ?
splice-to-remove 34.7090+-1.0254 ? 35.1881+-1.5913 ? might be 1.0138x slower
string-char-code-at 12.7178+-1.0729 ? 13.5532+-0.5287 ? might be 1.0657x slower
string-concat-object 1.8017+-0.0376 ? 1.9086+-0.3089 ? might be 1.0594x slower
string-concat-pair-object 1.7284+-0.0443 ? 1.7340+-0.0454 ?
string-concat-pair-simple 10.0825+-0.0947 ? 10.5920+-1.1789 ? might be 1.0505x slower
string-concat-simple 10.7007+-0.9321 10.3337+-0.1551 might be 1.0355x faster
string-cons-repeat 6.6406+-0.0839 ? 6.6747+-0.1166 ?
string-cons-tower 6.8616+-0.1788 6.8306+-0.1878
string-equality 24.5867+-0.3643 24.4832+-1.6380
string-get-by-val-big-char 7.1893+-0.1049 7.0750+-0.1196 might be 1.0162x faster
string-get-by-val-out-of-bounds-insane 3.1706+-0.0713 ? 3.2250+-0.0501 ? might be 1.0172x slower
string-get-by-val-out-of-bounds 2.6890+-0.0451 ? 2.7777+-0.0818 ? might be 1.0330x slower
string-get-by-val 2.4312+-0.0622 ? 2.4323+-0.0076 ?
string-hash 1.5512+-0.0271 ? 1.5786+-0.0337 ? might be 1.0176x slower
string-long-ident-equality 21.9065+-0.8889 21.6010+-0.4800 might be 1.0141x faster
string-repeat-arith 25.8989+-1.3851 25.8905+-1.1841
string-sub 50.5704+-1.2057 ? 50.8372+-0.4374 ?
string-test 2.3288+-0.0712 2.2717+-0.0281 might be 1.0251x faster
string-var-equality 36.8400+-7.3311 34.3920+-0.6728 might be 1.0712x faster
structure-hoist-over-transitions 1.9705+-0.0168 ? 1.9705+-0.0354 ?
switch-char-constant 2.0526+-0.0831 ? 2.0605+-0.0338 ?
switch-char 4.5370+-0.0409 ? 4.5388+-0.0767 ?
switch-constant 6.4991+-0.1493 ? 6.5764+-0.1101 ? might be 1.0119x slower
switch-string-basic-big-var 12.2833+-0.7003 12.2460+-0.2190
switch-string-basic-big 12.8124+-0.3156 12.6300+-0.1919 might be 1.0144x faster
switch-string-basic-var 11.9585+-0.1198 ? 12.0510+-0.2383 ?
switch-string-basic 12.5106+-0.8789 11.8444+-0.0804 might be 1.0563x faster
switch-string-big-length-tower-var 18.0536+-1.1105 17.3878+-0.4741 might be 1.0383x faster
switch-string-length-tower-var 12.4534+-0.1587 ? 12.6730+-1.1704 ? might be 1.0176x slower
switch-string-length-tower 11.1302+-0.0723 ? 11.2511+-0.1786 ? might be 1.0109x slower
switch-string-short 11.0715+-0.0798 ? 11.2012+-0.2591 ? might be 1.0117x slower
switch 9.6192+-0.1459 ? 9.6927+-0.0503 ?
tear-off-arguments-simple 1.5692+-0.2965 1.4616+-0.0286 might be 1.0736x faster
tear-off-arguments 2.3118+-0.1417 2.2860+-0.0390 might be 1.0113x faster
temporal-structure 11.5048+-0.3431 ? 11.6188+-0.2192 ?
to-int32-boolean 11.4911+-0.1208 ? 11.6115+-0.2057 ? might be 1.0105x slower
undefined-test 2.4951+-0.0878 2.4440+-0.0289 might be 1.0209x faster
unprofiled-licm 32.7851+-0.9030 ? 32.9747+-0.2451 ?
weird-inlining-const-prop 1.3372+-0.0895 ? 1.3865+-0.1259 ? might be 1.0369x slower
<arithmetic> 12.8363+-0.1977 ? 12.9325+-0.2726 ? might be 1.0075x slower
<geometric> * 6.5808+-0.0440 ? 6.6054+-0.0271 ? might be 1.0037x slower
<harmonic> 3.0357+-0.0627 ? 3.0359+-0.0389 ? might be 1.0001x slower
WithoutBT WithBT
AsmBench:
bigfib.cpp 904.8443+-16.4908 ? 920.8026+-19.2952 ? might be 1.0176x slower
cray.c 518.1676+-9.0886 ? 521.0995+-5.9082 ?
dry.c 821.8760+-22.7555 815.6345+-14.5357
FloatMM.c 1414.0047+-23.0696 ? 1434.0497+-18.9260 ? might be 1.0142x slower
gcc-loops.cpp 8355.7632+-79.4811 ? 8450.5498+-127.5787 ? might be 1.0113x slower
n-body.c 1454.5997+-27.5138 ? 1457.6918+-20.9632 ?
Quicksort.c 729.0630+-11.6187 ? 730.0959+-13.2260 ?
stepanov_container.cpp 4540.4850+-79.5394 ? 4553.9360+-34.2360 ?
Towers.c 392.7635+-27.2796 384.2907+-5.8973 might be 1.0220x faster
<arithmetic> 2125.7297+-14.6593 ? 2140.9056+-8.1701 ? might be 1.0071x slower
<geometric> * 1270.1096+-6.6314 ? 1273.8272+-4.0344 ? might be 1.0029x slower
<harmonic> 908.3719+-12.2412 906.8104+-3.6583 might be 1.0017x faster
WithoutBT WithBT
All benchmarks:
<arithmetic> 126.3109+-0.2650 ? 127.2125+-1.1030 ? might be 1.0071x slower
<geometric> 11.5171+-0.0530 ? 11.5785+-0.0528 ? might be 1.0053x slower
<harmonic> 2.7865+-0.0337 ? 2.7976+-0.0230 ? might be 1.0040x slower
WithoutBT WithBT
Geomean of preferred means:
<scaled-result> 47.8749+-0.0571 ! 48.2860+-0.1744 ! definitely 1.0086x slower
Re-opened since this is blocked by bug 132670 |