Summary: | YARR: Multi-character read optimization for 8bit strings | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Michael Saboff <msaboff> | ||||
Component: | JavaScriptCore | Assignee: | Michael Saboff <msaboff> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Normal | Keywords: | InRadar | ||||
Priority: | P2 | ||||||
Version: | 528+ (Nightly build) | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Attachments: |
|
Description
Michael Saboff
2011-12-09 11:08:00 PST
Created attachment 118636 [details] Patch Tested a 64 bit version for X86-64 that did 1-4 characters for 16 bit strings and 1-8 characters for 8 bit strings, but that version is slower than this 32 bit version. I suspect that the reason is that there aren't any 64 bit logic and compare instructions that take 64 bit immediate values thus needing to use a temporary register. This increases the number of instructions and possibly uses more renamed registers. Using the sun spider harness, regexp-dna goes from 14.0 ms to 10.0ms (+29%). Bencher shows a greater % increase (46%). Benchmark report for SunSpider, V8, and Kraken on msaboff-pro.apple.com (MacPro5,1). VMs tested: "Conf#1" at /Volumes/Data/src/webkit.baseline/WebKitBuild/Release/jsc (r102454) "Conf#2" at /Volumes/Data/src/webkit/WebKitBuild/Release/jsc (r102471) Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. Conf#1 Conf#2 SunSpider: 3d-cube 7.3308+-0.0519 ? 7.4966+-0.1323 ? might be 1.0226x slower 3d-morph 8.5611+-0.1446 8.3932+-0.0345 might be 1.0200x faster 3d-raytrace 7.7016+-0.0525 ? 7.7624+-0.1033 ? access-binary-trees 1.5953+-0.0084 ! 1.6549+-0.0406 ! definitely 1.0374x slower access-fannkuch 7.5178+-0.0205 ? 7.5269+-0.0345 ? access-nbody 3.9638+-0.0187 3.9385+-0.0167 access-nsieve 3.2228+-0.0552 ? 3.2369+-0.0658 ? bitops-3bit-bits-in-byte 1.2403+-0.0132 ? 1.2427+-0.0162 ? bitops-bits-in-byte 4.9608+-0.0576 4.9044+-0.0057 might be 1.0115x faster bitops-bitwise-and 3.2989+-0.0196 3.2854+-0.0040 bitops-nsieve-bits 5.6875+-0.0640 5.6344+-0.0344 controlflow-recursive 2.3070+-0.0266 2.2878+-0.0129 crypto-aes 7.3081+-0.0497 ? 7.3858+-0.0454 ? might be 1.0106x slower crypto-md5 2.4666+-0.0310 ? 2.4703+-0.0317 ? crypto-sha1 2.1842+-0.0341 ? 2.2145+-0.0425 ? might be 1.0139x slower date-format-tofte 11.0334+-0.1857 10.8138+-0.1057 might be 1.0203x faster date-format-xparb 10.0543+-0.1416 ! 10.3172+-0.0833 ! definitely 1.0262x slower math-cordic 7.2241+-0.0682 7.1663+-0.0574 math-partial-sums 10.5884+-0.0742 ? 10.6348+-0.0553 ? math-spectral-norm 2.6263+-0.0304 ? 2.6483+-0.0417 ? regexp-dna 13.0668+-0.0678 ^ 8.9721+-0.0649 ^ definitely 1.4564x faster string-base64 4.2590+-0.0269 4.2281+-0.0147 string-fasta 7.2582+-0.0597 ? 7.3990+-0.0923 ? might be 1.0194x slower string-tagcloud 12.4427+-0.0885 ? 12.5360+-0.0947 ? string-unpack-code 20.8654+-0.2174 ? 21.1506+-0.2558 ? might be 1.0137x slower string-validate-input 5.6095+-0.0893 5.5625+-0.0566 <arithmetic> * 6.7067+-0.0228 ^ 6.5717+-0.0205 ^ definitely 1.0206x faster <geometric> 5.3829+-0.0197 ^ 5.3210+-0.0166 ^ definitely 1.0116x faster <harmonic> 4.2026+-0.0192 4.1980+-0.0203 Conf#1 Conf#2 V8: crypto 76.0522+-0.2519 ? 76.3492+-0.4030 ? deltablue 168.4075+-1.0626 ? 169.2640+-1.6875 ? earley-boyer 99.8663+-1.1576 ? 100.0184+-1.1615 ? raytrace 57.1163+-0.2990 ! 58.3273+-0.2589 ! definitely 1.0212x slower regexp 124.1217+-0.7097 ? 124.1842+-0.8959 ? richards 140.3142+-1.2846 139.1383+-0.6220 splay 89.6470+-1.0977 ? 91.5579+-1.2189 ? might be 1.0213x slower <arithmetic> 107.9322+-0.4103 ? 108.4056+-0.3244 ? <geometric> * 101.8911+-0.3896 ? 102.5413+-0.2725 ? <harmonic> 95.9496+-0.3665 ! 96.7913+-0.2553 ! definitely 1.0088x slower Conf#1 Conf#2 Kraken: ai-astar 827.9136+-0.9099 ^ 808.9674+-12.4051 ^ definitely 1.0234x faster audio-beat-detection 208.8627+-1.1664 207.7250+-0.6196 audio-dft 280.5823+-7.2311 277.2134+-2.9478 might be 1.0122x faster audio-fft 136.4857+-0.6441 136.2737+-0.5297 audio-oscillator 282.5474+-3.9736 ? 285.1065+-4.4506 ? imaging-darkroom 334.0548+-4.4999 ? 334.6535+-4.5716 ? imaging-desaturate 237.3633+-0.1224 ? 237.6097+-0.1365 ? imaging-gaussian-blur 626.8731+-0.7891 626.4669+-0.2731 json-parse-financial 71.9815+-0.2422 ^ 71.0415+-0.5815 ^ definitely 1.0132x faster json-stringify-tinderbox 82.6528+-0.4891 ^ 81.5976+-0.2120 ^ definitely 1.0129x faster stanford-crypto-aes 116.3760+-0.2715 ^ 115.6811+-0.1324 ^ definitely 1.0060x faster stanford-crypto-ccm 114.4137+-0.6633 ? 115.5305+-0.7824 ? stanford-crypto-pbkdf2 231.8913+-0.9906 231.8791+-0.5605 stanford-crypto-sha256-iterative 95.7288+-0.1486 ! 96.1305+-0.2364 ! definitely 1.0042x slower <arithmetic> * 260.5519+-0.7184 258.9912+-0.9973 <geometric> 199.9573+-0.5722 199.2592+-0.4427 <harmonic> 160.3298+-0.3722 159.7248+-0.3229 Conf#1 Conf#2 All benchmarks: <arithmetic> 97.3963+-0.2485 96.9272+-0.3097 <geometric> 24.4828+-0.0773 ^ 24.3243+-0.0577 ^ definitely 1.0065x faster <harmonic> 7.4052+-0.0334 7.3976+-0.0351 Conf#1 Conf#2 Geomean of preferred means: <scaled-result> 56.2572+-0.1569 ^ 55.8836+-0.1362 ^ definitely 1.0067x faster Comment on attachment 118636 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=118636&action=review > Source/JavaScriptCore/yarr/YarrJIT.cpp:728 > + { brace should follow the :, and you've over indented the code :D > Source/JavaScriptCore/yarr/YarrJIT.cpp:756 > } Can't we do a 4 character compare on 64bit? (In reply to comment #2) > (From update of attachment 118636 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=118636&action=review > > > Source/JavaScriptCore/yarr/YarrJIT.cpp:728 > > + { > > brace should follow the :, and you've over indented the code :D > > > Source/JavaScriptCore/yarr/YarrJIT.cpp:756 > > } > > Can't we do a 4 character compare on 64bit? See comments above. 64 bit code is slower and 32 bit code. Committed r102475: <http://trac.webkit.org/changeset/102475> |