Bug 74191 - YARR: Multi-character read optimization for 8bit strings
Summary: YARR: Multi-character read optimization for 8bit strings
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2011-12-09 11:08 PST by Michael Saboff
Modified: 2011-12-09 14:42 PST (History)
0 users

See Also:


Attachments
Patch (8.22 KB, patch)
2011-12-09 14:14 PST, Michael Saboff
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2011-12-09 11:08:00 PST
From <rdar://problem/10225305>

Perform directed tuning to the YarrJIT and other regular expression code to improve v8-regexp by 20% over pre-8-bit string measurements.  Improve SunSpider regexp-dna benchmark test as well.
Comment 1 Michael Saboff 2011-12-09 14:14:50 PST
Created attachment 118636 [details]
Patch

Tested a 64 bit version for X86-64 that did 1-4 characters for 16 bit strings and 1-8 characters for 8 bit strings, but that version is slower than this 32 bit version.  I suspect that the reason is that there aren't any 64 bit logic and compare instructions that take 64 bit immediate values thus needing to use a temporary register.  This increases the number of instructions and possibly uses more renamed registers.

Using the sun spider harness, regexp-dna goes from 14.0 ms to 10.0ms (+29%).

Bencher shows a greater % increase (46%). 

Benchmark report for SunSpider, V8, and Kraken on msaboff-pro.apple.com (MacPro5,1).

VMs tested:
"Conf#1" at /Volumes/Data/src/webkit.baseline/WebKitBuild/Release/jsc (r102454)
"Conf#2" at /Volumes/Data/src/webkit/WebKitBuild/Release/jsc (r102471)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in
milliseconds.

                                              Conf#1                  Conf#2                                     
SunSpider:
   3d-cube                                7.3308+-0.0519    ?     7.4966+-0.1323       ? might be 1.0226x slower
   3d-morph                               8.5611+-0.1446          8.3932+-0.0345         might be 1.0200x faster
   3d-raytrace                            7.7016+-0.0525    ?     7.7624+-0.1033       ?
   access-binary-trees                    1.5953+-0.0084    !     1.6549+-0.0406       ! definitely 1.0374x slower
   access-fannkuch                        7.5178+-0.0205    ?     7.5269+-0.0345       ?
   access-nbody                           3.9638+-0.0187          3.9385+-0.0167       
   access-nsieve                          3.2228+-0.0552    ?     3.2369+-0.0658       ?
   bitops-3bit-bits-in-byte               1.2403+-0.0132    ?     1.2427+-0.0162       ?
   bitops-bits-in-byte                    4.9608+-0.0576          4.9044+-0.0057         might be 1.0115x faster
   bitops-bitwise-and                     3.2989+-0.0196          3.2854+-0.0040       
   bitops-nsieve-bits                     5.6875+-0.0640          5.6344+-0.0344       
   controlflow-recursive                  2.3070+-0.0266          2.2878+-0.0129       
   crypto-aes                             7.3081+-0.0497    ?     7.3858+-0.0454       ? might be 1.0106x slower
   crypto-md5                             2.4666+-0.0310    ?     2.4703+-0.0317       ?
   crypto-sha1                            2.1842+-0.0341    ?     2.2145+-0.0425       ? might be 1.0139x slower
   date-format-tofte                     11.0334+-0.1857         10.8138+-0.1057         might be 1.0203x faster
   date-format-xparb                     10.0543+-0.1416    !    10.3172+-0.0833       ! definitely 1.0262x slower
   math-cordic                            7.2241+-0.0682          7.1663+-0.0574       
   math-partial-sums                     10.5884+-0.0742    ?    10.6348+-0.0553       ?
   math-spectral-norm                     2.6263+-0.0304    ?     2.6483+-0.0417       ?
   regexp-dna                            13.0668+-0.0678    ^     8.9721+-0.0649       ^ definitely 1.4564x faster
   string-base64                          4.2590+-0.0269          4.2281+-0.0147       
   string-fasta                           7.2582+-0.0597    ?     7.3990+-0.0923       ? might be 1.0194x slower
   string-tagcloud                       12.4427+-0.0885    ?    12.5360+-0.0947       ?
   string-unpack-code                    20.8654+-0.2174    ?    21.1506+-0.2558       ? might be 1.0137x slower
   string-validate-input                  5.6095+-0.0893          5.5625+-0.0566       

   <arithmetic> *                         6.7067+-0.0228    ^     6.5717+-0.0205       ^ definitely 1.0206x faster
   <geometric>                            5.3829+-0.0197    ^     5.3210+-0.0166       ^ definitely 1.0116x faster
   <harmonic>                             4.2026+-0.0192          4.1980+-0.0203       

                                              Conf#1                  Conf#2                                     
V8:
   crypto                                76.0522+-0.2519    ?    76.3492+-0.4030       ?
   deltablue                            168.4075+-1.0626    ?   169.2640+-1.6875       ?
   earley-boyer                          99.8663+-1.1576    ?   100.0184+-1.1615       ?
   raytrace                              57.1163+-0.2990    !    58.3273+-0.2589       ! definitely 1.0212x slower
   regexp                               124.1217+-0.7097    ?   124.1842+-0.8959       ?
   richards                             140.3142+-1.2846        139.1383+-0.6220       
   splay                                 89.6470+-1.0977    ?    91.5579+-1.2189       ? might be 1.0213x slower

   <arithmetic>                         107.9322+-0.4103    ?   108.4056+-0.3244       ?
   <geometric> *                        101.8911+-0.3896    ?   102.5413+-0.2725       ?
   <harmonic>                            95.9496+-0.3665    !    96.7913+-0.2553       ! definitely 1.0088x slower

                                              Conf#1                  Conf#2                                     
Kraken:
   ai-astar                             827.9136+-0.9099    ^   808.9674+-12.4051      ^ definitely 1.0234x faster
   audio-beat-detection                 208.8627+-1.1664        207.7250+-0.6196       
   audio-dft                            280.5823+-7.2311        277.2134+-2.9478         might be 1.0122x faster
   audio-fft                            136.4857+-0.6441        136.2737+-0.5297       
   audio-oscillator                     282.5474+-3.9736    ?   285.1065+-4.4506       ?
   imaging-darkroom                     334.0548+-4.4999    ?   334.6535+-4.5716       ?
   imaging-desaturate                   237.3633+-0.1224    ?   237.6097+-0.1365       ?
   imaging-gaussian-blur                626.8731+-0.7891        626.4669+-0.2731       
   json-parse-financial                  71.9815+-0.2422    ^    71.0415+-0.5815       ^ definitely 1.0132x faster
   json-stringify-tinderbox              82.6528+-0.4891    ^    81.5976+-0.2120       ^ definitely 1.0129x faster
   stanford-crypto-aes                  116.3760+-0.2715    ^   115.6811+-0.1324       ^ definitely 1.0060x faster
   stanford-crypto-ccm                  114.4137+-0.6633    ?   115.5305+-0.7824       ?
   stanford-crypto-pbkdf2               231.8913+-0.9906        231.8791+-0.5605       
   stanford-crypto-sha256-iterative      95.7288+-0.1486    !    96.1305+-0.2364       ! definitely 1.0042x slower

   <arithmetic> *                       260.5519+-0.7184        258.9912+-0.9973       
   <geometric>                          199.9573+-0.5722        199.2592+-0.4427       
   <harmonic>                           160.3298+-0.3722        159.7248+-0.3229       

                                              Conf#1                  Conf#2                                     
All benchmarks:
   <arithmetic>                          97.3963+-0.2485         96.9272+-0.3097       
   <geometric>                           24.4828+-0.0773    ^    24.3243+-0.0577       ^ definitely 1.0065x faster
   <harmonic>                             7.4052+-0.0334          7.3976+-0.0351       

                                              Conf#1                  Conf#2                                     
Geomean of preferred means:
   <scaled-result>                       56.2572+-0.1569    ^    55.8836+-0.1362       ^ definitely 1.0067x faster
Comment 2 Oliver Hunt 2011-12-09 14:23:01 PST
Comment on attachment 118636 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=118636&action=review

> Source/JavaScriptCore/yarr/YarrJIT.cpp:728
> +                {

brace should follow the :, and you've over indented the code :D

> Source/JavaScriptCore/yarr/YarrJIT.cpp:756
>              }

Can't we do a 4 character compare on 64bit?
Comment 3 Michael Saboff 2011-12-09 14:37:44 PST
(In reply to comment #2)
> (From update of attachment 118636 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=118636&action=review
> 
> > Source/JavaScriptCore/yarr/YarrJIT.cpp:728
> > +                {
> 
> brace should follow the :, and you've over indented the code :D
> 
> > Source/JavaScriptCore/yarr/YarrJIT.cpp:756
> >              }
> 
> Can't we do a 4 character compare on 64bit?

See comments above.  64 bit code is slower and 32 bit code.
Comment 4 Michael Saboff 2011-12-09 14:42:52 PST
Committed r102475: <http://trac.webkit.org/changeset/102475>