Bug 162879 - Creating pcToOriginMap in FTL shouldn't insert unnecessary NOPs
Summary: Creating pcToOriginMap in FTL shouldn't insert unnecessary NOPs
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-03 14:38 PDT by Michael Saboff
Modified: 2016-10-03 14:53 PDT (History)
4 users (show)

See Also:


Attachments
Patch (3.41 KB, patch)
2016-10-03 14:44 PDT, Michael Saboff
fpizlo: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2016-10-03 14:38:18 PDT
In AirGenerate.cpp near line 196, we use standard macro assembler labels to get the current PC:
    pcToOriginMap.appendItem(jit.label(), ...);

If we recently created a watchpoint label, via labelForWatchpoint(), then we'll not pad when creating the label for pcToOriginMap.  This isn't necessary.

Instead, we should use jit.labelIgnoringWatchpoints() for pcToOriginMap's labels.
Comment 1 Michael Saboff 2016-10-03 14:44:48 PDT
Created attachment 290524 [details]
Patch
Comment 2 Michael Saboff 2016-10-03 14:47:56 PDT
Performance results of patch.  This shows a neutral to possible slight speedup on Octane.

VMs tested:
"Baseline" at /Volumes/Data/src/webkit.baseline/WebKitBuild/Release/JavaScriptCore.framework/Versions/A/Resources/jsc
"ReducedNops" at /Volumes/Data/src/webkit/WebKitBuild/Release/JavaScriptCore.framework/Versions/A/Resources/jsc

Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to
get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                 Baseline                ReducedNops                                    
SunSpider:
   3d-cube                                    8.4927+-0.2201            8.4569+-0.3008        
   3d-morph                                   8.2876+-0.0970            8.2708+-0.1228        
   3d-raytrace                                8.4772+-0.1849     ?      8.5529+-0.1157        ?
   access-binary-trees                        3.1561+-0.1318            3.0818+-0.0988          might be 1.0241x faster
   access-fannkuch                            8.5353+-0.3750     ?      8.5602+-0.2261        ?
   access-nbody                               4.3588+-0.0829     ?      4.4079+-0.1106        ? might be 1.0113x slower
   access-nsieve                              4.6929+-0.1068     ?      4.7037+-0.1082        ?
   bitops-3bit-bits-in-byte                   1.7280+-0.0614     ?      1.7440+-0.0493        ?
   bitops-bits-in-byte                        5.3063+-0.0667     ?      5.4083+-0.0655        ? might be 1.0192x slower
   bitops-bitwise-and                         2.7582+-0.0843     ?      2.8231+-0.0581        ? might be 1.0235x slower
   bitops-nsieve-bits                         4.7888+-0.1007            4.7659+-0.0748        
   controlflow-recursive                      3.5892+-0.0747     ?      3.6187+-0.0766        ?
   crypto-aes                                 7.2152+-0.1444            7.1528+-0.1348        
   crypto-md5                                 4.4246+-0.1020            4.4095+-0.1500        
   crypto-sha1                                4.4526+-0.0679     !      4.6137+-0.0745        ! definitely 1.0362x slower
   date-format-tofte                         14.2963+-0.3398           14.2701+-0.2102        
   date-format-xparb                          7.7327+-0.0951     ?      7.7615+-0.1554        ?
   math-cordic                                4.1716+-0.0670     ?      4.1983+-0.1158        ?
   math-partial-sums                          9.0604+-0.0999            9.0447+-0.0844        
   math-spectral-norm                         3.2773+-0.0997            3.2567+-0.0873        
   regexp-dna                                10.5064+-0.1815           10.4241+-0.1226        
   string-base64                              6.8975+-0.0620     ?      6.9925+-0.0856        ? might be 1.0138x slower
   string-fasta                               9.2126+-0.1079            9.1795+-0.1092        
   string-tagcloud                           13.5916+-0.1649     ?     13.6450+-0.2125        ?
   string-unpack-code                        26.8644+-0.3241           26.8144+-0.2583        
   string-validate-input                      6.6886+-0.0874            6.6695+-0.0890        

   <arithmetic>                               7.4063+-0.0279     ?      7.4164+-0.0330        ? might be 1.0014x slower

                                                 Baseline                ReducedNops                                    
Octane:
   encrypt                                   0.27579+-0.00106    ?     0.27656+-0.00036       ?
   decrypt                                   5.10833+-0.03781          5.07763+-0.05787       
   deltablue                        x2       0.21972+-0.00619    ?     0.22088+-0.00790       ?
   earley                                    0.45393+-0.00336          0.45152+-0.00161       
   boyer                                     8.05405+-0.06061    ?     8.06141+-0.05228       ?
   navier-stokes                    x2       6.48237+-0.01023    ?     6.48366+-0.01257       ?
   raytrace                         x2       1.26735+-0.01108          1.25816+-0.00740       
   richards                         x2       0.14584+-0.00255          0.14379+-0.00139         might be 1.0142x faster
   splay                            x2       0.55314+-0.00466          0.54653+-0.00251         might be 1.0121x faster
   regexp                           x2      28.82025+-0.24989         28.71947+-0.38194       
   pdfjs                            x2      65.33183+-0.38564         65.30303+-0.17080       
   mandreel                         x2      69.00627+-0.34301         68.73104+-0.22256       
   gbemu                            x2      64.55220+-6.32640         60.84211+-1.77502         might be 1.0610x faster
   closure                                   0.79790+-0.00121    !     0.80622+-0.00523       ! definitely 1.0104x slower
   jquery                                   10.56091+-0.04150         10.53725+-0.02327       
   box2d                            x2      16.89407+-0.03999    ^    16.66270+-0.06511       ^ definitely 1.0139x faster
   zlib                             x2     543.56729+-16.39174   ?   547.80662+-13.38795      ?
   typescript                       x2    1138.51814+-9.00853    ?  1141.69075+-8.76392       ?

   <geometric>                               8.47344+-0.04978          8.42189+-0.02698         might be 1.0061x faster

                                                 Baseline                ReducedNops                                    
Kraken:
   ai-astar                                  141.622+-1.036      ?     142.413+-0.553         ?
   audio-beat-detection                       64.743+-0.508             64.150+-0.496         
   audio-dft                                 129.453+-1.837      ?     129.635+-0.468         ?
   audio-fft                                  52.363+-0.448             51.915+-0.294         
   audio-oscillator                           76.678+-0.236      ?      76.769+-0.250         ?
   imaging-darkroom                           96.780+-0.331             96.472+-0.131         
   imaging-desaturate                         91.117+-0.322      ?      91.216+-0.274         ?
   imaging-gaussian-blur                     112.888+-4.577      ?     114.224+-4.722         ? might be 1.0118x slower
   json-parse-financial                       61.056+-0.294      ?      61.089+-0.386         ?
   json-stringify-tinderbox                   39.834+-0.172      ?      39.834+-0.146         ?
   stanford-crypto-aes                        64.406+-0.813             63.362+-0.810           might be 1.0165x faster
   stanford-crypto-ccm                        59.174+-1.134             58.481+-2.039           might be 1.0119x faster
   stanford-crypto-pbkdf2                    155.361+-2.938            152.626+-3.681           might be 1.0179x faster
   stanford-crypto-sha256-iterative           49.942+-0.505             49.711+-0.496         

   <arithmetic>                               85.387+-0.313             85.135+-0.517           might be 1.0030x faster

                                                 Baseline                ReducedNops                                    
Geomean of preferred means:
   <scaled-result>                           17.4989+-0.0423           17.4542+-0.0506          might be 1.0026x faster
Comment 3 Filip Pizlo 2016-10-03 14:49:38 PDT
Comment on attachment 290524 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290524&action=review

> Source/JavaScriptCore/b3/testb3.cpp:13295
> +            AllowMacroScratchRegisterUsage allowScratch(jit);

Remove

> Source/JavaScriptCore/b3/testb3.cpp:13302
> +            AllowMacroScratchRegisterUsage allowScratch(jit);

Remove.
Comment 4 Michael Saboff 2016-10-03 14:53:37 PDT
Committed r206752: <http://trac.webkit.org/changeset/206752>