Bug 76361 - DFG should be able to do JS and custom getter caching
Summary: DFG should be able to do JS and custom getter caching
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on: 76430
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-15 21:14 PST by Filip Pizlo
Modified: 2012-02-27 05:17 PST (History)
8 users (show)

See Also:


Attachments
work in progress (38.13 KB, patch)
2012-01-15 21:17 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress (43.40 KB, patch)
2012-01-15 21:41 PST, Filip Pizlo
webkit-ews: commit-queue-
Details | Formatted Diff | Diff
the patch (104.63 KB, patch)
2012-01-16 13:43 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch to fix 32-bit (1.93 KB, patch)
2012-01-17 00:18 PST, Filip Pizlo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2012-01-15 21:14:37 PST
Patch forthcoming
Comment 1 Filip Pizlo 2012-01-15 21:14:49 PST
<rdar://problem/10698060>
Comment 2 Filip Pizlo 2012-01-15 21:17:02 PST
Created attachment 122589 [details]
work in progress

Getting there.  Not there yet though.
Comment 3 Filip Pizlo 2012-01-15 21:41:25 PST
Created attachment 122591 [details]
work in progress

It's starting to work, but it needs quite a bit more testing.
Comment 4 WebKit Review Bot 2012-01-15 21:43:55 PST
Attachment 122591 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:877:  The parameter name "codeOrigin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:878:  The parameter name "codeOrigin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:880:  The parameter name "codeOrigin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h:881:  The parameter name "codeOrigin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 4 in 16 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Early Warning System Bot 2012-01-15 21:48:06 PST
Comment on attachment 122591 [details]
work in progress

Attachment 122591 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/11113970
Comment 6 Filip Pizlo 2012-01-16 13:43:39 PST
Created attachment 122678 [details]
the patch
Comment 7 Early Warning System Bot 2012-01-16 14:03:10 PST
Comment on attachment 122678 [details]
the patch

Attachment 122678 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/11254305
Comment 8 Geoffrey Garen 2012-01-16 14:39:31 PST
Comment on attachment 122678 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=122678&action=review

r=me

Qt build not happy.

> Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:131
> +            // FIXME: this is absurdly wrong.

(1) Yikes.

(2) Why don't you file a bug about what's wrong, and remove the FIXME.

> Source/JavaScriptCore/dfg/DFGOperations.cpp:835
> +    CodeOrigin codeOrigin = stubInfo->codeOrigin;
> +    while (codeOrigin.inlineCallFrame)
> +        codeOrigin = codeOrigin.inlineCallFrame->caller;

Is this loop also absurdly wrong, as the loop above is?

> Source/JavaScriptCore/dfg/DFGRepatch.cpp:283
> +        // Need extra checks.

A "why" comment would be better here, explaining that non-flushing gets must be trivial values, because non-trivial values might call to C++ code and stomp on our non-flushed registers.
Comment 9 Filip Pizlo 2012-01-16 16:26:02 PST
This is perf-neutral on the major benchmarks.



Benchmark report for SunSpider, V8, and Kraken on bigmac (MacPro5,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/DumpRenderTree (r105082)
"OldJIT" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/DumpRenderTree (r105082)
"GetterCache" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/DumpRenderTree (r105082)

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1
benchmark iteration per VM invocation for warm-up. Used the portable Date.now() method to get millisecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree                 OldJIT               GetterCache          GetterCache v. TipOfTree 
SunSpider:
   3d-cube                                7.0833+-0.8332    !     9.5833+-0.6885    ^     7.2500+-0.7722       ? might be 1.0235x slower
   3d-morph                               8.8333+-0.2473          8.3333+-0.3128    ?     8.7500+-0.2874       
   3d-raytrace                            8.5833+-0.5720    ?     9.5000+-0.5069          8.2500+-0.7722         might be 1.0404x faster
   access-binary-trees                    2.5000+-0.6354    ?     3.0833+-0.6330          2.4167+-0.4248         might be 1.0345x faster
   access-fannkuch                        7.5833+-0.4248    !    13.5833+-0.4248    ^     7.6667+-0.4138       ? might be 1.0110x slower
   access-nbody                           4.0833+-0.1834    !     7.5833+-0.3272    ^     4.2500+-0.2874       ? might be 1.0408x slower
   access-nsieve                          3.3333+-0.3128    ?     3.9167+-0.3272          3.4167+-0.3272       ? might be 1.0250x slower
   bitops-3bit-bits-in-byte               1.2500+-0.2874    !     2.5833+-0.3272    ^     1.4167+-0.3272       ? might be 1.1333x slower
   bitops-bits-in-byte                    5.5833+-0.3272    !     6.0000+-0.0000    ^     5.6667+-0.3128       ? might be 1.0149x slower
   bitops-bitwise-and                     3.3333+-0.3128    !     4.0833+-0.1834    ^     3.2500+-0.2874         might be 1.0256x faster
   bitops-nsieve-bits                     5.7500+-0.2874    ?     5.8333+-0.2473          5.7500+-0.2874       
   controlflow-recursive                  2.5833+-0.3272          2.2500+-0.2874    ?     2.5000+-0.3318         might be 1.0333x faster
   crypto-aes                             8.5000+-0.8779          7.5000+-0.7420    ?     8.1667+-1.0076         might be 1.0408x faster
   crypto-md5                             2.5000+-0.3318    !     3.0000+-0.0000          2.7500+-0.2874       ? might be 1.1000x slower
   crypto-sha1                            2.2500+-0.2874    ?     2.6667+-0.3128          2.4167+-0.3272       ? might be 1.0741x slower
   date-format-tofte                     13.0000+-1.4336    ?    13.6667+-1.3636         12.9167+-1.4704       
   date-format-xparb                     12.1667+-1.4798         11.5000+-1.3133    ?    11.9167+-1.4704         might be 1.0210x faster
   math-cordic                            7.4167+-0.3272          7.3333+-0.3128          7.3333+-0.3128         might be 1.0114x faster
   math-partial-sums                     10.6667+-0.3128         10.6667+-0.3128         10.6667+-0.3128       
   math-spectral-norm                     2.6667+-0.3128    !     4.7500+-0.2874    ^     2.6667+-0.3128       
   regexp-dna                             9.8333+-0.5956    ?     9.9167+-0.5038          9.9167+-0.5720       ?
   string-base64                          5.6667+-0.9121          5.6667+-0.9121    ?     5.7500+-0.9815       ? might be 1.0147x slower
   string-fasta                           8.0000+-0.7663    ?     8.2500+-0.7722          8.0000+-0.8567       
   string-tagcloud                       13.2500+-0.3949         13.1667+-0.4560    ?    13.5833+-0.5720       ? might be 1.0252x slower
   string-unpack-code                    23.0000+-1.3272         22.3333+-1.2513    ?    23.1667+-1.2366       ?
   string-validate-input                  7.0833+-1.0991          7.0833+-0.9171          7.0000+-0.9385         might be 1.0119x faster

   <arithmetic> *                         7.1731+-0.4415    ?     7.8397+-0.4109          7.1859+-0.4386       ? might be 1.0018x slower
   <geometric>                            5.7494+-0.3382    !     6.6279+-0.3450    ^     5.8054+-0.3348       ? might be 1.0097x slower
   <harmonic>                             4.4182+-0.2477    !     5.5129+-0.2933    ^     4.5538+-0.2810       ? might be 1.0307x slower

                                            TipOfTree                 OldJIT               GetterCache          GetterCache v. TipOfTree 
V8:
   crypto                                77.5000+-0.8779    !   200.5833+-0.5038    ^    77.5833+-0.7399       ?
   deltablue                            174.9167+-2.2416    !   280.5000+-3.4958    ^   174.4167+-2.9858       
   earley-boyer                          98.3333+-1.3364    !   122.2500+-1.0182    ^    98.2500+-1.3848       
   raytrace                              53.5833+-0.8332    !    71.3333+-0.7821    ^    53.2500+-0.9815       
   regexp                               102.0833+-0.8761    ?   102.1667+-0.5956        101.5000+-0.5069       
   richards                             141.4167+-1.8664    !   274.7500+-2.0832    ^   139.5000+-0.8350         might be 1.0137x faster
   splay                                121.1667+-45.3339   ?   131.9167+-45.9184       121.5833+-43.9626      ?

   <arithmetic>                         109.8571+-7.1039    !   169.0714+-7.3637    ^   109.4405+-7.0731         might be 1.0038x faster
   <geometric> *                        101.4766+-5.7549    !   149.0634+-7.4459    ^   101.2528+-5.7107         might be 1.0022x faster
   <harmonic>                            93.7618+-4.4638    !   131.1242+-6.4855    ^    93.6635+-4.4156         might be 1.0010x faster

                                            TipOfTree                 OldJIT               GetterCache          GetterCache v. TipOfTree 
Kraken:
   ai-astar                             820.7500+-12.3355   !  2011.6667+-26.5260   ^   820.0833+-12.2019      
   audio-beat-detection                 202.0000+-2.7761    !   555.0000+-3.5634    ^   204.2500+-1.5356       ? might be 1.0111x slower
   audio-dft                            276.6667+-1.5165    !   447.5833+-2.3690    ^   278.6667+-1.6554       ?
   audio-fft                            123.0833+-0.8761    !   426.4167+-2.5481    ^   122.2500+-0.5502       
   audio-oscillator                     279.7500+-2.5720    !   383.2500+-1.4870    ^   279.5833+-0.9939       
   imaging-darkroom                     308.5000+-0.6907    !   593.6667+-11.0526   ^   308.1667+-1.1757       
   imaging-desaturate                   230.7500+-0.2874    !   614.0000+-1.1170    ^   230.3333+-0.3128       
   imaging-gaussian-blur                507.5000+-2.6197    !  2127.9167+-3.4425    ^   506.8333+-2.6103       
   json-parse-financial                  73.2500+-5.0476         72.8333+-5.1321         72.6667+-5.4612       
   json-stringify-tinderbox              84.5833+-0.5720    ^    83.0000+-0.3831    !    84.0833+-0.6330       
   stanford-crypto-aes                  118.9167+-2.1066    !   166.5833+-1.3936    ^   119.8333+-1.5759       ?
   stanford-crypto-ccm                  122.8333+-1.2946        121.0833+-0.9563    !   123.5000+-0.9579       ?
   stanford-crypto-pbkdf2               227.2500+-2.0832    !   415.5833+-2.9611    ^   226.0833+-1.1951       
   stanford-crypto-sha256-iterative     103.3333+-3.1517    !   161.9167+-3.5784    ^   104.0833+-3.1532       ?

   <arithmetic> *                       248.5119+-1.0005    !   584.3214+-1.6714    ^   248.6012+-1.2898       ? might be 1.0004x slower
   <geometric>                          195.6625+-1.9151    !   355.5735+-2.1038    ^   195.7577+-1.9882       ? might be 1.0005x slower
   <harmonic>                           160.4673+-2.7914    !   225.7891+-4.0986    ^   160.4073+-2.8873         might be 1.0004x faster

                                            TipOfTree                 OldJIT               GetterCache          GetterCache v. TipOfTree 
All benchmarks:
   <arithmetic>                          94.3546+-1.5450    !   203.5709+-1.3371    ^    94.3262+-1.5735         might be 1.0003x faster
   <geometric>                           25.1982+-1.0981    !    34.4913+-1.3017    ^    25.3294+-1.0832       ? might be 1.0052x slower
   <harmonic>                             7.7723+-0.4299    !     9.7273+-0.5124    ^     8.0038+-0.4855       ? might be 1.0298x slower

                                            TipOfTree                 OldJIT               GetterCache          GetterCache v. TipOfTree 
Geomean of preferred means:
   <scaled-result>                       56.5109+-2.2661    !    88.0008+-2.9640    ^    56.5109+-2.2584       ? might be 1.0000x slower
Comment 10 Filip Pizlo 2012-01-16 16:29:00 PST
Performance on Dromaeo DOM Traversal nextSibling:

Old JIT ToT: 644.40runs/s ±7.18%
DFG ToT: 377.60runs/s ±0.50%
DFG ToT + this patch: 767.80runs/s ±6.80%

So a 2x speed-up in the DFG alone, and a 20% speed-up over where we would have been pre-DFG.
Comment 11 Filip Pizlo 2012-01-16 16:54:04 PST
Landed in http://trac.webkit.org/changeset/105107
Comment 12 Filip Pizlo 2012-01-16 16:55:03 PST
> Qt build not happy.

Should be fixed.  Turns out that the 32-bit build was not happy in general.

> 
> > Source/JavaScriptCore/dfg/DFGJITCompiler.cpp:131
> > +            // FIXME: this is absurdly wrong.
> 
> (1) Yikes.
> 
> (2) Why don't you file a bug about what's wrong, and remove the FIXME.

Filed a radar.

> 
> > Source/JavaScriptCore/dfg/DFGOperations.cpp:835
> > +    CodeOrigin codeOrigin = stubInfo->codeOrigin;
> > +    while (codeOrigin.inlineCallFrame)
> > +        codeOrigin = codeOrigin.inlineCallFrame->caller;
> 
> Is this loop also absurdly wrong, as the loop above is?

Yup.

> 
> > Source/JavaScriptCore/dfg/DFGRepatch.cpp:283
> > +        // Need extra checks.
> 
> A "why" comment would be better here, explaining that non-flushing gets must be trivial values, because non-trivial values might call to C++ code and stomp on our non-flushed registers.

Added a better comment.
Comment 13 Csaba Osztrogonác 2012-01-17 00:00:44 PST
Reopen, because it broke many tests on 32 bit:
http://build.webkit.org/results/Qt%20Linux%20Release/r105107%20%2842168%29/results.html
Comment 14 Filip Pizlo 2012-01-17 00:18:29 PST
Created attachment 122724 [details]
the patch to fix 32-bit

Fixed JSVALUE32_64 reversal of tag and payload when handling stub call return.

Fixed double use of GPRResult.  (Second use should always use the GPRResult2 class.)
Comment 15 Csaba Osztrogonác 2012-01-17 00:35:37 PST
Comment on attachment 122724 [details]
the patch to fix 32-bit

r=me, I tested on 32 bit Qt and it works.
Comment 16 Filip Pizlo 2012-01-17 00:37:07 PST
Comment on attachment 122724 [details]
the patch to fix 32-bit

Clearing flags because I just landed.
Comment 17 Filip Pizlo 2012-01-17 00:37:52 PST
Landed a fix for 32-bit in http://trac.webkit.org/changeset/105131
Comment 18 Csaba Osztrogonác 2012-02-27 05:17:59 PST
(In reply to comment #17)
> Landed a fix for 32-bit in http://trac.webkit.org/changeset/105131