Bug 82155

Summary: DFG OSR exit value recoveries should be computed lazily
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, eric, ggaren, rakuco, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
work in progress
none
work in progress
none
work in progress - a different approach
none
work in progress
none
work in progress
none
work in progress
none
work in progress
none
work in progress
none
almost done
webkit-ews: commit-queue-
the patch
webkit-ews: commit-queue-
the patch
barraclough: review+
start of rebasing none

Description Filip Pizlo 2012-03-25 18:22:55 PDT
Currently each OSR exit has a complete record of the value recoveries of all bytecode variables live at that exit. Each time a speculation check is emitted, we compute all of these value recoveries and save them.

This takes unnecessary space (since the value recovery for a variable at exit site K is likely to be identical to the value recovery for that variable at exit site K + 1) and unnecessary time (since we recompute those identical values repeatedly).

Instead, CodeBlock should record the sequence of variable-related changes in the DFG's representation of values. When an OSR exit is generated, the value recoveries can be lazily reconstructed from this sequence.
Comment 1 Filip Pizlo 2012-03-25 18:25:08 PDT
Created attachment 133708 [details]
work in progress
Comment 2 Filip Pizlo 2012-03-25 22:03:39 PDT
Created attachment 133728 [details]
work in progress
Comment 3 Filip Pizlo 2012-03-26 00:47:09 PDT
Created attachment 133744 [details]
work in progress - a different approach

Taking a different approach, where we record:

1) The subset of the data flow graph that we need to have to compute value recoveries.

2) A stream of events including: fills, spills, deaths, setlocals, and movhints.

Fills, spills, and deaths only need to be recorded for nodes that are interesting to value recoveries.

This allows us to reconstruct the generation infos at any point after the fact. It also gives us the value source vectors. Together that allows us to execute the equivalent computeValueRecovery.
Comment 4 Filip Pizlo 2012-03-26 16:02:11 PDT
Created attachment 133917 [details]
work in progress
Comment 5 Filip Pizlo 2012-03-27 13:31:40 PDT
Created attachment 134124 [details]
work in progress
Comment 6 Filip Pizlo 2012-03-27 14:57:57 PDT
Created attachment 134143 [details]
work in progress
Comment 7 Filip Pizlo 2012-03-27 16:48:43 PDT
Created attachment 134166 [details]
work in progress

It compiles.
Comment 8 Filip Pizlo 2012-03-27 17:42:02 PDT
Created attachment 134184 [details]
work in progress

It ran its first program.
Comment 9 Filip Pizlo 2012-03-28 02:21:43 PDT
It would appear that I did get the SunSpider speed-up that I was looking for, but it's marginal.  I may be fooling myself.



[pizlo@wartooth Internal] Tools/Scripts/run-jsc-benchmarks TipOfTree:/Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc VariableStream:/Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc --sunspider --inner 1 --outer 100 --remote oldmac,bigmac --local
Copying TipOfTree into /Volumes/Data/pizlo/Internal/BenchmarkTemp/benchdata...
Copying VariableStream into /Volumes/Data/pizlo/Internal/BenchmarkTemp/benchdata...
All VMs are in place.
Packaging benchmarking directory for remote hosts...
Sending benchmark payload to oldmac...
Running on oldmac...
5200/5200                                                     
Generating benchmark report at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_oldmac_20120328_0209_report.txt
And raw data at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_oldmac_20120328_0209.json

Benchmark report for SunSpider on oldmac (MacPro4,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r112326)
"VariableStream" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r112326)

Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times
with 95% confidence intervals in milliseconds.

                                  TipOfTree               VariableStream                                  

3d-cube                         9.0728+-0.0142     ^      8.9108+-0.0177        ^ definitely 1.0182x faster
3d-morph                        8.9181+-0.0310     ?      8.9607+-0.1729        ?
3d-raytrace                    11.3844+-0.0367     ^     11.0494+-0.0237        ^ definitely 1.0303x faster
access-binary-trees             2.1083+-0.0047     ^      2.0977+-0.0057        ^ definitely 1.0051x faster
access-fannkuch                 9.2028+-0.0145     ?      9.2028+-0.0040        ?
access-nbody                    4.6560+-0.0118     ^      4.5568+-0.0063        ^ definitely 1.0218x faster
access-nsieve                   4.2503+-0.0103     ?      4.2575+-0.0108        ?
bitops-3bit-bits-in-byte        1.6304+-0.0063     ?      1.6315+-0.0072        ?
bitops-bits-in-byte             6.5841+-0.0149     ?      6.6019+-0.0141        ?
bitops-bitwise-and              4.0202+-0.0069            4.0202+-0.0067        
bitops-nsieve-bits              3.8827+-0.0050     !      3.8969+-0.0062        ! definitely 1.0037x slower
controlflow-recursive           2.8471+-0.0082     ?      2.8581+-0.0094        ?
crypto-aes                      9.6587+-0.0135     ^      9.5758+-0.0088        ^ definitely 1.0087x faster
crypto-md5                      3.8811+-0.0046     ^      3.8669+-0.0070        ^ definitely 1.0037x faster
crypto-sha1                     3.0996+-0.0056     !      3.1170+-0.0068        ! definitely 1.0056x slower
date-format-tofte              13.7325+-0.0341     ^     13.5543+-0.0398        ^ definitely 1.0131x faster
date-format-xparb              12.8675+-0.0363     ?     12.8921+-0.0450        ?
math-cordic                     5.0040+-0.0121     ?      5.0187+-0.0141        ?
math-partial-sums              10.7343+-0.0173           10.7094+-0.0339        
math-spectral-norm              3.2941+-0.0075     ^      3.2787+-0.0065        ^ definitely 1.0047x faster
regexp-dna                     11.9019+-0.0179     ?     11.9217+-0.0216        ?
string-base64                   5.6682+-0.0115            5.6625+-0.0183        
string-fasta                    8.6603+-0.0114     !      8.6959+-0.0165        ! definitely 1.0041x slower
string-tagcloud                15.6934+-0.0207           15.6509+-0.0222        
string-unpack-code             26.7153+-0.0359     ?     26.7949+-0.0713        ?
string-validate-input           7.7287+-0.0178     !      7.8270+-0.0264        ! definitely 1.0127x slower

<arithmetic> *                  7.9691+-0.0034     ^      7.9465+-0.0080        ^ definitely 1.0028x faster
<geometric>                     6.4588+-0.0029     ^      6.4428+-0.0045        ^ definitely 1.0025x faster
<harmonic>                      5.1958+-0.0037     ^      5.1872+-0.0040        ^ definitely 1.0017x faster

Sending benchmark payload to bigmac...
Running on bigmac...
5200/5200                                                     
Generating benchmark report at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_bigmac_20120328_0215_report.txt
And raw data at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_bigmac_20120328_0215.json

Benchmark report for SunSpider on bigmac (MacPro5,1).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r112326)
"VariableStream" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r112326)

Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times
with 95% confidence intervals in milliseconds.

                                  TipOfTree               VariableStream                                  

3d-cube                         7.3357+-0.0072     ^      7.2197+-0.0081        ^ definitely 1.0161x faster
3d-morph                        7.3669+-0.0173            7.3593+-0.0180        
3d-raytrace                     9.3486+-0.0254     ^      9.1235+-0.0109        ^ definitely 1.0247x faster
access-binary-trees             1.7455+-0.0069            1.7412+-0.0064        
access-fannkuch                 7.6001+-0.0031     !      7.6238+-0.0099        ! definitely 1.0031x slower
access-nbody                    3.8549+-0.0053     ^      3.7814+-0.0027        ^ definitely 1.0194x faster
access-nsieve                   3.5398+-0.0073     ?      3.5512+-0.0075        ?
bitops-3bit-bits-in-byte        1.3473+-0.0099     ?      1.3508+-0.0049        ?
bitops-bits-in-byte             5.4733+-0.0099            5.4666+-0.0108        
bitops-bitwise-and              3.3319+-0.0071     ?      3.3414+-0.0091        ?
bitops-nsieve-bits              3.2265+-0.0085     ?      3.2360+-0.0081        ?
controlflow-recursive           2.3347+-0.0077     ?      2.3389+-0.0081        ?
crypto-aes                      7.8193+-0.0081     ^      7.7482+-0.0096        ^ definitely 1.0092x faster
crypto-md5                      3.2321+-0.0055            3.2226+-0.0062        
crypto-sha1                     2.5992+-0.0080     !      2.6214+-0.0087        ! definitely 1.0086x slower
date-format-tofte              11.0332+-0.0229     ^     10.8673+-0.0198        ^ definitely 1.0153x faster
date-format-xparb              10.3785+-0.0556           10.3312+-0.0383        
math-cordic                     4.1416+-0.0052     ?      4.1527+-0.0081        ?
math-partial-sums               8.9177+-0.0103     ^      8.8960+-0.0055        ^ definitely 1.0024x faster
math-spectral-norm              2.7571+-0.0087            2.7448+-0.0084        
regexp-dna                      9.7353+-0.0121     !      9.7758+-0.0154        ! definitely 1.0042x slower
string-base64                   4.5591+-0.0102            4.5506+-0.0145        
string-fasta                    7.1483+-0.0111     ?      7.1703+-0.0136        ?
string-tagcloud                12.8586+-0.0179     ?     12.8755+-0.0137        ?
string-unpack-code             21.5617+-0.0292     ?     21.5697+-0.0234        ?
string-validate-input           6.1975+-0.0159     !      6.2321+-0.0129        ! definitely 1.0056x slower

<arithmetic> *                  6.5171+-0.0030     ^      6.4958+-0.0025        ^ definitely 1.0033x faster
<geometric>                     5.3081+-0.0025     ^      5.2954+-0.0021        ^ definitely 1.0024x faster
<harmonic>                      4.2882+-0.0038            4.2842+-0.0029          might be 1.0009x faster

Running locally...
5200/5200                                                     
Generating benchmark report at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_wartooth_20120328_0217_report.txt
And raw data at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_wartooth_20120328_0217.json

Benchmark report for SunSpider on wartooth (MacBookPro8,2).

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r112326)
"VariableStream" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r112326)

Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times
with 95% confidence intervals in milliseconds.

                                  TipOfTree               VariableStream                                  

3d-cube                         6.7561+-0.1459     ^      6.4877+-0.0579        ^ definitely 1.0414x faster
3d-morph                        6.9644+-0.2915     ?      7.0522+-0.3207        ? might be 1.0126x slower
3d-raytrace                     8.6955+-0.0818            8.5678+-0.1092          might be 1.0149x faster
access-binary-trees             1.7009+-0.0353            1.6882+-0.0357        
access-fannkuch                 6.4264+-0.1183     ?      6.6163+-0.2240        ? might be 1.0295x slower
access-nbody                    3.2247+-0.0731            3.1766+-0.0891          might be 1.0151x faster
access-nsieve                   3.0165+-0.0859            2.9559+-0.0461          might be 1.0205x faster
bitops-3bit-bits-in-byte        1.4353+-0.0391            1.3867+-0.0205          might be 1.0350x faster
bitops-bits-in-byte             2.3393+-0.0468     ?      2.3406+-0.0446        ?
bitops-bitwise-and              3.7230+-0.1279            3.6903+-0.1132        
bitops-nsieve-bits              3.3064+-0.1035            3.2306+-0.0606          might be 1.0235x faster
controlflow-recursive           2.0716+-0.0761            2.0475+-0.0671          might be 1.0118x faster
crypto-aes                      7.6965+-0.1193            7.5897+-0.1252          might be 1.0141x faster
crypto-md5                      2.9156+-0.0714            2.8466+-0.0213          might be 1.0242x faster
crypto-sha1                     2.4435+-0.0637            2.4361+-0.0451        
date-format-tofte              10.2863+-0.0519           10.2762+-0.0532        
date-format-xparb               9.5949+-0.0712     ^      9.4645+-0.0514        ^ definitely 1.0138x faster
math-cordic                     3.7434+-0.1035     ?      3.7831+-0.1230        ? might be 1.0106x slower
math-partial-sums               6.4855+-0.1586            6.4043+-0.0624          might be 1.0127x faster
math-spectral-norm              2.5343+-0.0608            2.5067+-0.0745          might be 1.0110x faster
regexp-dna                      8.4074+-0.0721     ?      8.5258+-0.1156        ? might be 1.0141x slower
string-base64                   4.6353+-0.0910            4.5789+-0.0490          might be 1.0123x faster
string-fasta                    6.4247+-0.0936            6.4110+-0.0406        
string-tagcloud                11.2349+-0.0512     !     11.4188+-0.0542        ! definitely 1.0164x slower
string-unpack-code             19.4910+-0.0821           19.4458+-0.0712        
string-validate-input           5.9134+-0.0436     ?      6.0783+-0.1735        ? might be 1.0279x slower

<arithmetic> *                  5.8256+-0.0212            5.8079+-0.0232          might be 1.0030x faster
<geometric>                     4.7335+-0.0155            4.7062+-0.0166          might be 1.0058x faster
<harmonic>                      3.8626+-0.0135     ^      3.8303+-0.0134        ^ definitely 1.0084x faster
Comment 10 Filip Pizlo 2012-03-28 02:38:28 PDT
Created attachment 134245 [details]
almost done

It works, but still needs the 32-bit version of some of the code.
Comment 11 WebKit Review Bot 2012-03-28 02:41:21 PDT
Attachment 134245 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/CMakeLists.txt', u'S..." exit_code: 1
Source/JavaScriptCore/dfg/DFGVariableEvent.h:235:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Source/JavaScriptCore/dfg/DFGMinifiedNode.h:58:  The parameter name "nodeIndex" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGMinifiedNode.h:58:  The parameter name "node" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGMinifiedNode.h:65:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Source/JavaScriptCore/dfg/DFGVariableEventStream.h:52:  The parameter name "codeBlock" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGVariableEventStream.h:52:  The parameter name "codeOrigin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGVariableEventStream.h:52:  The parameter name "graph" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGVariableEventStream.h:53:  The parameter name "operands" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGVariableEventStream.h:56:  The parameter name "event" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/dfg/DFGVariableEvent.cpp:31:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 10 in 38 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Early Warning System Bot 2012-03-28 02:45:33 PDT
Comment on attachment 134245 [details]
almost done

Attachment 134245 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/12152070
Comment 13 Early Warning System Bot 2012-03-28 02:47:44 PDT
Comment on attachment 134245 [details]
almost done

Attachment 134245 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/12146948
Comment 14 Filip Pizlo 2012-03-28 03:36:48 PDT
Created attachment 134253 [details]
the patch

Looks like it's done.
Comment 15 Early Warning System Bot 2012-03-28 03:43:31 PDT
Comment on attachment 134253 [details]
the patch

Attachment 134253 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/12155053
Comment 16 Early Warning System Bot 2012-03-28 03:43:53 PDT
Comment on attachment 134253 [details]
the patch

Attachment 134253 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/12155054
Comment 17 Filip Pizlo 2012-03-28 03:47:36 PDT
Created attachment 134256 [details]
the patch

Fixing Qt build
Comment 18 Gavin Barraclough 2012-05-28 01:34:32 PDT
Comment on attachment 134256 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=134256&action=review

It seems that it might be cleaner at some point to refactor GenerationInfo, so that rather than being a class representing a single value & having a Vector of them, to instead have GenerationInfo be a container holding info for N variables, and make all methods be passed the virtual register to be operated on.  Then the generation info could hold a reference to m_stream as a member, which would better encapsulate the implementation.

Apologies again on the delay on reviewing this, r+, please re-test performance before landing.

> Source/JavaScriptCore/bytecode/Operands.h:199
> +    // Use const-cast because:

:-(

> Source/JavaScriptCore/dfg/DFGMinifiedNode.h:42
> +    switch (type) {

Given that this happens around the node generation loop, I wonder if it might be worth trying to make this a faster test than a switch?
Comment 19 Filip Pizlo 2012-07-02 15:23:07 PDT
Created attachment 150482 [details]
start of rebasing

It probably doesn't even compile yet.
Comment 20 Filip Pizlo 2012-07-02 18:27:01 PDT
(In reply to comment #18)
> (From update of attachment 134256 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=134256&action=review
> 
> It seems that it might be cleaner at some point to refactor GenerationInfo, so that rather than being a class representing a single value & having a Vector of them, to instead have GenerationInfo be a container holding info for N variables, and make all methods be passed the virtual register to be operated on.  Then the generation info could hold a reference to m_stream as a member, which would better encapsulate the implementation.
> 
> Apologies again on the delay on reviewing this, r+, please re-test performance before landing.

The performance still looks good!
Comment 21 Filip Pizlo 2012-07-02 18:28:03 PDT
Landed in http://trac.webkit.org/changeset/121717