RESOLVED FIXED Bug 82155
DFG OSR exit value recoveries should be computed lazily
https://bugs.webkit.org/show_bug.cgi?id=82155
Summary DFG OSR exit value recoveries should be computed lazily
Filip Pizlo
Reported 2012-03-25 18:22:55 PDT
Currently each OSR exit has a complete record of the value recoveries of all bytecode variables live at that exit. Each time a speculation check is emitted, we compute all of these value recoveries and save them. This takes unnecessary space (since the value recovery for a variable at exit site K is likely to be identical to the value recovery for that variable at exit site K + 1) and unnecessary time (since we recompute those identical values repeatedly). Instead, CodeBlock should record the sequence of variable-related changes in the DFG's representation of values. When an OSR exit is generated, the value recoveries can be lazily reconstructed from this sequence.
Attachments
work in progress (15.49 KB, patch)
2012-03-25 18:25 PDT, Filip Pizlo
no flags
work in progress (33.14 KB, patch)
2012-03-25 22:03 PDT, Filip Pizlo
no flags
work in progress - a different approach (17.37 KB, patch)
2012-03-26 00:47 PDT, Filip Pizlo
no flags
work in progress (43.84 KB, patch)
2012-03-26 16:02 PDT, Filip Pizlo
no flags
work in progress (77.84 KB, patch)
2012-03-27 13:31 PDT, Filip Pizlo
no flags
work in progress (122.19 KB, patch)
2012-03-27 14:57 PDT, Filip Pizlo
no flags
work in progress (134.44 KB, patch)
2012-03-27 16:48 PDT, Filip Pizlo
no flags
work in progress (146.51 KB, patch)
2012-03-27 17:42 PDT, Filip Pizlo
no flags
almost done (156.71 KB, patch)
2012-03-28 02:38 PDT, Filip Pizlo
webkit-ews: commit-queue-
the patch (180.14 KB, patch)
2012-03-28 03:36 PDT, Filip Pizlo
webkit-ews: commit-queue-
the patch (180.16 KB, patch)
2012-03-28 03:47 PDT, Filip Pizlo
barraclough: review+
start of rebasing (182.35 KB, patch)
2012-07-02 15:23 PDT, Filip Pizlo
no flags
Filip Pizlo
Comment 1 2012-03-25 18:25:08 PDT
Created attachment 133708 [details] work in progress
Filip Pizlo
Comment 2 2012-03-25 22:03:39 PDT
Created attachment 133728 [details] work in progress
Filip Pizlo
Comment 3 2012-03-26 00:47:09 PDT
Created attachment 133744 [details] work in progress - a different approach Taking a different approach, where we record: 1) The subset of the data flow graph that we need to have to compute value recoveries. 2) A stream of events including: fills, spills, deaths, setlocals, and movhints. Fills, spills, and deaths only need to be recorded for nodes that are interesting to value recoveries. This allows us to reconstruct the generation infos at any point after the fact. It also gives us the value source vectors. Together that allows us to execute the equivalent computeValueRecovery.
Filip Pizlo
Comment 4 2012-03-26 16:02:11 PDT
Created attachment 133917 [details] work in progress
Filip Pizlo
Comment 5 2012-03-27 13:31:40 PDT
Created attachment 134124 [details] work in progress
Filip Pizlo
Comment 6 2012-03-27 14:57:57 PDT
Created attachment 134143 [details] work in progress
Filip Pizlo
Comment 7 2012-03-27 16:48:43 PDT
Created attachment 134166 [details] work in progress It compiles.
Filip Pizlo
Comment 8 2012-03-27 17:42:02 PDT
Created attachment 134184 [details] work in progress It ran its first program.
Filip Pizlo
Comment 9 2012-03-28 02:21:43 PDT
It would appear that I did get the SunSpider speed-up that I was looking for, but it's marginal. I may be fooling myself. [pizlo@wartooth Internal] Tools/Scripts/run-jsc-benchmarks TipOfTree:/Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc VariableStream:/Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc --sunspider --inner 1 --outer 100 --remote oldmac,bigmac --local Copying TipOfTree into /Volumes/Data/pizlo/Internal/BenchmarkTemp/benchdata... Copying VariableStream into /Volumes/Data/pizlo/Internal/BenchmarkTemp/benchdata... All VMs are in place. Packaging benchmarking directory for remote hosts... Sending benchmark payload to oldmac... Running on oldmac... 5200/5200 Generating benchmark report at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_oldmac_20120328_0209_report.txt And raw data at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_oldmac_20120328_0209.json Benchmark report for SunSpider on oldmac (MacPro4,1). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r112326) "VariableStream" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r112326) Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree VariableStream 3d-cube 9.0728+-0.0142 ^ 8.9108+-0.0177 ^ definitely 1.0182x faster 3d-morph 8.9181+-0.0310 ? 8.9607+-0.1729 ? 3d-raytrace 11.3844+-0.0367 ^ 11.0494+-0.0237 ^ definitely 1.0303x faster access-binary-trees 2.1083+-0.0047 ^ 2.0977+-0.0057 ^ definitely 1.0051x faster access-fannkuch 9.2028+-0.0145 ? 9.2028+-0.0040 ? access-nbody 4.6560+-0.0118 ^ 4.5568+-0.0063 ^ definitely 1.0218x faster access-nsieve 4.2503+-0.0103 ? 4.2575+-0.0108 ? bitops-3bit-bits-in-byte 1.6304+-0.0063 ? 1.6315+-0.0072 ? bitops-bits-in-byte 6.5841+-0.0149 ? 6.6019+-0.0141 ? bitops-bitwise-and 4.0202+-0.0069 4.0202+-0.0067 bitops-nsieve-bits 3.8827+-0.0050 ! 3.8969+-0.0062 ! definitely 1.0037x slower controlflow-recursive 2.8471+-0.0082 ? 2.8581+-0.0094 ? crypto-aes 9.6587+-0.0135 ^ 9.5758+-0.0088 ^ definitely 1.0087x faster crypto-md5 3.8811+-0.0046 ^ 3.8669+-0.0070 ^ definitely 1.0037x faster crypto-sha1 3.0996+-0.0056 ! 3.1170+-0.0068 ! definitely 1.0056x slower date-format-tofte 13.7325+-0.0341 ^ 13.5543+-0.0398 ^ definitely 1.0131x faster date-format-xparb 12.8675+-0.0363 ? 12.8921+-0.0450 ? math-cordic 5.0040+-0.0121 ? 5.0187+-0.0141 ? math-partial-sums 10.7343+-0.0173 10.7094+-0.0339 math-spectral-norm 3.2941+-0.0075 ^ 3.2787+-0.0065 ^ definitely 1.0047x faster regexp-dna 11.9019+-0.0179 ? 11.9217+-0.0216 ? string-base64 5.6682+-0.0115 5.6625+-0.0183 string-fasta 8.6603+-0.0114 ! 8.6959+-0.0165 ! definitely 1.0041x slower string-tagcloud 15.6934+-0.0207 15.6509+-0.0222 string-unpack-code 26.7153+-0.0359 ? 26.7949+-0.0713 ? string-validate-input 7.7287+-0.0178 ! 7.8270+-0.0264 ! definitely 1.0127x slower <arithmetic> * 7.9691+-0.0034 ^ 7.9465+-0.0080 ^ definitely 1.0028x faster <geometric> 6.4588+-0.0029 ^ 6.4428+-0.0045 ^ definitely 1.0025x faster <harmonic> 5.1958+-0.0037 ^ 5.1872+-0.0040 ^ definitely 1.0017x faster Sending benchmark payload to bigmac... Running on bigmac... 5200/5200 Generating benchmark report at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_bigmac_20120328_0215_report.txt And raw data at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_bigmac_20120328_0215.json Benchmark report for SunSpider on bigmac (MacPro5,1). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r112326) "VariableStream" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r112326) Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree VariableStream 3d-cube 7.3357+-0.0072 ^ 7.2197+-0.0081 ^ definitely 1.0161x faster 3d-morph 7.3669+-0.0173 7.3593+-0.0180 3d-raytrace 9.3486+-0.0254 ^ 9.1235+-0.0109 ^ definitely 1.0247x faster access-binary-trees 1.7455+-0.0069 1.7412+-0.0064 access-fannkuch 7.6001+-0.0031 ! 7.6238+-0.0099 ! definitely 1.0031x slower access-nbody 3.8549+-0.0053 ^ 3.7814+-0.0027 ^ definitely 1.0194x faster access-nsieve 3.5398+-0.0073 ? 3.5512+-0.0075 ? bitops-3bit-bits-in-byte 1.3473+-0.0099 ? 1.3508+-0.0049 ? bitops-bits-in-byte 5.4733+-0.0099 5.4666+-0.0108 bitops-bitwise-and 3.3319+-0.0071 ? 3.3414+-0.0091 ? bitops-nsieve-bits 3.2265+-0.0085 ? 3.2360+-0.0081 ? controlflow-recursive 2.3347+-0.0077 ? 2.3389+-0.0081 ? crypto-aes 7.8193+-0.0081 ^ 7.7482+-0.0096 ^ definitely 1.0092x faster crypto-md5 3.2321+-0.0055 3.2226+-0.0062 crypto-sha1 2.5992+-0.0080 ! 2.6214+-0.0087 ! definitely 1.0086x slower date-format-tofte 11.0332+-0.0229 ^ 10.8673+-0.0198 ^ definitely 1.0153x faster date-format-xparb 10.3785+-0.0556 10.3312+-0.0383 math-cordic 4.1416+-0.0052 ? 4.1527+-0.0081 ? math-partial-sums 8.9177+-0.0103 ^ 8.8960+-0.0055 ^ definitely 1.0024x faster math-spectral-norm 2.7571+-0.0087 2.7448+-0.0084 regexp-dna 9.7353+-0.0121 ! 9.7758+-0.0154 ! definitely 1.0042x slower string-base64 4.5591+-0.0102 4.5506+-0.0145 string-fasta 7.1483+-0.0111 ? 7.1703+-0.0136 ? string-tagcloud 12.8586+-0.0179 ? 12.8755+-0.0137 ? string-unpack-code 21.5617+-0.0292 ? 21.5697+-0.0234 ? string-validate-input 6.1975+-0.0159 ! 6.2321+-0.0129 ! definitely 1.0056x slower <arithmetic> * 6.5171+-0.0030 ^ 6.4958+-0.0025 ^ definitely 1.0033x faster <geometric> 5.3081+-0.0025 ^ 5.2954+-0.0021 ^ definitely 1.0024x faster <harmonic> 4.2882+-0.0038 4.2842+-0.0029 might be 1.0009x faster Running locally... 5200/5200 Generating benchmark report at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_wartooth_20120328_0217_report.txt And raw data at /Volumes/Data/pizlo/Internal/TipOfTree_VariableStream_SunSpider_wartooth_20120328_0217.json Benchmark report for SunSpider on wartooth (MacBookPro8,2). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r112326) "VariableStream" at /Volumes/Data/pizlo/tertiary/OpenSource/WebKitBuild/Release/jsc (r112326) Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree VariableStream 3d-cube 6.7561+-0.1459 ^ 6.4877+-0.0579 ^ definitely 1.0414x faster 3d-morph 6.9644+-0.2915 ? 7.0522+-0.3207 ? might be 1.0126x slower 3d-raytrace 8.6955+-0.0818 8.5678+-0.1092 might be 1.0149x faster access-binary-trees 1.7009+-0.0353 1.6882+-0.0357 access-fannkuch 6.4264+-0.1183 ? 6.6163+-0.2240 ? might be 1.0295x slower access-nbody 3.2247+-0.0731 3.1766+-0.0891 might be 1.0151x faster access-nsieve 3.0165+-0.0859 2.9559+-0.0461 might be 1.0205x faster bitops-3bit-bits-in-byte 1.4353+-0.0391 1.3867+-0.0205 might be 1.0350x faster bitops-bits-in-byte 2.3393+-0.0468 ? 2.3406+-0.0446 ? bitops-bitwise-and 3.7230+-0.1279 3.6903+-0.1132 bitops-nsieve-bits 3.3064+-0.1035 3.2306+-0.0606 might be 1.0235x faster controlflow-recursive 2.0716+-0.0761 2.0475+-0.0671 might be 1.0118x faster crypto-aes 7.6965+-0.1193 7.5897+-0.1252 might be 1.0141x faster crypto-md5 2.9156+-0.0714 2.8466+-0.0213 might be 1.0242x faster crypto-sha1 2.4435+-0.0637 2.4361+-0.0451 date-format-tofte 10.2863+-0.0519 10.2762+-0.0532 date-format-xparb 9.5949+-0.0712 ^ 9.4645+-0.0514 ^ definitely 1.0138x faster math-cordic 3.7434+-0.1035 ? 3.7831+-0.1230 ? might be 1.0106x slower math-partial-sums 6.4855+-0.1586 6.4043+-0.0624 might be 1.0127x faster math-spectral-norm 2.5343+-0.0608 2.5067+-0.0745 might be 1.0110x faster regexp-dna 8.4074+-0.0721 ? 8.5258+-0.1156 ? might be 1.0141x slower string-base64 4.6353+-0.0910 4.5789+-0.0490 might be 1.0123x faster string-fasta 6.4247+-0.0936 6.4110+-0.0406 string-tagcloud 11.2349+-0.0512 ! 11.4188+-0.0542 ! definitely 1.0164x slower string-unpack-code 19.4910+-0.0821 19.4458+-0.0712 string-validate-input 5.9134+-0.0436 ? 6.0783+-0.1735 ? might be 1.0279x slower <arithmetic> * 5.8256+-0.0212 5.8079+-0.0232 might be 1.0030x faster <geometric> 4.7335+-0.0155 4.7062+-0.0166 might be 1.0058x faster <harmonic> 3.8626+-0.0135 ^ 3.8303+-0.0134 ^ definitely 1.0084x faster
Filip Pizlo
Comment 10 2012-03-28 02:38:28 PDT
Created attachment 134245 [details] almost done It works, but still needs the 32-bit version of some of the code.
WebKit Review Bot
Comment 11 2012-03-28 02:41:21 PDT
Attachment 134245 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/CMakeLists.txt', u'S..." exit_code: 1 Source/JavaScriptCore/dfg/DFGVariableEvent.h:235: Place brace on its own line for function definitions. [whitespace/braces] [4] Source/JavaScriptCore/dfg/DFGMinifiedNode.h:58: The parameter name "nodeIndex" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGMinifiedNode.h:58: The parameter name "node" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGMinifiedNode.h:65: Place brace on its own line for function definitions. [whitespace/braces] [4] Source/JavaScriptCore/dfg/DFGVariableEventStream.h:52: The parameter name "codeBlock" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGVariableEventStream.h:52: The parameter name "codeOrigin" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGVariableEventStream.h:52: The parameter name "graph" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGVariableEventStream.h:53: The parameter name "operands" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGVariableEventStream.h:56: The parameter name "event" adds no information, so it should be removed. [readability/parameter_name] [5] Source/JavaScriptCore/dfg/DFGVariableEvent.cpp:31: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 10 in 38 files If any of these errors are false positives, please file a bug against check-webkit-style.
Early Warning System Bot
Comment 12 2012-03-28 02:45:33 PDT
Early Warning System Bot
Comment 13 2012-03-28 02:47:44 PDT
Comment on attachment 134245 [details] almost done Attachment 134245 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/12146948
Filip Pizlo
Comment 14 2012-03-28 03:36:48 PDT
Created attachment 134253 [details] the patch Looks like it's done.
Early Warning System Bot
Comment 15 2012-03-28 03:43:31 PDT
Comment on attachment 134253 [details] the patch Attachment 134253 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/12155053
Early Warning System Bot
Comment 16 2012-03-28 03:43:53 PDT
Filip Pizlo
Comment 17 2012-03-28 03:47:36 PDT
Created attachment 134256 [details] the patch Fixing Qt build
Gavin Barraclough
Comment 18 2012-05-28 01:34:32 PDT
Comment on attachment 134256 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=134256&action=review It seems that it might be cleaner at some point to refactor GenerationInfo, so that rather than being a class representing a single value & having a Vector of them, to instead have GenerationInfo be a container holding info for N variables, and make all methods be passed the virtual register to be operated on. Then the generation info could hold a reference to m_stream as a member, which would better encapsulate the implementation. Apologies again on the delay on reviewing this, r+, please re-test performance before landing. > Source/JavaScriptCore/bytecode/Operands.h:199 > + // Use const-cast because: :-( > Source/JavaScriptCore/dfg/DFGMinifiedNode.h:42 > + switch (type) { Given that this happens around the node generation loop, I wonder if it might be worth trying to make this a faster test than a switch?
Filip Pizlo
Comment 19 2012-07-02 15:23:07 PDT
Created attachment 150482 [details] start of rebasing It probably doesn't even compile yet.
Filip Pizlo
Comment 20 2012-07-02 18:27:01 PDT
(In reply to comment #18) > (From update of attachment 134256 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=134256&action=review > > It seems that it might be cleaner at some point to refactor GenerationInfo, so that rather than being a class representing a single value & having a Vector of them, to instead have GenerationInfo be a container holding info for N variables, and make all methods be passed the virtual register to be operated on. Then the generation info could hold a reference to m_stream as a member, which would better encapsulate the implementation. > > Apologies again on the delay on reviewing this, r+, please re-test performance before landing. The performance still looks good!
Filip Pizlo
Comment 21 2012-07-02 18:28:03 PDT
Note You need to log in before you can comment on or make changes to this bug.