WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
86553
JSGlobalData ScratchBuffers Are Not Visited During Garbage Collection
https://bugs.webkit.org/show_bug.cgi?id=86553
Summary
JSGlobalData ScratchBuffers Are Not Visited During Garbage Collection
Michael Saboff
Reported
2012-05-15 17:44:33 PDT
This is a problem in the DFG and its use of JSGlobalData scratch buffers. In some cases, the array of EncodedJSValues used as arguments for string concat and new Array with initializers may never be spilled to the register file and thus are never visited during garbage collection. The fix is to visit the contents of scratch buffers when we know they are being used the operationStrCat() and operationNewArray(). Although this is easily manifested on debug builds when COLLECT_ON_EVERY_ALLOCATION is turned on, it is likely random in release builds. It is probably more likely to happen during low memory situations. <
rdar://problem/11451334
>
Attachments
Strawman Patch
(33.49 KB, patch)
2012-05-15 18:49 PDT
,
Michael Saboff
fpizlo
: review+
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Updated Patch
(5.93 KB, patch)
2012-05-16 22:49 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Updated Patch with call to clearScratchBuffers()
(6.55 KB, patch)
2012-05-17 09:59 PDT
,
Michael Saboff
ggaren
: review-
Details
Formatted Diff
Diff
Updated patch wit JIT code keeping track of active scratch buffers
(36.33 KB, patch)
2012-05-18 21:57 PDT
,
Michael Saboff
barraclough
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Michael Saboff
Comment 1
2012-05-15 18:49:18 PDT
Created
attachment 142123
[details]
Strawman Patch
Build Bot
Comment 2
2012-05-15 19:12:25 PDT
Comment on
attachment 142123
[details]
Strawman Patch
Attachment 142123
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/12715118
Early Warning System Bot
Comment 3
2012-05-15 19:16:54 PDT
Comment on
attachment 142123
[details]
Strawman Patch
Attachment 142123
[details]
did not pass qt-wk2-ews (qt): Output:
http://queues.webkit.org/results/12718016
Early Warning System Bot
Comment 4
2012-05-15 19:22:53 PDT
Comment on
attachment 142123
[details]
Strawman Patch
Attachment 142123
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/12701679
Filip Pizlo
Comment 5
2012-05-15 21:16:06 PDT
Comment on
attachment 142123
[details]
Strawman Patch R=me, though I am curious what the performance implications are. In particular, it might be profitable to just scan all scratch buffers during every GC and have VM entry/exit (the dynamic global object thingy) clear them.
Geoffrey Garen
Comment 6
2012-05-16 13:04:26 PDT
> it might be profitable to just scan all scratch buffers during every GC and have VM entry/exit (the dynamic global object thingy) clear them.
I like this approach because it's low risk and future proof against new uses of scratch buffers. Is there any evidence that this would be for performance? For example, what's the largest scratch buffer you've ever observed?
> The arguments to operationStrCat and operationNewArray can be Garbage Collected before they are used
I think this title understates the problem in two ways: (1) OSR exit compiler can also have GC problems; (2) future uses of scratch buffer can also have GC problems.
Michael Saboff
Comment 7
2012-05-16 22:49:11 PDT
Created
attachment 142415
[details]
Updated Patch Following the suggestions from Filip and Geoff.
Michael Saboff
Comment 8
2012-05-17 07:28:23 PDT
Below are performance results for the latest patch. Some notable slowdowns are in v8Real-boyer (10%) and three of the DSP tests (8-25%). My guess is that we are using some large scratch buffers in DSP. I'll investigate their size. If they are large, instead of walking every scratch buffer that may have been active since the last VM entry/exit, we could add code to emit an active count when a scratch buffer is in use and clear the count after it is done being used. This would also eliminate the scratch buffer clearing on VM entry/exit. Benchmark report for SunSpider, V8, V8Real, Kraken, JSBench, JSRegress, and DSP on msaboff-pro (MacPro5,1). VMs tested: "BaseLine" at /Volumes/Data/src/webkit.baseline/WebKitBuild/Release/DumpRenderTree (
r117391
) "VisitScratchBuffers" at /Volumes/Data/src/webkit/WebKitBuild/Release/DumpRenderTree (
r117391
) Collected 24 samples per benchmark/VM, with 4 VM invocations per benchmark. No manual garbage collection invocations were emitted. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. BaseLine VisitScratchBuffers SunSpider: 3d-cube 7.6290+-0.0958 ! 8.1633+-0.3565 ! definitely 1.0700x slower 3d-morph 7.3367+-0.0330 ! 7.4184+-0.0416 ! definitely 1.0111x slower 3d-raytrace 9.9701+-0.2782 9.9671+-0.2734 access-binary-trees 1.8151+-0.0148 ! 1.8436+-0.0117 ! definitely 1.0157x slower access-fannkuch 7.5555+-0.0231 7.5536+-0.0217 access-nbody 3.9557+-0.0414 3.9442+-0.0457 access-nsieve 3.6780+-0.0330 3.6649+-0.0335 bitops-3bit-bits-in-byte 1.4249+-0.0068 1.4239+-0.0088 bitops-bits-in-byte 5.5219+-0.0539 5.5163+-0.0394 bitops-bitwise-and 3.4434+-0.0142 3.4394+-0.0128 bitops-nsieve-bits 3.3294+-0.0075 ? 3.3406+-0.0158 ? controlflow-recursive 2.4646+-0.0142 ? 2.4650+-0.0144 ? crypto-aes 7.8838+-0.0911 7.8696+-0.1144 crypto-md5 3.4821+-0.0455 ? 3.5028+-0.0457 ? crypto-sha1 2.8209+-0.0282 ? 2.8324+-0.0331 ? date-format-tofte 13.4208+-0.8414 13.4053+-0.8448 date-format-xparb 10.9558+-0.5022 ? 11.1908+-0.5194 ? might be 1.0215x slower math-cordic 4.3268+-0.0536 ? 4.3439+-0.0463 ? math-partial-sums 9.1656+-0.0404 9.1190+-0.0239 math-spectral-norm 2.9607+-0.0315 ? 3.0156+-0.0482 ? might be 1.0186x slower regexp-dna 9.8170+-0.0669 ? 9.8840+-0.0616 ? string-base64 4.8861+-0.0822 ? 5.0759+-0.2757 ? might be 1.0388x slower string-fasta 7.8795+-0.2593 7.8227+-0.2496 string-tagcloud 13.3586+-0.2377 13.3419+-0.2498 string-unpack-code 22.6781+-0.6134 ? 23.0348+-0.6210 ? might be 1.0157x slower string-validate-input 8.6317+-0.5664 8.5349+-0.5673 might be 1.0113x faster <arithmetic> * 6.9381+-0.0578 ? 6.9890+-0.0663 ? might be 1.0073x slower <geometric> 5.6227+-0.0282 ? 5.6581+-0.0299 ? might be 1.0063x slower <harmonic> 4.5266+-0.0171 ? 4.5514+-0.0171 ? might be 1.0055x slower BaseLine VisitScratchBuffers V8: crypto 75.7108+-0.4343 75.3083+-0.4152 deltablue 157.0207+-1.2346 ? 159.0475+-1.2714 ? might be 1.0129x slower earley-boyer 93.3553+-0.3135 93.2624+-0.3410 raytrace 55.7297+-1.0792 55.3264+-0.9292 regexp 94.1726+-0.5496 93.4649+-0.3465 richards 143.0130+-2.0701 ? 143.6083+-1.7372 ? splay 118.3060+-12.7935 ? 119.6515+-14.2327 ? might be 1.0114x slower <arithmetic> 105.3297+-1.7095 ? 105.6670+-2.0644 ? might be 1.0032x slower <geometric> * 99.3799+-1.3309 ? 99.4186+-1.6785 ? might be 1.0004x slower <harmonic> 93.5526+-0.9242 93.3542+-1.2568 might be 1.0021x faster BaseLine VisitScratchBuffers V8Real: encrypt 0.42586+-0.00073 ? 0.42648+-0.00043 ? decrypt 7.42316+-0.01392 7.42258+-0.00771 deltablue x2 0.95010+-0.00854 ? 0.95353+-0.00834 ? earley 3.15345+-0.03146 3.14664+-0.03539 boyer 15.67492+-0.05763 ! 17.33746+-0.06626 ! definitely 1.1061x slower raytrace x2 6.89074+-0.05814 6.87976+-0.05811 regexp x2 27.09554+-0.07497 ! 27.39942+-0.07797 ! definitely 1.0112x slower richards x2 0.38468+-0.00376 ? 0.38877+-0.00263 ? might be 1.0106x slower splay x2 0.94327+-0.01582 0.93704+-0.02027 <arithmetic> 7.08615+-0.01235 ! 7.24644+-0.01206 ! definitely 1.0226x slower <geometric> * 2.60012+-0.00677 ! 2.62502+-0.00638 ! definitely 1.0096x slower <harmonic> 1.10599+-0.00420 ? 1.11098+-0.00486 ? might be 1.0045x slower BaseLine VisitScratchBuffers Kraken: ai-astar 814.347+-8.251 ! 830.726+-4.139 ! definitely 1.0201x slower audio-beat-detection 203.984+-1.788 203.743+-1.706 audio-dft 291.639+-0.945 ? 291.780+-1.063 ? audio-fft 120.093+-0.219 119.820+-0.220 audio-oscillator 324.720+-1.567 322.826+-0.415 imaging-darkroom 306.721+-1.923 305.557+-1.111 imaging-desaturate 219.677+-0.267 219.539+-0.195 imaging-gaussian-blur 457.122+-0.206 ? 457.562+-0.420 ? json-parse-financial 66.606+-0.268 ^ 66.051+-0.176 ^ definitely 1.0084x faster json-stringify-tinderbox 84.355+-0.179 ! 85.183+-0.167 ! definitely 1.0098x slower stanford-crypto-aes 88.255+-0.305 ? 89.239+-0.737 ? might be 1.0111x slower stanford-crypto-ccm 94.696+-0.320 ? 95.190+-0.308 ? stanford-crypto-pbkdf2 193.612+-0.549 ? 194.085+-0.735 ? stanford-crypto-sha256-iterative 94.710+-0.246 ? 95.055+-0.240 ? <arithmetic> * 240.038+-0.653 ! 241.168+-0.352 ! definitely 1.0047x slower <geometric> 183.760+-0.226 ? 184.175+-0.210 ? might be 1.0023x slower <harmonic> 146.665+-0.141 ? 146.955+-0.168 ? might be 1.0020x slower BaseLine VisitScratchBuffers JSBench: amazon 17.8750+-0.2266 ? 17.9167+-0.2761 ? facebook 71.3750+-2.5207 71.0833+-2.5607 google 98.1667+-2.3351 ? 100.5417+-3.0627 ? might be 1.0242x slower twitter 52.1667+-0.2384 51.9167+-0.2127 yahoo 22.5000+-0.2157 22.2083+-0.1752 might be 1.0131x faster <arithmetic> * 52.4167+-0.5829 ? 52.7333+-0.6998 ? might be 1.0060x slower <geometric> 42.9592+-0.3285 ? 42.9855+-0.3366 ? might be 1.0006x slower <harmonic> 34.7417+-0.2155 34.6456+-0.2233 might be 1.0028x faster BaseLine VisitScratchBuffers JSRegress: adapt-to-double-divide 72.6218+-0.0794 ? 72.7202+-0.0940 ? aliased-arguments-getbyval 3.6727+-0.1847 ? 3.7001+-0.1750 ? arity-mismatch-inlining 1.2790+-0.0196 1.2759+-0.0161 big-int-mul 29.0281+-0.3496 28.5815+-0.2496 might be 1.0156x faster boolean-test 3.9447+-0.0186 ? 3.9471+-0.0265 ? cast-int-to-double 14.2695+-0.1376 14.1682+-0.0399 cfg-simplify 6.5976+-0.0133 6.5869+-0.0066 cmpeq-obj-to-obj-other 13.9487+-0.4826 ? 14.0535+-0.3919 ? constant-test 27.1813+-0.0656 27.1764+-0.0771 direct-arguments-getbyval 0.7041+-0.0122 0.7021+-0.0083 double-pollution-getbyval 8.7708+-0.0283 8.7586+-0.0112 double-pollution-putbyoffset 4.7199+-0.0368 ? 4.8365+-0.2883 ? might be 1.0247x slower external-arguments-getbyval 4.2635+-0.1943 ? 4.3203+-0.1929 ? might be 1.0133x slower external-arguments-putbyval 6.9140+-0.3132 ? 6.9899+-0.3268 ? might be 1.0110x slower Float32Array-matrix-mult 11.3018+-0.4727 ? 11.6333+-0.5641 ? might be 1.0293x slower fold-double-to-int 33.6938+-0.3232 ? 34.1681+-0.5579 ? might be 1.0141x slower function-test 4.6099+-0.0190 4.6069+-0.0332 inline-arguments-access 3.6253+-0.0201 3.6081+-0.0173 inline-arguments-local-escape 40.7060+-1.8914 40.3615+-1.5219 int-overflow-local 102.5900+-0.0887 ? 102.6643+-0.1530 ? Int16Array-bubble-sort 70.2287+-1.2847 ^ 68.1696+-0.6331 ^ definitely 1.0302x faster Int16Array-load-int-mul 15.8263+-0.0944 15.7996+-0.0961 Int8Array-load 4.7756+-0.0730 4.7599+-0.0352 integer-divide 14.8499+-0.0278 ? 14.9282+-0.0693 ? method-on-number 189.8829+-2.1216 187.3501+-0.8256 might be 1.0135x faster number-test 3.9219+-0.0108 ? 3.9247+-0.0151 ? object-test 4.2437+-0.0220 4.2265+-0.0168 poly-stricteq 91.5346+-0.4327 91.2106+-0.5289 rare-osr-exit-on-local 150.6409+-0.1017 ? 150.9463+-0.4745 ? simple-activation-demo 55.2423+-0.3153 55.0908+-0.1199 slow-convergence 89.9716+-0.3284 ? 90.2389+-0.3525 ? string-hash 14.3514+-0.1324 ? 14.5032+-0.2095 ? might be 1.0106x slower string-test 3.7486+-0.0161 ? 3.8054+-0.0575 ? might be 1.0151x slower tear-off-arguments 3.7719+-0.1858 3.7565+-0.1917 to-int32-boolean 29.2531+-0.3133 28.9949+-0.1031 undefined-test 4.2126+-0.0236 4.1998+-0.0140 <arithmetic> 31.6916+-0.0969 31.5768+-0.0868 might be 1.0036x faster <geometric> * 13.0125+-0.0894 ? 13.0206+-0.0912 ? might be 1.0006x slower <harmonic> 5.7698+-0.0564 ? 5.7752+-0.0515 ? might be 1.0009x slower BaseLine VisitScratchBuffers DSP: filtrr-posterize-tint 48.9578+-0.3852 ! 59.7338+-1.8883 ! definitely 1.2201x slower filtrr-tint-contrast-sat-bright 81.7491+-0.6064 ! 88.5864+-1.7853 ! definitely 1.0836x slower filtrr-tint-sat-adj-contr-mult 93.9438+-0.7122 ? 94.4437+-0.8106 ? filtrr-blur-overlay-sat-contr 231.3795+-0.7343 ? 231.3800+-0.6014 ? filtrr-sat-blur-mult-sharpen-contr 289.7408+-1.9787 ? 291.2437+-1.6147 ? filtrr-sepia-bias 33.5366+-0.4042 ! 41.7406+-0.7265 ! definitely 1.2446x slower route9-vp8 x5 1582.9230+-8.1424 ? 1585.9894+-8.1551 ? <arithmetic> 790.3566+-3.6224 ? 794.2795+-3.7229 ? might be 1.0050x slower <geometric> * 345.5647+-0.9056 ! 362.1289+-2.0187 ! definitely 1.0479x slower <harmonic> 130.8196+-0.4819 ! 149.5164+-2.0252 ! definitely 1.1429x slower BaseLine VisitScratchBuffers All benchmarks: <arithmetic> 128.0916+-0.3731 ? 128.6433+-0.3950 ? might be 1.0043x slower <geometric> 20.0715+-0.0572 ! 20.2264+-0.0621 ! definitely 1.0077x slower <harmonic> 4.5122+-0.0145 ? 4.5312+-0.0178 ? might be 1.0042x slower BaseLine VisitScratchBuffers Geomean of preferred means: <scaled-result> 37.3460+-0.0895 ! 37.7481+-0.1305 ! definitely 1.0108x slower
Geoffrey Garen
Comment 9
2012-05-17 09:14:46 PDT
> My guess is that we are using some large scratch buffers in DSP. I'll investigate their size.
I'd be interested to know the average and max scratch buffer sizes we encounter. That said, it looks like the patch you posted doesn't include any calls to clearScratchBuffers(). Your regression may be due to prolonging object lifetimes indefinitely.
Michael Saboff
Comment 10
2012-05-17 09:59:00 PDT
Created
attachment 142493
[details]
Updated Patch with call to clearScratchBuffers() (In reply to
comment #9
)
> > My guess is that we are using some large scratch buffers in DSP. I'll investigate their size. > > I'd be interested to know the average and max scratch buffer sizes we encounter. > > That said, it looks like the patch you posted doesn't include any calls to clearScratchBuffers(). Your regression may be due to prolonging object lifetimes indefinitely.
Added call to clearScratchBuffer(). Reran perf tests. Things are improved, but there are still regressions in V8Real-boyer (11%), V8Real-regexp (2%) and DSP-filtrr-posterize-tint (15%). I'll instrument to find the avg and max scratch buffer sizes. Benchmark report for SunSpider, V8, V8Real, Kraken, JSBench, JSRegress, and DSP on msaboff-pro (MacPro5,1). VMs tested: "BaseLine" at /Volumes/Data/src/webkit.baseline/WebKitBuild/Release/DumpRenderTree (
r117391
) "VisitScratchBuffers" at /Volumes/Data/src/webkit/WebKitBuild/Release/DumpRenderTree (
r117391
) Collected 24 samples per benchmark/VM, with 4 VM invocations per benchmark. No manual garbage collection invocations were emitted. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. BaseLine VisitScratchBuffers SunSpider: 3d-cube 7.6382+-0.1017 ! 8.1743+-0.3596 ! definitely 1.0702x slower 3d-morph 7.3621+-0.0534 ? 7.3885+-0.0471 ? 3d-raytrace 9.9905+-0.2774 ? 10.0949+-0.3039 ? might be 1.0104x slower access-binary-trees 1.8445+-0.0207 1.8380+-0.0194 access-fannkuch 7.5688+-0.0393 7.5483+-0.0196 access-nbody 3.9595+-0.0532 3.9594+-0.0553 access-nsieve 3.6748+-0.0372 ? 3.6748+-0.0357 ? bitops-3bit-bits-in-byte 1.4279+-0.0109 1.4236+-0.0070 bitops-bits-in-byte 5.5123+-0.0472 ? 5.5168+-0.0481 ? bitops-bitwise-and 3.4382+-0.0144 3.4333+-0.0085 bitops-nsieve-bits 3.3341+-0.0136 ? 3.3558+-0.0215 ? controlflow-recursive 2.4750+-0.0186 2.4709+-0.0192 crypto-aes 7.9060+-0.1131 7.8586+-0.0957 crypto-md5 3.4962+-0.0526 ? 3.5057+-0.0476 ? crypto-sha1 2.8158+-0.0306 ? 2.8340+-0.0284 ? date-format-tofte 13.7028+-0.7998 13.6085+-0.8305 date-format-xparb 11.7095+-0.6318 11.4952+-0.5794 might be 1.0186x faster math-cordic 4.3427+-0.0518 ? 4.3489+-0.0604 ? math-partial-sums 9.2762+-0.0821 ^ 9.1460+-0.0399 ^ definitely 1.0142x faster math-spectral-norm 2.9644+-0.0323 2.9639+-0.0290 regexp-dna 9.8113+-0.0776 ? 9.9496+-0.0802 ? might be 1.0141x slower string-base64 5.7630+-0.4708 5.2252+-0.3278 might be 1.1029x faster string-fasta 7.8887+-0.2574 7.8652+-0.2491 string-tagcloud 13.4021+-0.2315 ? 13.5341+-0.2386 ? string-unpack-code 24.6493+-1.2497 ^ 22.3556+-0.5754 ^ definitely 1.1026x faster string-validate-input 8.6108+-0.5751 ? 8.6282+-0.5595 ? <arithmetic> * 7.0986+-0.1110 7.0076+-0.0584 might be 1.0130x faster <geometric> 5.7031+-0.0543 5.6756+-0.0283 might be 1.0048x faster <harmonic> 4.5681+-0.0276 4.5581+-0.0184 might be 1.0022x faster BaseLine VisitScratchBuffers V8: crypto 75.4699+-0.3748 ? 76.1831+-0.4941 ? deltablue 156.7851+-1.0209 ? 159.0510+-1.5313 ? might be 1.0145x slower earley-boyer 93.4869+-0.3133 93.0661+-0.3407 raytrace 55.4889+-0.9941 55.3468+-0.9577 regexp 94.7245+-0.4325 94.6743+-0.4086 richards 144.5699+-3.7234 ? 145.2562+-1.6123 ? splay 121.8940+-13.8473 111.7071+-10.5208 might be 1.0912x faster <arithmetic> 106.0599+-1.8881 105.0407+-1.4594 might be 1.0097x faster <geometric> * 99.8749+-1.4358 99.0898+-1.2208 might be 1.0079x faster <harmonic> 93.8181+-0.9615 93.2795+-0.9191 might be 1.0058x faster BaseLine VisitScratchBuffers V8Real: encrypt 0.42686+-0.00061 0.42662+-0.00107 decrypt 7.44261+-0.02793 ? 7.44473+-0.01255 ? deltablue x2 0.95299+-0.00888 0.94254+-0.00671 might be 1.0111x faster earley 3.17639+-0.03090 3.15148+-0.03737 boyer 15.62556+-0.07386 ! 17.39136+-0.06645 ! definitely 1.1130x slower raytrace x2 6.93767+-0.08055 ? 6.93848+-0.06828 ? regexp x2 26.97673+-0.10808 ! 27.53988+-0.15392 ! definitely 1.0209x slower richards x2 0.38134+-0.00274 ? 0.38363+-0.00302 ? splay x2 0.93975+-0.01782 ? 0.94127+-0.01919 ? <arithmetic> 7.07488+-0.01734 ! 7.27899+-0.02245 ! definitely 1.0288x slower <geometric> * 2.59907+-0.00637 ! 2.62400+-0.00827 ! definitely 1.0096x slower <harmonic> 1.10272+-0.00378 ? 1.10416+-0.00637 ? might be 1.0013x slower BaseLine VisitScratchBuffers Kraken: ai-astar 825.529+-6.831 ? 827.945+-5.930 ? audio-beat-detection 202.458+-0.854 ? 203.063+-1.056 ? audio-dft 291.652+-0.912 ? 293.150+-0.658 ? audio-fft 119.908+-0.227 ? 120.103+-0.268 ? audio-oscillator 322.027+-0.433 ! 323.395+-0.793 ! definitely 1.0042x slower imaging-darkroom 305.505+-1.329 ? 305.749+-1.708 ? imaging-desaturate 219.772+-0.200 ? 220.013+-0.260 ? imaging-gaussian-blur 457.636+-0.344 457.345+-0.286 json-parse-financial 66.807+-0.260 ^ 66.310+-0.150 ^ definitely 1.0075x faster json-stringify-tinderbox 84.468+-0.233 ! 85.364+-0.248 ! definitely 1.0106x slower stanford-crypto-aes 88.754+-0.378 ^ 88.076+-0.261 ^ definitely 1.0077x faster stanford-crypto-ccm 94.881+-0.394 ? 95.091+-0.473 ? stanford-crypto-pbkdf2 193.877+-0.575 193.384+-0.552 stanford-crypto-sha256-iterative 95.064+-0.291 94.879+-0.218 <arithmetic> * 240.596+-0.473 ? 240.991+-0.466 ? might be 1.0016x slower <geometric> 183.907+-0.171 ? 184.056+-0.181 ? might be 1.0008x slower <harmonic> 146.859+-0.153 146.835+-0.118 might be 1.0002x faster BaseLine VisitScratchBuffers JSBench: amazon 18.0833+-0.2761 18.0417+-0.2635 facebook 72.7917+-2.5760 70.7083+-2.4794 might be 1.0295x faster google 97.3333+-2.3778 ? 99.0833+-2.2167 ? might be 1.0180x slower twitter 51.9167+-0.2127 51.7917+-0.2149 yahoo 22.4583+-0.2149 22.3333+-0.2033 <arithmetic> * 52.5167+-0.5288 52.3917+-0.6895 might be 1.0024x faster <geometric> 43.0929+-0.2808 42.9160+-0.3924 might be 1.0041x faster <harmonic> 34.8956+-0.2225 34.7447+-0.2568 might be 1.0043x faster BaseLine VisitScratchBuffers JSRegress: adapt-to-double-divide 72.7993+-0.1640 72.7789+-0.1399 aliased-arguments-getbyval 3.6778+-0.1797 ? 3.6864+-0.1811 ? arity-mismatch-inlining 1.2527+-0.0226 1.2481+-0.0207 big-int-mul 29.1196+-0.3597 ? 29.4550+-0.4643 ? might be 1.0115x slower boolean-test 3.9331+-0.0094 3.9291+-0.0117 cast-int-to-double 14.1323+-0.0103 ? 14.1396+-0.0198 ? cfg-simplify 6.6048+-0.0230 6.5899+-0.0067 cmpeq-obj-to-obj-other 13.9148+-0.4460 13.8551+-0.4528 constant-test 27.1749+-0.0926 27.1329+-0.0575 direct-arguments-getbyval 0.6942+-0.0080 ? 0.6994+-0.0094 ? double-pollution-getbyval 8.7667+-0.0126 ? 8.8327+-0.0552 ? double-pollution-putbyoffset 4.7326+-0.0408 4.6795+-0.0406 might be 1.0113x faster external-arguments-getbyval 4.3451+-0.1865 4.3442+-0.1900 external-arguments-putbyval 7.0146+-0.3043 6.9897+-0.3226 Float32Array-matrix-mult 11.3433+-0.4849 11.3403+-0.5515 fold-double-to-int 33.6169+-0.0770 ? 34.0122+-0.3748 ? might be 1.0118x slower function-test 4.5889+-0.0218 4.5652+-0.0292 inline-arguments-access 3.6047+-0.0151 ? 3.6108+-0.0218 ? inline-arguments-local-escape 39.7807+-1.5892 39.5610+-1.2593 int-overflow-local 102.6140+-0.1205 ? 102.6674+-0.1153 ? Int16Array-bubble-sort 70.4634+-1.3327 69.1422+-1.1507 might be 1.0191x faster Int16Array-load-int-mul 15.9369+-0.1157 15.8395+-0.0913 Int8Array-load 4.7754+-0.0492 4.7634+-0.0626 integer-divide 14.8176+-0.0255 ? 14.8602+-0.0360 ? method-on-number 190.9984+-2.0137 ? 192.6862+-2.4090 ? number-test 3.9347+-0.0152 3.9147+-0.0152 object-test 4.2352+-0.0257 4.2327+-0.0321 poly-stricteq 91.2725+-0.3988 91.1868+-0.3670 rare-osr-exit-on-local 151.3971+-0.2817 ^ 150.8141+-0.1539 ^ definitely 1.0039x faster simple-activation-demo 55.1085+-0.0983 55.0981+-0.1281 slow-convergence 90.1628+-0.3554 90.0917+-0.2935 string-hash 14.6232+-0.2043 14.4087+-0.1655 might be 1.0149x faster string-test 3.7646+-0.0249 3.7472+-0.0274 tear-off-arguments 3.8142+-0.1890 3.7598+-0.1781 might be 1.0145x faster to-int32-boolean 29.2344+-0.2806 29.0884+-0.0999 undefined-test 4.2088+-0.0204 ? 4.2117+-0.0232 ? <arithmetic> 31.7350+-0.0858 31.7212+-0.1114 might be 1.0004x faster <geometric> * 13.0177+-0.0826 12.9954+-0.0859 might be 1.0017x faster <harmonic> 5.7446+-0.0439 5.7415+-0.0455 might be 1.0005x faster BaseLine VisitScratchBuffers DSP: filtrr-posterize-tint 49.0858+-0.4281 ! 56.3526+-1.3970 ! definitely 1.1480x slower filtrr-tint-contrast-sat-bright 81.1976+-0.5604 ? 81.4146+-0.5446 ? filtrr-tint-sat-adj-contr-mult 93.5226+-0.6975 ? 94.5968+-0.8942 ? might be 1.0115x slower filtrr-blur-overlay-sat-contr 231.9381+-0.7855 ^ 230.1953+-0.5450 ^ definitely 1.0076x faster filtrr-sat-blur-mult-sharpen-contr 289.8628+-1.8513 289.6081+-1.8766 filtrr-sepia-bias 33.5683+-0.3988 33.5252+-0.4178 route9-vp8 x5 1599.6274+-7.4486 ? 1605.2971+-9.5441 ? <arithmetic> 797.9375+-3.3477 ? 801.1071+-4.2938 ? might be 1.0040x slower <geometric> * 347.0646+-0.8033 ! 352.1084+-1.3685 ! definitely 1.0145x slower <harmonic> 130.8101+-0.4321 ! 135.1173+-0.9201 ! definitely 1.0329x slower BaseLine VisitScratchBuffers All benchmarks: <arithmetic> 128.9976+-0.3469 ? 129.2863+-0.4662 ? might be 1.0022x slower <geometric> 20.1577+-0.0748 ? 20.1655+-0.0590 ? might be 1.0004x slower <harmonic> 4.5103+-0.0171 ? 4.5108+-0.0214 ? might be 1.0001x slower BaseLine VisitScratchBuffers Geomean of preferred means: <scaled-result> 37.5389+-0.1292 ? 37.5443+-0.1006 ? might be 1.0001x slower
Michael Saboff
Comment 11
2012-05-17 13:52:28 PDT
(In reply to
comment #9
)
> > My guess is that we are using some large scratch buffers in DSP. I'll investigate their size. > > I'd be interested to know the average and max scratch buffer sizes we encounter. > > That said, it looks like the patch you posted doesn't include any calls to clearScratchBuffers(). Your regression may be due to prolonging object lifetimes indefinitely.
Running the perf tests we encounter a max size of 120 which may be the test framework. For DSP, the only test that seems to use scratch buffers is VP8 which uses 75 sized 16 and 12 size 24 (average 17). I've spent some time trying to figure out why this change impacts DSP-filtrr-posterize-tint so much, but haven't reached any conclusions. Sizes running V8Real are 8, 16, 24, 40 and 88 and the average is 32. Boyer uses 12 instances of 16 byte buffers. Remember these sizes are bytes and that we often allocate twice the number of bytes requested.
Geoffrey Garen
Comment 12
2012-05-18 16:29:46 PDT
Comment on
attachment 142493
[details]
Updated Patch with call to clearScratchBuffers() r- since this is a regression.
Michael Saboff
Comment 13
2012-05-18 21:57:34 PDT
Created
attachment 142849
[details]
Updated patch wit JIT code keeping track of active scratch buffers Updated the patch to eliminate the large regression. The JIT'ed code now writes out the active scratch size before calling out to C++ code and then clears after the scratch buffer is done being used. Below are the updated performance results. The only tests that shows any kind of slowdown is V8Real-richards at 1.8% and JSRegress string-test at 1.2% and undefined-test at 1%. Benchmark report for SunSpider, V8, V8Real, Kraken, JSBench, JSRegress, and DSP on msaboff-pro (MacPro5,1). VMs tested: "BaseLine" at /Volumes/Data/src/webkit.baseline/WebKitBuild/Release/DumpRenderTree (
r117391
) "VisitScratchBuffers" at /Volumes/Data/src/webkit/WebKitBuild/Release/DumpRenderTree (
r117391
) Collected 24 samples per benchmark/VM, with 4 VM invocations per benchmark. No manual garbage collection invocations were emitted. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. BaseLine VisitScratchBuffers SunSpider: 3d-cube 7.6129+-0.0933 ? 7.6663+-0.0854 ? 3d-morph 7.4009+-0.0485 7.3647+-0.0634 3d-raytrace 9.9724+-0.2799 ? 10.0044+-0.2792 ? access-binary-trees 1.8306+-0.0218 1.8300+-0.0139 access-fannkuch 7.5732+-0.0296 ? 7.5741+-0.0188 ? access-nbody 3.9349+-0.0397 ? 3.9701+-0.0474 ? access-nsieve 3.6768+-0.0277 3.6712+-0.0311 bitops-3bit-bits-in-byte 1.4424+-0.0165 1.4295+-0.0111 bitops-bits-in-byte 5.5075+-0.0504 ? 5.5182+-0.0306 ? bitops-bitwise-and 3.4327+-0.0082 ? 3.4491+-0.0210 ? bitops-nsieve-bits 3.3273+-0.0090 ? 3.3510+-0.0241 ? controlflow-recursive 2.4683+-0.0155 2.4647+-0.0142 crypto-aes 7.9243+-0.1127 7.9035+-0.0971 crypto-md5 3.4875+-0.0452 ? 3.4883+-0.0473 ? crypto-sha1 2.8354+-0.0325 2.8082+-0.0302 date-format-tofte 13.1882+-0.8200 ? 13.3276+-0.8214 ? might be 1.0106x slower date-format-xparb 10.9723+-0.5200 ? 11.2165+-0.5118 ? might be 1.0223x slower math-cordic 4.3376+-0.0520 4.3276+-0.0502 math-partial-sums 9.1740+-0.0313 ? 9.2795+-0.0910 ? might be 1.0115x slower math-spectral-norm 2.9616+-0.0325 ? 2.9657+-0.0319 ? regexp-dna 9.8527+-0.0953 ? 9.8968+-0.0972 ? string-base64 4.8435+-0.0392 ? 4.9458+-0.2122 ? might be 1.0211x slower string-fasta 7.8418+-0.2518 7.8418+-0.2455 string-tagcloud 13.3686+-0.2192 13.2846+-0.2230 string-unpack-code 22.7043+-0.5997 ? 22.7916+-0.6311 ? string-validate-input 8.5788+-0.5542 ? 8.6677+-0.5762 ? might be 1.0104x slower <arithmetic> * 6.9327+-0.0485 ? 6.9630+-0.0495 ? might be 1.0044x slower <geometric> 5.6237+-0.0257 ? 5.6402+-0.0250 ? might be 1.0029x slower <harmonic> 4.5358+-0.0197 ? 4.5390+-0.0182 ? might be 1.0007x slower BaseLine VisitScratchBuffers V8: crypto 75.6202+-0.4168 ? 75.6938+-0.4005 ? deltablue 157.6811+-1.3157 ? 157.9404+-1.6757 ? earley-boyer 93.4031+-0.2822 ? 93.4446+-0.2714 ? raytrace 55.7306+-0.9583 ? 55.9167+-1.1695 ? regexp 93.8529+-0.3721 ^ 93.0471+-0.2650 ^ definitely 1.0087x faster richards 142.9939+-1.9374 ? 143.9461+-2.3855 ? splay 115.5968+-11.9257 110.2246+-12.3616 might be 1.0487x faster <arithmetic> 104.9827+-1.6401 104.3162+-1.6959 might be 1.0064x faster <geometric> * 99.0779+-1.3428 98.4330+-1.3742 might be 1.0066x faster <harmonic> 93.3028+-0.9826 92.7727+-1.0176 might be 1.0057x faster BaseLine VisitScratchBuffers V8Real: encrypt 0.42564+-0.00038 ! 0.42761+-0.00063 ! definitely 1.0046x slower decrypt 7.42067+-0.01653 ? 7.44325+-0.01692 ? deltablue x2 0.94420+-0.00600 0.94255+-0.00499 earley 3.16078+-0.03656 3.13649+-0.03207 boyer 15.72871+-0.05014 ? 15.77503+-0.04728 ? raytrace x2 6.94780+-0.05752 6.89380+-0.06171 regexp x2 26.95430+-0.04281 ? 27.00633+-0.08617 ? richards x2 0.37960+-0.00234 ! 0.38654+-0.00326 ! definitely 1.0183x slower splay x2 0.95467+-0.01488 0.93567+-0.01685 might be 1.0203x faster <arithmetic> 7.07835+-0.01316 ? 7.07944+-0.01412 ? might be 1.0002x slower <geometric> * 2.59949+-0.00684 2.59637+-0.00808 might be 1.0012x faster <harmonic> 1.10135+-0.00424 ? 1.10605+-0.00511 ? might be 1.0043x slower BaseLine VisitScratchBuffers Kraken: ai-astar 814.209+-7.905 812.194+-8.275 audio-beat-detection 204.129+-1.316 ? 204.146+-1.351 ? audio-dft 291.572+-0.967 290.128+-2.574 audio-fft 120.070+-0.146 ? 120.132+-0.196 ? audio-oscillator 322.365+-0.300 ! 325.176+-1.396 ! definitely 1.0087x slower imaging-darkroom 304.594+-1.003 304.585+-1.207 imaging-desaturate 219.444+-0.127 ? 219.604+-0.156 ? imaging-gaussian-blur 457.373+-0.472 ? 457.991+-0.488 ? json-parse-financial 66.846+-0.273 ^ 66.418+-0.143 ^ definitely 1.0064x faster json-stringify-tinderbox 84.163+-0.164 ! 84.717+-0.190 ! definitely 1.0066x slower stanford-crypto-aes 89.333+-0.994 ^ 88.113+-0.222 ^ definitely 1.0138x faster stanford-crypto-ccm 94.998+-0.292 94.973+-0.240 stanford-crypto-pbkdf2 193.628+-0.655 ? 194.207+-0.939 ? stanford-crypto-sha256-iterative 94.502+-0.175 ? 94.640+-0.163 ? <arithmetic> * 239.802+-0.608 239.787+-0.577 might be 1.0001x faster <geometric> 183.758+-0.217 183.686+-0.169 might be 1.0004x faster <harmonic> 146.853+-0.210 146.664+-0.096 might be 1.0013x faster BaseLine VisitScratchBuffers JSBench: amazon 17.8333+-0.2033 ? 18.0000+-0.2490 ? facebook 69.8750+-2.1253 ? 71.0833+-2.6028 ? might be 1.0173x slower google 98.5000+-2.2926 97.2917+-2.3411 might be 1.0124x faster twitter 52.1250+-0.2266 51.8333+-0.2690 yahoo 22.4583+-0.2149 22.2917+-0.1961 <arithmetic> * 52.1583+-0.5224 52.1000+-0.5728 might be 1.0011x faster <geometric> 42.7707+-0.2901 42.7705+-0.3350 might be 1.0000x faster <harmonic> 34.6295+-0.1954 ? 34.6683+-0.2322 ? might be 1.0011x slower BaseLine VisitScratchBuffers JSRegress: adapt-to-double-divide 72.6620+-0.0852 ? 72.7051+-0.1321 ? aliased-arguments-getbyval 3.6815+-0.1802 3.6746+-0.1908 arity-mismatch-inlining 1.3168+-0.0312 1.2890+-0.0094 might be 1.0215x faster big-int-mul 28.5101+-0.3113 28.4635+-0.1875 boolean-test 3.9266+-0.0074 ? 3.9326+-0.0151 ? cast-int-to-double 14.1526+-0.0168 ? 14.1607+-0.0213 ? cfg-simplify 6.5957+-0.0205 ? 6.6029+-0.0167 ? cmpeq-obj-to-obj-other 13.9139+-0.4605 ? 14.0145+-0.4557 ? constant-test 27.1168+-0.0414 ? 27.2107+-0.0838 ? direct-arguments-getbyval 0.7011+-0.0105 ? 0.7019+-0.0123 ? double-pollution-getbyval 8.7756+-0.0274 ? 8.7777+-0.0186 ? double-pollution-putbyoffset 4.6870+-0.0354 4.6648+-0.0408 external-arguments-getbyval 4.2952+-0.1919 4.2653+-0.1955 external-arguments-putbyval 7.0715+-0.3312 ? 7.0967+-0.3327 ? Float32Array-matrix-mult 11.3192+-0.5077 ? 11.5432+-0.5367 ? might be 1.0198x slower fold-double-to-int 34.2063+-0.5895 33.8435+-0.2564 might be 1.0107x faster function-test 4.6158+-0.0286 4.5848+-0.0172 inline-arguments-access 3.6107+-0.0225 ? 3.6116+-0.0190 ? inline-arguments-local-escape 39.4209+-1.4598 ? 40.0287+-1.4872 ? might be 1.0154x slower int-overflow-local 102.6761+-0.1919 ? 102.6830+-0.1260 ? Int16Array-bubble-sort 69.0327+-0.6773 68.6075+-0.9430 Int16Array-load-int-mul 15.8174+-0.0875 ? 15.8965+-0.1237 ? Int8Array-load 4.7640+-0.0325 4.7464+-0.0637 integer-divide 15.0545+-0.3723 14.8291+-0.0315 might be 1.0152x faster method-on-number 187.5714+-1.4717 ? 191.3869+-2.8458 ? might be 1.0203x slower number-test 3.9351+-0.0177 ? 3.9598+-0.0108 ? object-test 4.2206+-0.0152 4.1962+-0.0182 poly-stricteq 91.0462+-0.4125 ? 91.1090+-0.4208 ? rare-osr-exit-on-local 150.9123+-0.1566 150.6968+-0.0943 simple-activation-demo 55.2887+-0.3349 55.2096+-0.1918 slow-convergence 90.2112+-0.3039 90.1522+-0.3118 string-hash 14.5765+-0.2147 ? 14.7791+-0.9469 ? might be 1.0139x slower string-test 3.7378+-0.0119 ! 3.7830+-0.0133 ! definitely 1.0121x slower tear-off-arguments 3.8165+-0.1831 3.8063+-0.1918 to-int32-boolean 29.3838+-0.2678 29.0918+-0.0978 might be 1.0100x faster undefined-test 4.2109+-0.0201 ! 4.2541+-0.0155 ! definitely 1.0102x slower <arithmetic> 31.5788+-0.0842 ? 31.6766+-0.1305 ? might be 1.0031x slower <geometric> * 13.0148+-0.0891 ? 13.0203+-0.0916 ? might be 1.0004x slower <harmonic> 5.7876+-0.0518 5.7763+-0.0484 might be 1.0020x faster BaseLine VisitScratchBuffers DSP: filtrr-posterize-tint 49.0644+-0.3898 ? 49.1086+-0.4294 ? filtrr-tint-contrast-sat-bright 81.3190+-0.6132 ? 81.5211+-0.5791 ? filtrr-tint-sat-adj-contr-mult 93.7695+-0.7597 93.6671+-0.6576 filtrr-blur-overlay-sat-contr 231.7627+-0.7890 ? 232.0110+-0.8206 ? filtrr-sat-blur-mult-sharpen-contr 288.6334+-1.8254 288.5511+-1.6429 filtrr-sepia-bias 33.4302+-0.3968 ? 33.5880+-0.4207 ? route9-vp8 x5 1587.7897+-7.5654 ? 1589.0287+-6.5103 ? <arithmetic> 792.4480+-3.3264 ? 793.0537+-2.9408 ? might be 1.0008x slower <geometric> * 345.7214+-0.6517 ? 346.0915+-0.6747 ? might be 1.0011x slower <harmonic> 130.6126+-0.4003 ? 130.8947+-0.4082 ? might be 1.0022x slower BaseLine VisitScratchBuffers All benchmarks: <arithmetic> 128.1947+-0.3152 ? 128.2463+-0.3003 ? might be 1.0004x slower <geometric> 20.0661+-0.0589 ? 20.0724+-0.0659 ? might be 1.0003x slower <harmonic> 4.5082+-0.0165 ? 4.5165+-0.0190 ? might be 1.0018x slower BaseLine VisitScratchBuffers Geomean of preferred means: <scaled-result> 37.2970+-0.1094 37.2804+-0.1215 might be 1.0004x faster
Michael Saboff
Comment 14
2012-05-20 22:42:41 PDT
Committed
r117729
: <
http://trac.webkit.org/changeset/117729
>
Geoffrey Garen
Comment 15
2012-05-21 10:56:39 PDT
Comment on
attachment 142849
[details]
Updated patch wit JIT code keeping track of active scratch buffers View in context:
https://bugs.webkit.org/attachment.cgi?id=142849&action=review
> Source/JavaScriptCore/runtime/JSGlobalData.h:134 > + ScratchBuffer() > + : m_activeLength(0) > + { > + }
It's deceptive to have a constructor function that, by design, is never called. If someone later adds a field to this struct, they'll be very surprised to learn that it never gets initialized, and then they'll have the chore of searching globally for all calls to fastMalloc that are later used to produce a ScratchBuffer, and fixing them up. Typically, we clarify code like this by giving the class a "create" function that encapsulates allocation size and invokes placement new to create the object. This preserves the C++ invariant that constructors get called to initialize objects.
Michael Saboff
Comment 16
2012-05-21 13:22:05 PDT
(In reply to
comment #15
)
> (From update of
attachment 142849
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=142849&action=review
> > > Source/JavaScriptCore/runtime/JSGlobalData.h:134 > > + ScratchBuffer() > > + : m_activeLength(0) > > + { > > + } > > It's deceptive to have a constructor function that, by design, is never called. If someone later adds a field to this struct, they'll be very surprised to learn that it never gets initialized, and then they'll have the chore of searching globally for all calls to fastMalloc that are later used to produce a ScratchBuffer, and fixing them up. > > Typically, we clarify code like this by giving the class a "create" function that encapsulates allocation size and invokes placement new to create the object. This preserves the C++ invariant that constructors get called to initialize objects.
I have a change that fixes this along with another minor cleanup (
https://bugs.webkit.org/show_bug.cgi?id=87027
). I can merge them together or create a new defect. You have a preference?
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug