WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
73187
Don't try to optimize huge code blocks
https://bugs.webkit.org/show_bug.cgi?id=73187
Summary
Don't try to optimize huge code blocks
Filip Pizlo
Reported
2011-11-27 22:33:44 PST
Huge code blocks are expensive to optimize. It's probably unwise for a production VM to try to do full optimizations on code blocks that are gigantic, since in the worst case this is probably more expensive than not doing any optimizations. As well, giant code blocks are much more likely to be initializers of some kind rather than containing hot code.
Attachments
the patch
(5.15 KB, patch)
2011-11-27 22:37 PST
,
Filip Pizlo
no flags
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Filip Pizlo
Comment 1
2011-11-27 22:37:01 PST
Created
attachment 116698
[details]
the patch Benchmark report for SunSpider, V8, and Kraken on bigmac.local (MacPro5,1). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (
r101220
) "LimitOpt" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc (
r101220
) Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree LimitOpt SunSpider: 3d-cube 7.5037+-0.0272 ^ 7.3407+-0.0213 ^ definitely 1.0222x faster 3d-morph 8.4692+-0.1285 ? 8.5893+-0.1530 ? might be 1.0142x slower 3d-raytrace 7.7017+-0.0510 ? 7.7053+-0.0690 ? access-binary-trees 1.5935+-0.0131 1.5916+-0.0048 access-fannkuch 7.5645+-0.0095 7.5612+-0.0147 access-nbody 4.1802+-0.0087 ^ 4.1666+-0.0041 ^ definitely 1.0033x faster access-nsieve 3.1486+-0.0423 ? 3.1523+-0.0479 ? bitops-3bit-bits-in-byte 1.2422+-0.0107 1.2381+-0.0138 bitops-bits-in-byte 4.9074+-0.0115 ? 4.9102+-0.0131 ? bitops-bitwise-and 3.3056+-0.0353 3.2914+-0.0256 bitops-nsieve-bits 5.6562+-0.0365 ? 5.6575+-0.0476 ? controlflow-recursive 2.2977+-0.0150 ? 2.2978+-0.0269 ? crypto-aes 7.1723+-0.0557 ? 7.1838+-0.0519 ? crypto-md5 2.4878+-0.0102 ^ 2.4387+-0.0237 ^ definitely 1.0201x faster crypto-sha1 2.1636+-0.0141 ? 2.1671+-0.0214 ? date-format-tofte 10.8586+-0.2214 10.6046+-0.0496 might be 1.0240x faster date-format-xparb 10.9635+-0.1876 ^ 10.2095+-0.0681 ^ definitely 1.0739x faster math-cordic 7.1197+-0.0179 ? 7.1384+-0.0156 ? math-partial-sums 10.4507+-0.0306 10.4288+-0.0191 math-spectral-norm 2.6073+-0.0112 2.5990+-0.0083 regexp-dna 12.9991+-0.0618 12.8863+-0.0823 string-base64 3.9306+-0.0149 3.9185+-0.0141 string-fasta 7.3658+-0.0145 ^ 7.3088+-0.0243 ^ definitely 1.0078x faster string-tagcloud 12.4449+-0.0504 ? 12.5589+-0.0641 ? string-unpack-code 22.3649+-0.0975 22.3208+-0.0598 string-validate-input 5.7198+-0.0597 ^ 5.5566+-0.0250 ^ definitely 1.0294x faster <arithmetic> * 6.7777+-0.0214 ^ 6.7239+-0.0183 ^ definitely 1.0080x faster <geometric> 5.3966+-0.0156 ^ 5.3608+-0.0174 ^ definitely 1.0067x faster <harmonic> 4.1967+-0.0126 4.1762+-0.0165 might be 1.0049x faster TipOfTree LimitOpt V8: crypto 77.4626+-0.2908 77.3180+-0.2790 deltablue 171.2997+-2.4494 167.5339+-1.3757 might be 1.0225x faster earley-boyer 104.2899+-0.9817 104.0024+-0.6383 raytrace 62.5229+-0.4979 ^ 57.5952+-0.2964 ^ definitely 1.0856x faster regexp 123.4270+-0.9859 ? 123.9464+-1.6153 ? richards 140.1639+-0.5766 139.1292+-0.9662 splay 90.4572+-0.6331 ? 90.6517+-1.3748 ? <arithmetic> 109.9462+-0.4949 ^ 108.5967+-0.5483 ^ definitely 1.0124x faster <geometric> * 104.4177+-0.3725 ^ 102.7886+-0.4385 ^ definitely 1.0158x faster <harmonic> 99.1082+-0.3006 ^ 96.9826+-0.3248 ^ definitely 1.0219x faster TipOfTree LimitOpt Kraken: ai-astar 829.0711+-0.5821 ^ 809.9338+-13.6241 ^ definitely 1.0236x faster audio-beat-detection 205.4648+-1.0089 205.3014+-0.8063 audio-dft 262.9105+-1.7574 ? 263.5134+-2.3609 ? audio-fft 132.8605+-0.1003 132.7982+-0.1085 audio-oscillator 280.4854+-5.9853 ? 281.1293+-7.0148 ? imaging-darkroom 332.9700+-5.1986 ? 335.2299+-5.0319 ? imaging-desaturate 238.7119+-0.0337 238.6970+-0.1261 imaging-gaussian-blur 620.3764+-0.2415 ? 620.4309+-0.1187 ? json-parse-financial 73.0484+-0.5359 72.6107+-0.1585 json-stringify-tinderbox 86.0346+-0.3021 86.0212+-0.1926 stanford-crypto-aes 120.8148+-1.2424 119.9662+-1.8868 stanford-crypto-ccm 119.0706+-1.1233 ? 119.4359+-1.7183 ? stanford-crypto-pbkdf2 233.7058+-1.1453 ? 237.4952+-5.3935 ? might be 1.0162x slower stanford-crypto-sha256-iterative 97.3761+-0.7750 96.9643+-0.1659 <arithmetic> * 259.4929+-0.5239 258.5377+-0.9804 might be 1.0037x faster <geometric> 200.4537+-0.4679 200.2741+-0.8760 might be 1.0009x faster <harmonic> 162.1835+-0.3705 162.0018+-0.7378 might be 1.0011x faster TipOfTree LimitOpt All benchmarks: <arithmetic> 97.4201+-0.1913 ^ 96.9048+-0.3238 ^ definitely 1.0053x faster <geometric> 24.6250+-0.0589 ^ 24.4704+-0.0651 ^ definitely 1.0063x faster <harmonic> 7.3988+-0.0220 7.3617+-0.0286 might be 1.0050x faster TipOfTree LimitOpt Geomean of preferred means: <scaled-result> 56.8405+-0.1219 ^ 56.3239+-0.1338 ^ definitely 1.0092x faster
Filip Pizlo
Comment 2
2011-11-27 22:38:10 PST
<
rdar://problem/10489464
>
Oliver Hunt
Comment 3
2011-11-28 08:48:36 PST
Comment on
attachment 116698
[details]
the patch It seems like we would want to be able to separate out eval vs. prog vs. function calls -- it makes more sense (to me) that we'd be willing to optimize a bigger function than eval or prog code. Also how does effect OSR of loops? eg. does a loop have to b relatively small to benefit? What happens on things like
http://nerget.com/fluidSim
and
http://nerget.com/compression/
-- these have a few relatively large core loops.
Filip Pizlo
Comment 4
2011-11-28 09:18:11 PST
(In reply to
comment #3
)
> (From update of
attachment 116698
[details]
) > It seems like we would want to be able to separate out eval vs. prog vs. function calls -- it makes more sense (to me) that we'd be willing to optimize a bigger function than eval or prog code.
That does make sense. But for the purposes of tuning this number for now, I figured that having only one setting is better than having four. Bring back the four settings would be easy, if we found that it would be useful.
> Also how does effect OSR of loops? eg. does a loop have to b relatively small to benefit? What happens on things like
http://nerget.com/fluidSim
and
http://nerget.com/compression/
-- these have a few relatively large core loops.
I'll have a look.
Geoffrey Garen
Comment 5
2011-11-28 11:14:35 PST
To reduce the risk of missing an optimization opportunity in a huge code block, perhaps the huge code block heuristic could be a multiplier instead of an absolute number. E.g.: size_t iterationsUntilOptimizingCompile = N; size_t iterationsUntilOptimizingCompileInHugeCodeBlock = N * M;
Filip Pizlo
Comment 6
2011-11-28 12:45:24 PST
(In reply to
comment #5
)
> To reduce the risk of missing an optimization opportunity in a huge code block, perhaps the huge code block heuristic could be a multiplier instead of an absolute number. E.g.: > > size_t iterationsUntilOptimizingCompile = N; > size_t iterationsUntilOptimizingCompileInHugeCodeBlock = N * M;
I was thinking about this as well. But: 1) We still want to have a size at which we will simply not compile at all. Even before my CFA and propagation stuff, the DFG already had O(n^2) memory usage, since it needs to track all variables at all basic blocks. Pretty easy to write a program that will cause the DFG to run out of memory. 2) Delaying optimization means that we still eat profiling overhead. Taking these two things together, I think that at least for now, until we have more sophisticated heuristics, we want to just have a threshold at which we disable any optimization for the code block, including removing profiling.
Filip Pizlo
Comment 7
2011-11-28 14:44:53 PST
(In reply to
comment #4
)
> (In reply to
comment #3
) > > (From update of
attachment 116698
[details]
[details]) > > It seems like we would want to be able to separate out eval vs. prog vs. function calls -- it makes more sense (to me) that we'd be willing to optimize a bigger function than eval or prog code. > > That does make sense. But for the purposes of tuning this number for now, I figured that having only one setting is better than having four. Bring back the four settings would be easy, if we found that it would be useful. > > > Also how does effect OSR of loops? eg. does a loop have to b relatively small to benefit? What happens on things like
http://nerget.com/fluidSim
and
http://nerget.com/compression/
-- these have a few relatively large core loops. > > I'll have a look.
No performance difference.
Filip Pizlo
Comment 8
2011-11-28 15:08:12 PST
Landed in
http://trac.webkit.org/changeset/101291
Filip Pizlo
Comment 9
2011-11-28 15:08:23 PST
Comment on
attachment 116698
[details]
the patch Clearing flags.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug