73187 – Don't try to optimize huge code blocks

RESOLVED FIXED73187

Don't try to optimize huge code blocks

https://bugs.webkit.org/show_bug.cgi?id=73187

Summary Don't try to optimize huge code blocks

Filip Pizlo

Reported 2011-11-27 22:33:44 PST

Huge code blocks are expensive to optimize. It's probably unwise for a production VM to try to do full optimizations on code blocks that are gigantic, since in the worst case this is probably more expensive than not doing any optimizations. As well, giant code blocks are much more likely to be initializers of some kind rather than containing hot code.

Attachments
the patch (5.15 KB, patch) 2011-11-27 22:37 PST, Filip Pizlo	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Filip Pizlo

Comment 1 2011-11-27 22:37:01 PST

Created attachment 116698 [details] the patch Benchmark report for SunSpider, V8, and Kraken on bigmac.local (MacPro5,1). VMs tested: "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc (r101220) "LimitOpt" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc (r101220) Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. TipOfTree LimitOpt SunSpider: 3d-cube 7.5037+-0.0272 ^ 7.3407+-0.0213 ^ definitely 1.0222x faster 3d-morph 8.4692+-0.1285 ? 8.5893+-0.1530 ? might be 1.0142x slower 3d-raytrace 7.7017+-0.0510 ? 7.7053+-0.0690 ? access-binary-trees 1.5935+-0.0131 1.5916+-0.0048 access-fannkuch 7.5645+-0.0095 7.5612+-0.0147 access-nbody 4.1802+-0.0087 ^ 4.1666+-0.0041 ^ definitely 1.0033x faster access-nsieve 3.1486+-0.0423 ? 3.1523+-0.0479 ? bitops-3bit-bits-in-byte 1.2422+-0.0107 1.2381+-0.0138 bitops-bits-in-byte 4.9074+-0.0115 ? 4.9102+-0.0131 ? bitops-bitwise-and 3.3056+-0.0353 3.2914+-0.0256 bitops-nsieve-bits 5.6562+-0.0365 ? 5.6575+-0.0476 ? controlflow-recursive 2.2977+-0.0150 ? 2.2978+-0.0269 ? crypto-aes 7.1723+-0.0557 ? 7.1838+-0.0519 ? crypto-md5 2.4878+-0.0102 ^ 2.4387+-0.0237 ^ definitely 1.0201x faster crypto-sha1 2.1636+-0.0141 ? 2.1671+-0.0214 ? date-format-tofte 10.8586+-0.2214 10.6046+-0.0496 might be 1.0240x faster date-format-xparb 10.9635+-0.1876 ^ 10.2095+-0.0681 ^ definitely 1.0739x faster math-cordic 7.1197+-0.0179 ? 7.1384+-0.0156 ? math-partial-sums 10.4507+-0.0306 10.4288+-0.0191 math-spectral-norm 2.6073+-0.0112 2.5990+-0.0083 regexp-dna 12.9991+-0.0618 12.8863+-0.0823 string-base64 3.9306+-0.0149 3.9185+-0.0141 string-fasta 7.3658+-0.0145 ^ 7.3088+-0.0243 ^ definitely 1.0078x faster string-tagcloud 12.4449+-0.0504 ? 12.5589+-0.0641 ? string-unpack-code 22.3649+-0.0975 22.3208+-0.0598 string-validate-input 5.7198+-0.0597 ^ 5.5566+-0.0250 ^ definitely 1.0294x faster <arithmetic> * 6.7777+-0.0214 ^ 6.7239+-0.0183 ^ definitely 1.0080x faster <geometric> 5.3966+-0.0156 ^ 5.3608+-0.0174 ^ definitely 1.0067x faster <harmonic> 4.1967+-0.0126 4.1762+-0.0165 might be 1.0049x faster TipOfTree LimitOpt V8: crypto 77.4626+-0.2908 77.3180+-0.2790 deltablue 171.2997+-2.4494 167.5339+-1.3757 might be 1.0225x faster earley-boyer 104.2899+-0.9817 104.0024+-0.6383 raytrace 62.5229+-0.4979 ^ 57.5952+-0.2964 ^ definitely 1.0856x faster regexp 123.4270+-0.9859 ? 123.9464+-1.6153 ? richards 140.1639+-0.5766 139.1292+-0.9662 splay 90.4572+-0.6331 ? 90.6517+-1.3748 ? <arithmetic> 109.9462+-0.4949 ^ 108.5967+-0.5483 ^ definitely 1.0124x faster <geometric> * 104.4177+-0.3725 ^ 102.7886+-0.4385 ^ definitely 1.0158x faster <harmonic> 99.1082+-0.3006 ^ 96.9826+-0.3248 ^ definitely 1.0219x faster TipOfTree LimitOpt Kraken: ai-astar 829.0711+-0.5821 ^ 809.9338+-13.6241 ^ definitely 1.0236x faster audio-beat-detection 205.4648+-1.0089 205.3014+-0.8063 audio-dft 262.9105+-1.7574 ? 263.5134+-2.3609 ? audio-fft 132.8605+-0.1003 132.7982+-0.1085 audio-oscillator 280.4854+-5.9853 ? 281.1293+-7.0148 ? imaging-darkroom 332.9700+-5.1986 ? 335.2299+-5.0319 ? imaging-desaturate 238.7119+-0.0337 238.6970+-0.1261 imaging-gaussian-blur 620.3764+-0.2415 ? 620.4309+-0.1187 ? json-parse-financial 73.0484+-0.5359 72.6107+-0.1585 json-stringify-tinderbox 86.0346+-0.3021 86.0212+-0.1926 stanford-crypto-aes 120.8148+-1.2424 119.9662+-1.8868 stanford-crypto-ccm 119.0706+-1.1233 ? 119.4359+-1.7183 ? stanford-crypto-pbkdf2 233.7058+-1.1453 ? 237.4952+-5.3935 ? might be 1.0162x slower stanford-crypto-sha256-iterative 97.3761+-0.7750 96.9643+-0.1659 <arithmetic> * 259.4929+-0.5239 258.5377+-0.9804 might be 1.0037x faster <geometric> 200.4537+-0.4679 200.2741+-0.8760 might be 1.0009x faster <harmonic> 162.1835+-0.3705 162.0018+-0.7378 might be 1.0011x faster TipOfTree LimitOpt All benchmarks: <arithmetic> 97.4201+-0.1913 ^ 96.9048+-0.3238 ^ definitely 1.0053x faster <geometric> 24.6250+-0.0589 ^ 24.4704+-0.0651 ^ definitely 1.0063x faster <harmonic> 7.3988+-0.0220 7.3617+-0.0286 might be 1.0050x faster TipOfTree LimitOpt Geomean of preferred means: <scaled-result> 56.8405+-0.1219 ^ 56.3239+-0.1338 ^ definitely 1.0092x faster

Filip Pizlo

Comment 2 2011-11-27 22:38:10 PST

<rdar://problem/10489464>

Oliver Hunt

Comment 3 2011-11-28 08:48:36 PST

Comment on attachment 116698 [details] the patch It seems like we would want to be able to separate out eval vs. prog vs. function calls -- it makes more sense (to me) that we'd be willing to optimize a bigger function than eval or prog code. Also how does effect OSR of loops? eg. does a loop have to b relatively small to benefit? What happens on things like http://nerget.com/fluidSim and http://nerget.com/compression/ -- these have a few relatively large core loops.

Filip Pizlo

Comment 4 2011-11-28 09:18:11 PST

(In reply to comment #3) > (From update of attachment 116698 [details]) > It seems like we would want to be able to separate out eval vs. prog vs. function calls -- it makes more sense (to me) that we'd be willing to optimize a bigger function than eval or prog code. That does make sense. But for the purposes of tuning this number for now, I figured that having only one setting is better than having four. Bring back the four settings would be easy, if we found that it would be useful. > Also how does effect OSR of loops? eg. does a loop have to b relatively small to benefit? What happens on things like http://nerget.com/fluidSim and http://nerget.com/compression/ -- these have a few relatively large core loops. I'll have a look.

Geoffrey Garen

Comment 5 2011-11-28 11:14:35 PST

To reduce the risk of missing an optimization opportunity in a huge code block, perhaps the huge code block heuristic could be a multiplier instead of an absolute number. E.g.: size_t iterationsUntilOptimizingCompile = N; size_t iterationsUntilOptimizingCompileInHugeCodeBlock = N * M;

Filip Pizlo

Comment 6 2011-11-28 12:45:24 PST

(In reply to comment #5) > To reduce the risk of missing an optimization opportunity in a huge code block, perhaps the huge code block heuristic could be a multiplier instead of an absolute number. E.g.: > > size_t iterationsUntilOptimizingCompile = N; > size_t iterationsUntilOptimizingCompileInHugeCodeBlock = N * M; I was thinking about this as well. But: 1) We still want to have a size at which we will simply not compile at all. Even before my CFA and propagation stuff, the DFG already had O(n^2) memory usage, since it needs to track all variables at all basic blocks. Pretty easy to write a program that will cause the DFG to run out of memory. 2) Delaying optimization means that we still eat profiling overhead. Taking these two things together, I think that at least for now, until we have more sophisticated heuristics, we want to just have a threshold at which we disable any optimization for the code block, including removing profiling.

Filip Pizlo

Comment 7 2011-11-28 14:44:53 PST

(In reply to comment #4) > (In reply to comment #3) > > (From update of attachment 116698 [details] [details]) > > It seems like we would want to be able to separate out eval vs. prog vs. function calls -- it makes more sense (to me) that we'd be willing to optimize a bigger function than eval or prog code. > > That does make sense. But for the purposes of tuning this number for now, I figured that having only one setting is better than having four. Bring back the four settings would be easy, if we found that it would be useful. > > > Also how does effect OSR of loops? eg. does a loop have to b relatively small to benefit? What happens on things like http://nerget.com/fluidSim and http://nerget.com/compression/ -- these have a few relatively large core loops. > > I'll have a look. No performance difference.

Filip Pizlo

Comment 8 2011-11-28 15:08:12 PST

Landed in http://trac.webkit.org/changeset/101291

Filip Pizlo

Comment 9 2011-11-28 15:08:23 PST

Comment on attachment 116698 [details] the patch Clearing flags.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component JavaScriptCore

Assignee

Nobody

Reported

2011-11-27 22:33 PST

Modified

2011-11-28 15:08 PST History

CC List

1 user Show

URL

Keywords InRadar

Depends on

Blocks