Summary: | JSC JIT does not inline GC allocation fast paths | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Filip Pizlo <fpizlo> | ||||||||||
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | barraclough, oliver, webkit.review.bot | ||||||||||
Priority: | P2 | ||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||
Hardware: | All | ||||||||||||
OS: | All | ||||||||||||
Attachments: |
|
Description
Filip Pizlo
2011-07-14 20:10:13 PDT
Created attachment 100924 [details]
the patch
This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider.
I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work.
[pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results --v8 v8-v4-results/sunspider-results-2011-07-14-20.03.02.js /Volumes/Data/pizlo/quinary/OpenSource/PerformanceTests/SunSpider/v8-v4-results/sunspider-results-2011-07-14-20.00.09.js
TEST COMPARISON FROM TO DETAILS
=============================================================================
** TOTAL **: 1.013x as fast 1166.2ms +/- 0.3% 1150.7ms +/- 0.3% significant
=============================================================================
v8: 1.013x as fast 1166.2ms +/- 0.3% 1150.7ms +/- 0.3% significant
crypto: 1.016x as fast 194.6ms +/- 0.6% 191.6ms +/- 0.4% significant
deltablue: 1.014x as fast 240.7ms +/- 0.8% 237.4ms +/- 0.5% significant
earley-boyer: 1.028x as fast 129.7ms +/- 0.3% 126.2ms +/- 0.4% significant
raytrace: 1.050x as fast 73.8ms +/- 0.8% 70.3ms +/- 1.4% significant
regexp: ?? 105.9ms +/- 0.6% 106.2ms +/- 0.4% not conclusive: might be *1.003x as slow*
richards: - 223.8ms +/- 0.6% 223.8ms +/- 0.9%
splay: 1.013x as fast 197.7ms +/- 0.5% 195.2ms +/- 0.9% significant
[pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results sunspider-1.0-results/sunspider-results-2011-07-14-20.04.12.js /Volumes/Data/pizlo/quinary/OpenSource/PerformanceTests/SunSpider/sunspider-1.0-results/sunspider-results-2011-07-14-20.04.56.js
TEST COMPARISON FROM TO DETAILS
=============================================================================
** TOTAL **: - 174.4ms +/- 0.3% 173.9ms +/- 0.4%
=============================================================================
3d: - 25.2ms +/- 1.0% 25.1ms +/- 0.9%
cube: ?? 8.9ms +/- 1.0% 9.0ms +/- 0.8% not conclusive: might be *1.009x as slow*
morph: - 7.3ms +/- 3.3% 7.1ms +/- 1.4%
raytrace: ?? 9.0ms +/- 0.0% 9.0ms +/- 0.9% not conclusive: might be *1.004x as slow*
access: - 22.1ms +/- 0.4% 22.1ms +/- 0.5%
binary-trees: ?? 2.0ms +/- 2.0% 2.0ms +/- 2.8% not conclusive: might be *1.010x as slow*
fannkuch: - 11.1ms +/- 0.8% 11.1ms +/- 0.6%
nbody: - 6.0ms +/- 0.0% 6.0ms +/- 0.0%
nsieve: ?? 3.0ms +/- 0.0% 3.0ms +/- 1.3% not conclusive: might be *1.007x as slow*
bitops: ?? 15.4ms +/- 1.2% 15.5ms +/- 1.0% not conclusive: might be *1.006x as slow*
3bit-bits-in-byte: - 2.0ms +/- 0.0% 2.0ms +/- 0.0%
bits-in-byte: ?? 5.4ms +/- 2.6% 5.5ms +/- 2.6% not conclusive: might be *1.019x as slow*
bitwise-and: - 3.1ms +/- 2.5% 3.0ms +/- 1.9%
nsieve-bits: ?? 5.0ms +/- 1.1% 5.0ms +/- 0.0% not conclusive: might be *1.008x as slow*
controlflow: ?? 1.1ms +/- 6.4% 1.1ms +/- 7.8% not conclusive: might be *1.038x as slow*
recursive: ?? 1.1ms +/- 6.4% 1.1ms +/- 7.8% not conclusive: might be *1.038x as slow*
crypto: - 11.0ms +/- 0.5% 10.9ms +/- 1.1%
aes: 1.017x as fast 7.0ms +/- 0.6% 6.9ms +/- 1.5% significant
md5: ?? 2.0ms +/- 2.0% 2.0ms +/- 2.8% not conclusive: might be *1.010x as slow*
sha1: - 2.0ms +/- 0.0% 2.0ms +/- 0.0%
date: 1.020x as fast 21.9ms +/- 0.6% 21.5ms +/- 0.9% significant
format-tofte: 1.036x as fast 13.8ms +/- 0.9% 13.4ms +/- 1.0% significant
format-xparb: ?? 8.0ms +/- 0.7% 8.1ms +/- 1.1% not conclusive: might be *1.007x as slow*
math: - 16.0ms +/- 0.0% 16.0ms +/- 0.0%
cordic: - 6.0ms +/- 0.0% 6.0ms +/- 0.0%
partial-sums: - 7.0ms +/- 0.0% 7.0ms +/- 0.0%
spectral-norm: - 3.0ms +/- 0.0% 3.0ms +/- 0.0%
regexp: ?? 10.0ms +/- 0.0% 10.0ms +/- 0.4% not conclusive: might be *1.002x as slow*
dna: ?? 10.0ms +/- 0.0% 10.0ms +/- 0.4% not conclusive: might be *1.002x as slow*
string: - 51.7ms +/- 0.4% 51.6ms +/- 0.9%
base64: 1.062x as fast 5.5ms +/- 2.6% 5.2ms +/- 2.0% significant
fasta: 1.036x as fast 7.0ms +/- 0.6% 6.7ms +/- 1.9% significant
tagcloud: ?? 13.0ms +/- 0.3% 13.0ms +/- 0.4% not conclusive: might be *1.002x as slow*
unpack-code: *1.021x as slow* 20.2ms +/- 0.5% 20.6ms +/- 1.2% significant
validate-input: ?? 6.1ms +/- 1.4% 6.1ms +/- 1.5% not conclusive: might be *1.003x as slow*
[pizlo@minime PerformanceTests]
Attachment 100924 [details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1
Source/JavaScriptCore/jit/JITOpcodes.cpp:39: Alphabetical sorting problem. [build/include_order] [4]
Total errors found: 1 in 9 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 100925 [details]
the patch (fix style, fix 32-bit)
This updated patch fixes style and also fixes 32-bit builds, and adds a note to the ChangeLog that this patch only inlines allocation on 64-bit (32-bit proceeds same as before).
Attachment 100925 [details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1
Source/JavaScriptCore/ChangeLog:19: Line contains tab character. [whitespace/tab] [5]
Total errors found: 1 in 9 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 100926 [details]
the patch (fix style)
Replaced a tab character that emacs accidentally inserted.
(In reply to comment #1) > Created an attachment (id=100924) [details] > the patch > > This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider. > > I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work. I'm kind of surprised it doesn't help a microbenchmark of op_new_object, or do you mean only inlining op_new_object doesn't seem to help any of the benchmarks we track? (which is less surprising) (In reply to comment #6) > (In reply to comment #1) > > Created an attachment (id=100924) [details] [details] > > the patch > > > > This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider. > > > > I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work. > > I'm kind of surprised it doesn't help a microbenchmark of op_new_object, or do you mean only inlining op_new_object doesn't seem to help any of the benchmarks we track? (which is less surprising) To clarify: I only meant that it didn't help V8 or SunSpider by more than a fraction of a percent. I haven't tried a micro benchmark; I "declared victory" once I saw that it gave a slight win on V8. It definitely helps micro benchmarks (see below). mytest2.js: for (var i = 0; i < 10000000; ++i) { var x = new Object(); } With the patch: [pizlo@minime OpenSource] time WebKitBuild/Release/jsc mytest2.js real 0m0.277s user 0m0.270s sys 0m0.006s Without the patch: [pizlo@minime OpenSource] time ../../secondary/OpenSource/WebKitBuild/Release/jsc mytest2.js real 0m0.474s user 0m0.264s sys 0m0.008s Comment on attachment 100926 [details] the patch (fix style) View in context: https://bugs.webkit.org/attachment.cgi?id=100926&action=review r- but basically just because of the code duplication. Would be good to get 32bit going as well. > Source/JavaScriptCore/jit/JITOpcodes.cpp:352 > + // remove the object from the free list > + loadPtr(Address(regT0), regT1); > + storePtr(regT1, &sizeClass->firstFreeCell); > + > + // initialize the object's vtable > + storePtr(ImmPtr(m_globalData->jsFinalObjectVPtr), Address(regT0)); > + > + // initialize the object's structure > + storePtr(ImmPtr(m_codeBlock->globalObject()->emptyObjectStructure()), Address(regT0, JSCell::structureOffset())); > + > + // initialize the inheritor ID > + storePtr(ImmPtr(0), Address(regT0, JSObject::offsetOfInheritorID())); > + > + // initialize the object's property storage pointer > + addPtr(Imm32(sizeof(JSObject)), regT0, regT1); I think all of this could be hoisted into a separate helper function, rather than duplicating the code in op_create_this, well i guess with the exception of the initial structure (In reply to comment #8) > (From update of attachment 100926 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=100926&action=review > > r- but basically just because of the code duplication. Would be good to get 32bit going as well. > That's a fine point. I'll hoist that, and implement a 32-bit version, and then resubmit. Created attachment 101054 [details]
the patch
This reduces code duplication and adds 32-bit support.
Comment on attachment 101054 [details]
the patch
Sorry, i thought i had reviewed this :(
Comment on attachment 101054 [details] the patch Clearing flags on attachment: 101054 Committed r91199: <http://trac.webkit.org/changeset/91199> All reviewed patches have been landed. Closing bug. |