WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
64582
JSC JIT does not inline GC allocation fast paths
https://bugs.webkit.org/show_bug.cgi?id=64582
Summary
JSC JIT does not inline GC allocation fast paths
Filip Pizlo
Reported
2011-07-14 20:10:13 PDT
The JSC GC now has easy-to-inline allocation fast paths. But the JSC JIT still emits out-of-line stub calls for allocation. This introduces two forms of overhead. First, every allocation pays the price of call overhead, even if the allocation just involves a load, a branch, and a handful of stores. Second, an inlined JIT-generated fast path can directly locate and access the appropriate SizeClass, which should simplify the fast path even more than would be possible with ahead-of-time compilation with a C++ compiler (since the C++ compiler does not statically know the address of the SizeClass and must instead find it by first loading the global data and then adding a constant). The JSC JIT should inline the GC allocation fast path to alleviate these overheads.
Attachments
the patch
(13.25 KB, patch)
2011-07-14 20:47 PDT
,
Filip Pizlo
no flags
Details
Formatted Diff
Diff
the patch (fix style, fix 32-bit)
(13.70 KB, patch)
2011-07-14 20:59 PDT
,
Filip Pizlo
no flags
Details
Formatted Diff
Diff
the patch (fix style)
(13.66 KB, patch)
2011-07-14 21:06 PDT
,
Filip Pizlo
oliver
: review-
oliver
: commit-queue-
Details
Formatted Diff
Diff
the patch
(15.49 KB, patch)
2011-07-15 15:18 PDT
,
Filip Pizlo
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Filip Pizlo
Comment 1
2011-07-14 20:47:41 PDT
Created
attachment 100924
[details]
the patch This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider. I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work. [pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results --v8 v8-v4-results/sunspider-results-2011-07-14-20.03.02.js /Volumes/Data/pizlo/quinary/OpenSource/PerformanceTests/SunSpider/v8-v4-results/sunspider-results-2011-07-14-20.00.09.js TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.013x as fast 1166.2ms +/- 0.3% 1150.7ms +/- 0.3% significant ============================================================================= v8: 1.013x as fast 1166.2ms +/- 0.3% 1150.7ms +/- 0.3% significant crypto: 1.016x as fast 194.6ms +/- 0.6% 191.6ms +/- 0.4% significant deltablue: 1.014x as fast 240.7ms +/- 0.8% 237.4ms +/- 0.5% significant earley-boyer: 1.028x as fast 129.7ms +/- 0.3% 126.2ms +/- 0.4% significant raytrace: 1.050x as fast 73.8ms +/- 0.8% 70.3ms +/- 1.4% significant regexp: ?? 105.9ms +/- 0.6% 106.2ms +/- 0.4% not conclusive: might be *1.003x as slow* richards: - 223.8ms +/- 0.6% 223.8ms +/- 0.9% splay: 1.013x as fast 197.7ms +/- 0.5% 195.2ms +/- 0.9% significant [pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results sunspider-1.0-results/sunspider-results-2011-07-14-20.04.12.js /Volumes/Data/pizlo/quinary/OpenSource/PerformanceTests/SunSpider/sunspider-1.0-results/sunspider-results-2011-07-14-20.04.56.js TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 174.4ms +/- 0.3% 173.9ms +/- 0.4% ============================================================================= 3d: - 25.2ms +/- 1.0% 25.1ms +/- 0.9% cube: ?? 8.9ms +/- 1.0% 9.0ms +/- 0.8% not conclusive: might be *1.009x as slow* morph: - 7.3ms +/- 3.3% 7.1ms +/- 1.4% raytrace: ?? 9.0ms +/- 0.0% 9.0ms +/- 0.9% not conclusive: might be *1.004x as slow* access: - 22.1ms +/- 0.4% 22.1ms +/- 0.5% binary-trees: ?? 2.0ms +/- 2.0% 2.0ms +/- 2.8% not conclusive: might be *1.010x as slow* fannkuch: - 11.1ms +/- 0.8% 11.1ms +/- 0.6% nbody: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% nsieve: ?? 3.0ms +/- 0.0% 3.0ms +/- 1.3% not conclusive: might be *1.007x as slow* bitops: ?? 15.4ms +/- 1.2% 15.5ms +/- 1.0% not conclusive: might be *1.006x as slow* 3bit-bits-in-byte: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% bits-in-byte: ?? 5.4ms +/- 2.6% 5.5ms +/- 2.6% not conclusive: might be *1.019x as slow* bitwise-and: - 3.1ms +/- 2.5% 3.0ms +/- 1.9% nsieve-bits: ?? 5.0ms +/- 1.1% 5.0ms +/- 0.0% not conclusive: might be *1.008x as slow* controlflow: ?? 1.1ms +/- 6.4% 1.1ms +/- 7.8% not conclusive: might be *1.038x as slow* recursive: ?? 1.1ms +/- 6.4% 1.1ms +/- 7.8% not conclusive: might be *1.038x as slow* crypto: - 11.0ms +/- 0.5% 10.9ms +/- 1.1% aes: 1.017x as fast 7.0ms +/- 0.6% 6.9ms +/- 1.5% significant md5: ?? 2.0ms +/- 2.0% 2.0ms +/- 2.8% not conclusive: might be *1.010x as slow* sha1: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% date: 1.020x as fast 21.9ms +/- 0.6% 21.5ms +/- 0.9% significant format-tofte: 1.036x as fast 13.8ms +/- 0.9% 13.4ms +/- 1.0% significant format-xparb: ?? 8.0ms +/- 0.7% 8.1ms +/- 1.1% not conclusive: might be *1.007x as slow* math: - 16.0ms +/- 0.0% 16.0ms +/- 0.0% cordic: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% partial-sums: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% spectral-norm: - 3.0ms +/- 0.0% 3.0ms +/- 0.0% regexp: ?? 10.0ms +/- 0.0% 10.0ms +/- 0.4% not conclusive: might be *1.002x as slow* dna: ?? 10.0ms +/- 0.0% 10.0ms +/- 0.4% not conclusive: might be *1.002x as slow* string: - 51.7ms +/- 0.4% 51.6ms +/- 0.9% base64: 1.062x as fast 5.5ms +/- 2.6% 5.2ms +/- 2.0% significant fasta: 1.036x as fast 7.0ms +/- 0.6% 6.7ms +/- 1.9% significant tagcloud: ?? 13.0ms +/- 0.3% 13.0ms +/- 0.4% not conclusive: might be *1.002x as slow* unpack-code: *1.021x as slow* 20.2ms +/- 0.5% 20.6ms +/- 1.2% significant validate-input: ?? 6.1ms +/- 1.4% 6.1ms +/- 1.5% not conclusive: might be *1.003x as slow* [pizlo@minime PerformanceTests]
WebKit Review Bot
Comment 2
2011-07-14 20:50:12 PDT
Attachment 100924
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/jit/JITOpcodes.cpp:39: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 1 in 9 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 3
2011-07-14 20:59:39 PDT
Created
attachment 100925
[details]
the patch (fix style, fix 32-bit) This updated patch fixes style and also fixes 32-bit builds, and adds a note to the ChangeLog that this patch only inlines allocation on 64-bit (32-bit proceeds same as before).
WebKit Review Bot
Comment 4
2011-07-14 21:03:33 PDT
Attachment 100925
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/ChangeLog:19: Line contains tab character. [whitespace/tab] [5] Total errors found: 1 in 9 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 5
2011-07-14 21:06:57 PDT
Created
attachment 100926
[details]
the patch (fix style) Replaced a tab character that emacs accidentally inserted.
Oliver Hunt
Comment 6
2011-07-14 21:50:51 PDT
(In reply to
comment #1
)
> Created an attachment (id=100924) [details] > the patch > > This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider. > > I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work.
I'm kind of surprised it doesn't help a microbenchmark of op_new_object, or do you mean only inlining op_new_object doesn't seem to help any of the benchmarks we track? (which is less surprising)
Filip Pizlo
Comment 7
2011-07-14 21:55:36 PDT
(In reply to
comment #6
)
> (In reply to
comment #1
) > > Created an attachment (id=100924) [details] [details] > > the patch > > > > This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider. > > > > I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work. > > I'm kind of surprised it doesn't help a microbenchmark of op_new_object, or do you mean only inlining op_new_object doesn't seem to help any of the benchmarks we track? (which is less surprising)
To clarify: I only meant that it didn't help V8 or SunSpider by more than a fraction of a percent. I haven't tried a micro benchmark; I "declared victory" once I saw that it gave a slight win on V8. It definitely helps micro benchmarks (see below). mytest2.js: for (var i = 0; i < 10000000; ++i) { var x = new Object(); } With the patch: [pizlo@minime OpenSource] time WebKitBuild/Release/jsc mytest2.js real 0m0.277s user 0m0.270s sys 0m0.006s Without the patch: [pizlo@minime OpenSource] time ../../secondary/OpenSource/WebKitBuild/Release/jsc mytest2.js real 0m0.474s user 0m0.264s sys 0m0.008s
Oliver Hunt
Comment 8
2011-07-14 21:58:36 PDT
Comment on
attachment 100926
[details]
the patch (fix style) View in context:
https://bugs.webkit.org/attachment.cgi?id=100926&action=review
r- but basically just because of the code duplication. Would be good to get 32bit going as well.
> Source/JavaScriptCore/jit/JITOpcodes.cpp:352 > + // remove the object from the free list > + loadPtr(Address(regT0), regT1); > + storePtr(regT1, &sizeClass->firstFreeCell); > + > + // initialize the object's vtable > + storePtr(ImmPtr(m_globalData->jsFinalObjectVPtr), Address(regT0)); > + > + // initialize the object's structure > + storePtr(ImmPtr(m_codeBlock->globalObject()->emptyObjectStructure()), Address(regT0, JSCell::structureOffset())); > + > + // initialize the inheritor ID > + storePtr(ImmPtr(0), Address(regT0, JSObject::offsetOfInheritorID())); > + > + // initialize the object's property storage pointer > + addPtr(Imm32(sizeof(JSObject)), regT0, regT1);
I think all of this could be hoisted into a separate helper function, rather than duplicating the code in op_create_this, well i guess with the exception of the initial structure
Filip Pizlo
Comment 9
2011-07-14 21:59:34 PDT
(In reply to
comment #8
)
> (From update of
attachment 100926
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=100926&action=review
> > r- but basically just because of the code duplication. Would be good to get 32bit going as well. >
That's a fine point. I'll hoist that, and implement a 32-bit version, and then resubmit.
Filip Pizlo
Comment 10
2011-07-15 15:18:53 PDT
Created
attachment 101054
[details]
the patch This reduces code duplication and adds 32-bit support.
Oliver Hunt
Comment 11
2011-07-18 11:00:02 PDT
Comment on
attachment 101054
[details]
the patch Sorry, i thought i had reviewed this :(
WebKit Review Bot
Comment 12
2011-07-18 11:56:01 PDT
Comment on
attachment 101054
[details]
the patch Clearing flags on attachment: 101054 Committed
r91199
: <
http://trac.webkit.org/changeset/91199
>
WebKit Review Bot
Comment 13
2011-07-18 11:56:05 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug