RESOLVED FIXED 64582
JSC JIT does not inline GC allocation fast paths
https://bugs.webkit.org/show_bug.cgi?id=64582
Summary JSC JIT does not inline GC allocation fast paths
Filip Pizlo
Reported 2011-07-14 20:10:13 PDT
The JSC GC now has easy-to-inline allocation fast paths. But the JSC JIT still emits out-of-line stub calls for allocation. This introduces two forms of overhead. First, every allocation pays the price of call overhead, even if the allocation just involves a load, a branch, and a handful of stores. Second, an inlined JIT-generated fast path can directly locate and access the appropriate SizeClass, which should simplify the fast path even more than would be possible with ahead-of-time compilation with a C++ compiler (since the C++ compiler does not statically know the address of the SizeClass and must instead find it by first loading the global data and then adding a constant). The JSC JIT should inline the GC allocation fast path to alleviate these overheads.
Attachments
the patch (13.25 KB, patch)
2011-07-14 20:47 PDT, Filip Pizlo
no flags
the patch (fix style, fix 32-bit) (13.70 KB, patch)
2011-07-14 20:59 PDT, Filip Pizlo
no flags
the patch (fix style) (13.66 KB, patch)
2011-07-14 21:06 PDT, Filip Pizlo
oliver: review-
oliver: commit-queue-
the patch (15.49 KB, patch)
2011-07-15 15:18 PDT, Filip Pizlo
no flags
Filip Pizlo
Comment 1 2011-07-14 20:47:41 PDT
Created attachment 100924 [details] the patch This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider. I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work. [pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results --v8 v8-v4-results/sunspider-results-2011-07-14-20.03.02.js /Volumes/Data/pizlo/quinary/OpenSource/PerformanceTests/SunSpider/v8-v4-results/sunspider-results-2011-07-14-20.00.09.js TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.013x as fast 1166.2ms +/- 0.3% 1150.7ms +/- 0.3% significant ============================================================================= v8: 1.013x as fast 1166.2ms +/- 0.3% 1150.7ms +/- 0.3% significant crypto: 1.016x as fast 194.6ms +/- 0.6% 191.6ms +/- 0.4% significant deltablue: 1.014x as fast 240.7ms +/- 0.8% 237.4ms +/- 0.5% significant earley-boyer: 1.028x as fast 129.7ms +/- 0.3% 126.2ms +/- 0.4% significant raytrace: 1.050x as fast 73.8ms +/- 0.8% 70.3ms +/- 1.4% significant regexp: ?? 105.9ms +/- 0.6% 106.2ms +/- 0.4% not conclusive: might be *1.003x as slow* richards: - 223.8ms +/- 0.6% 223.8ms +/- 0.9% splay: 1.013x as fast 197.7ms +/- 0.5% 195.2ms +/- 0.9% significant [pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results sunspider-1.0-results/sunspider-results-2011-07-14-20.04.12.js /Volumes/Data/pizlo/quinary/OpenSource/PerformanceTests/SunSpider/sunspider-1.0-results/sunspider-results-2011-07-14-20.04.56.js TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 174.4ms +/- 0.3% 173.9ms +/- 0.4% ============================================================================= 3d: - 25.2ms +/- 1.0% 25.1ms +/- 0.9% cube: ?? 8.9ms +/- 1.0% 9.0ms +/- 0.8% not conclusive: might be *1.009x as slow* morph: - 7.3ms +/- 3.3% 7.1ms +/- 1.4% raytrace: ?? 9.0ms +/- 0.0% 9.0ms +/- 0.9% not conclusive: might be *1.004x as slow* access: - 22.1ms +/- 0.4% 22.1ms +/- 0.5% binary-trees: ?? 2.0ms +/- 2.0% 2.0ms +/- 2.8% not conclusive: might be *1.010x as slow* fannkuch: - 11.1ms +/- 0.8% 11.1ms +/- 0.6% nbody: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% nsieve: ?? 3.0ms +/- 0.0% 3.0ms +/- 1.3% not conclusive: might be *1.007x as slow* bitops: ?? 15.4ms +/- 1.2% 15.5ms +/- 1.0% not conclusive: might be *1.006x as slow* 3bit-bits-in-byte: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% bits-in-byte: ?? 5.4ms +/- 2.6% 5.5ms +/- 2.6% not conclusive: might be *1.019x as slow* bitwise-and: - 3.1ms +/- 2.5% 3.0ms +/- 1.9% nsieve-bits: ?? 5.0ms +/- 1.1% 5.0ms +/- 0.0% not conclusive: might be *1.008x as slow* controlflow: ?? 1.1ms +/- 6.4% 1.1ms +/- 7.8% not conclusive: might be *1.038x as slow* recursive: ?? 1.1ms +/- 6.4% 1.1ms +/- 7.8% not conclusive: might be *1.038x as slow* crypto: - 11.0ms +/- 0.5% 10.9ms +/- 1.1% aes: 1.017x as fast 7.0ms +/- 0.6% 6.9ms +/- 1.5% significant md5: ?? 2.0ms +/- 2.0% 2.0ms +/- 2.8% not conclusive: might be *1.010x as slow* sha1: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% date: 1.020x as fast 21.9ms +/- 0.6% 21.5ms +/- 0.9% significant format-tofte: 1.036x as fast 13.8ms +/- 0.9% 13.4ms +/- 1.0% significant format-xparb: ?? 8.0ms +/- 0.7% 8.1ms +/- 1.1% not conclusive: might be *1.007x as slow* math: - 16.0ms +/- 0.0% 16.0ms +/- 0.0% cordic: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% partial-sums: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% spectral-norm: - 3.0ms +/- 0.0% 3.0ms +/- 0.0% regexp: ?? 10.0ms +/- 0.0% 10.0ms +/- 0.4% not conclusive: might be *1.002x as slow* dna: ?? 10.0ms +/- 0.0% 10.0ms +/- 0.4% not conclusive: might be *1.002x as slow* string: - 51.7ms +/- 0.4% 51.6ms +/- 0.9% base64: 1.062x as fast 5.5ms +/- 2.6% 5.2ms +/- 2.0% significant fasta: 1.036x as fast 7.0ms +/- 0.6% 6.7ms +/- 1.9% significant tagcloud: ?? 13.0ms +/- 0.3% 13.0ms +/- 0.4% not conclusive: might be *1.002x as slow* unpack-code: *1.021x as slow* 20.2ms +/- 0.5% 20.6ms +/- 1.2% significant validate-input: ?? 6.1ms +/- 1.4% 6.1ms +/- 1.5% not conclusive: might be *1.003x as slow* [pizlo@minime PerformanceTests]
WebKit Review Bot
Comment 2 2011-07-14 20:50:12 PDT
Attachment 100924 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/jit/JITOpcodes.cpp:39: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 1 in 9 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 3 2011-07-14 20:59:39 PDT
Created attachment 100925 [details] the patch (fix style, fix 32-bit) This updated patch fixes style and also fixes 32-bit builds, and adds a note to the ChangeLog that this patch only inlines allocation on 64-bit (32-bit proceeds same as before).
WebKit Review Bot
Comment 4 2011-07-14 21:03:33 PDT
Attachment 100925 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/ChangeLog:19: Line contains tab character. [whitespace/tab] [5] Total errors found: 1 in 9 files If any of these errors are false positives, please file a bug against check-webkit-style.
Filip Pizlo
Comment 5 2011-07-14 21:06:57 PDT
Created attachment 100926 [details] the patch (fix style) Replaced a tab character that emacs accidentally inserted.
Oliver Hunt
Comment 6 2011-07-14 21:50:51 PDT
(In reply to comment #1) > Created an attachment (id=100924) [details] > the patch > > This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider. > > I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work. I'm kind of surprised it doesn't help a microbenchmark of op_new_object, or do you mean only inlining op_new_object doesn't seem to help any of the benchmarks we track? (which is less surprising)
Filip Pizlo
Comment 7 2011-07-14 21:55:36 PDT
(In reply to comment #6) > (In reply to comment #1) > > Created an attachment (id=100924) [details] [details] > > the patch > > > > This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider. > > > > I've lived on it for a day or so, and it seems to work. It passes all tests. But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that! Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work. > > I'm kind of surprised it doesn't help a microbenchmark of op_new_object, or do you mean only inlining op_new_object doesn't seem to help any of the benchmarks we track? (which is less surprising) To clarify: I only meant that it didn't help V8 or SunSpider by more than a fraction of a percent. I haven't tried a micro benchmark; I "declared victory" once I saw that it gave a slight win on V8. It definitely helps micro benchmarks (see below). mytest2.js: for (var i = 0; i < 10000000; ++i) { var x = new Object(); } With the patch: [pizlo@minime OpenSource] time WebKitBuild/Release/jsc mytest2.js real 0m0.277s user 0m0.270s sys 0m0.006s Without the patch: [pizlo@minime OpenSource] time ../../secondary/OpenSource/WebKitBuild/Release/jsc mytest2.js real 0m0.474s user 0m0.264s sys 0m0.008s
Oliver Hunt
Comment 8 2011-07-14 21:58:36 PDT
Comment on attachment 100926 [details] the patch (fix style) View in context: https://bugs.webkit.org/attachment.cgi?id=100926&action=review r- but basically just because of the code duplication. Would be good to get 32bit going as well. > Source/JavaScriptCore/jit/JITOpcodes.cpp:352 > + // remove the object from the free list > + loadPtr(Address(regT0), regT1); > + storePtr(regT1, &sizeClass->firstFreeCell); > + > + // initialize the object's vtable > + storePtr(ImmPtr(m_globalData->jsFinalObjectVPtr), Address(regT0)); > + > + // initialize the object's structure > + storePtr(ImmPtr(m_codeBlock->globalObject()->emptyObjectStructure()), Address(regT0, JSCell::structureOffset())); > + > + // initialize the inheritor ID > + storePtr(ImmPtr(0), Address(regT0, JSObject::offsetOfInheritorID())); > + > + // initialize the object's property storage pointer > + addPtr(Imm32(sizeof(JSObject)), regT0, regT1); I think all of this could be hoisted into a separate helper function, rather than duplicating the code in op_create_this, well i guess with the exception of the initial structure
Filip Pizlo
Comment 9 2011-07-14 21:59:34 PDT
(In reply to comment #8) > (From update of attachment 100926 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=100926&action=review > > r- but basically just because of the code duplication. Would be good to get 32bit going as well. > That's a fine point. I'll hoist that, and implement a 32-bit version, and then resubmit.
Filip Pizlo
Comment 10 2011-07-15 15:18:53 PDT
Created attachment 101054 [details] the patch This reduces code duplication and adds 32-bit support.
Oliver Hunt
Comment 11 2011-07-18 11:00:02 PDT
Comment on attachment 101054 [details] the patch Sorry, i thought i had reviewed this :(
WebKit Review Bot
Comment 12 2011-07-18 11:56:01 PDT
Comment on attachment 101054 [details] the patch Clearing flags on attachment: 101054 Committed r91199: <http://trac.webkit.org/changeset/91199>
WebKit Review Bot
Comment 13 2011-07-18 11:56:05 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.