Bug 64582

Summary: JSC JIT does not inline GC allocation fast paths
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, oliver, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
the patch
none
the patch (fix style, fix 32-bit)
none
the patch (fix style)
oliver: review-, oliver: commit-queue-
the patch none

Description Filip Pizlo 2011-07-14 20:10:13 PDT
The JSC GC now has easy-to-inline allocation fast paths.  But the JSC JIT still emits out-of-line stub calls for allocation.  This introduces two forms of overhead.  First, every allocation pays the price of call overhead, even if the allocation just involves a load, a branch, and a handful of stores.  Second, an inlined JIT-generated fast path can directly locate and access the appropriate SizeClass, which should simplify the fast path even more than would be possible with ahead-of-time compilation with a C++ compiler (since the C++ compiler does not statically know the address of the SizeClass and must instead find it by first loading the global data and then adding a constant).  The JSC JIT should inline the GC allocation fast path to alleviate these overheads.
Comment 1 Filip Pizlo 2011-07-14 20:47:41 PDT
Created attachment 100924 [details]
the patch

This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider.

I've lived on it for a day or so, and it seems to work.  It passes all tests.  But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that!  Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work.


[pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results --v8 v8-v4-results/sunspider-results-2011-07-14-20.03.02.js /Volumes/Data/pizlo/quinary/OpenSource/PerformanceTests/SunSpider/v8-v4-results/sunspider-results-2011-07-14-20.00.09.js


TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      1.013x as fast    1166.2ms +/- 0.3%   1150.7ms +/- 0.3%     significant

=============================================================================

  v8:             1.013x as fast    1166.2ms +/- 0.3%   1150.7ms +/- 0.3%     significant
    crypto:       1.016x as fast     194.6ms +/- 0.6%    191.6ms +/- 0.4%     significant
    deltablue:    1.014x as fast     240.7ms +/- 0.8%    237.4ms +/- 0.5%     significant
    earley-boyer: 1.028x as fast     129.7ms +/- 0.3%    126.2ms +/- 0.4%     significant
    raytrace:     1.050x as fast      73.8ms +/- 0.8%     70.3ms +/- 1.4%     significant
    regexp:       ??                 105.9ms +/- 0.6%    106.2ms +/- 0.4%     not conclusive: might be *1.003x as slow*
    richards:     -                  223.8ms +/- 0.6%    223.8ms +/- 0.9% 
    splay:        1.013x as fast     197.7ms +/- 0.5%    195.2ms +/- 0.9%     significant



[pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results sunspider-1.0-results/sunspider-results-2011-07-14-20.04.12.js /Volumes/Data/pizlo/quinary/OpenSource/PerformanceTests/SunSpider/sunspider-1.0-results/sunspider-results-2011-07-14-20.04.56.js


TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           -                 174.4ms +/- 0.3%   173.9ms +/- 0.4% 

=============================================================================

  3d:                  -                  25.2ms +/- 1.0%    25.1ms +/- 0.9% 
    cube:              ??                  8.9ms +/- 1.0%     9.0ms +/- 0.8%     not conclusive: might be *1.009x as slow*
    morph:             -                   7.3ms +/- 3.3%     7.1ms +/- 1.4% 
    raytrace:          ??                  9.0ms +/- 0.0%     9.0ms +/- 0.9%     not conclusive: might be *1.004x as slow*

  access:              -                  22.1ms +/- 0.4%    22.1ms +/- 0.5% 
    binary-trees:      ??                  2.0ms +/- 2.0%     2.0ms +/- 2.8%     not conclusive: might be *1.010x as slow*
    fannkuch:          -                  11.1ms +/- 0.8%    11.1ms +/- 0.6% 
    nbody:             -                   6.0ms +/- 0.0%     6.0ms +/- 0.0% 
    nsieve:            ??                  3.0ms +/- 0.0%     3.0ms +/- 1.3%     not conclusive: might be *1.007x as slow*

  bitops:              ??                 15.4ms +/- 1.2%    15.5ms +/- 1.0%     not conclusive: might be *1.006x as slow*
    3bit-bits-in-byte: -                   2.0ms +/- 0.0%     2.0ms +/- 0.0% 
    bits-in-byte:      ??                  5.4ms +/- 2.6%     5.5ms +/- 2.6%     not conclusive: might be *1.019x as slow*
    bitwise-and:       -                   3.1ms +/- 2.5%     3.0ms +/- 1.9% 
    nsieve-bits:       ??                  5.0ms +/- 1.1%     5.0ms +/- 0.0%     not conclusive: might be *1.008x as slow*

  controlflow:         ??                  1.1ms +/- 6.4%     1.1ms +/- 7.8%     not conclusive: might be *1.038x as slow*
    recursive:         ??                  1.1ms +/- 6.4%     1.1ms +/- 7.8%     not conclusive: might be *1.038x as slow*

  crypto:              -                  11.0ms +/- 0.5%    10.9ms +/- 1.1% 
    aes:               1.017x as fast      7.0ms +/- 0.6%     6.9ms +/- 1.5%     significant
    md5:               ??                  2.0ms +/- 2.0%     2.0ms +/- 2.8%     not conclusive: might be *1.010x as slow*
    sha1:              -                   2.0ms +/- 0.0%     2.0ms +/- 0.0% 

  date:                1.020x as fast     21.9ms +/- 0.6%    21.5ms +/- 0.9%     significant
    format-tofte:      1.036x as fast     13.8ms +/- 0.9%    13.4ms +/- 1.0%     significant
    format-xparb:      ??                  8.0ms +/- 0.7%     8.1ms +/- 1.1%     not conclusive: might be *1.007x as slow*

  math:                -                  16.0ms +/- 0.0%    16.0ms +/- 0.0% 
    cordic:            -                   6.0ms +/- 0.0%     6.0ms +/- 0.0% 
    partial-sums:      -                   7.0ms +/- 0.0%     7.0ms +/- 0.0% 
    spectral-norm:     -                   3.0ms +/- 0.0%     3.0ms +/- 0.0% 

  regexp:              ??                 10.0ms +/- 0.0%    10.0ms +/- 0.4%     not conclusive: might be *1.002x as slow*
    dna:               ??                 10.0ms +/- 0.0%    10.0ms +/- 0.4%     not conclusive: might be *1.002x as slow*

  string:              -                  51.7ms +/- 0.4%    51.6ms +/- 0.9% 
    base64:            1.062x as fast      5.5ms +/- 2.6%     5.2ms +/- 2.0%     significant
    fasta:             1.036x as fast      7.0ms +/- 0.6%     6.7ms +/- 1.9%     significant
    tagcloud:          ??                 13.0ms +/- 0.3%    13.0ms +/- 0.4%     not conclusive: might be *1.002x as slow*
    unpack-code:       *1.021x as slow*   20.2ms +/- 0.5%    20.6ms +/- 1.2%     significant
    validate-input:    ??                  6.1ms +/- 1.4%     6.1ms +/- 1.5%     not conclusive: might be *1.003x as slow*
[pizlo@minime PerformanceTests]
Comment 2 WebKit Review Bot 2011-07-14 20:50:12 PDT
Attachment 100924 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/jit/JITOpcodes.cpp:39:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Filip Pizlo 2011-07-14 20:59:39 PDT
Created attachment 100925 [details]
the patch (fix style, fix 32-bit)

This updated patch fixes style and also fixes 32-bit builds, and adds a note to the ChangeLog that this patch only inlines allocation on 64-bit (32-bit proceeds same as before).
Comment 4 WebKit Review Bot 2011-07-14 21:03:33 PDT
Attachment 100925 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/ChangeLog:19:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Filip Pizlo 2011-07-14 21:06:57 PDT
Created attachment 100926 [details]
the patch (fix style)

Replaced a tab character that emacs accidentally inserted.
Comment 6 Oliver Hunt 2011-07-14 21:50:51 PDT
(In reply to comment #1)
> Created an attachment (id=100924) [details]
> the patch
> 
> This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider.
> 
> I've lived on it for a day or so, and it seems to work.  It passes all tests.  But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that!  Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work.

I'm kind of surprised it doesn't help a microbenchmark of op_new_object, or do you mean only inlining op_new_object doesn't seem to help any of the benchmarks we track? (which is less surprising)
Comment 7 Filip Pizlo 2011-07-14 21:55:36 PDT
(In reply to comment #6)
> (In reply to comment #1)
> > Created an attachment (id=100924) [details] [details]
> > the patch
> > 
> > This patch works gives a 1.3% speed-up on V8, and is neutral on SunSpider.
> > 
> > I've lived on it for a day or so, and it seems to work.  It passes all tests.  But I'm not sure if I'm handling global objects (for initializing the structure in op_new_object) correctly, so I'd appreciate some feedback on that!  Separately I tested the speed-up of op_new_object in isolation, and it wasn't much (statistically significant but fraction of a percent); it definitely appears that op_create_this is doing most of the work.
> 
> I'm kind of surprised it doesn't help a microbenchmark of op_new_object, or do you mean only inlining op_new_object doesn't seem to help any of the benchmarks we track? (which is less surprising)

To clarify: I only meant that it didn't help V8 or SunSpider by more than a fraction of a percent.  I haven't tried a micro benchmark; I "declared victory" once I saw that it gave a slight win on V8.

It definitely helps micro benchmarks (see below).

mytest2.js:
for (var i = 0; i < 10000000; ++i) {
    var x = new Object();
}

With the patch:
[pizlo@minime OpenSource] time WebKitBuild/Release/jsc mytest2.js 

real	0m0.277s
user	0m0.270s
sys	0m0.006s

Without the patch:
[pizlo@minime OpenSource] time ../../secondary/OpenSource/WebKitBuild/Release/jsc mytest2.js 

real	0m0.474s
user	0m0.264s
sys	0m0.008s
Comment 8 Oliver Hunt 2011-07-14 21:58:36 PDT
Comment on attachment 100926 [details]
the patch (fix style)

View in context: https://bugs.webkit.org/attachment.cgi?id=100926&action=review

r- but basically just because of the code duplication.  Would be good to get 32bit going as well.

> Source/JavaScriptCore/jit/JITOpcodes.cpp:352
> +    // remove the object from the free list
> +    loadPtr(Address(regT0), regT1);
> +    storePtr(regT1, &sizeClass->firstFreeCell);
> +    
> +    // initialize the object's vtable
> +    storePtr(ImmPtr(m_globalData->jsFinalObjectVPtr), Address(regT0));
> +    
> +    // initialize the object's structure
> +    storePtr(ImmPtr(m_codeBlock->globalObject()->emptyObjectStructure()), Address(regT0, JSCell::structureOffset()));
> +    
> +    // initialize the inheritor ID
> +    storePtr(ImmPtr(0), Address(regT0, JSObject::offsetOfInheritorID()));
> +    
> +    // initialize the object's property storage pointer
> +    addPtr(Imm32(sizeof(JSObject)), regT0, regT1);

I think all of this could be hoisted into a separate helper function, rather than duplicating the code in op_create_this, well i guess with the exception of the initial structure
Comment 9 Filip Pizlo 2011-07-14 21:59:34 PDT
(In reply to comment #8)
> (From update of attachment 100926 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=100926&action=review
> 
> r- but basically just because of the code duplication.  Would be good to get 32bit going as well.
> 

That's a fine point.  I'll hoist that, and implement a 32-bit version, and then resubmit.
Comment 10 Filip Pizlo 2011-07-15 15:18:53 PDT
Created attachment 101054 [details]
the patch

This reduces code duplication and adds 32-bit support.
Comment 11 Oliver Hunt 2011-07-18 11:00:02 PDT
Comment on attachment 101054 [details]
the patch

Sorry, i thought i had reviewed this :(
Comment 12 WebKit Review Bot 2011-07-18 11:56:01 PDT
Comment on attachment 101054 [details]
the patch

Clearing flags on attachment: 101054

Committed r91199: <http://trac.webkit.org/changeset/91199>
Comment 13 WebKit Review Bot 2011-07-18 11:56:05 PDT
All reviewed patches have been landed.  Closing bug.