Bug 64493

Summary: GC allocation fast path has too many operations
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, gns, oliver, webkit.review.bot, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
the patch
none
the patch (fix style)
none
the patch (more style fixes)
webkit-ews: commit-queue-
the patch (fix build)
oliver: review-, oliver: commit-queue-
the patch (fix review)
none
the patch (fix review)
none
the patch (fix windows) none

Description Filip Pizlo 2011-07-13 17:23:28 PDT
The JSC GC allocates objects by looping starting at the current block, and the current atom in the block, and proceeds until it finds a free atom, which it then destroys.  This allocation path is both expensive and difficult to inline.  The JSC GC should have an allocation fast path that (1) is fast and simple, (2) covers the great majority of cases, and (3) is complemented by a reasonably fast slow path.
Comment 1 Filip Pizlo 2011-07-13 17:35:29 PDT
Created attachment 100735 [details]
the patch

This patch results in the following performance wins:


[pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results --v8  v8-v4-results/sunspider-results-2011-07-13-16.43.10.js v8-v4-results/sunspider-results-2011-07-13-16.38.09.js


TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      1.019x as fast    1205.1ms +/- 0.4%   1182.6ms +/- 0.2%     significant

=============================================================================

  v8:             1.019x as fast    1205.1ms +/- 0.4%   1182.6ms +/- 0.2%     significant
    crypto:       -                  196.3ms +/- 0.5%    195.6ms +/- 0.7% 
    deltablue:    1.026x as fast     252.7ms +/- 1.6%    246.3ms +/- 1.0%     significant
    earley-boyer: 1.028x as fast     136.6ms +/- 0.4%    132.9ms +/- 0.4%     significant
    raytrace:     1.013x as fast      75.4ms +/- 0.8%     74.4ms +/- 0.5%     significant
    regexp:       1.061x as fast     114.4ms +/- 1.0%    107.8ms +/- 0.4%     significant
    richards:     -                  227.8ms +/- 1.1%    226.2ms +/- 0.7% 
    splay:        1.013x as fast     201.9ms +/- 1.0%    199.4ms +/- 0.8%     significant


[pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results sunspider-1.0-results/sunspider-results-2011-07-13-16.42.42.js sunspider-1.0-results/sunspider-results-2011-07-13-16.39.19.js


TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.006x as fast    177.1ms +/- 0.2%   176.0ms +/- 0.2%     significant

=============================================================================

  3d:                  -                  25.3ms +/- 0.8%    25.1ms +/- 0.4% 
    cube:              -                   9.0ms +/- 0.4%     9.0ms +/- 0.8% 
    morph:             -                   7.2ms +/- 2.6%     7.1ms +/- 1.1% 
    raytrace:          -                   9.0ms +/- 0.4%     9.0ms +/- 0.0% 

  access:              -                  22.4ms +/- 0.7%    22.3ms +/- 0.6% 
    binary-trees:      -                   2.0ms +/- 2.8%     2.0ms +/- 2.0% 
    fannkuch:          ??                 11.1ms +/- 0.7%    11.2ms +/- 1.0%     not conclusive: might be *1.009x as slow*
    nbody:             -                   6.0ms +/- 0.0%     6.0ms +/- 0.0% 
    nsieve:            1.071x as fast      3.3ms +/- 4.0%     3.1ms +/- 2.8%     significant

  bitops:              ??                 15.5ms +/- 1.0%    15.6ms +/- 1.1%     not conclusive: might be *1.001x as slow*
    3bit-bits-in-byte: -                   2.0ms +/- 0.0%     2.0ms +/- 0.0% 
    bits-in-byte:      ??                  5.5ms +/- 2.6%     5.5ms +/- 2.8%     not conclusive: might be *1.004x as slow*
    bitwise-and:       -                   3.0ms +/- 1.3%     3.0ms +/- 1.3% 
    nsieve-bits:       -                   5.0ms +/- 0.0%     5.0ms +/- 0.0% 

  controlflow:         ??                  1.1ms +/- 8.7%     1.2ms +/- 9.1%     not conclusive: might be *1.018x as slow*
    recursive:         ??                  1.1ms +/- 8.7%     1.2ms +/- 9.1%     not conclusive: might be *1.018x as slow*

  crypto:              -                  11.1ms +/- 0.8%    11.1ms +/- 0.9% 
    aes:               -                   7.0ms +/- 0.6%     7.0ms +/- 0.6% 
    md5:               ??                  2.1ms +/- 3.7%     2.1ms +/- 4.1%     not conclusive: might be *1.010x as slow*
    sha1:              -                   2.0ms +/- 0.0%     2.0ms +/- 0.0% 

  date:                1.033x as fast     22.6ms +/- 0.9%    21.9ms +/- 0.7%     significant
    format-tofte:      1.019x as fast     14.0ms +/- 0.7%    13.8ms +/- 0.9%     significant
    format-xparb:      1.057x as fast      8.6ms +/- 1.7%     8.1ms +/- 1.1%     significant

  math:                ??                 16.0ms +/- 0.3%    16.0ms +/- 0.3%     not conclusive: might be *1.003x as slow*
    cordic:            ??                  6.0ms +/- 0.7%     6.0ms +/- 0.0%     not conclusive: might be *1.003x as slow*
    partial-sums:      ??                  7.0ms +/- 0.0%     7.0ms +/- 0.6%     not conclusive: might be *1.003x as slow*
    spectral-norm:     -                   3.0ms +/- 0.0%     3.0ms +/- 0.0% 

  regexp:              -                  10.0ms +/- 0.4%    10.0ms +/- 0.0% 
    dna:               -                  10.0ms +/- 0.4%    10.0ms +/- 0.0% 

  string:              -                  53.0ms +/- 0.4%    53.0ms +/- 0.5% 
    base64:            1.020x as fast      6.0ms +/- 0.0%     5.9ms +/- 1.6%     significant
    fasta:             -                   7.0ms +/- 0.6%     7.0ms +/- 0.0% 
    tagcloud:          -                  13.1ms +/- 0.5%    13.0ms +/- 0.4% 
    unpack-code:       ??                 20.6ms +/- 0.8%    20.8ms +/- 0.7%     not conclusive: might be *1.010x as slow*
    validate-input:    -                   6.4ms +/- 2.3%     6.3ms +/- 2.4% 
[pizlo@minime PerformanceTests]
Comment 2 WebKit Review Bot 2011-07-13 17:37:36 PDT
Attachment 100735 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/heap/Heap.h:28:  Alphabetical sorting problem.  [build/include_order] [4]
Source/JavaScriptCore/heap/NewSpace.h:131:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/JavaScriptCore/heap/MarkedBlock.h:97:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Total errors found: 3 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Filip Pizlo 2011-07-13 17:43:24 PDT
Created attachment 100736 [details]
the patch (fix style)
Comment 4 WebKit Review Bot 2011-07-13 17:46:35 PDT
Attachment 100736 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/heap/Heap.h:29:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Filip Pizlo 2011-07-13 17:48:14 PDT
Created attachment 100739 [details]
the patch (more style fixes)
Comment 6 Early Warning System Bot 2011-07-13 18:12:47 PDT
Comment on attachment 100739 [details]
the patch (more style fixes)

Attachment 100739 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/9021763
Comment 7 Filip Pizlo 2011-07-13 18:14:29 PDT
Created attachment 100747 [details]
the patch (fix build)

Reordering #include's to satisfy the style resulted in build failures.
Comment 8 Oliver Hunt 2011-07-13 18:24:41 PDT
Comment on attachment 100739 [details]
the patch (more style fixes)

View in context: https://bugs.webkit.org/attachment.cgi?id=100739&action=review

r- due to the comments above, but also because the exports changes will no doubt break the $%#& windows build.

> Source/JavaScriptCore/heap/Heap.h:264
> +        clearAllocCache();

These functions are used for things other than GC -- clearAllocCache has the effect of clearing mark bits so this seems like it could cause badness, what am i missing?

> Source/JavaScriptCore/heap/Heap.h:279
> +        clearAllocCache();

ditto

> Source/JavaScriptCore/heap/MarkedBlock.h:101
> +        // These should be called immediately after a block is created.
> +        // Blessing for fast path creates a linked list, while blessing for
> +        // slow path creates dummy cells.
> +        FreeCell* blessNewBlockForFastPath();
> +        void blessNewBlockForSlowPath();

I think this is most easily guarantee by making the MarkBlock constructor handle that itself, add an enum along the lines of
enum MarkBlockAllocationPath { AllocationPathSlow, AllocationPathFast };
make the MarkBlock constructor take a MarkBlockAllocationPath parameter in its constructor, and decide which version of bless should be used in the constructor.

> Source/JavaScriptCore/heap/MarkedBlock.h:105
> +        void clearAllocCache(FreeCell* firstFree);

I don't like this name, i find it confusing -- it says it is clearing a cache, when in reality it's also clearing all the mark bits as we would for a GC pass

> Source/JavaScriptCore/heap/NewSpace.h:159
> +        MarkedBlock::FreeCell* firstFree = sizeClass.firstFree;
> +        if (!firstFree) {
> +            // There are two possibilities for why we got here:
> +            // 1) We've exhausted the allocation cache for curBlock, in which case
> +            //    curBlock == nextBlock, and we know that there is no reason to
> +            //    repeat a lazy sweep of nextBlock because we won't find anything.
> +            // 2) Allocation caches have been cleared, in which case nextBlock may
> +            //    have (and most likely does have) free cells, so we almost certainly
> +            //    should do a lazySweep for nextBlock. This also implies that
> +            //    curBlock == 0.
> +            
> +            if (sizeClass.curBlock) {
> +                ASSERT(sizeClass.curBlock == sizeClass.nextBlock);
> +                m_waterMark += sizeClass.nextBlock->capacity();
> +                sizeClass.nextBlock = sizeClass.nextBlock->next();
> +                sizeClass.curBlock = 0;
> +            }
> +            
> +            for (MarkedBlock*& block = sizeClass.nextBlock ; block; block = block->next()) {
> +                firstFree = block->lazySweep();
> +                if (firstFree) {
> +                    sizeClass.firstFree = firstFree;
> +                    sizeClass.curBlock = block;
> +                    break;
> +                }
> +                
> +                m_waterMark += block->capacity();
> +            }
> +            
> +            if (!firstFree)
> +                return 0;
>          }
> -
> -        return 0;
> +        
> +        ASSERT(firstFree);
> +        
> +        sizeClass.firstFree = firstFree->next;
> +        return static_cast<void*>(firstFree);

This would be nicer as an early return:
if (MarkedBlock::FreeCell* firstFree = sizeClass.firstFree) {
     sizeClass.firstFree = firstFree->next;
     return firstFree;
}
....

I think that makes the fast path much clearer and more obvious
Comment 9 Oliver Hunt 2011-07-13 18:43:36 PDT
Comment on attachment 100747 [details]
the patch (fix build)

r- for the reasons listed in the prior comment
Comment 10 Darin Adler 2011-07-13 18:48:26 PDT
Comment on attachment 100747 [details]
the patch (fix build)

View in context: https://bugs.webkit.org/attachment.cgi?id=100747&action=review

> Source/JavaScriptCore/heap/Heap.h:129
> +        void clearAllocCache();

We normally prefer not to abbreviate, so this would be named allocation cache rather than alloc cache.

> Source/JavaScriptCore/heap/Heap.h:297
> +        MarkedBlock::FreeCell* firstFree = sizeClass.firstFree;
> +        if (!firstFree)
> +            return allocateSlowCase(sizeClass);

I wonder if this branch is properly predicted. We might want to see if using UNLIKELY() speeds this up still-more.

> Source/JavaScriptCore/heap/Heap.h:300
> +        return static_cast<void*>(firstFree);

This cast should not be needed. Any pointer to a non-const non-volatile type should convert to void* without a cast.

> Source/JavaScriptCore/heap/MarkedBlock.cpp:93
> +            result = freeCell;

Is the “loop through all cells, but return the last freed one” policy here intentional? I think this is something that deserves a comment, because it’s not obvious to me at least.

> Source/JavaScriptCore/heap/MarkedBlock.cpp:107
> +        result = freeCell;

I have a similar question here.

> Source/JavaScriptCore/heap/NewSpace.h:55
> +            MarkedBlock::FreeCell* firstFree;

We normally use nouns for data members, so we would call this firstFreeCell rather than firstFree.

> Source/JavaScriptCore/heap/NewSpace.h:56
> +            MarkedBlock* curBlock;

We normally don’t abbreviate words like current to cur.

> Source/JavaScriptCore/heap/NewSpace.h:159
> +        return static_cast<void*>(firstFree);

This cast should not be needed.
Comment 11 Filip Pizlo 2011-07-13 19:18:38 PDT
(In reply to comment #8)
> (From update of attachment 100739 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=100739&action=review
> 
> r- due to the comments above, but also because the exports changes will no doubt break the $%#& windows build.
> 
> > Source/JavaScriptCore/heap/Heap.h:264
> > +        clearAllocCache();
> 
> These functions are used for things other than GC -- clearAllocCache has the effect of clearing mark bits so this seems like it could cause badness, what am i missing?

It's almost always safe to call this function, in the sense that it just undoes free-lists in SizeClasses that had them.  It brings the heap back into a canonical form, where each MarkedBlock consists of a bunch of fully initialized JSCells, with mark bits indicating whether these JSCells are actually live.

In particular, this method only clears mark bits for those objects that are on the free list.  An object is placed on the free list only if it is not marked, and as it's placed, it also has its mark bit set to ensure that the allocator fast path doesn't have to do it.  That means that every object still on a free list has the mark bit set, and should have its mark bit cleared if the free list is destroyed.

Another implication of placing objects on the free list is that they no longer have a valid JSCell in them, so anyone iterating over JSCells will get massively confused.  Hence the need to call clearAllocCache() before doing most anything that iterates over all cells or all blocks - this brings the block back into a "canonical" state where (1) mark bits are only set for things that are actually allocated and (2) all cells contains a valid JSCell.

This brings up a possible idea for a better name for this method: canonicalizeBlocks()?

> 
> > Source/JavaScriptCore/heap/Heap.h:279
> > +        clearAllocCache();
> 
> ditto
> 
> > Source/JavaScriptCore/heap/MarkedBlock.h:101
> > +        // These should be called immediately after a block is created.
> > +        // Blessing for fast path creates a linked list, while blessing for
> > +        // slow path creates dummy cells.
> > +        FreeCell* blessNewBlockForFastPath();
> > +        void blessNewBlockForSlowPath();
> 
> I think this is most easily guarantee by making the MarkBlock constructor handle that itself, add an enum along the lines of
> enum MarkBlockAllocationPath { AllocationPathSlow, AllocationPathFast };
> make the MarkBlock constructor take a MarkBlockAllocationPath parameter in its constructor, and decide which version of bless should be used in the constructor.

The reason why I didn't do it this way is that blessNewBlockForFastPath() returns a FreeCell*, which is not stored anywhere in MarkedBlock.  It's only stored in SizeClass, because it's only relevant for the block that is currently used for allocation.

I'm happy to change the code to do construction in one go, and have MarkedBlock::create() also return a FreeCell* (via a reference parameter) and then thread that through the relevant methods in Heap and NewSpace.

> > Source/JavaScriptCore/heap/NewSpace.h:159
> > +        MarkedBlock::FreeCell* firstFree = sizeClass.firstFree;
> > +        if (!firstFree) {
> > +            // There are two possibilities for why we got here:
> > +            // 1) We've exhausted the allocation cache for curBlock, in which case
> > +            //    curBlock == nextBlock, and we know that there is no reason to
> > +            //    repeat a lazy sweep of nextBlock because we won't find anything.
> > +            // 2) Allocation caches have been cleared, in which case nextBlock may
> > +            //    have (and most likely does have) free cells, so we almost certainly
> > +            //    should do a lazySweep for nextBlock. This also implies that
> > +            //    curBlock == 0.
> > +            
> > +            if (sizeClass.curBlock) {
> > +                ASSERT(sizeClass.curBlock == sizeClass.nextBlock);
> > +                m_waterMark += sizeClass.nextBlock->capacity();
> > +                sizeClass.nextBlock = sizeClass.nextBlock->next();
> > +                sizeClass.curBlock = 0;
> > +            }
> > +            
> > +            for (MarkedBlock*& block = sizeClass.nextBlock ; block; block = block->next()) {
> > +                firstFree = block->lazySweep();
> > +                if (firstFree) {
> > +                    sizeClass.firstFree = firstFree;
> > +                    sizeClass.curBlock = block;
> > +                    break;
> > +                }
> > +                
> > +                m_waterMark += block->capacity();
> > +            }
> > +            
> > +            if (!firstFree)
> > +                return 0;
> >          }
> > -
> > -        return 0;
> > +        
> > +        ASSERT(firstFree);
> > +        
> > +        sizeClass.firstFree = firstFree->next;
> > +        return static_cast<void*>(firstFree);
> 
> This would be nicer as an early return:
> if (MarkedBlock::FreeCell* firstFree = sizeClass.firstFree) {
>      sizeClass.firstFree = firstFree->next;
>      return firstFree;
> }
> ....
> 
> I think that makes the fast path much clearer and more obvious

I made it a late return because this covers both the I-already-have-a-free-cell case and the I-need-to-scavenge-for-a-free-cell case.  This way the NewSpace::allocate() method has two clear phases: phase one ensures that it's possible to allocate in the given SizeClass, and the second phase (the "fast path") performs the allocation.

And anyway, the real fast path is in Heap::allocate().  NewSpace::allocate() is typically called with SizeClass.firstFree == 0.  The only exceptions are if NewSpace::allocate() returns 0, and the Heap::allocate() decides to deal with it by allocating a new block, and then calls NewSpace::allocate() again.
Comment 12 Gyuyoung Kim 2011-07-13 19:21:03 PDT
Comment on attachment 100739 [details]
the patch (more style fixes)

Attachment 100739 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/9023818
Comment 13 Filip Pizlo 2011-07-13 19:21:30 PDT
(In reply to comment #10)

Darin - thanks for catching those issues!  To answer your point about the free list construction policy:

> > Source/JavaScriptCore/heap/MarkedBlock.cpp:93
> > +            result = freeCell;
> 
> Is the “loop through all cells, but return the last freed one” policy here intentional? I think this is something that deserves a comment, because it’s not obvious to me at least.

It's not intentional; as far as I can tell it makes no difference if the free list is constructed in a way that makes it travel forward through the block, or backward; and I just chose the latter.  I can add a comment that explains this, or I'd be happy to construct a list that is in-order if that is more intuitive.

-Filip
Comment 14 Gustavo Noronha (kov) 2011-07-13 21:23:31 PDT
Comment on attachment 100739 [details]
the patch (more style fixes)

Attachment 100739 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/9023842
Comment 15 Filip Pizlo 2011-07-14 15:01:09 PDT
Created attachment 100868 [details]
the patch (fix review)
Comment 16 Darin Adler 2011-07-14 15:04:37 PDT
(In reply to comment #13)
> (In reply to comment #10)
> > > Source/JavaScriptCore/heap/MarkedBlock.cpp:93
> > > +            result = freeCell;
> > 
> > Is the “loop through all cells, but return the last freed one” policy here intentional? I think this is something that deserves a comment, because it’s not obvious to me at least.
> 
> It's not intentional; as far as I can tell it makes no difference if the free list is constructed in a way that makes it travel forward through the block, or backward; and I just chose the latter.  I can add a comment that explains this, or I'd be happy to construct a list that is in-order if that is more intuitive.

But regardless of direction, wouldn’t it be faster to return when you find a free cell instead of finishing the loop?
Comment 17 Filip Pizlo 2011-07-14 15:08:51 PDT
Created attachment 100870 [details]
the patch (fix review)

Fixed a part of Darin's review (unnecessary static_cast's) that I missed in the last patch.
Comment 18 Filip Pizlo 2011-07-14 15:15:19 PDT
(In reply to comment #16)
> (In reply to comment #13)
> > (In reply to comment #10)
> > > > Source/JavaScriptCore/heap/MarkedBlock.cpp:93
> > > > +            result = freeCell;
> > > 
> > > Is the “loop through all cells, but return the last freed one” policy here intentional? I think this is something that deserves a comment, because it’s not obvious to me at least.
> > 
> > It's not intentional; as far as I can tell it makes no difference if the free list is constructed in a way that makes it travel forward through the block, or backward; and I just chose the latter.  I can add a comment that explains this, or I'd be happy to construct a list that is in-order if that is more intuitive.
> 
> But regardless of direction, wouldn’t it be faster to return when you find a free cell instead of finishing the loop?

I believe returning early would be slower, since that would mean that every allocation would take slow path.  The point of the lazySweep() method is to build an as-long-as-reasonably-possible linked list of free cells that the fast path can chomp on.
Comment 19 Filip Pizlo 2011-07-14 15:51:04 PDT
Created attachment 100880 [details]
the patch (fix windows)
Comment 20 Darin Adler 2011-07-14 16:12:05 PDT
Comment on attachment 100880 [details]
the patch (fix windows)

View in context: https://bugs.webkit.org/attachment.cgi?id=100880&action=review

> Source/JavaScriptCore/heap/Heap.h:296
> +        if (UNLIKELY(!firstFreeCell))

Did this have a performance effect?

> Source/JavaScriptCore/heap/MarkedBlock.cpp:135
> +        new (static_cast<void*>(current)) JSCell(*m_heap->globalData(), dummyMarkableCellStructure, JSCell::CreatingEarlyCell);

Is this static_cast needed?
Comment 21 Darin Adler 2011-07-14 17:33:22 PDT
(In reply to comment #18)
> I believe returning early would be slower

Yes, I totally overlooked the fact that these functions both do two things. Oops.
Comment 22 WebKit Review Bot 2011-07-14 18:40:38 PDT
Comment on attachment 100880 [details]
the patch (fix windows)

Clearing flags on attachment: 100880

Committed r91039: <http://trac.webkit.org/changeset/91039>
Comment 23 WebKit Review Bot 2011-07-14 18:40:44 PDT
All reviewed patches have been landed.  Closing bug.