The JSC GC allocates objects by looping starting at the current block, and the current atom in the block, and proceeds until it finds a free atom, which it then destroys. This allocation path is both expensive and difficult to inline. The JSC GC should have an allocation fast path that (1) is fast and simple, (2) covers the great majority of cases, and (3) is complemented by a reasonably fast slow path.
Created attachment 100735 [details] the patch This patch results in the following performance wins: [pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results --v8 v8-v4-results/sunspider-results-2011-07-13-16.43.10.js v8-v4-results/sunspider-results-2011-07-13-16.38.09.js TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.019x as fast 1205.1ms +/- 0.4% 1182.6ms +/- 0.2% significant ============================================================================= v8: 1.019x as fast 1205.1ms +/- 0.4% 1182.6ms +/- 0.2% significant crypto: - 196.3ms +/- 0.5% 195.6ms +/- 0.7% deltablue: 1.026x as fast 252.7ms +/- 1.6% 246.3ms +/- 1.0% significant earley-boyer: 1.028x as fast 136.6ms +/- 0.4% 132.9ms +/- 0.4% significant raytrace: 1.013x as fast 75.4ms +/- 0.8% 74.4ms +/- 0.5% significant regexp: 1.061x as fast 114.4ms +/- 1.0% 107.8ms +/- 0.4% significant richards: - 227.8ms +/- 1.1% 226.2ms +/- 0.7% splay: 1.013x as fast 201.9ms +/- 1.0% 199.4ms +/- 0.8% significant [pizlo@minime PerformanceTests] ../Tools/Scripts/sunspider-compare-results sunspider-1.0-results/sunspider-results-2011-07-13-16.42.42.js sunspider-1.0-results/sunspider-results-2011-07-13-16.39.19.js TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.006x as fast 177.1ms +/- 0.2% 176.0ms +/- 0.2% significant ============================================================================= 3d: - 25.3ms +/- 0.8% 25.1ms +/- 0.4% cube: - 9.0ms +/- 0.4% 9.0ms +/- 0.8% morph: - 7.2ms +/- 2.6% 7.1ms +/- 1.1% raytrace: - 9.0ms +/- 0.4% 9.0ms +/- 0.0% access: - 22.4ms +/- 0.7% 22.3ms +/- 0.6% binary-trees: - 2.0ms +/- 2.8% 2.0ms +/- 2.0% fannkuch: ?? 11.1ms +/- 0.7% 11.2ms +/- 1.0% not conclusive: might be *1.009x as slow* nbody: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% nsieve: 1.071x as fast 3.3ms +/- 4.0% 3.1ms +/- 2.8% significant bitops: ?? 15.5ms +/- 1.0% 15.6ms +/- 1.1% not conclusive: might be *1.001x as slow* 3bit-bits-in-byte: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% bits-in-byte: ?? 5.5ms +/- 2.6% 5.5ms +/- 2.8% not conclusive: might be *1.004x as slow* bitwise-and: - 3.0ms +/- 1.3% 3.0ms +/- 1.3% nsieve-bits: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% controlflow: ?? 1.1ms +/- 8.7% 1.2ms +/- 9.1% not conclusive: might be *1.018x as slow* recursive: ?? 1.1ms +/- 8.7% 1.2ms +/- 9.1% not conclusive: might be *1.018x as slow* crypto: - 11.1ms +/- 0.8% 11.1ms +/- 0.9% aes: - 7.0ms +/- 0.6% 7.0ms +/- 0.6% md5: ?? 2.1ms +/- 3.7% 2.1ms +/- 4.1% not conclusive: might be *1.010x as slow* sha1: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% date: 1.033x as fast 22.6ms +/- 0.9% 21.9ms +/- 0.7% significant format-tofte: 1.019x as fast 14.0ms +/- 0.7% 13.8ms +/- 0.9% significant format-xparb: 1.057x as fast 8.6ms +/- 1.7% 8.1ms +/- 1.1% significant math: ?? 16.0ms +/- 0.3% 16.0ms +/- 0.3% not conclusive: might be *1.003x as slow* cordic: ?? 6.0ms +/- 0.7% 6.0ms +/- 0.0% not conclusive: might be *1.003x as slow* partial-sums: ?? 7.0ms +/- 0.0% 7.0ms +/- 0.6% not conclusive: might be *1.003x as slow* spectral-norm: - 3.0ms +/- 0.0% 3.0ms +/- 0.0% regexp: - 10.0ms +/- 0.4% 10.0ms +/- 0.0% dna: - 10.0ms +/- 0.4% 10.0ms +/- 0.0% string: - 53.0ms +/- 0.4% 53.0ms +/- 0.5% base64: 1.020x as fast 6.0ms +/- 0.0% 5.9ms +/- 1.6% significant fasta: - 7.0ms +/- 0.6% 7.0ms +/- 0.0% tagcloud: - 13.1ms +/- 0.5% 13.0ms +/- 0.4% unpack-code: ?? 20.6ms +/- 0.8% 20.8ms +/- 0.7% not conclusive: might be *1.010x as slow* validate-input: - 6.4ms +/- 2.3% 6.3ms +/- 2.4% [pizlo@minime PerformanceTests]
Attachment 100735 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/heap/Heap.h:28: Alphabetical sorting problem. [build/include_order] [4] Source/JavaScriptCore/heap/NewSpace.h:131: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/JavaScriptCore/heap/MarkedBlock.h:97: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 3 in 9 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 100736 [details] the patch (fix style)
Attachment 100736 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/heap/Heap.h:29: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 1 in 9 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 100739 [details] the patch (more style fixes)
Comment on attachment 100739 [details] the patch (more style fixes) Attachment 100739 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/9021763
Created attachment 100747 [details] the patch (fix build) Reordering #include's to satisfy the style resulted in build failures.
Comment on attachment 100739 [details] the patch (more style fixes) View in context: https://bugs.webkit.org/attachment.cgi?id=100739&action=review r- due to the comments above, but also because the exports changes will no doubt break the $%#& windows build. > Source/JavaScriptCore/heap/Heap.h:264 > + clearAllocCache(); These functions are used for things other than GC -- clearAllocCache has the effect of clearing mark bits so this seems like it could cause badness, what am i missing? > Source/JavaScriptCore/heap/Heap.h:279 > + clearAllocCache(); ditto > Source/JavaScriptCore/heap/MarkedBlock.h:101 > + // These should be called immediately after a block is created. > + // Blessing for fast path creates a linked list, while blessing for > + // slow path creates dummy cells. > + FreeCell* blessNewBlockForFastPath(); > + void blessNewBlockForSlowPath(); I think this is most easily guarantee by making the MarkBlock constructor handle that itself, add an enum along the lines of enum MarkBlockAllocationPath { AllocationPathSlow, AllocationPathFast }; make the MarkBlock constructor take a MarkBlockAllocationPath parameter in its constructor, and decide which version of bless should be used in the constructor. > Source/JavaScriptCore/heap/MarkedBlock.h:105 > + void clearAllocCache(FreeCell* firstFree); I don't like this name, i find it confusing -- it says it is clearing a cache, when in reality it's also clearing all the mark bits as we would for a GC pass > Source/JavaScriptCore/heap/NewSpace.h:159 > + MarkedBlock::FreeCell* firstFree = sizeClass.firstFree; > + if (!firstFree) { > + // There are two possibilities for why we got here: > + // 1) We've exhausted the allocation cache for curBlock, in which case > + // curBlock == nextBlock, and we know that there is no reason to > + // repeat a lazy sweep of nextBlock because we won't find anything. > + // 2) Allocation caches have been cleared, in which case nextBlock may > + // have (and most likely does have) free cells, so we almost certainly > + // should do a lazySweep for nextBlock. This also implies that > + // curBlock == 0. > + > + if (sizeClass.curBlock) { > + ASSERT(sizeClass.curBlock == sizeClass.nextBlock); > + m_waterMark += sizeClass.nextBlock->capacity(); > + sizeClass.nextBlock = sizeClass.nextBlock->next(); > + sizeClass.curBlock = 0; > + } > + > + for (MarkedBlock*& block = sizeClass.nextBlock ; block; block = block->next()) { > + firstFree = block->lazySweep(); > + if (firstFree) { > + sizeClass.firstFree = firstFree; > + sizeClass.curBlock = block; > + break; > + } > + > + m_waterMark += block->capacity(); > + } > + > + if (!firstFree) > + return 0; > } > - > - return 0; > + > + ASSERT(firstFree); > + > + sizeClass.firstFree = firstFree->next; > + return static_cast<void*>(firstFree); This would be nicer as an early return: if (MarkedBlock::FreeCell* firstFree = sizeClass.firstFree) { sizeClass.firstFree = firstFree->next; return firstFree; } .... I think that makes the fast path much clearer and more obvious
Comment on attachment 100747 [details] the patch (fix build) r- for the reasons listed in the prior comment
Comment on attachment 100747 [details] the patch (fix build) View in context: https://bugs.webkit.org/attachment.cgi?id=100747&action=review > Source/JavaScriptCore/heap/Heap.h:129 > + void clearAllocCache(); We normally prefer not to abbreviate, so this would be named allocation cache rather than alloc cache. > Source/JavaScriptCore/heap/Heap.h:297 > + MarkedBlock::FreeCell* firstFree = sizeClass.firstFree; > + if (!firstFree) > + return allocateSlowCase(sizeClass); I wonder if this branch is properly predicted. We might want to see if using UNLIKELY() speeds this up still-more. > Source/JavaScriptCore/heap/Heap.h:300 > + return static_cast<void*>(firstFree); This cast should not be needed. Any pointer to a non-const non-volatile type should convert to void* without a cast. > Source/JavaScriptCore/heap/MarkedBlock.cpp:93 > + result = freeCell; Is the “loop through all cells, but return the last freed one” policy here intentional? I think this is something that deserves a comment, because it’s not obvious to me at least. > Source/JavaScriptCore/heap/MarkedBlock.cpp:107 > + result = freeCell; I have a similar question here. > Source/JavaScriptCore/heap/NewSpace.h:55 > + MarkedBlock::FreeCell* firstFree; We normally use nouns for data members, so we would call this firstFreeCell rather than firstFree. > Source/JavaScriptCore/heap/NewSpace.h:56 > + MarkedBlock* curBlock; We normally don’t abbreviate words like current to cur. > Source/JavaScriptCore/heap/NewSpace.h:159 > + return static_cast<void*>(firstFree); This cast should not be needed.
(In reply to comment #8) > (From update of attachment 100739 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=100739&action=review > > r- due to the comments above, but also because the exports changes will no doubt break the $%#& windows build. > > > Source/JavaScriptCore/heap/Heap.h:264 > > + clearAllocCache(); > > These functions are used for things other than GC -- clearAllocCache has the effect of clearing mark bits so this seems like it could cause badness, what am i missing? It's almost always safe to call this function, in the sense that it just undoes free-lists in SizeClasses that had them. It brings the heap back into a canonical form, where each MarkedBlock consists of a bunch of fully initialized JSCells, with mark bits indicating whether these JSCells are actually live. In particular, this method only clears mark bits for those objects that are on the free list. An object is placed on the free list only if it is not marked, and as it's placed, it also has its mark bit set to ensure that the allocator fast path doesn't have to do it. That means that every object still on a free list has the mark bit set, and should have its mark bit cleared if the free list is destroyed. Another implication of placing objects on the free list is that they no longer have a valid JSCell in them, so anyone iterating over JSCells will get massively confused. Hence the need to call clearAllocCache() before doing most anything that iterates over all cells or all blocks - this brings the block back into a "canonical" state where (1) mark bits are only set for things that are actually allocated and (2) all cells contains a valid JSCell. This brings up a possible idea for a better name for this method: canonicalizeBlocks()? > > > Source/JavaScriptCore/heap/Heap.h:279 > > + clearAllocCache(); > > ditto > > > Source/JavaScriptCore/heap/MarkedBlock.h:101 > > + // These should be called immediately after a block is created. > > + // Blessing for fast path creates a linked list, while blessing for > > + // slow path creates dummy cells. > > + FreeCell* blessNewBlockForFastPath(); > > + void blessNewBlockForSlowPath(); > > I think this is most easily guarantee by making the MarkBlock constructor handle that itself, add an enum along the lines of > enum MarkBlockAllocationPath { AllocationPathSlow, AllocationPathFast }; > make the MarkBlock constructor take a MarkBlockAllocationPath parameter in its constructor, and decide which version of bless should be used in the constructor. The reason why I didn't do it this way is that blessNewBlockForFastPath() returns a FreeCell*, which is not stored anywhere in MarkedBlock. It's only stored in SizeClass, because it's only relevant for the block that is currently used for allocation. I'm happy to change the code to do construction in one go, and have MarkedBlock::create() also return a FreeCell* (via a reference parameter) and then thread that through the relevant methods in Heap and NewSpace. > > Source/JavaScriptCore/heap/NewSpace.h:159 > > + MarkedBlock::FreeCell* firstFree = sizeClass.firstFree; > > + if (!firstFree) { > > + // There are two possibilities for why we got here: > > + // 1) We've exhausted the allocation cache for curBlock, in which case > > + // curBlock == nextBlock, and we know that there is no reason to > > + // repeat a lazy sweep of nextBlock because we won't find anything. > > + // 2) Allocation caches have been cleared, in which case nextBlock may > > + // have (and most likely does have) free cells, so we almost certainly > > + // should do a lazySweep for nextBlock. This also implies that > > + // curBlock == 0. > > + > > + if (sizeClass.curBlock) { > > + ASSERT(sizeClass.curBlock == sizeClass.nextBlock); > > + m_waterMark += sizeClass.nextBlock->capacity(); > > + sizeClass.nextBlock = sizeClass.nextBlock->next(); > > + sizeClass.curBlock = 0; > > + } > > + > > + for (MarkedBlock*& block = sizeClass.nextBlock ; block; block = block->next()) { > > + firstFree = block->lazySweep(); > > + if (firstFree) { > > + sizeClass.firstFree = firstFree; > > + sizeClass.curBlock = block; > > + break; > > + } > > + > > + m_waterMark += block->capacity(); > > + } > > + > > + if (!firstFree) > > + return 0; > > } > > - > > - return 0; > > + > > + ASSERT(firstFree); > > + > > + sizeClass.firstFree = firstFree->next; > > + return static_cast<void*>(firstFree); > > This would be nicer as an early return: > if (MarkedBlock::FreeCell* firstFree = sizeClass.firstFree) { > sizeClass.firstFree = firstFree->next; > return firstFree; > } > .... > > I think that makes the fast path much clearer and more obvious I made it a late return because this covers both the I-already-have-a-free-cell case and the I-need-to-scavenge-for-a-free-cell case. This way the NewSpace::allocate() method has two clear phases: phase one ensures that it's possible to allocate in the given SizeClass, and the second phase (the "fast path") performs the allocation. And anyway, the real fast path is in Heap::allocate(). NewSpace::allocate() is typically called with SizeClass.firstFree == 0. The only exceptions are if NewSpace::allocate() returns 0, and the Heap::allocate() decides to deal with it by allocating a new block, and then calls NewSpace::allocate() again.
Comment on attachment 100739 [details] the patch (more style fixes) Attachment 100739 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/9023818
(In reply to comment #10) Darin - thanks for catching those issues! To answer your point about the free list construction policy: > > Source/JavaScriptCore/heap/MarkedBlock.cpp:93 > > + result = freeCell; > > Is the “loop through all cells, but return the last freed one” policy here intentional? I think this is something that deserves a comment, because it’s not obvious to me at least. It's not intentional; as far as I can tell it makes no difference if the free list is constructed in a way that makes it travel forward through the block, or backward; and I just chose the latter. I can add a comment that explains this, or I'd be happy to construct a list that is in-order if that is more intuitive. -Filip
Comment on attachment 100739 [details] the patch (more style fixes) Attachment 100739 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/9023842
Created attachment 100868 [details] the patch (fix review)
(In reply to comment #13) > (In reply to comment #10) > > > Source/JavaScriptCore/heap/MarkedBlock.cpp:93 > > > + result = freeCell; > > > > Is the “loop through all cells, but return the last freed one” policy here intentional? I think this is something that deserves a comment, because it’s not obvious to me at least. > > It's not intentional; as far as I can tell it makes no difference if the free list is constructed in a way that makes it travel forward through the block, or backward; and I just chose the latter. I can add a comment that explains this, or I'd be happy to construct a list that is in-order if that is more intuitive. But regardless of direction, wouldn’t it be faster to return when you find a free cell instead of finishing the loop?
Created attachment 100870 [details] the patch (fix review) Fixed a part of Darin's review (unnecessary static_cast's) that I missed in the last patch.
(In reply to comment #16) > (In reply to comment #13) > > (In reply to comment #10) > > > > Source/JavaScriptCore/heap/MarkedBlock.cpp:93 > > > > + result = freeCell; > > > > > > Is the “loop through all cells, but return the last freed one” policy here intentional? I think this is something that deserves a comment, because it’s not obvious to me at least. > > > > It's not intentional; as far as I can tell it makes no difference if the free list is constructed in a way that makes it travel forward through the block, or backward; and I just chose the latter. I can add a comment that explains this, or I'd be happy to construct a list that is in-order if that is more intuitive. > > But regardless of direction, wouldn’t it be faster to return when you find a free cell instead of finishing the loop? I believe returning early would be slower, since that would mean that every allocation would take slow path. The point of the lazySweep() method is to build an as-long-as-reasonably-possible linked list of free cells that the fast path can chomp on.
Created attachment 100880 [details] the patch (fix windows)
Comment on attachment 100880 [details] the patch (fix windows) View in context: https://bugs.webkit.org/attachment.cgi?id=100880&action=review > Source/JavaScriptCore/heap/Heap.h:296 > + if (UNLIKELY(!firstFreeCell)) Did this have a performance effect? > Source/JavaScriptCore/heap/MarkedBlock.cpp:135 > + new (static_cast<void*>(current)) JSCell(*m_heap->globalData(), dummyMarkableCellStructure, JSCell::CreatingEarlyCell); Is this static_cast needed?
(In reply to comment #18) > I believe returning early would be slower Yes, I totally overlooked the fact that these functions both do two things. Oops.
Comment on attachment 100880 [details] the patch (fix windows) Clearing flags on attachment: 100880 Committed r91039: <http://trac.webkit.org/changeset/91039>
All reviewed patches have been landed. Closing bug.