This is an investigation to see if using a Bitmap is faster. We expect it to be.
Created attachment 401630 [details] work in progress.
Created attachment 401678 [details] test patch 1: does NOT use adjusted bitmap start.
Created attachment 401679 [details] test patch 2: does use adjusted bitmap start.
Created attachment 401826 [details] test patch 3: post-tuned
Created attachment 402163 [details] proposed patch.
Created attachment 402164 [details] proposed patch.
Comment on attachment 402164 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review > Source/JavaScriptCore/heap/MarkedBlockInlines.h:341 > + freeAtoms ^= cellLocations; You could probably make this faster if it was one loop that did the whole thing. If I’m understanding right we will have a loop here for the ^= and loops above for other parts of the freeAtoms computation.
(In reply to Filip Pizlo from comment #7) > Comment on attachment 402164 [details] > proposed patch. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402164&action=review > > > Source/JavaScriptCore/heap/MarkedBlockInlines.h:341 > > + freeAtoms ^= cellLocations; > > You could probably make this faster if it was one loop that did the whole > thing. If I’m understanding right we will have a loop here for the ^= and > loops above for other parts of the freeAtoms computation. Thanks. I'll look into doing this in a follow up patch as it might get a messy. I'll land the current patch first which has already been tested.
Landed in r263195: <http://trac.webkit.org/r263195>.
<rdar://problem/64472738>
Comment on attachment 402164 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review This patch looks perfectly reasonable to me, but I am not familiar enough with this code to feel comfortable r+-ing it alone. > Source/JavaScriptCore/ChangeLog:98 > + there is n the location of m_cellSize. It is now moved up next to m_remaining, typo: "n" ? > Source/JavaScriptCore/heap/FreeList.h:108 > + // if there atoms still available for allocation. See comment blob below typo: "there atoms" => "there are atoms" > Source/JavaScriptCore/heap/FreeList.h:119 > + static ptrdiff_t offsetOfBitmapRows() { return OBJECT_OFFSETOF(FreeList, m_bitmap) - sizeof(AtomsBitmap::Word); } Maybe rename to offsetOfBitmapRowsMinusOne, or offsetOneBeforeBitmapRows, or something like this? Just in case anyone tries to use it without looking at where it is defined. > Source/JavaScriptCore/heap/FreeListInlines.h:100 > + while (rowBitmap) { It probably does not matter, but I can think of a way to make this loop a tad more efficient: ``` unsigned atomIndexInRow = 0; while (rowBitmap) { atomIndexInRow += ctz(rowBitmap); auto* cell = bitwise_cast<HeapCell*>(¤tMarkedBlockRowAddress[atomIndexInRow]); rowBitmap >>= (++atomIndexInRow); func(cell); } ``` > Source/JavaScriptCore/jit/AssemblyHelpers.cpp:561 > +#if CPU(ARM64) I am a bit wary that this code will never be tested since BITMAP_FREELIST is currently only set on x86_64. I don't have a better solution though. It is also a bit weird to have both "#if CPU(ARM64)" and "if (isARM64())", but just "if (isx86_64())". What is the criterion for picking one of these ways of checking the CPU?
Comment on attachment 402164 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review Thanks. >> Source/JavaScriptCore/ChangeLog:98 >> + there is n the location of m_cellSize. It is now moved up next to m_remaining, > > typo: "n" ? I think I meant to say "in". >> Source/JavaScriptCore/heap/FreeList.h:108 >> + // if there atoms still available for allocation. See comment blob below > > typo: "there atoms" => "there are atoms" Will fix. >> Source/JavaScriptCore/heap/FreeList.h:119 >> + static ptrdiff_t offsetOfBitmapRows() { return OBJECT_OFFSETOF(FreeList, m_bitmap) - sizeof(AtomsBitmap::Word); } > > Maybe rename to offsetOfBitmapRowsMinusOne, or offsetOneBeforeBitmapRows, or something like this? Just in case anyone tries to use it without looking at where it is defined. offsetOfBitmapRowsMinusOne sounds good. I'll try to apply this in my follow up patch to address Fil's idea for an optimization. >> Source/JavaScriptCore/heap/FreeListInlines.h:100 >> + while (rowBitmap) { > > It probably does not matter, but I can think of a way to make this loop a tad more efficient: > ``` > unsigned atomIndexInRow = 0; > while (rowBitmap) { > atomIndexInRow += ctz(rowBitmap); > auto* cell = bitwise_cast<HeapCell*>(¤tMarkedBlockRowAddress[atomIndexInRow]); > rowBitmap >>= (++atomIndexInRow); > func(cell); > } > ``` I don't see how this can work. Let's go over an example: say we have bits at index 2, 3, and 5. rowBitmap is ...0000101100b. 1st iteration: atomIndexInRow is 0, and becomes 2. We compute the cell to be ¤tMarkedBlockRowAddress[2]. All is well. atomIndexInRow becomes 3. rowBitmap becomes ...0000101b. 2nd iteration: atomIndexInRow is 3, and stays 3. We compute the cell to be ¤tMarkedBlockRowAddress[3]. All is well. atomIndexInRow becomes 4. rowBitmap becomes ...000b. 3rd iteration should have computed cell at ¤tMarkedBlockRowAddress[5], but actually thinks there is no cells left because rowBitmap has become 0. The bug lies in the right shift on rowBitmap. I think it needs to be shifted by the result of the ctz + 1, not atomIndexInRow + 1. And yeah, I'm not sure this is worth it. The code will definitely be trickier and harder to understand, and therefore error prone. >> Source/JavaScriptCore/jit/AssemblyHelpers.cpp:561 >> +#if CPU(ARM64) > > I am a bit wary that this code will never be tested since BITMAP_FREELIST is currently only set on x86_64. I don't have a better solution though. > It is also a bit weird to have both "#if CPU(ARM64)" and "if (isARM64())", but just "if (isx86_64())". What is the criterion for picking one of these ways of checking the CPU? It's unfortunate but the #if CPU(ARM64) is needed because getCachedMemoryTempRegisterIDAndInvalidate() is only available for ARM64. We'll be testing this code when we try to turn this on for ARM64 later after we try to do more to improve perf.
Comment on attachment 402164 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review LGTM too > Source/JavaScriptCore/ChangeLog:87 > + The middle path checks m_currentRowIndex to see if we have more rows to allocate > + from. For each m_currentRowIndex, we check its corresponding AtomsBitmap::Word > + in m_bitmap. If the word is non-zero, we copy it to m_currentRowBitmap and > + jump to the fast path to do the allocation. The middle path will update > + m_currentRowIndex to point to the current row we're allocating from. I think it'd be faster if m_currentRowIndex were instead a bitmap itself telling you the next row to allocate out of was. Might matter for blocks with larger cell sizes. > Source/JavaScriptCore/heap/FreeList.cpp:137 > + return m_currentRowBitmap & (one << atomIndexInRow); using the constant 1 is more readable imo > Source/JavaScriptCore/heap/FreeListInlines.h:51 > + rowBitmap &= ~(one << atomIndexInRow); nit: I think it's easier to read with the constant 1 than the name "one" > Source/JavaScriptCore/heap/MarkedBlockInlines.h:322 > + auto handleDeadCell = [&] (size_t i) { > + HeapCell* cell = reinterpret_cast_ptr<HeapCell*>(atomAt(i)); > + > + if (destructionMode != BlockHasNoDestructors) > + destroy(cell); > + > + if (sweepMode == SweepToFreeList) { > + if (scribbleMode == Scribble) > + scribble(cell, cellSize); > + ++count; > + } > + }; It'd be a lot easier to read if this just were inline where you used it. Anyways, you check destructionMode twice
Comment on attachment 402164 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review > Source/JavaScriptCore/ChangeLog:12 > + 2 loads - m_currentRowBitmap, m_currentMarkedBlockRowAddress on arm64 we can should turn this into a single loadpair
Comment on attachment 402164 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review >> Source/JavaScriptCore/ChangeLog:12 >> + 2 loads - m_currentRowBitmap, m_currentMarkedBlockRowAddress > > on arm64 we can should turn this into a single loadpair I can experiment with this to see if it's worth it. It impacts the layout of the fields. I think our cache line should be large enough that it doesn't matter, but I'll need to test it to be sure. >> Source/JavaScriptCore/ChangeLog:87 >> + m_currentRowIndex to point to the current row we're allocating from. > > I think it'd be faster if m_currentRowIndex were instead a bitmap itself telling you the next row to allocate out of was. Might matter for blocks with larger cell sizes. This is an intriguing idea. I'll give it a try. >> Source/JavaScriptCore/heap/FreeList.cpp:137 >> + return m_currentRowBitmap & (one << atomIndexInRow); > > using the constant 1 is more readable imo It is necessary to use "one" here because the alternative would be to use static_cast<AtomsBitmap::Word>(1). It is incorrect to just use literal 1 because literal 1 is of type int, and this shift will never produce a 64-bit value if we use that (which has been a source of bugs). We don't use 1ull either because technically, AtomsBitmap::Word can be 32-bits eventually if 32-bit ports adopts BITMAP_FREELIST. Using "one" is the portable and always correct solution here. >> Source/JavaScriptCore/heap/MarkedBlockInlines.h:322 >> + }; > > It'd be a lot easier to read if this just were inline where you used it. > > Anyways, you check destructionMode twice This is a carry over from the linked list implementation that calls handleDeadCell in 2 places. For the bitmap implementation, we only need to call it in one place. I'll move it to the call site. As for the redundant destructionMode check, you found a bug! handleDeadCell should be called unconditionally. The bug here is that: 1. we won't scribble dead cells if the cell does not have a destructor. 2. we won't count the dead cell. This results in FreeList::originalSize() always returning 0, which leads to a 0 being wrongly reported to Heap::didAllocate().
Re-opened since this is blocked by bug 215312