Bug 213071 - Replace JSC::FreeList linked list with a Bitmap.
Summary: Replace JSC::FreeList linked list with a Bitmap.
Status: REOPENED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords: InRadar
Depends on: 213255 215312
Blocks:
  Show dependency treegraph
 
Reported: 2020-06-11 01:44 PDT by Mark Lam
Modified: 2020-08-09 02:07 PDT (History)
12 users (show)

See Also:


Attachments
work in progress. (33.30 KB, patch)
2020-06-11 02:20 PDT, Mark Lam
no flags Details | Formatted Diff | Diff
test patch 1: does NOT use adjusted bitmap start. (33.65 KB, patch)
2020-06-11 13:41 PDT, Mark Lam
no flags Details | Formatted Diff | Diff
test patch 2: does use adjusted bitmap start. (33.65 KB, patch)
2020-06-11 13:42 PDT, Mark Lam
no flags Details | Formatted Diff | Diff
test patch 3: post-tuned (39.06 KB, patch)
2020-06-12 23:05 PDT, Mark Lam
no flags Details | Formatted Diff | Diff
proposed patch. (45.22 KB, patch)
2020-06-17 15:58 PDT, Mark Lam
no flags Details | Formatted Diff | Diff
proposed patch. (45.17 KB, patch)
2020-06-17 16:12 PDT, Mark Lam
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2020-06-11 01:44:58 PDT
This is an investigation to see if using a Bitmap is faster.  We expect it to be.
Comment 1 Mark Lam 2020-06-11 02:20:26 PDT
Created attachment 401630 [details]
work in progress.
Comment 2 Mark Lam 2020-06-11 13:41:52 PDT
Created attachment 401678 [details]
test patch 1: does NOT use adjusted bitmap start.
Comment 3 Mark Lam 2020-06-11 13:42:45 PDT
Created attachment 401679 [details]
test patch 2: does use adjusted bitmap start.
Comment 4 Mark Lam 2020-06-12 23:05:24 PDT
Created attachment 401826 [details]
test patch 3: post-tuned
Comment 5 Mark Lam 2020-06-17 15:58:42 PDT
Created attachment 402163 [details]
proposed patch.
Comment 6 Mark Lam 2020-06-17 16:12:07 PDT
Created attachment 402164 [details]
proposed patch.
Comment 7 Filip Pizlo 2020-06-17 18:44:34 PDT
Comment on attachment 402164 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review

> Source/JavaScriptCore/heap/MarkedBlockInlines.h:341
> +        freeAtoms ^= cellLocations;

You could probably make this faster if it was one loop that did the whole thing. If I’m understanding right we will have a loop here for the ^= and loops above for other parts of the freeAtoms computation.
Comment 8 Mark Lam 2020-06-17 18:53:43 PDT
(In reply to Filip Pizlo from comment #7)
> Comment on attachment 402164 [details]
> proposed patch.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=402164&action=review
> 
> > Source/JavaScriptCore/heap/MarkedBlockInlines.h:341
> > +        freeAtoms ^= cellLocations;
> 
> You could probably make this faster if it was one loop that did the whole
> thing. If I’m understanding right we will have a loop here for the ^= and
> loops above for other parts of the freeAtoms computation.

Thanks.  I'll look into doing this in a follow up patch as it might get a messy.  I'll land the current patch first which has already been tested.
Comment 9 Mark Lam 2020-06-17 18:58:08 PDT
Landed in r263195: <http://trac.webkit.org/r263195>.
Comment 10 Radar WebKit Bug Importer 2020-06-17 18:59:16 PDT
<rdar://problem/64472738>
Comment 11 Robin Morisset 2020-06-17 19:15:52 PDT
Comment on attachment 402164 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review

This patch looks perfectly reasonable to me, but I am not familiar enough with this code to feel comfortable r+-ing it alone.

> Source/JavaScriptCore/ChangeLog:98
> +           there is n the location of m_cellSize.  It is now moved up next to m_remaining,

typo: "n" ?

> Source/JavaScriptCore/heap/FreeList.h:108
> +        // if there atoms still available for allocation. See comment blob below

typo: "there atoms" => "there are atoms"

> Source/JavaScriptCore/heap/FreeList.h:119
> +    static ptrdiff_t offsetOfBitmapRows() { return OBJECT_OFFSETOF(FreeList, m_bitmap) - sizeof(AtomsBitmap::Word); }

Maybe rename to offsetOfBitmapRowsMinusOne, or offsetOneBeforeBitmapRows, or something like this? Just in case anyone tries to use it without looking at where it is defined.

> Source/JavaScriptCore/heap/FreeListInlines.h:100
> +            while (rowBitmap) {

It probably does not matter, but I can think of a way to make this loop a tad more efficient:
```
unsigned atomIndexInRow = 0;
while (rowBitmap) {
  atomIndexInRow += ctz(rowBitmap);
  auto* cell = bitwise_cast<HeapCell*>(&currentMarkedBlockRowAddress[atomIndexInRow]);
  rowBitmap >>= (++atomIndexInRow);
  func(cell);
}
```

> Source/JavaScriptCore/jit/AssemblyHelpers.cpp:561
> +#if CPU(ARM64)

I am a bit wary that this code will never be tested since BITMAP_FREELIST is currently only set on x86_64. I don't have a better solution though.
It is also a bit weird to have both "#if CPU(ARM64)" and "if (isARM64())", but just "if (isx86_64())". What is the criterion for picking one of these ways of checking the CPU?
Comment 12 Mark Lam 2020-06-17 20:09:00 PDT
Comment on attachment 402164 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review

Thanks.

>> Source/JavaScriptCore/ChangeLog:98
>> +           there is n the location of m_cellSize.  It is now moved up next to m_remaining,
> 
> typo: "n" ?

I think I meant to say "in".

>> Source/JavaScriptCore/heap/FreeList.h:108
>> +        // if there atoms still available for allocation. See comment blob below
> 
> typo: "there atoms" => "there are atoms"

Will fix.

>> Source/JavaScriptCore/heap/FreeList.h:119
>> +    static ptrdiff_t offsetOfBitmapRows() { return OBJECT_OFFSETOF(FreeList, m_bitmap) - sizeof(AtomsBitmap::Word); }
> 
> Maybe rename to offsetOfBitmapRowsMinusOne, or offsetOneBeforeBitmapRows, or something like this? Just in case anyone tries to use it without looking at where it is defined.

offsetOfBitmapRowsMinusOne sounds good.  I'll try to apply this in my follow up patch to address Fil's idea for an optimization.

>> Source/JavaScriptCore/heap/FreeListInlines.h:100
>> +            while (rowBitmap) {
> 
> It probably does not matter, but I can think of a way to make this loop a tad more efficient:
> ```
> unsigned atomIndexInRow = 0;
> while (rowBitmap) {
>   atomIndexInRow += ctz(rowBitmap);
>   auto* cell = bitwise_cast<HeapCell*>(&currentMarkedBlockRowAddress[atomIndexInRow]);
>   rowBitmap >>= (++atomIndexInRow);
>   func(cell);
> }
> ```

I don't see how this can work.  Let's go over an example: say we have bits at index 2, 3, and 5.

rowBitmap is ...0000101100b.
1st iteration: atomIndexInRow is 0, and becomes 2.  We compute the cell to be &currentMarkedBlockRowAddress[2].  All is well.
    atomIndexInRow becomes 3.
    rowBitmap becomes ...0000101b.
2nd iteration: atomIndexInRow is 3, and stays 3.  We compute the cell to be &currentMarkedBlockRowAddress[3].  All is well.
    atomIndexInRow becomes 4.
    rowBitmap becomes ...000b.
3rd iteration should have computed cell at &currentMarkedBlockRowAddress[5], but actually thinks there is no cells left because rowBitmap has become 0.

The bug lies in the right shift on rowBitmap.  I think it needs to be shifted by the result of the ctz + 1, not atomIndexInRow + 1.

And yeah, I'm not sure this is worth it.  The code will definitely be trickier and harder to understand, and therefore error prone.

>> Source/JavaScriptCore/jit/AssemblyHelpers.cpp:561
>> +#if CPU(ARM64)
> 
> I am a bit wary that this code will never be tested since BITMAP_FREELIST is currently only set on x86_64. I don't have a better solution though.
> It is also a bit weird to have both "#if CPU(ARM64)" and "if (isARM64())", but just "if (isx86_64())". What is the criterion for picking one of these ways of checking the CPU?

It's unfortunate but the #if CPU(ARM64) is needed because getCachedMemoryTempRegisterIDAndInvalidate() is only available for ARM64.  We'll be testing this code when we try to turn this on for ARM64 later after we try to do more to improve perf.
Comment 13 Saam Barati 2020-06-17 22:30:14 PDT
Comment on attachment 402164 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review

LGTM too

> Source/JavaScriptCore/ChangeLog:87
> +           The middle path checks m_currentRowIndex to see if we have more rows to allocate
> +           from.  For each m_currentRowIndex, we check its corresponding AtomsBitmap::Word
> +           in m_bitmap.  If the word is non-zero, we copy it to m_currentRowBitmap and
> +           jump to the fast path to do the allocation.  The middle path will update
> +           m_currentRowIndex to point to the current row we're allocating from.

I think it'd be faster if m_currentRowIndex were instead a bitmap itself telling you the next row to allocate out of was. Might matter for blocks with larger cell sizes.

> Source/JavaScriptCore/heap/FreeList.cpp:137
> +        return m_currentRowBitmap & (one << atomIndexInRow);

using the constant 1 is more readable imo

> Source/JavaScriptCore/heap/FreeListInlines.h:51
> +            rowBitmap &= ~(one << atomIndexInRow);

nit: I think it's easier to read with the constant 1 than the name "one"

> Source/JavaScriptCore/heap/MarkedBlockInlines.h:322
> +    auto handleDeadCell = [&] (size_t i) {
> +        HeapCell* cell = reinterpret_cast_ptr<HeapCell*>(atomAt(i));
> +
> +        if (destructionMode != BlockHasNoDestructors)
> +            destroy(cell);
> +
> +        if (sweepMode == SweepToFreeList) {
> +            if (scribbleMode == Scribble)
> +                scribble(cell, cellSize);
> +            ++count;
> +        }
> +    };

It'd be a lot easier to read if this just were inline where you used it.

Anyways, you check destructionMode twice
Comment 14 Saam Barati 2020-06-17 22:31:34 PDT
Comment on attachment 402164 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review

> Source/JavaScriptCore/ChangeLog:12
> +            2 loads - m_currentRowBitmap, m_currentMarkedBlockRowAddress

on arm64 we can should turn this into a single loadpair
Comment 15 Mark Lam 2020-06-17 23:50:33 PDT
Comment on attachment 402164 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=402164&action=review

>> Source/JavaScriptCore/ChangeLog:12
>> +            2 loads - m_currentRowBitmap, m_currentMarkedBlockRowAddress
> 
> on arm64 we can should turn this into a single loadpair

I can experiment with this to see if it's worth it.  It impacts the layout of the fields.  I think our cache line should be large enough that it doesn't matter, but I'll need to test it to be sure.

>> Source/JavaScriptCore/ChangeLog:87
>> +           m_currentRowIndex to point to the current row we're allocating from.
> 
> I think it'd be faster if m_currentRowIndex were instead a bitmap itself telling you the next row to allocate out of was. Might matter for blocks with larger cell sizes.

This is an intriguing idea.  I'll give it a try.

>> Source/JavaScriptCore/heap/FreeList.cpp:137
>> +        return m_currentRowBitmap & (one << atomIndexInRow);
> 
> using the constant 1 is more readable imo

It is necessary to use "one" here because the alternative would be to use static_cast<AtomsBitmap::Word>(1).  It is incorrect to just use literal 1 because literal 1 is of type int, and this shift will never produce a 64-bit value if we use that (which has been a source of bugs).  We don't use 1ull either because technically, AtomsBitmap::Word can be 32-bits eventually if 32-bit ports adopts BITMAP_FREELIST.  Using "one" is the portable and always correct solution here.

>> Source/JavaScriptCore/heap/MarkedBlockInlines.h:322
>> +    };
> 
> It'd be a lot easier to read if this just were inline where you used it.
> 
> Anyways, you check destructionMode twice

This is a carry over from the linked list implementation that calls handleDeadCell in 2 places.  For the bitmap implementation, we only need to call it in one place.  I'll move it to the call site.

As for the redundant destructionMode check, you found a bug!  handleDeadCell should be called unconditionally.  The bug here is that:
1. we won't scribble dead cells if the cell does not have a destructor.
2. we won't count the dead cell.  This results in FreeList::originalSize() always returning 0, which leads to a 0 being wrongly reported to Heap::didAllocate().
Comment 16 WebKit Commit Bot 2020-08-09 02:07:00 PDT
Re-opened since this is blocked by bug 215312