Bug 75181 - Implement a new allocator for backing stores
Summary: Implement a new allocator for backing stores
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Hahnenberg
URL:
Keywords:
Depends on:
Blocks: 76665
  Show dependency treegraph
 
Reported: 2011-12-23 17:07 PST by Mark Hahnenberg
Modified: 2012-01-19 14:38 PST (History)
6 users (show)

See Also:


Attachments
Patch (87.80 KB, patch)
2012-01-03 14:13 PST, Mark Hahnenberg
no flags Details | Formatted Diff | Diff
Bencher results (7.44 KB, text/plain)
2012-01-03 17:11 PST, Mark Hahnenberg
no flags Details
Splay benchmark profiling results (349.71 KB, image/png)
2012-01-04 15:49 PST, Mark Hahnenberg
no flags Details
Splay CPU utilization results (292.57 KB, image/png)
2012-01-04 16:06 PST, Mark Hahnenberg
no flags Details
Splay CPU utilization results without bump allocator (366.04 KB, image/png)
2012-01-04 16:22 PST, Mark Hahnenberg
no flags Details
Patch (97.49 KB, patch)
2012-01-06 15:15 PST, Mark Hahnenberg
no flags Details | Formatted Diff | Diff
Patch (113.81 KB, patch)
2012-01-12 17:22 PST, Mark Hahnenberg
fpizlo: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Hahnenberg 2011-12-23 17:07:53 PST
We want to move away from using fastMalloc for the backing stores for some of our objects (e.g. JSArray, JSObject, JSString, etc).  These backing stores have a nice property in that they only have a single owner (i.e. a single pointer to them at any one time).  One way that we can take advantage of this property is to implement a simple bump allocator/copying collector, which will run alongside our normal mark/sweep collector, that only needs to update the single owner pointer rather than having to redirect an arbitrary number of pointers in from-space to to-space.

This plan can give us a number of benefits. We can beat fastMalloc in terms of both performance and memory usage, we can track how much memory we're using far more accurately than our rough estimation now through the use of reportExtraMemoryCost, and we can allocate arbitrary size objects (as opposed to being limited to size classes like we have been historically). This is also another step toward moving away from lazy destruction, which will improve our memory footprint.

We can start by creating said allocator and moving the ArrayStorage for JSArray to use it rather than fastMalloc.

The design of the collector is as follows:
Allocation:
-The collector allocates 64KB chunks from the OS to use for object allocation.
-Each chunk contains an offset, a flag indicating if the block has been pinned, and a payload, along with next and prev pointers so that they can be put in DoublyLinkedLists.
-Any allocation greater than 64KB gets its own separate oversize block, which is managed separately from the rest.
-If the allocator receives a request for more than the remaining amount in the current block, it grabs a fresh block.
-Grabbing a fresh block means grabbing one off of the global free list (which is now shared between the mark/sweep allocator and the bump allocator) if there is one. If there isn't a new one we do one of two things: allocate a new block from the OS if we're not ready for a GC yet, or run a GC and then try again. If we still don't have enough space after the GC, we allocate a new block from the OS.

Garbage collection:
-At the start of garbage collection during conservative stack scanning, if we encounter what appears to be a pointer to a bump-allocated block of memory, 
we pin that block so that it will not be copied for this round of collection.
-We also pin any oversize blocks that we encounter, which effectively doubles as a "mark bit" for that block. Any oversize blocks that aren't pinned at the end of copying are given back to the OS.
-Marking threads are now also responsible for copying bump-allocated objects to newSpace
-Each marking thread has a private 64KB block into which it copies bump-allocated objects that it encounters.
-When that block fills up, the marking thread gives it back to the allocator and requests a new one.
-When all marking has concluded, each thread gives back its copy block, even if it isn't full.
-At the conclusion of copying (which is done by the end of the marking phase), we un-pin any pinned blocks and give any blocks left in from-space to the global free list.
Comment 1 Mark Hahnenberg 2012-01-03 14:13:36 PST
Created attachment 120993 [details]
Patch
Comment 2 Sam Weinig 2012-01-03 15:23:36 PST
Comment on attachment 120993 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=120993&action=review

> Source/JavaScriptCore/heap/BumpBlock.h:2
> +#ifndef BumpBlock_h
> +#define BumpBlock_h

Needs license.

> Source/JavaScriptCore/heap/BumpSpace.h:2
> +#ifndef BumpSpace_h
> +#define BumpSpace_h

Needs license.

> Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:2
> +#ifndef BumpSpaceInlineMethods_h
> +#define BumpSpaceInlineMethods_h

Needs license.
Comment 3 Filip Pizlo 2012-01-03 16:20:25 PST
Comment on attachment 120993 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=120993&action=review

R=me.  Can you post the overall performance results?  For something like this it would be good to do bencher and also in-browser tests.  Also, clarify the thread safe comments.  For the other comments if there is something obvious that can be done now, then do it; otherwise post a Radar or bugzilla bug regarding future direction (i.e. tightening up the fast path code for the allocator).

> Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:63
> +// Needs to be threadsafe.

It would be good to clarify these comments.  Does this mean that it needs to be called under a lock?  Or something else?

> Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:244
> +inline void* BumpSpace::allocate(size_t bytes)
> +{
> +    ASSERT(!m_heap->globalData()->isInitializingObject());
> +
> +    if (isOversize(bytes))
> +        return allocateOversize(bytes);
> +    
> +    if (!fitsInCurrentBlock(bytes)) {
> +        addNewBlock();
> +        m_toSpaceFilter.add(reinterpret_cast<Bits>(m_currentBlock));
> +        m_toSpaceSet.add(m_currentBlock);
> +    }
> +    
> +    m_totalMemoryUtilized += bytes;
> +    return allocateFromBlock(m_currentBlock, bytes);
> +}

Is this your fast path?  If so, it would be good if it could be turned into something like:

if (isOversize(bytes) || !fitsInCurrentBlock(bytes)) return callSomeOutOfLineSlowPath(bytes);
return allocateFromBlock(m_currentBlock, bytes);

Also, it would be good if:

- m_totalMemoryUtilized was lazily updated when a block is done being allocated in, since your eager update of this value currently means an extra load and store.

- The data needed to allocate in the current block was kept in BumpSpace, or in something like a BumpAllocator class.  I.e. if you want to allocate stuff (so you're either a GC thread or you're the application) then you have an instance of BumpAllocator that is easily reachable (say by offsetting into JSGlobalData, if you're the application), and then all of the fields (i.e. m_offset) needed to allocate are inside of BumpAllocator.  Right now you're taking an extra load to get to the BumpBlock::m_offset.  The BumpBlock::m_offset field can then be updated after you switch from one block to another.

It's OK if this initial patch doesn't do these things, but it would be great to add a FIXME somewhere documenting this, or else create a Radar that indicates that this is what should be done.

The point is that it will (a) simplify the inline asm code that the JIT will use for allocation and (b) speed things up a fair bit as we start relying on this allocator more.

> Source/JavaScriptCore/heap/MarkStack.cpp:407
> +                        doneCopying();

Does this mean that we call doneCopying twice?  This is for the master SlotVisitor, which you call doneCopying() for in markRoots().
Comment 4 Gavin Barraclough 2012-01-03 16:33:46 PST
Hey Mark, per our conversation please hold off & let me land a couple of quick fixes before committing this - I'll let you know when I'm done.

Thank you!
Comment 5 Mark Hahnenberg 2012-01-03 16:41:04 PST
(In reply to comment #3)
> (From update of attachment 120993 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=120993&action=review
> 
> R=me.  Can you post the overall performance results?  For something like this it would be good to do bencher and also in-browser tests.  Also, clarify the thread safe comments.  For the other comments if there is something obvious that can be done now, then do it; otherwise post a Radar or bugzilla bug regarding future direction (i.e. tightening up the fast path code for the allocator).
> 
> > Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:63
> > +// Needs to be threadsafe.
> 
> It would be good to clarify these comments.  Does this mean that it needs to be called under a lock?  Or something else?

This was more of just a note to myself that this function could be called either directly or indirectly by the copying threads and therefore, any lookup/update to shared data between the threads needed to be locked. I can remove them.

> > Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:244
> > +inline void* BumpSpace::allocate(size_t bytes)
> > +{
> > +    ASSERT(!m_heap->globalData()->isInitializingObject());
> > +
> > +    if (isOversize(bytes))
> > +        return allocateOversize(bytes);
> > +    
> > +    if (!fitsInCurrentBlock(bytes)) {
> > +        addNewBlock();
> > +        m_toSpaceFilter.add(reinterpret_cast<Bits>(m_currentBlock));
> > +        m_toSpaceSet.add(m_currentBlock);
> > +    }
> > +    
> > +    m_totalMemoryUtilized += bytes;
> > +    return allocateFromBlock(m_currentBlock, bytes);
> > +}
> 
> Is this your fast path?  If so, it would be good if it could be turned into something like:
> 
> if (isOversize(bytes) || !fitsInCurrentBlock(bytes)) return callSomeOutOfLineSlowPath(bytes);
> return allocateFromBlock(m_currentBlock, bytes);

Will do.

> Also, it would be good if:
> 
> - m_totalMemoryUtilized was lazily updated when a block is done being allocated in, since your eager update of this value currently means an extra load and store.

Will do.

> - The data needed to allocate in the current block was kept in BumpSpace, or in something like a BumpAllocator class.  I.e. if you want to allocate stuff (so you're either a GC thread or you're the application) then you have an instance of BumpAllocator that is easily reachable (say by offsetting into JSGlobalData, if you're the application), and then all of the fields (i.e. m_offset) needed to allocate are inside of BumpAllocator.  Right now you're taking an extra load to get to the BumpBlock::m_offset.  The BumpBlock::m_offset field can then be updated after you switch from one block to another.

Should be an easy fix.

> It's OK if this initial patch doesn't do these things, but it would be great to add a FIXME somewhere documenting this, or else create a Radar that indicates that this is what should be done.
> 
> The point is that it will (a) simplify the inline asm code that the JIT will use for allocation and (b) speed things up a fair bit as we start relying on this allocator more.

> > Source/JavaScriptCore/heap/MarkStack.cpp:407
> > +                        doneCopying();
> 
> Does this mean that we call doneCopying twice?  This is for the master SlotVisitor, which you call doneCopying() for in markRoots().

Yes, I'll fix this.
Comment 6 Mark Hahnenberg 2012-01-03 17:11:23 PST
Created attachment 121022 [details]
Bencher results

Here's the benchmark results. I was seeing slightly better than this prior to the holidays (a little faster across all three averages). Not sure what's changed between then and now.
Comment 7 Filip Pizlo 2012-01-03 17:14:32 PST
(In reply to comment #6)
> Created an attachment (id=121022) [details]
> Bencher results
> 
> Here's the benchmark results. I was seeing slightly better than this prior to the holidays (a little faster across all three averages). Not sure what's changed between then and now.

You should definitely look at what's going on in access-fannkuch.  It might be an opportunity for goodness.
Comment 8 Geoffrey Garen 2012-01-04 13:26:15 PST
access-nsieve, too.
Comment 9 Mark Hahnenberg 2012-01-04 13:48:40 PST
Here are the results when running SunSpider in-browser. access-fannkuch is no longer an issue, but access-nsieve still is, along with a few others (morph and raytrace) that bencher claimed were faster...???

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.009x as fast    182.6ms +/- 0.6%   180.9ms +/- 0.6%     significant

=============================================================================

  3d:                  ??                 24.1ms +/- 1.7%    24.2ms +/- 3.6%     not conclusive: might be *1.004x as slow*
    cube:              -                   6.9ms +/- 3.3%     6.7ms +/- 8.8% 
    morph:             *1.131x as slow*    8.4ms +/- 4.4%     9.5ms +/- 5.3%     significant
    raytrace:          1.100x as fast      8.8ms +/- 3.4%     8.0ms +/- 0.0%     significant

  access:              ??                 17.4ms +/- 3.5%    17.5ms +/- 2.9%     not conclusive: might be *1.006x as slow*
    binary-trees:      -                   1.9ms +/- 11.9%     1.8ms +/- 16.7% 
    fannkuch:          -                   8.1ms +/- 2.8%     7.8ms +/- 3.9% 
    nbody:             -                   4.1ms +/- 5.5%     4.0ms +/- 0.0% 
    nsieve:            *1.182x as slow*    3.3ms +/- 10.5%     3.9ms +/- 5.8%     significant

  bitops:              ??                 15.2ms +/- 4.3%    15.6ms +/- 2.4%     not conclusive: might be *1.026x as slow*
    3bit-bits-in-byte: -                   1.0ms +/- 0.0%     1.0ms +/- 0.0% 
    bits-in-byte:      -                   5.2ms +/- 5.8%     5.1ms +/- 4.4% 
    bitwise-and:       ??                  3.2ms +/- 9.4%     3.5ms +/- 10.8%     not conclusive: might be *1.094x as slow*
    nsieve-bits:       ??                  5.8ms +/- 5.2%     6.0ms +/- 0.0%     not conclusive: might be *1.034x as slow*

  controlflow:         -                   2.3ms +/- 15.0%     2.3ms +/- 15.0% 
    recursive:         -                   2.3ms +/- 15.0%     2.3ms +/- 15.0% 

  crypto:              -                  12.8ms +/- 3.5%    12.6ms +/- 4.8% 
    aes:               1.066x as fast      8.1ms +/- 2.8%     7.6ms +/- 4.9%     significant
    md5:               ??                  2.4ms +/- 15.4%     2.6ms +/- 14.2%     not conclusive: might be *1.083x as slow*
    sha1:              ??                  2.3ms +/- 15.0%     2.4ms +/- 15.4%     not conclusive: might be *1.043x as slow*

  date:                1.057x as fast     24.2ms +/- 1.9%    22.9ms +/- 1.0%     significant
    format-tofte:      1.083x as fast     13.0ms +/- 0.0%    12.0ms +/- 0.0%     significant
    format-xparb:      -                  11.2ms +/- 4.0%    10.9ms +/- 2.1% 

  math:                -                  20.6ms +/- 2.4%    20.3ms +/- 2.4% 
    cordic:            -                   6.9ms +/- 3.3%     6.8ms +/- 4.4% 
    partial-sums:      1.009x as fast     11.0ms +/- 0.0%    10.9ms +/- 2.1%     significant
    spectral-norm:     -                   2.7ms +/- 12.8%     2.6ms +/- 14.2% 

  regexp:              -                   8.9ms +/- 2.5%     8.8ms +/- 3.4% 
    dna:               -                   8.9ms +/- 2.5%     8.8ms +/- 3.4% 

  string:              -                  57.1ms +/- 1.2%    56.7ms +/- 1.0% 
    base64:            -                   5.2ms +/- 5.8%     5.2ms +/- 5.8% 
    fasta:             -                   8.1ms +/- 2.8%     8.0ms +/- 0.0% 
    tagcloud:          -                  12.9ms +/- 1.8%    12.7ms +/- 2.7% 
    unpack-code:       -                  22.0ms +/- 3.1%    21.7ms +/- 1.6% 
    validate-input:    ??                  8.9ms +/- 2.5%     9.1ms +/- 2.5%     not conclusive: might be *1.022x as slow*
Comment 10 Geoffrey Garen 2012-01-04 15:21:15 PST
Testing in the v8 harness, this patch is a 20% regression on v8-splay (<1% speedup overall). Mark is profiling to analyze what accounts for the 20% regression. Our hypothesis is all that copying.
Comment 11 Mark Hahnenberg 2012-01-04 15:49:05 PST
Created attachment 121181 [details]
Splay benchmark profiling results

It appears that our hypothesis was correct, and it is due to all the copying.
Comment 12 Mark Hahnenberg 2012-01-04 16:06:26 PST
Created attachment 121183 [details]
Splay CPU utilization results

Just thought I'd post this screenshot to show our CPU utilization during parallel GC. Not sure if this is normal or not...
Comment 13 Filip Pizlo 2012-01-04 16:08:21 PST
(In reply to comment #12)
> Created an attachment (id=121183) [details]
> Splay CPU utilization results
> 
> Just thought I'd post this screenshot to show our CPU utilization during parallel GC. Not sure if this is normal or not...

Looks about right to me.  Is there any difference between with your patch and without?
Comment 14 Mark Hahnenberg 2012-01-04 16:22:14 PST
Created attachment 121185 [details]
Splay CPU utilization results without bump allocator

Here's what it looks like without my patch in ToT as of yesterday I believe. The two screenshots are at the same resolution in terms of milliseconds/pixel. Interesting to note that with bump allocation, GC pauses are less numerous but far longer, which if we're targeting shorter pause times this could be a very bad thing. It's rather alarming the increase in GC pause times with bump allocation as the size of the heap increases...

Also, the CPU utilization during GC with bump allocation seems a little choppy to me. It could be that I'm locking something in a too coarse-grained fashion?
Comment 15 Filip Pizlo 2012-01-04 16:25:39 PST
(In reply to comment #14)
> Created an attachment (id=121185) [details]
> Splay CPU utilization results without bump allocator
> 
> Here's what it looks like without my patch in ToT as of yesterday I believe. The two screenshots are at the same resolution in terms of milliseconds/pixel. Interesting to note that with bump allocation, GC pauses are less numerous but far longer, which if we're targeting shorter pause times this could be a very bad thing. It's rather alarming the increase in GC pause times with bump allocation as the size of the heap increases...
> 
> Also, the CPU utilization during GC with bump allocation seems a little choppy to me. It could be that I'm locking something in a too coarse-grained fashion?

It definitely could be a locking issue.

It could also be that we only have a relatively small number of objects total, and so sometimes one GC thread ends up doing a bunch of copying, which takes a while, while sitting on a worklist full of stuff to do, and other threads are sitting on empty worklists.

Is this happening with the patch you've posted up here, or a modified one?  If the latter, could I get the modified one to run a test myself?
Comment 16 Geoffrey Garen 2012-01-05 10:20:44 PST
A few comments about the patch:

(1) appendValues() and copy() should not be separate steps -- that doubles the number of loads you need to do. Visiting a bump allocated backing store should look like this:

- allocate new backing store
- for each value in old backing store:
    - if cell and not marked, append to slot visitor
    - store to new storage

(2) What, if anything, prevents two (or more) threads from trying to visit / copy the same backing store? I believe the old parallel collector allowed two (or more) threads to mark the same object, since marking is side-effect free. But copying may introduce problems.

(3) Would be good to have back of the envelope numbers on how expensive it is to thread-safely count block utilization during the copy phase. We could use those counters both for quick free and for avoiding copy in highly utilized blocks, each of these being a significant optimization.
Comment 17 Filip Pizlo 2012-01-05 11:26:23 PST
> (2) What, if anything, prevents two (or more) threads from trying to visit / copy the same backing store? I believe the old parallel collector allowed two (or more) threads to mark the same object, since marking is side-effect free. But copying may introduce problems.

That can't ever happen.  Each object gets marked once, by one thread.

> 
> (3) Would be good to have back of the envelope numbers on how expensive it is to thread-safely count block utilization during the copy phase. We could use those counters both for quick free and for avoiding copy in highly utilized blocks, each of these being a significant optimization.

Copying collectors don't normally require such optimizations to be fast.  Hence, I don't think that we should do this so early in the game.

The copying collector, as I understand it, will only know what the utilization of a block was after it had already evacuated it.  Knowing the utilization prior or during the copy would require an extra phase and extra bookkeeping.
Comment 18 Geoffrey Garen 2012-01-05 12:18:28 PST
> > (3) Would be good to have back of the envelope numbers on how expensive it is to thread-safely count block utilization during the copy phase. We could use those counters both for quick free and for avoiding copy in highly utilized blocks, each of these being a significant optimization.
> 
> Copying collectors don't normally require such optimizations to be fast.  Hence, I don't think that we should do this so early in the game.

Agreed.
Comment 19 Gavin Barraclough 2012-01-05 17:17:12 PST
(In reply to comment #4)
> Hey Mark, per our conversation please hold off & let me land a couple of quick fixes before committing this - I'll let you know when I'm done.
> 
> Thank you!

Mark feel free to go ahead and land this whenever you are ready, thank you for holding off.

The overall win here looks great, but please file bugs for any subtests that regress, so we can track that we need to investigate.

cheers,
G.
Comment 20 Mark Hahnenberg 2012-01-06 15:15:21 PST
Created attachment 121504 [details]
Patch
Comment 21 WebKit Review Bot 2012-01-06 15:16:53 PST
Attachment 121504 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/heap/Heap.h:42:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:206:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:274:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:289:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/heap/BumpSpaceInlineMethods.h:296:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 5 in 24 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 22 Geoffrey Garen 2012-01-06 15:36:32 PST
Comment on attachment 121504 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=121504&action=review

> Source/JavaScriptCore/heap/MarkStack.cpp:530
> +        memcpy(newPtr, oldPtr, jsValuesOffset);

I award you zero points, and may god have mercy on your soul!
Comment 23 Geoffrey Garen 2012-01-06 15:39:17 PST
> > +        memcpy(newPtr, oldPtr, jsValuesOffset);

Oh, wait, this is right -- it's the copy of the non-JSValue portion of the storage.

Still, Mark says his patch is crashing in testing, so the r- stands.
Comment 24 Geoffrey Garen 2012-01-06 15:45:23 PST
Comment on attachment 121504 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=121504&action=review

> Source/JavaScriptCore/heap/BumpBlock.h:47
> +    uintptr_t m_pinned;

Can this be bool?

It's nice to name booleans with "is": m_isPinned.

> Source/JavaScriptCore/heap/BumpSpace.h:116
> +    static const size_t KB = 1024;

Would be nice to share a definition for KB with MarkedBlock -- you can just move the one from MarkedBlock.h into StdLibExtras.h.

> Source/JavaScriptCore/wtf/Platform.h:1160
> +#ifndef ENABLE_JSC_BUMP_ALLOCATOR

Is this #ifdef meaningful? I don't think JSArray would work anymore if you turned it off.
Comment 25 Filip Pizlo 2012-01-06 15:49:27 PST
(In reply to comment #24)
> (From update of attachment 121504 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=121504&action=review
> 
> > Source/JavaScriptCore/heap/BumpBlock.h:47
> > +    uintptr_t m_pinned;
> 
> Can this be bool?

I think that was a hack to aline the class, so you can do ((BumpBlock*)foo) + 1.
Comment 26 Mark Hahnenberg 2012-01-06 15:50:31 PST
(In reply to comment #24)
> (From update of attachment 121504 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=121504&action=review
> 
> > Source/JavaScriptCore/heap/BumpBlock.h:47
> > +    uintptr_t m_pinned;
> 
> Can this be bool?

bool has a minimum alignment of 1 byte, and I was trying to keep the beginning of the payload aligned on an 8-byte boundary.

> > Source/JavaScriptCore/wtf/Platform.h:1160
> > +#ifndef ENABLE_JSC_BUMP_ALLOCATOR
> 
> Is this #ifdef meaningful? I don't think JSArray would work anymore if you turned it off.

I think it would still work, although I haven't tried it in a while. It defaults to using fastMalloc, although now that I think about it, I removed all of the corresponding fastFrees...I was basically trying to avoid having to worry about platforms other than OS X.
Comment 27 Gavin Barraclough 2012-01-06 16:22:06 PST
Comment on attachment 121504 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=121504&action=review

> Source/JavaScriptCore/runtime/JSArray.cpp:553
> +        storage = m_storage = reinterpret_cast_ptr<ArrayStorage*>(static_cast<char*>(newStorage) + m_indexBias * sizeof(JSValue));

The change on this line is incorrect - m_indexBias is 0 in this block, and sizeof(JSValue) is not the right value for the size of an entry.  Just revert to:
    storage = m_storage = reinterpret_cast_ptr<ArrayStorage*>(baseStorage);

>>> Source/JavaScriptCore/wtf/Platform.h:1160
>>> +#ifndef ENABLE_JSC_BUMP_ALLOCATOR
>> 
>> Is this #ifdef meaningful? I don't think JSArray would work anymore if you turned it off.
> 
> I think it would still work, although I haven't tried it in a while. It defaults to using fastMalloc, although now that I think about it, I removed all of the corresponding fastFrees...I was basically trying to avoid having to worry about platforms other than OS X.

It looks like you've deleted the code in JSArray::visitChildren that would mark the array contents if you disable this switch.
If you're going to land this with an #ifdef, I think you'd best run all tests with ENABLE_JSC_BUMP_ALLOCATOR disabled before landing.
But I'd prefer to see the ifdef removed.
Comment 28 Gavin Barraclough 2012-01-07 22:02:45 PST
There's one other change that I think it would be really good to make to this patch.  Allocation functions that return null on failure to allocate are a common cause of security bugs (when people subsequently fail to do so).  For this reason fastMalloc currently CRASHes if it fails to allocate.

For a method that might return null, we tend to use a naming scheme of prefixing 'try' – e.g. fastMalloc vs tryFastMalloc, StringImpl & JSArray's create vs tryCreate methods.  I'd suggest the names of the allocation on the bump allocator should follow this pattern.

Using a function name that indicates to callers that they should be null checking the result is a good first step, but it would be even better if we can use a type that will generate an error if the result isn't checked. tryFastMalloc does something that almost does so, but not quite.  It uses a special 'TryMallocReturnValue' type that ASSERTs that getValue has been called, but it doesn't check that the bool returned from getValue is ever tested, so it doesn't really achieve anything.  It would be great if we could make tryAllocate return the allocated memory as an out parameter, and return something like an instance of this:


class CheckedBoolean {
public:
    CheckedBoolean(bool value)
        : m_value(value)
#ifndef ASSERT_DISABLED
        , m_checked(false)
#endif
    {
    }

    ~CheckedBoolean()
    {
        ASSERT(m_checked);
    }

    operator bool()
    {
#ifndef ASSERT_DISABLED
        m_checked = true;
#endif
        return m_value;
    }

private:
    bool m_value;
#ifndef ASSERT_DISABLED
    bool m_checked;
#endif
};


This way, any code that failed to check the result of a call to tryAllocate would result in an ASSERT firing.
Comment 29 Mark Hahnenberg 2012-01-12 17:22:57 PST
Created attachment 122346 [details]
Patch
Comment 30 WebKit Review Bot 2012-01-12 17:26:46 PST
Attachment 122346 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/CMakeLists.txt', u'S..." exit_code: 1

Source/JavaScriptCore/heap/Heap.h:43:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1 in 35 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 31 Filip Pizlo 2012-01-18 16:53:14 PST
Comment on attachment 122346 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=122346&action=review

R=me, but fix the issues.

> Source/JavaScriptCore/heap/ConservativeRoots.cpp:130
> +    genericAddSpan(begin, end, hook);

You should probably chain the dfgCodeBlocks mark hook below with the PinStorageMarkHook.  Or better yet, kill the PinStorageMarkHook, and put the pin storage logic directly into genericAddScan, since we need to do this all the time anyway.

Otherwise you might get awesome fails where an array storage pointer was cached across a heap allocation.

> Source/JavaScriptCore/wtf/Platform.h:-39
> -

You should probably revert Platform.h
Comment 32 Mark Hahnenberg 2012-01-19 13:51:03 PST
Committed r105442: <http://trac.webkit.org/changeset/105442>