Bug 94712 - High memory usage spike in AssemblerBuffer
Summary: High memory usage spike in AssemblerBuffer
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jacky Jiang
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-22 08:09 PDT by Yong Li
Modified: 2012-10-18 14:23 PDT (History)
11 users (show)

See Also:


Attachments
the patch (only build & work on ARMv7 with classic JIT) (23.83 KB, patch)
2012-08-23 14:00 PDT, Yong Li
no flags Details | Formatted Diff | Diff
Patch used to see build errors (23.84 KB, patch)
2012-08-23 14:19 PDT, Yong Li
buildbot: commit-queue-
Details | Formatted Diff | Diff
Try again (14.38 KB, patch)
2012-08-24 08:53 PDT, Yong Li
no flags Details | Formatted Diff | Diff
Patch for ARMv7 (14.65 KB, patch)
2012-08-24 11:35 PDT, Yong Li
no flags Details | Formatted Diff | Diff
Patch for review (14.65 KB, patch)
2012-08-24 11:57 PDT, Yong Li
barraclough: review-
Details | Formatted Diff | Diff
How about this one? (16.43 KB, patch)
2012-08-30 14:34 PDT, Yong Li
no flags Details | Formatted Diff | Diff
Updated according to Ben's comments. also created AssemblerBufferSegmented (16.91 KB, patch)
2012-08-30 14:53 PDT, Yong Li
barraclough: review-
Details | Formatted Diff | Diff
Patch (5.99 KB, patch)
2012-10-12 15:43 PDT, Jacky Jiang
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yong Li 2012-08-22 08:09:29 PDT
The internal storage in AssemblerBuffer can be very huge. When it grows (creates a larger buffer and memcpy), it could cause high memory spike (and also hurts performance).

If it cannot use SegmentedVector, how about using OSAllocator/mmap?
Comment 1 Yong Li 2012-08-22 14:26:34 PDT
seems using SegmentedVector is doable.
Comment 2 Yong Li 2012-08-23 14:00:25 PDT
Created attachment 160235 [details]
the patch (only build & work on ARMv7 with classic JIT)
Comment 3 WebKit Review Bot 2012-08-23 14:02:26 PDT
Attachment 160235 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1
Source/WTF/wtf/SegmentedVector.h:162:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Yong Li 2012-08-23 14:05:02 PDT
Comment on attachment 160235 [details]
the patch (only build & work on ARMv7 with classic JIT)

I see InlineCapacity is too small. will try a larger one
Comment 5 Yong Li 2012-08-23 14:19:09 PDT
Created attachment 160242 [details]
Patch used to see build errors

I cannot test it on every CPU arch. Help wanted and will be appreciated!
Comment 6 Benjamin Poulain 2012-08-23 14:22:04 PDT
Comment on attachment 160242 [details]
Patch used to see build errors

View in context: https://bugs.webkit.org/attachment.cgi?id=160235&action=review

Please add WTF tests for the SegmentedVector additions.

> Source/WTF/wtf/SegmentedVector.h:-44
> -        ~SegmentedVectorIterator() { }

I think that was here so that some compiler do not generate a real function for the empty destructor.

> Source/WTF/wtf/SegmentedVector.h:72
> +            m_index += offset;
> +            m_segment += m_index / SegmentSize;
> +            m_index %= SegmentSize;

Use a temporary variable instead of abusing m_index.

> Source/WTF/wtf/SegmentedVector.h:79
> +            // Points to the "end" symbol
> +            m_segment = 0;
> +            m_index = SegmentSize;

Missing period on the comment.
Instead of duplicating the code, please consider making it an utility function.

> Source/WTF/wtf/SegmentedVector.h:112
> +        size_t indexInSegment() const { return m_index; }
> +
> +        void proceedToNextSegment()
> +        {
> +            if (++m_segment >= m_vector.m_segments.size()) {
> +                // Points to the "end" symbol
> +                m_segment = 0;
> +                m_index = SegmentSize;
> +            } else
> +                m_index = 0;
> +        }

This is weird. Isn't this the wrong abstraction?
Why not give the "copy" capability to SegmentedVector?
Comment 7 Yong Li 2012-08-23 14:39:34 PDT
(In reply to comment #6)

Not ready for review yet. But thanks for reviewing.

> (From update of attachment 160242 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=160235&action=review
> 
> Please add WTF tests for the SegmentedVector additions.
> 
> > Source/WTF/wtf/SegmentedVector.h:-44
> > -        ~SegmentedVectorIterator() { }
> 
> I think that was here so that some compiler do not generate a real function for the empty destructor.

I can recover it easily. But isn't that a compiler bug? as generating a default destructor is so fundamental in C++. If we do think this could happen, shouldn't there be a coding rule saying we have to explicitly provide a destructor for every class in WebKit? Actually there are tons of classes in WebKit that don't have dtor specified.

> 
> > Source/WTF/wtf/SegmentedVector.h:72
> > +            m_index += offset;
> > +            m_segment += m_index / SegmentSize;
> > +            m_index %= SegmentSize;
> 
> Use a temporary variable instead of abusing m_index.

OK.

> 
> > Source/WTF/wtf/SegmentedVector.h:79
> > +            // Points to the "end" symbol
> > +            m_segment = 0;
> > +            m_index = SegmentSize;
> 
> Missing period on the comment.
> Instead of duplicating the code, please consider making it an utility function.

Good call!

> 
> > Source/WTF/wtf/SegmentedVector.h:112
> > +        size_t indexInSegment() const { return m_index; }
> > +
> > +        void proceedToNextSegment()
> > +        {
> > +            if (++m_segment >= m_vector.m_segments.size()) {
> > +                // Points to the "end" symbol
> > +                m_segment = 0;
> > +                m_index = SegmentSize;
> > +            } else
> > +                m_index = 0;
> > +        }
> 
> This is weird. Isn't this the wrong abstraction?
Why?

> Why not give the "copy" capability to SegmentedVector?

Like SegmentedVector::copyTo(T* dest, size_t pos, size_t len)? yeah, probably better.
Comment 8 Build Bot 2012-08-23 14:48:51 PDT
Comment on attachment 160242 [details]
Patch used to see build errors

Attachment 160242 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13567831
Comment 9 Benjamin Poulain 2012-08-23 14:53:58 PDT
> Not ready for review yet. But thanks for reviewing.

Yep, I noticed there was no r? only after making a few comments :)

> > > Source/WTF/wtf/SegmentedVector.h:-44
> > > -        ~SegmentedVectorIterator() { }
> > 
> > I think that was here so that some compiler do not generate a real function for the empty destructor.
> 
> I can recover it easily. But isn't that a compiler bug? as generating a default destructor is so fundamental in C++. If we do think this could happen, shouldn't there be a coding rule saying we have to explicitly provide a destructor for every class in WebKit? Actually there are tons of classes in WebKit that don't have dtor specified.

Yeah, I may be wrong. It is just a vague memory from profiling.

> > > Source/WTF/wtf/SegmentedVector.h:112
> > > +        size_t indexInSegment() const { return m_index; }
> > > +
> > > +        void proceedToNextSegment()
> > > +        {
> > > +            if (++m_segment >= m_vector.m_segments.size()) {
> > > +                // Points to the "end" symbol
> > > +                m_segment = 0;
> > > +                m_index = SegmentSize;
> > > +            } else
> > > +                m_index = 0;
> > > +        }
> > 
> > This is weird. Isn't this the wrong abstraction?
> Why?

This is exposing what SegmentedVector abstract, defeating the purpose of having an adt.

> > Why not give the "copy" capability to SegmentedVector?
> 
> Like SegmentedVector::copyTo(T* dest, size_t pos, size_t len)? yeah, probably better.

Yep, that sounds good.
Comment 10 Gyuyoung Kim 2012-08-23 15:02:12 PDT
Comment on attachment 160242 [details]
Patch used to see build errors

Attachment 160242 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13567842
Comment 11 Build Bot 2012-08-23 15:05:57 PDT
Comment on attachment 160242 [details]
Patch used to see build errors

Attachment 160242 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13568822
Comment 12 Early Warning System Bot 2012-08-23 15:14:49 PDT
Comment on attachment 160242 [details]
Patch used to see build errors

Attachment 160242 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/13571784
Comment 13 Early Warning System Bot 2012-08-23 15:24:02 PDT
Comment on attachment 160242 [details]
Patch used to see build errors

Attachment 160242 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13564855
Comment 14 Yong Li 2012-08-24 07:16:37 PDT
Performance: no impact on sunpider, octane results.

Ping JSC experts: any test good for testing JIT compiling performance?
Comment 15 Yong Li 2012-08-24 08:53:03 PDT
Created attachment 160428 [details]
Try again
Comment 16 Yong Li 2012-08-24 11:35:26 PDT
Created attachment 160462 [details]
Patch for ARMv7

It needs huge code changes to other assemblers to use segmented assembler buffer. Can we just do this for ARMv7 first?
Comment 17 Yong Li 2012-08-24 11:57:11 PDT
Created attachment 160468 [details]
Patch for review

It seems inlineCapacity=2048 gives better result than 1024 on sunspider. But 4096 doesn't change. Of course the larger the better generally, but stack is also limited.
Comment 18 Yong Li 2012-08-24 15:12:25 PDT
Comment on attachment 160468 [details]
Patch for review

View in context: https://bugs.webkit.org/attachment.cgi?id=160468&action=review

> Source/WTF/wtf/SegmentedVector.h:235
> +        void getElements(T* destBuffer, size_t start = 0, size_t count = std::numeric_limits<size_t>::max()) const
> +        {
> +            ASSERT(start < m_size);

will change it to ASSERT(start <= m_size); so it won't assert when getElements(buffer, 0, 0) is called on an empty SegmentedVector
Comment 19 Benjamin Poulain 2012-08-24 15:25:51 PDT
(In reply to comment #16)
> It needs huge code changes to other assemblers to use segmented assembler buffer. Can we just do this for ARMv7 first?

I am not talking for the JSC gurus, but personally, I would really prefer not doing it for ARMv7 separately.

It is much easier to work (or even read the code) when x86 and ARM are very similar than when they are different.
Comment 20 Benjamin Poulain 2012-08-24 16:06:22 PDT
Comment on attachment 160468 [details]
Patch for review

View in context: https://bugs.webkit.org/attachment.cgi?id=160468&action=review

This generally looks like it could fly, SegmentedVector can likely be efficient enough for the few copy operations.

I am _really_ not a fan of having this only on ARM_THUMB2. Please ask about that to the JSC experts.

No review from me, especially without hard numbers, but here are a few comments:

> Source/WTF/wtf/SegmentedVector.h:173
> +            m_size += length;

ASSERT for overflow would be nice.

> Source/WTF/wtf/SegmentedVector.h:177
> +                if (segmentIndex == m_segments.size())
> +                    m_segments.append(new Segment);

I would do this upfront before while (length) to avoid executing the branch for every iteration when it is not needed.

>> Source/WTF/wtf/SegmentedVector.h:235
>> +            ASSERT(start < m_size);
> 
> will change it to ASSERT(start <= m_size); so it won't assert when getElements(buffer, 0, 0) is called on an empty SegmentedVector

> will change it to ASSERT(start <= m_size); so it won't assert when getElements(buffer, 0, 0) is called on an empty SegmentedVector

Maybe:
ASSERT(start < m_size || (!start && !m_size));
Because "start < m_size" seems better if m_size > 0.

> Source/WTF/wtf/SegmentedVector.h:243
> +            size_t segment = start / SegmentSize;

segment -> segmentIndex

Similarily to subscriptFor().,I am tempted to have a 
size_t segmentIndexFor(absoluteIndex) {
    return absoluteIndex / SegmentSize;
}

> Source/WTF/wtf/SegmentedVector.h:275
> +        {

Add ASSERT(index < m_size); ?
Comment 21 Gavin Barraclough 2012-08-24 19:12:45 PDT
Comment on attachment 160468 [details]
Patch for review

Looks like a really great idea in principle, Benjamin has some good review comments, and is right – it doesn't make any sense to tie this to ARMv7, we want a common infrastructure across platforms.
Comment 22 Yong Li 2012-08-27 07:11:59 PDT
(In reply to comment #19)
> I am not talking for the JSC gurus, but personally, I would really prefer not doing it for ARMv7 separately.
> 
> It is much easier to work (or even read the code) when x86 and ARM are very similar than when they are different.

(In reply to comment #21)
> (From update of attachment 160468 [details])
> Looks like a really great idea in principle, Benjamin has some good review comments, and is right – it doesn't make any sense to tie this to ARMv7, we want a common infrastructure across platforms.

I also think it should be used by all assemblers. but I don't have dev environments for all of them, neither have enough time to do so. So my proposal is make ARMv7 the first step, and it is up to every assembler's maintainer to make them use new ArrayBuffer. Then finally we can get rid of the old one. Would that be OK?
Comment 23 Benjamin Poulain 2012-08-27 13:29:58 PDT
> I also think it should be used by all assemblers. but I don't have dev environments for all of them, neither have enough time to do so. So my proposal is make ARMv7 the first step, and it is up to every assembler's maintainer to make them use new ArrayBuffer. Then finally we can get rid of the old one. Would that be OK?

JSC does not really work like that.

That would be a mess if every architecture started doing their own thing.

You do not need to be able to run all the dev environment. The EWS bots can help you to some extent in verifying everything.
Comment 24 Yong Li 2012-08-27 14:07:44 PDT
(In reply to comment #23)
> > I also think it should be used by all assemblers. but I don't have dev environments for all of them, neither have enough time to do so. So my proposal is make ARMv7 the first step, and it is up to every assembler's maintainer to make them use new ArrayBuffer. Then finally we can get rid of the old one. Would that be OK?
> 
> JSC does not really work like that.
> 
> That would be a mess if every architecture started doing their own thing.
> 
> You do not need to be able to run all the dev environment. The EWS bots can help you to some extent in verifying everything.

I tried making X86Assmebler work with the new AssemblerBuffer. But there are too many dependencies on the flat consecutive buffer. I guess probably more than half of the source code will be changed. And to be honest, I'm not sure using segmented buffer is good for all assemblers. I noticed X86Assmebler makes a lot of calls to putIntegralUnchecked, and this was supposed to give performance boost on date/time tests. 

Speaking of assemblers, there is already another AssemblerBuffer for ARM (traditional): AssemblerBufferWithConstantPool. AssemblerBuffer used by different assemblers are not always same. So to be neat, I'm considering to introduce AssemblerBufferSegmented (a brother of AssemblerBuffer and AssemblerBufferWithConstantPool), and it is only used for ARMv7Assembler and MacroAssemblerARMv7. Objection?
Comment 25 Gavin Barraclough 2012-08-27 16:52:13 PDT
> I tried making X86Assmebler work with the new AssemblerBuffer. But there are too many dependencies on the flat consecutive buffer. I guess probably more than half of the source code will be changed.

I'd suggest there is probably a really quick fix you could try here.  In the X86Assembler, do two quick find & replaces to remove the following:
    "m_buffer.ensureSpace(maxInstructionSize);" -> ""
    "Unchecked" -> ""
This will add a few extra bounds checks, but the benefit from your patch should more than balance this out.

> Speaking of assemblers, there is already another AssemblerBuffer for ARM (traditional): AssemblerBufferWithConstantPool.

No, ARM doesn't use a different AssemblerBuffer – if you look you'll see that AssemblerBufferWithConstantPool is actually a subclass of AssemblerBuffer – all assemblers use the same implementation of AssemblerBuffer, we just happen to use inheritance as a part of how we structure the code to share the constant pool used by ARM traditional & SH4.
Comment 26 Yong Li 2012-08-28 07:43:46 PDT
(In reply to comment #25)
> > I tried making X86Assmebler work with the new AssemblerBuffer. But there are too many dependencies on the flat consecutive buffer. I guess probably more than half of the source code will be changed.
> 
> I'd suggest there is probably a really quick fix you could try here.  In the X86Assembler, do two quick find & replaces to remove the following:
>     "m_buffer.ensureSpace(maxInstructionSize);" -> ""
>     "Unchecked" -> ""
> This will add a few extra bounds checks, but the benefit from your patch should more than balance this out.
> 

That's what I've tried. But there are many places it directly works on raw addresses in the buffer... I'm afraid solving all of this ends up with performance penalty... Also X86 system usually has a lot of memory (vm space) to consume... I would prefer OSAllocator/mmap to segmented vector on x86.

> > Speaking of assemblers, there is already another AssemblerBuffer for ARM (traditional): AssemblerBufferWithConstantPool.
> 
> No, ARM doesn't use a different AssemblerBuffer – if you look you'll see that AssemblerBufferWithConstantPool is actually a subclass of AssemblerBuffer – all assemblers use the same implementation of AssemblerBuffer, we just happen to use inheritance as a part of how we structure the code to share the constant pool used by ARM traditional & SH4.

We have different assembler/MacroAssembler anyway. And there is #ifdef even in LinkBuffer.cpp. So a specific assembler using a different buffer internally doesn't look so bad to me. But probably it would be better to change buffer() function to getData() so the buffer is fully hidden in assembler implementation.
Comment 27 Yong Li 2012-08-30 14:34:25 PDT
Created attachment 161559 [details]
How about this one?
Comment 28 Yong Li 2012-08-30 14:53:56 PDT
Created attachment 161564 [details]
Updated according to Ben's comments. also created AssemblerBufferSegmented
Comment 29 WebKit Review Bot 2012-08-30 14:57:03 PDT
Attachment 161564 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1
Source/JavaScriptCore/assembler/AssemblerBufferSegmented.h:37:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 30 Benjamin Poulain 2012-08-30 20:48:38 PDT
Comment on attachment 161564 [details]
Updated according to Ben's comments. also created AssemblerBufferSegmented

View in context: https://bugs.webkit.org/attachment.cgi?id=161564&action=review

I am too tired to review this now. I think this looks very promising but it needs a lot more polishing.

Here are some quick comments.

> Source/JavaScriptCore/ChangeLog:13
> +        It is seen AssemberBuffer can use 20MB+ memory on Google's Octane
> +        page. When the buffer grows, it allocates a new memory block and makes a copy,
> +        which results in a huge memory spike. This patch makes AssemblerBuffer use
> +        a SegmentedBuffer instead, and this can reduce memory spikes and also avoid
> +        copying existing data when buffer grows.

This is a performance patch, you should have a performance analysis paragraph in your changelog.
1) You need to determine if it causes changes in the common benchmarks.
2) You need to gather numbers on how much memory improvement the patch makes. For example measuring the memory before-after patch with Sunspider and Octane.

Ideally, you should also have an analysis for the SegmentSize and InlineCapacity. It can be as simple as: "Running all the benchmarks, the average is XX segments and max at YY segments".

I haven't looked at the x86 assembler, and it might be obvious why this patch does not generalize to other assemblers.
Unless it is obviously wrong, I think you should do the implementation and measure the performance on x86_64. So far, I have only read very weak arguments and no numbers.

Remember this patch is touching important code, you want to make sure this is a net improvement.

> Source/JavaScriptCore/assembler/AssemblerBufferSegmented.h:38
> +        static const int inlineCapacity = 2048;

This does not do what you think it does. You are defining the Segment size. In:
    SegmentedVector<char, inlineCapacity> m_storage;
The first argument is the SegmentSize, the second is the inline capacity.

The inline capacity of SegmentedVector is for the vector listing the segments. 2048 would be huge.

For the segment size, chances are you want to match the page size of your allocator to avoid wasting memory. 
For fastMalloc, I would expose kPageSize and make sure sizeof(Segment) == kPageSize.

In this case, 2048 for the segment size seems not optimal for any allocator because the vector is gonna need the size for "size_t m_size;" in addition to the vector.
Comment 31 Yong Li 2012-08-31 07:36:34 PDT
(In reply to comment #30)
> (From update of attachment 161564 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=161564&action=review
> 
> I am too tired to review this now. I think this looks very promising but it needs a lot more polishing.
> 
> Here are some quick comments.
> 
> > Source/JavaScriptCore/ChangeLog:13
> > +        It is seen AssemberBuffer can use 20MB+ memory on Google's Octane
> > +        page. When the buffer grows, it allocates a new memory block and makes a copy,
> > +        which results in a huge memory spike. This patch makes AssemblerBuffer use
> > +        a SegmentedBuffer instead, and this can reduce memory spikes and also avoid
> > +        copying existing data when buffer grows.
> 
> This is a performance patch, you should have a performance analysis paragraph in your changelog.
> 1) You need to determine if it causes changes in the common benchmarks.
> 2) You need to gather numbers on how much memory improvement the patch makes. For example measuring the memory before-after patch with Sunspider and Octane.
> 
> Ideally, you should also have an analysis for the SegmentSize and InlineCapacity. It can be as simple as: "Running all the benchmarks, the average is XX segments and max at YY segments".
> 
> I haven't looked at the x86 assembler, and it might be obvious why this patch does not generalize to other assemblers.
> Unless it is obviously wrong, I think you should do the implementation and measure the performance on x86_64. So far, I have only read very weak arguments and no numbers.
> 
> Remember this patch is touching important code, you want to make sure this is a net improvement.
> 
> > Source/JavaScriptCore/assembler/AssemblerBufferSegmented.h:38
> > +        static const int inlineCapacity = 2048;
> 
> This does not do what you think it does. You are defining the Segment size. In:
>     SegmentedVector<char, inlineCapacity> m_storage;
> The first argument is the SegmentSize, the second is the inline capacity.
> 
> The inline capacity of SegmentedVector is for the vector listing the segments. 2048 would be huge.
> 
> For the segment size, chances are you want to match the page size of your allocator to avoid wasting memory. 
> For fastMalloc, I would expose kPageSize and make sure sizeof(Segment) == kPageSize.
> 
> In this case, 2048 for the segment size seems not optimal for any allocator because the vector is gonna need the size for "size_t m_size;" in addition to the vector.

The purpose of this patch is not to boost performance, but to reduce memory spike. I've noticed the buffer size can hit ~24MB. When it grows, it can run out-of-memory on a mobile device. If I was working on any x86 port, I would try using PageReservation to reserve a huge chunk of virtual memory (as RegisterFile does), and probably I would write another vector class that uses OSAllocator rather than malloc, so that it would keep the buffer consecutive, but avoid creating new buffers when it grows. But I don't think I should be responsible to implement this or test on x86 port as I'm not working on any x86 port.

That said, this is definitely a change that can affect performance. During my test, it gives 1-2% boost on sunspider when the segment size is 2048. 4096 doesn't give better result. I cannot see any impact on Octane. I can do more testing for sure.

I think we should always use a power of 2 as segment size because operations like (index / segmentSize) and (index % segmentSize) are usually very expensive, but if segmentSize is power of 2, compiler can optimize them with bit operations.

Speaking of best block size for memory allocation, even pageSize is probably not the best size for malloc, depending on the implementation, the malloc may need extra space for block header. Using OSAllocator to allocate pages should be the best solution. But that would need too much change. We could think about something like PageSegmentedVector and PageVector...
Comment 32 Gavin Barraclough 2012-08-31 11:51:24 PDT
Comment on attachment 161564 [details]
Updated according to Ben's comments. also created AssemblerBufferSegmented

We work very hard to keep the JIT cross platform.  This change is great, but we are concerned about peak memory usage on x86 too, there is absolutely no justification to fork the AssemblerBuffer here, and to do so will detrimental to the long term maintainability of the project.  If I have time I'll try to look at what the problem is you're having on x86, and see if I am able to give you any further advice to to try to take this patch to completion as efficiently as possible.
Comment 33 Yong Li 2012-08-31 12:45:41 PDT
(In reply to comment #32)
> (From update of attachment 161564 [details])
> We work very hard to keep the JIT cross platform.  This change is great, but we are concerned about peak memory usage on x86 too, there is absolutely no justification to fork the AssemblerBuffer here, and to do so will detrimental to the long term maintainability of the project.  If I have time I'll try to look at what the problem is you're having on x86, and see if I am able to give you any further advice to to try to take this patch to completion as efficiently as possible.

I understand JIT should be cross-platform as long as possible. But we still have platform-specific code when needed, like ENABLE(BRANCH_COMPACTION) in LinkBuffer.cpp, which is actually only used by CPU(ARM_THUMB2). The most important thing is unlike LinkBuffer, AssemberBuffer is even below the platform-specific layer, and far way from the cross-platform part in JIT.

1-Cross-platform code
2-MacroAssember<CPU>
3-<CPU>Assmbler
4-AssemblerBuffer

Which AssemberBuffer to use in a specific assembler can be (if not now) made totally transparent to higher layer. Also I am not saying we cannot share the same AssemblerBuffer between assemblers in longer term. But why cannot we make other assemblers follow up in subsequent tasks?

That said, I'm also considering another solution, using OSAllocator, which will keep the buffer consecutive as it is, so work with all assemblers. However I'm not sure every platform is OK to reserve a huge virtual memory space, let's say 32MB. Any idea please?
Comment 34 Yong Li 2012-10-02 13:51:05 PDT
New plan: 

1. Make it have an inline buffer (2048 bytes for example)
2. When inline buffer is not enough, Reserve 32MB virtual memory as the new buffer, and copy the data from inline buffer to the new buffer. Commit physical memory only when necessary.
3. When reserved virtual memory is still not enough, allocate another one with size doubled. (or fail?)

Objection?
Comment 35 Jacky Jiang 2012-10-12 15:43:13 PDT
Created attachment 168499 [details]
Patch
Comment 36 Jacky Jiang 2012-10-12 15:44:58 PDT
The patch is based on the new solution on Comment #34 suggested by Yong Li
Comment 37 Jacky Jiang 2012-10-12 15:57:10 PDT
Comment on attachment 161564 [details]
Updated according to Ben's comments. also created AssemblerBufferSegmented

Deprecate the old patch
Comment 38 Yong Li 2012-10-15 13:41:00 PDT
Comment on attachment 168499 [details]
Patch

Basically LGTM. Any objection?
Comment 39 Benjamin Poulain 2012-10-15 14:07:39 PDT
(In reply to comment #38)
> (From update of attachment 168499 [details])
> Basically LGTM. Any objection?

One of the JSC guys should review this.
Personally I won't have time to test the implementation before Wednesday.
Comment 40 Yong Li 2012-10-15 14:14:52 PDT
(In reply to comment #39)
> (In reply to comment #38)
> > (From update of attachment 168499 [details] [details])
> > Basically LGTM. Any objection?
> 
> One of the JSC guys should review this.
> Personally I won't have time to test the implementation before Wednesday.

Sure. It is not an urgent thing anyway. I would like to wait and see test results on other platforms.
Comment 41 Filip Pizlo 2012-10-15 14:19:46 PDT
Comment on attachment 168499 [details]
Patch

Which part of Octane causes the spikes?

Are you testing with LLInt on or off?
Comment 42 Yong Li 2012-10-16 07:47:21 PDT
(In reply to comment #41)
> (From update of attachment 168499 [details])
> Which part of Octane causes the spikes?
> 
> Are you testing with LLInt on or off?

Actually this issue was found with classic JIT. Jacky found that, when using DFG JIT, the maximum memory usage of AssemblerBuffer is less than 2MB through the whole Octane test. Given that every ports will use DFG JIT sooner or later, or is already using DFG JIT, it is only performance concern now. The number 32MB looks way too big... I'll clear the review request until we get more perf numbers.
Comment 43 Yong Li 2012-10-18 14:23:58 PDT
original issue is obsolete. no noticeable performance gain.