Bug 33178 - Segmented SharedBuffer: SharedBuffer doesn't have to be a flat memory block
Summary: Segmented SharedBuffer: SharedBuffer doesn't have to be a flat memory block
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-04 12:49 PST by Yong Li
Modified: 2010-01-05 10:11 PST (History)
4 users (show)

See Also:


Attachments
(1) Make SharedBuffer use segments internally (12.04 KB, patch)
2010-01-04 12:56 PST, Yong Li
darin: review-
Details | Formatted Diff | Diff
(1) make SharedBuffer use segments internally (12.05 KB, patch)
2010-01-04 14:34 PST, Yong Li
no flags Details | Formatted Diff | Diff
(2) make PNG image decoder read from segmented SharedBuffer (7.72 KB, patch)
2010-01-05 07:31 PST, Yong Li
no flags Details | Formatted Diff | Diff
Fix a bug in the first commit (introduced when fixing coding style) (1.49 KB, patch)
2010-01-05 07:41 PST, Yong Li
darin: review+
Details | Formatted Diff | Diff
fix build warnings (1.37 KB, patch)
2010-01-05 08:41 PST, Yong Li
aroben: review+
Details | Formatted Diff | Diff
fix build warning by change append(char*, int) to append(char*, unsigned) (1.83 KB, patch)
2010-01-05 09:00 PST, Yong Li
no flags Details | Formatted Diff | Diff
fix build warnings (2.65 KB, patch)
2010-01-05 09:32 PST, Yong Li
aroben: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yong Li 2010-01-04 12:49:21 PST
Currently, SharedBuffer uses Vector<char> as the internal buffer, and stores all resource data in the flat vector. When more resource data comes and the buffer size is not big enough, SharedBuffer will allocate a new buffer, and copy all bytes to the new buffer.

The purpose of this bug report is to allow SharedBuffer to use a group of memory segments to store resource data, and also enable image/text decoders to decode from segmented source data. This will reduce peak memory usage and also avoid requests for huge memory blocks. For desktop browsers, it is probably not that helpful, but from my testing, it seems not hurting the performance. I tested segmented SharedBuffer with modified text decoders, jpeg decoder and png decoder against iBench using Safari win32, and the result is segmented SharedBuffer is roughly 1% faster.

Patches are coming.
Comment 1 Yong Li 2010-01-04 12:56:36 PST
Created attachment 45821 [details]
(1) Make SharedBuffer use segments internally
Comment 2 WebKit Review Bot 2010-01-04 13:04:48 PST
Attachment 45821 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/platform/cf/SharedBufferCF.cpp:79:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 1
Comment 3 Darin Adler 2010-01-04 13:14:28 PST
Comment on attachment 45821 [details]
(1) Make SharedBuffer use segments internally

Looks OK. But with no clients using the getSomeData function, it seems like it's scaffolding for some future work and functions like getSomeData are not tested. And this change will make things less efficient until the major clients adopt.

Do you know what adding this temporary extra work does to page loading time? I don't want to get stuck with that.

This patch also changes behavior when running out of memory. The new code will abort when running out of memory, whereas the old code would drop data. I think that probably should be changed in a separate patch rather than as a side effect of this one.

> +#define SEGMENT_SIZE        (0x1000)
> +#define SEGMENT_POS_MASK    (0x0FFF)
> +#define GET_SEGMENT(position) ((position) / SEGMENT_SIZE)
> +#define GET_OFF_IN_SEGMENT(globalPosition) ((globalPosition) & SEGMENT_POS_MASK)

We use C++ constants and inline functions rather than macros for this sort of thing.

> +    unsigned posInSegment = GET_OFF_IN_SEGMENT(m_size - m_buffer.size());

We try to avoid abbreviations. The existing code uses "len" but should use "length". And here we would use "position" or "offset" rather than "pos" or "off".

> +    unsigned toCopy = len < segmentFreeSpace ? len : segmentFreeSpace;

It's better to use a noun or adjective for this sort of variable rather than the preposition "to". For example, "bytesToCopy" is probably a good name.

> +    RefPtr<SharedBuffer> clone(adoptRef(new SharedBuffer()));

No need for the parentheses here after SharedBuffer.

> +    clone->m_size = m_size;
> +    clone->m_buffer.append(m_buffer.data(), m_buffer.size());
> +    clone->m_segments.reserveCapacity(m_segments.size());
> +    for (Vector<char*>::const_iterator i = m_segments.begin(); i != m_segments.end(); ++i) {
> +        char* segment = allocateSegment();
> +        memcpy(segment, *i, SEGMENT_SIZE);
> +        clone->m_segments.append(segment);
> +    }
> +    return clone;

Why is it a good idea to clone all the segments into separate new segments rather than making a single large segment?

> +        char* dest = m_buffer.data() + bufferSize;

Again, avoid abbreviation if possible.

> +        unsigned left = m_size - bufferSize;

The word "left" is a strange name for this. Maybe bytesLeft would be OK.

> +        for (Vector<char*>::const_iterator i = m_segments.begin(); i != m_segments.end(); ++i) {

We normally use array-style index iteration for vectors. The iterators are just there for things like STL algorithms.

> +unsigned SharedBuffer::getSomeData(const char** someData, unsigned pos) const
> +{
> +    if (hasPlatformData() || m_purgeableBuffer) {
> +        *someData = data();
> +        return size();
> +    }

The above code seems clearly wrong when pos is non-zero.

> +    // Calling this function will force internal segmented buffers
> +    // to be merged into a flat buffer. Use getSomeData() whenever possible
> +    // for better performance.
>      const char* data() const;
>      unsigned size() const;

This comment comes before a paragraph of two functions. Please add a blank line.

> +    // Calling this function will force internal segmented buffers
> +    // to be merged into a flat buffer. Use getSomeData() whenever possible
> +    // for better performance.
> +    const Vector<char> &buffer() const;

You should change this to "const Vector<char>& buffer()" -- the existing code was formatted incorrectly.

> +    // Return the flat data size after "pos". "*data" will be the address
> +    // Return 0 when no more data left.

I suggest using "const char*&" for data instead of "const char**".

I don't think the term "flat data size" is clear.

> +    unsigned getSegment(unsigned position) const;
> +    unsigned getOffsetInSegment(unsigned globalPosition) const;

WebKit code normally does not use "get" in the names of functions that have return values. The above functions should be named segmentIndex() and offsetInSegment().

review- because of multiple issues above.
Comment 4 Yong Li 2010-01-04 13:17:25 PST
(In reply to comment #2)
> Attachment 45821 [details] did not pass style-queue:
> 
> Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
> WebCore/platform/cf/SharedBufferCF.cpp:79:  Tests for true/false,
> null/non-null, and zero/non-zero should all be done without equality
> comparisons.  [readability/comparison_to_zero] [5]
> Total errors found: 1

-    ASSERT(m_buffer.size() == 0);
+    ASSERT(m_size == 0);

should be ok?
Comment 5 Darin Adler 2010-01-04 13:20:19 PST
(In reply to comment #4)
> -    ASSERT(m_buffer.size() == 0);
> +    ASSERT(m_size == 0);
> 
> should be ok?

No, the old code had wrong style. Should be ASSERT(!m_size).
Comment 6 Yong Li 2010-01-04 13:29:14 PST
(In reply to comment #3)
> (From update of attachment 45821 [details])
> Looks OK. But with no clients using the getSomeData function, it seems like
> it's scaffolding for some future work and functions like getSomeData are not
> tested. And this change will make things less efficient until the major clients
> adopt.

I have 3 more patches that make all script decoders, jpeg decoder and png decoder support decoding from segmented SharedBuffer. Will post them soon.

> 
> Do you know what adding this temporary extra work does to page loading time? I
> don't want to get stuck with that.

Probably it's because it doesn't need extra memory copying. But I'm not sure. I did the test just because I want to make sure it doesn't slow down desktop browsers.

> 
> This patch also changes behavior when running out of memory. The new code will
> abort when running out of memory, whereas the old code would drop data. I think
> that probably should be changed in a separate patch rather than as a side
> effect of this one.

The current patch calls fastMalloc and I remember fastMalloc calls abort() upon OOM already. In the future, the allocation of segments can be done in other ways for better performance. We could directly allocate a page of virtual memory although I'm not sure if it helps.


Thanks for all suggestions and corrections! I'll modify the code and update the patch soon
Comment 7 Darin Adler 2010-01-04 13:33:26 PST
(In reply to comment #6)
> (In reply to comment #3)
> > (From update of attachment 45821 [details] [details])
> > Looks OK. But with no clients using the getSomeData function, it seems like
> > it's scaffolding for some future work and functions like getSomeData are not
> > tested. And this change will make things less efficient until the major clients
> > adopt.
> 
> I have 3 more patches that make all script decoders, jpeg decoder and png
> decoder support decoding from segmented SharedBuffer. Will post them soon.

I think that means that after your work image decoding on Mac OS X, iPhone, and some other platforms will be less efficient than it is today, because the image decoder in question is not a script decoder, JPEG decoder, or PNG decoder.

> > Do you know what adding this temporary extra work does to page loading time? I
> > don't want to get stuck with that.
> 
> Probably it's because it doesn't need extra memory copying. But I'm not sure. I
> did the test just because I want to make sure it doesn't slow down desktop
> browsers.

Sorry, I don't understand. This patch will allocate separate segments and then merge them later, which will make things slower for clients that use it the old way. That seems like it will make things slower and will require allocating two copies of everything at one point.

You say "did the test". What test did you do?

> > This patch also changes behavior when running out of memory. The new code will
> > abort when running out of memory, whereas the old code would drop data. I think
> > that probably should be changed in a separate patch rather than as a side
> > effect of this one.
> 
> The current patch calls fastMalloc and I remember fastMalloc calls abort() upon
> OOM already. In the future, the allocation of segments can be done in other
> ways for better performance. We could directly allocate a page of virtual
> memory although I'm not sure if it helps.

Yes, that's just the problem. The old code calls Vector::append, which does not call abort. There is tryFastMalloc, which does not call abort. You could use that and keep the semantics the same. Or you could change the semantics if you think there is a good reason.
Comment 8 Yong Li 2010-01-04 14:27:05 PST
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #3)
> > > (From update of attachment 45821 [details] [details] [details])
> > > Looks OK. But with no clients using the getSomeData function, it seems like
> > > it's scaffolding for some future work and functions like getSomeData are not
> > > tested. And this change will make things less efficient until the major clients
> > > adopt.
> > 
> > I have 3 more patches that make all script decoders, jpeg decoder and png
> > decoder support decoding from segmented SharedBuffer. Will post them soon.
> 
> I think that means that after your work image decoding on Mac OS X, iPhone, and
> some other platforms will be less efficient than it is today, because the image
> decoder in question is not a script decoder, JPEG decoder, or PNG decoder.
> 
Still not sure why it will less efficient. Even just merging into a flat consecutive buffer at last can still save memory copy times. Also it shouldn't be difficult to make image decoding work on segments, because the resource from network comes in fragments, and a browser should always be able to decode incomplete image source and be able to continue unfinished decoding job when more data comes.

> > > Do you know what adding this temporary extra work does to page loading time? I
> > > don't want to get stuck with that.
> > 
> > Probably it's because it doesn't need extra memory copying. But I'm not sure. I
> > did the test just because I want to make sure it doesn't slow down desktop
> > browsers.
> 
> Sorry, I don't understand. This patch will allocate separate segments and then
> merge them later, which will make things slower for clients that use it the old
> way. That seems like it will make things slower and will require allocating two
> copies of everything at one point.
> 
> You say "did the test". What test did you do?
> 
Even with the old solution, when the buffer grows, it has to allocate 2 copies of everything, and copies all bytes from one to the other.

The test I did is with other patches that make many decoders support segmented decoding. And more details: 
1. run HTML loading test in iBench once and ignore the result.
2. run HTML loading test in iBench for 3 times continously, and record the results.
I know the resources are actually loaded from disk cache. But in this way, the results are very stable. When I "reset Safari" every time before running iBench, I got very unstable and useless results.

> > > This patch also changes behavior when running out of memory. The new code will
> > > abort when running out of memory, whereas the old code would drop data. I think
> > > that probably should be changed in a separate patch rather than as a side
> > > effect of this one.
> > 
> > The current patch calls fastMalloc and I remember fastMalloc calls abort() upon
> > OOM already. In the future, the allocation of segments can be done in other
> > ways for better performance. We could directly allocate a page of virtual
> > memory although I'm not sure if it helps.
> 
> Yes, that's just the problem. The old code calls Vector::append, which does not
> call abort. There is tryFastMalloc, which does not call abort. You could use
> that and keep the semantics the same. Or you could change the semantics if you
> think there is a good reason.

But Vector doesn't call tryFastMalloc, instead, it calls fastMalloc in allocateBuffer(). Have I missed anything?
Comment 9 Yong Li 2010-01-04 14:34:27 PST
Created attachment 45830 [details]
(1) make SharedBuffer use segments internally

Fixed style problems and bugs that Darin Adler pointed out.
Comment 10 WebKit Review Bot 2010-01-04 14:43:38 PST
Attachment 45830 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/platform/cf/SharedBufferCF.cpp:79:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 1
Comment 11 Yong Li 2010-01-04 15:02:48 PST
(In reply to comment #10)
> Attachment 45830 [details] did not pass style-queue:
> 
> Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
> WebCore/platform/cf/SharedBufferCF.cpp:79:  Tests for true/false,
> null/non-null, and zero/non-zero should all be done without equality
> comparisons.  [readability/comparison_to_zero] [5]
> Total errors found: 1

oops. forgot this one
Comment 12 Darin Adler 2010-01-04 15:52:01 PST
(In reply to comment #8)
> Still not sure why it will less efficient. Even just merging into a flat
> consecutive buffer at last can still save memory copy times.

OK.

> Also it shouldn't
> be difficult to make image decoding work on segments, because the resource from
> network comes in fragments, and a browser should always be able to decode
> incomplete image source and be able to continue unfinished decoding job when
> more data comes.

Yes, I agree. We just have to do it! And in the mean time I don't want things to be worse than before. You've convinced me.

> But Vector doesn't call tryFastMalloc, instead, it calls fastMalloc in
> allocateBuffer(). Have I missed anything?

You're right!

I am puzzled, now, by all the places in Vector's code where it says:

    if (!begin())
        return;

I thought those were out-of-memory checks.
Comment 13 Darin Adler 2010-01-04 15:54:05 PST
Comment on attachment 45830 [details]
(1) make SharedBuffer use segments internally

> +static unsigned segmentIndex(unsigned position)
> +{
> +    return position / segmentSize;
> +}
> +
> +static unsigned offsetInSegment(unsigned position)
> +{
> +    return position & segmentPositionMask;
> +}

Should probably say "inline" in both of these.

> +unsigned SharedBuffer::getSomeData(const char*& someData, unsigned pos) const

Would prefer "position" or "offset" to "pos".

r=me
Comment 14 Yong Li 2010-01-05 06:51:37 PST
(In reply to comment #13)
> (From update of attachment 45830 [details])
> > +static unsigned segmentIndex(unsigned position)
> > +{
> > +    return position / segmentSize;
> > +}
> > +
> > +static unsigned offsetInSegment(unsigned position)
> > +{
> > +    return position & segmentPositionMask;
> > +}
> 
> Should probably say "inline" in both of these.
> 
> > +unsigned SharedBuffer::getSomeData(const char*& someData, unsigned pos) const
> 
> Would prefer "position" or "offset" to "pos".
> 
> r=me

Thanks! I'll fix these before committing it
Comment 15 Yong Li 2010-01-05 07:26:34 PST
Comment on attachment 45830 [details]
(1) make SharedBuffer use segments internally

committed @ r52795

clear flags
Comment 16 Yong Li 2010-01-05 07:31:45 PST
Created attachment 45889 [details]
(2) make PNG image decoder read from segmented SharedBuffer
Comment 17 WebKit Review Bot 2010-01-05 07:34:57 PST
style-queue ran check-webkit-style on attachment 45889 [details] without any errors.
Comment 18 WebKit Review Bot 2010-01-05 07:40:28 PST
Attachment 45889 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/164099
Comment 19 Yong Li 2010-01-05 07:41:05 PST
Created attachment 45890 [details]
Fix a bug in the first commit (introduced when fixing coding style)

also include another coding style fix
Comment 20 Darin Adler 2010-01-05 07:42:53 PST
I don’t think you should do all the adoption patches in a single bug. It’s confusing.
Comment 21 Yong Li 2010-01-05 07:58:15 PST
(In reply to comment #20)
> I don’t think you should do all the adoption patches in a single bug. It’s
> confusing.

OK. I'll raise new bugs
Comment 22 Darin Adler 2010-01-05 08:36:39 PST
> unsigned bytesToCopy = static_cast<unsigned>(length) < segmentFreeSpace ? static_cast<unsigned>(length) : segmentFreeSpace;

A more elegant way to write that is:

    unsigned bytesToCopy = min<unsigned>(length, segmentFreeSpace);

That gets rid of the casts too. In general, the min function is helpful for these "how much more data to get out of a buffer" sites.
Comment 23 Yong Li 2010-01-05 08:41:25 PST
Created attachment 45894 [details]
fix build warnings
Comment 24 Adam Roben (:aroben) 2010-01-05 08:42:49 PST
Comment on attachment 45894 [details]
fix build warnings

r=me
Comment 25 Darin Adler 2010-01-05 08:43:58 PST
It seems clear to me that we should change the type of the argument to SharedBuffer::append to unsigned. That will affect more files than this, but we don't want all these typecasts.

I am very surprised you need that cast to int, though.
Comment 26 Yong Li 2010-01-05 08:46:31 PST
(In reply to comment #25)
> It seems clear to me that we should change the type of the argument to
> SharedBuffer::append to unsigned. That will affect more files than this, but we
> don't want all these typecasts.
> 
> I am very surprised you need that cast to int, though.

I was worried about having more warning by changing the type of the argument. but just realized that int->unsigned won't give warning. will do that way
Comment 27 Darin Adler 2010-01-05 08:50:10 PST
(In reply to comment #26)
> I was worried about having more warning by changing the type of the argument.
> but just realized that int->unsigned won't give warning. will do that way

Yes, I understand that you are worried, but you should not be! We can’t program based on fear ;-)
Comment 28 Yong Li 2010-01-05 09:00:47 PST
Created attachment 45895 [details]
fix build warning by change append(char*, int) to append(char*, unsigned)
Comment 29 WebKit Review Bot 2010-01-05 09:04:03 PST
style-queue ran check-webkit-style on attachment 45895 [details] without any errors.
Comment 30 Yong Li 2010-01-05 09:32:05 PST
Created attachment 45900 [details]
fix build warnings

fix warnings and also replace (A < B ? A : B) with std::min
Comment 31 Adam Roben (:aroben) 2010-01-05 09:35:14 PST
Comment on attachment 45900 [details]
fix build warnings

> -    unsigned bytesToCopy = static_cast<unsigned>(length) < segmentFreeSpace ? static_cast<unsigned>(length) : segmentFreeSpace;
> +    unsigned bytesToCopy = std::min(length, segmentFreeSpace);

Our style is to put "using namespace std;" at the top of the file and then omit the "std::" qualifier.

Other than that, this looks good.

r=me if you fix that before committing.
Comment 32 Yong Li 2010-01-05 09:42:59 PST
(In reply to comment #31)
> (From update of attachment 45900 [details])
> > -    unsigned bytesToCopy = static_cast<unsigned>(length) < segmentFreeSpace ? static_cast<unsigned>(length) : segmentFreeSpace;
> > +    unsigned bytesToCopy = std::min(length, segmentFreeSpace);
> 
> Our style is to put "using namespace std;" at the top of the file and then omit
> the "std::" qualifier.
> 
> Other than that, this looks good.
> 
> r=me if you fix that before committing.

thanks. doing it now
Comment 33 Yong Li 2010-01-05 10:10:46 PST
closed
Comment 34 Yong Li 2010-01-05 10:11:15 PST
Comment on attachment 45900 [details]
fix build warnings

clear flags