33178 – Segmented SharedBuffer: SharedBuffer doesn't have to be a flat memory block

RESOLVED FIXED 33178

Segmented SharedBuffer: SharedBuffer doesn't have to be a flat memory block

https://bugs.webkit.org/show_bug.cgi?id=33178

Summary Segmented SharedBuffer: SharedBuffer doesn't have to be a flat memory block

Yong Li

Reported 2010-01-04 12:49:21 PST

Currently, SharedBuffer uses Vector<char> as the internal buffer, and stores all resource data in the flat vector. When more resource data comes and the buffer size is not big enough, SharedBuffer will allocate a new buffer, and copy all bytes to the new buffer. The purpose of this bug report is to allow SharedBuffer to use a group of memory segments to store resource data, and also enable image/text decoders to decode from segmented source data. This will reduce peak memory usage and also avoid requests for huge memory blocks. For desktop browsers, it is probably not that helpful, but from my testing, it seems not hurting the performance. I tested segmented SharedBuffer with modified text decoders, jpeg decoder and png decoder against iBench using Safari win32, and the result is segmented SharedBuffer is roughly 1% faster. Patches are coming.

Attachments
(1) Make SharedBuffer use segments internally (12.04 KB, patch) 2010-01-04 12:56 PST, Yong Li	darin: review-	Details Formatted Diff Diff
(1) make SharedBuffer use segments internally (12.05 KB, patch) 2010-01-04 14:34 PST, Yong Li	no flags	Details Formatted Diff Diff
(2) make PNG image decoder read from segmented SharedBuffer (7.72 KB, patch) 2010-01-05 07:31 PST, Yong Li	no flags	Details Formatted Diff Diff
Fix a bug in the first commit (introduced when fixing coding style) (1.49 KB, patch) 2010-01-05 07:41 PST, Yong Li	darin: review+	Details Formatted Diff Diff
fix build warnings (1.37 KB, patch) 2010-01-05 08:41 PST, Yong Li	aroben: review+	Details Formatted Diff Diff
*fix build warning by change append(char, int) to append(char, unsigned)* (1.83 KB, patch) 2010-01-05 09:00 PST, Yong Li	no flags	Details Formatted Diff Diff
fix build warnings (2.65 KB, patch) 2010-01-05 09:32 PST, Yong Li	aroben: review+	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Yong Li

Comment 1 2010-01-04 12:56:36 PST

Created attachment 45821 [details] (1) Make SharedBuffer use segments internally

WebKit Review Bot

Comment 2 2010-01-04 13:04:48 PST

Attachment 45821 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/platform/cf/SharedBufferCF.cpp:79: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1

Darin Adler

Comment 3 2010-01-04 13:14:28 PST

Comment on attachment 45821 [details] (1) Make SharedBuffer use segments internally Looks OK. But with no clients using the getSomeData function, it seems like it's scaffolding for some future work and functions like getSomeData are not tested. And this change will make things less efficient until the major clients adopt. Do you know what adding this temporary extra work does to page loading time? I don't want to get stuck with that. This patch also changes behavior when running out of memory. The new code will abort when running out of memory, whereas the old code would drop data. I think that probably should be changed in a separate patch rather than as a side effect of this one. > +#define SEGMENT_SIZE (0x1000) > +#define SEGMENT_POS_MASK (0x0FFF) > +#define GET_SEGMENT(position) ((position) / SEGMENT_SIZE) > +#define GET_OFF_IN_SEGMENT(globalPosition) ((globalPosition) & SEGMENT_POS_MASK) We use C++ constants and inline functions rather than macros for this sort of thing. > + unsigned posInSegment = GET_OFF_IN_SEGMENT(m_size - m_buffer.size()); We try to avoid abbreviations. The existing code uses "len" but should use "length". And here we would use "position" or "offset" rather than "pos" or "off". > + unsigned toCopy = len < segmentFreeSpace ? len : segmentFreeSpace; It's better to use a noun or adjective for this sort of variable rather than the preposition "to". For example, "bytesToCopy" is probably a good name. > + RefPtr<SharedBuffer> clone(adoptRef(new SharedBuffer())); No need for the parentheses here after SharedBuffer. > + clone->m_size = m_size; > + clone->m_buffer.append(m_buffer.data(), m_buffer.size()); > + clone->m_segments.reserveCapacity(m_segments.size()); > + for (Vector<char*>::const_iterator i = m_segments.begin(); i != m_segments.end(); ++i) { > + char* segment = allocateSegment(); > + memcpy(segment, *i, SEGMENT_SIZE); > + clone->m_segments.append(segment); > + } > + return clone; Why is it a good idea to clone all the segments into separate new segments rather than making a single large segment? > + char* dest = m_buffer.data() + bufferSize; Again, avoid abbreviation if possible. > + unsigned left = m_size - bufferSize; The word "left" is a strange name for this. Maybe bytesLeft would be OK. > + for (Vector<char*>::const_iterator i = m_segments.begin(); i != m_segments.end(); ++i) { We normally use array-style index iteration for vectors. The iterators are just there for things like STL algorithms. > +unsigned SharedBuffer::getSomeData(const char** someData, unsigned pos) const > +{ > + if (hasPlatformData() || m_purgeableBuffer) { > + *someData = data(); > + return size(); > + } The above code seems clearly wrong when pos is non-zero. > + // Calling this function will force internal segmented buffers > + // to be merged into a flat buffer. Use getSomeData() whenever possible > + // for better performance. > const char* data() const; > unsigned size() const; This comment comes before a paragraph of two functions. Please add a blank line. > + // Calling this function will force internal segmented buffers > + // to be merged into a flat buffer. Use getSomeData() whenever possible > + // for better performance. > + const Vector<char> &buffer() const; You should change this to "const Vector<char>& buffer()" -- the existing code was formatted incorrectly. > + // Return the flat data size after "pos". "*data" will be the address > + // Return 0 when no more data left. I suggest using "const char*&" for data instead of "const char**". I don't think the term "flat data size" is clear. > + unsigned getSegment(unsigned position) const; > + unsigned getOffsetInSegment(unsigned globalPosition) const; WebKit code normally does not use "get" in the names of functions that have return values. The above functions should be named segmentIndex() and offsetInSegment(). review- because of multiple issues above.

Yong Li

Comment 4 2010-01-04 13:17:25 PST

(In reply to comment #2) > Attachment 45821 [details] did not pass style-queue: > > Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 > WebCore/platform/cf/SharedBufferCF.cpp:79: Tests for true/false, > null/non-null, and zero/non-zero should all be done without equality > comparisons. [readability/comparison_to_zero] [5] > Total errors found: 1 - ASSERT(m_buffer.size() == 0); + ASSERT(m_size == 0); should be ok?

Darin Adler

Comment 5 2010-01-04 13:20:19 PST

(In reply to comment #4) > - ASSERT(m_buffer.size() == 0); > + ASSERT(m_size == 0); > > should be ok? No, the old code had wrong style. Should be ASSERT(!m_size).

Yong Li

Comment 6 2010-01-04 13:29:14 PST

(In reply to comment #3) > (From update of attachment 45821 [details]) > Looks OK. But with no clients using the getSomeData function, it seems like > it's scaffolding for some future work and functions like getSomeData are not > tested. And this change will make things less efficient until the major clients > adopt. I have 3 more patches that make all script decoders, jpeg decoder and png decoder support decoding from segmented SharedBuffer. Will post them soon. > > Do you know what adding this temporary extra work does to page loading time? I > don't want to get stuck with that. Probably it's because it doesn't need extra memory copying. But I'm not sure. I did the test just because I want to make sure it doesn't slow down desktop browsers. > > This patch also changes behavior when running out of memory. The new code will > abort when running out of memory, whereas the old code would drop data. I think > that probably should be changed in a separate patch rather than as a side > effect of this one. The current patch calls fastMalloc and I remember fastMalloc calls abort() upon OOM already. In the future, the allocation of segments can be done in other ways for better performance. We could directly allocate a page of virtual memory although I'm not sure if it helps. Thanks for all suggestions and corrections! I'll modify the code and update the patch soon

Darin Adler

Comment 7 2010-01-04 13:33:26 PST

(In reply to comment #6) > (In reply to comment #3) > > (From update of attachment 45821 [details] [details]) > > Looks OK. But with no clients using the getSomeData function, it seems like > > it's scaffolding for some future work and functions like getSomeData are not > > tested. And this change will make things less efficient until the major clients > > adopt. > > I have 3 more patches that make all script decoders, jpeg decoder and png > decoder support decoding from segmented SharedBuffer. Will post them soon. I think that means that after your work image decoding on Mac OS X, iPhone, and some other platforms will be less efficient than it is today, because the image decoder in question is not a script decoder, JPEG decoder, or PNG decoder. > > Do you know what adding this temporary extra work does to page loading time? I > > don't want to get stuck with that. > > Probably it's because it doesn't need extra memory copying. But I'm not sure. I > did the test just because I want to make sure it doesn't slow down desktop > browsers. Sorry, I don't understand. This patch will allocate separate segments and then merge them later, which will make things slower for clients that use it the old way. That seems like it will make things slower and will require allocating two copies of everything at one point. You say "did the test". What test did you do? > > This patch also changes behavior when running out of memory. The new code will > > abort when running out of memory, whereas the old code would drop data. I think > > that probably should be changed in a separate patch rather than as a side > > effect of this one. > > The current patch calls fastMalloc and I remember fastMalloc calls abort() upon > OOM already. In the future, the allocation of segments can be done in other > ways for better performance. We could directly allocate a page of virtual > memory although I'm not sure if it helps. Yes, that's just the problem. The old code calls Vector::append, which does not call abort. There is tryFastMalloc, which does not call abort. You could use that and keep the semantics the same. Or you could change the semantics if you think there is a good reason.

Yong Li

Comment 8 2010-01-04 14:27:05 PST

(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #3) > > > (From update of attachment 45821 [details] [details] [details]) > > > Looks OK. But with no clients using the getSomeData function, it seems like > > > it's scaffolding for some future work and functions like getSomeData are not > > > tested. And this change will make things less efficient until the major clients > > > adopt. > > > > I have 3 more patches that make all script decoders, jpeg decoder and png > > decoder support decoding from segmented SharedBuffer. Will post them soon. > > I think that means that after your work image decoding on Mac OS X, iPhone, and > some other platforms will be less efficient than it is today, because the image > decoder in question is not a script decoder, JPEG decoder, or PNG decoder. > Still not sure why it will less efficient. Even just merging into a flat consecutive buffer at last can still save memory copy times. Also it shouldn't be difficult to make image decoding work on segments, because the resource from network comes in fragments, and a browser should always be able to decode incomplete image source and be able to continue unfinished decoding job when more data comes. > > > Do you know what adding this temporary extra work does to page loading time? I > > > don't want to get stuck with that. > > > > Probably it's because it doesn't need extra memory copying. But I'm not sure. I > > did the test just because I want to make sure it doesn't slow down desktop > > browsers. > > Sorry, I don't understand. This patch will allocate separate segments and then > merge them later, which will make things slower for clients that use it the old > way. That seems like it will make things slower and will require allocating two > copies of everything at one point. > > You say "did the test". What test did you do? > Even with the old solution, when the buffer grows, it has to allocate 2 copies of everything, and copies all bytes from one to the other. The test I did is with other patches that make many decoders support segmented decoding. And more details: 1. run HTML loading test in iBench once and ignore the result. 2. run HTML loading test in iBench for 3 times continously, and record the results. I know the resources are actually loaded from disk cache. But in this way, the results are very stable. When I "reset Safari" every time before running iBench, I got very unstable and useless results. > > > This patch also changes behavior when running out of memory. The new code will > > > abort when running out of memory, whereas the old code would drop data. I think > > > that probably should be changed in a separate patch rather than as a side > > > effect of this one. > > > > The current patch calls fastMalloc and I remember fastMalloc calls abort() upon > > OOM already. In the future, the allocation of segments can be done in other > > ways for better performance. We could directly allocate a page of virtual > > memory although I'm not sure if it helps. > > Yes, that's just the problem. The old code calls Vector::append, which does not > call abort. There is tryFastMalloc, which does not call abort. You could use > that and keep the semantics the same. Or you could change the semantics if you > think there is a good reason. But Vector doesn't call tryFastMalloc, instead, it calls fastMalloc in allocateBuffer(). Have I missed anything?

Yong Li

Comment 9 2010-01-04 14:34:27 PST

Created attachment 45830 [details] (1) make SharedBuffer use segments internally Fixed style problems and bugs that Darin Adler pointed out.

WebKit Review Bot

Comment 10 2010-01-04 14:43:38 PST

Attachment 45830 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/platform/cf/SharedBufferCF.cpp:79: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1

Yong Li

Comment 11 2010-01-04 15:02:48 PST

(In reply to comment #10) > Attachment 45830 [details] did not pass style-queue: > > Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 > WebCore/platform/cf/SharedBufferCF.cpp:79: Tests for true/false, > null/non-null, and zero/non-zero should all be done without equality > comparisons. [readability/comparison_to_zero] [5] > Total errors found: 1 oops. forgot this one

Darin Adler

Comment 12 2010-01-04 15:52:01 PST

(In reply to comment #8) > Still not sure why it will less efficient. Even just merging into a flat > consecutive buffer at last can still save memory copy times. OK. > Also it shouldn't > be difficult to make image decoding work on segments, because the resource from > network comes in fragments, and a browser should always be able to decode > incomplete image source and be able to continue unfinished decoding job when > more data comes. Yes, I agree. We just have to do it! And in the mean time I don't want things to be worse than before. You've convinced me. > But Vector doesn't call tryFastMalloc, instead, it calls fastMalloc in > allocateBuffer(). Have I missed anything? You're right! I am puzzled, now, by all the places in Vector's code where it says: if (!begin()) return; I thought those were out-of-memory checks.

Darin Adler

Comment 13 2010-01-04 15:54:05 PST

Comment on attachment 45830 [details] (1) make SharedBuffer use segments internally > +static unsigned segmentIndex(unsigned position) > +{ > + return position / segmentSize; > +} > + > +static unsigned offsetInSegment(unsigned position) > +{ > + return position & segmentPositionMask; > +} Should probably say "inline" in both of these. > +unsigned SharedBuffer::getSomeData(const char*& someData, unsigned pos) const Would prefer "position" or "offset" to "pos". r=me

Yong Li

Comment 14 2010-01-05 06:51:37 PST

(In reply to comment #13) > (From update of attachment 45830 [details]) > > +static unsigned segmentIndex(unsigned position) > > +{ > > + return position / segmentSize; > > +} > > + > > +static unsigned offsetInSegment(unsigned position) > > +{ > > + return position & segmentPositionMask; > > +} > > Should probably say "inline" in both of these. > > > +unsigned SharedBuffer::getSomeData(const char*& someData, unsigned pos) const > > Would prefer "position" or "offset" to "pos". > > r=me Thanks! I'll fix these before committing it

Yong Li

Comment 15 2010-01-05 07:26:34 PST

Comment on attachment 45830 [details] (1) make SharedBuffer use segments internally committed @ r52795 clear flags

Yong Li

Comment 16 2010-01-05 07:31:45 PST

Created attachment 45889 [details] (2) make PNG image decoder read from segmented SharedBuffer

WebKit Review Bot

Comment 17 2010-01-05 07:34:57 PST

style-queue ran check-webkit-style on attachment 45889 [details] without any errors.

WebKit Review Bot

Comment 18 2010-01-05 07:40:28 PST

Attachment 45889 [details] did not build on chromium: Build output: http://webkit-commit-queue.appspot.com/results/164099

Yong Li

Comment 19 2010-01-05 07:41:05 PST

Created attachment 45890 [details] Fix a bug in the first commit (introduced when fixing coding style) also include another coding style fix

Darin Adler

Comment 20 2010-01-05 07:42:53 PST

I don’t think you should do all the adoption patches in a single bug. It’s confusing.

Yong Li

Comment 21 2010-01-05 07:58:15 PST

(In reply to comment #20) > I don’t think you should do all the adoption patches in a single bug. It’s > confusing. OK. I'll raise new bugs

Darin Adler

Comment 22 2010-01-05 08:36:39 PST

> unsigned bytesToCopy = static_cast<unsigned>(length) < segmentFreeSpace ? static_cast<unsigned>(length) : segmentFreeSpace; A more elegant way to write that is: unsigned bytesToCopy = min<unsigned>(length, segmentFreeSpace); That gets rid of the casts too. In general, the min function is helpful for these "how much more data to get out of a buffer" sites.

Yong Li

Comment 23 2010-01-05 08:41:25 PST

Created attachment 45894 [details] fix build warnings

Adam Roben (:aroben)

Comment 24 2010-01-05 08:42:49 PST

Comment on attachment 45894 [details] fix build warnings r=me

Darin Adler

Comment 25 2010-01-05 08:43:58 PST

It seems clear to me that we should change the type of the argument to SharedBuffer::append to unsigned. That will affect more files than this, but we don't want all these typecasts. I am very surprised you need that cast to int, though.

Yong Li

Comment 26 2010-01-05 08:46:31 PST

(In reply to comment #25) > It seems clear to me that we should change the type of the argument to > SharedBuffer::append to unsigned. That will affect more files than this, but we > don't want all these typecasts. > > I am very surprised you need that cast to int, though. I was worried about having more warning by changing the type of the argument. but just realized that int->unsigned won't give warning. will do that way

Darin Adler

Comment 27 2010-01-05 08:50:10 PST

(In reply to comment #26) > I was worried about having more warning by changing the type of the argument. > but just realized that int->unsigned won't give warning. will do that way Yes, I understand that you are worried, but you should not be! We can’t program based on fear ;-)

Yong Li

Comment 28 2010-01-05 09:00:47 PST

Created attachment 45895 [details] fix build warning by change append(char*, int) to append(char*, unsigned)

WebKit Review Bot

Comment 29 2010-01-05 09:04:03 PST

style-queue ran check-webkit-style on attachment 45895 [details] without any errors.

Yong Li

Comment 30 2010-01-05 09:32:05 PST

Created attachment 45900 [details] fix build warnings fix warnings and also replace (A < B ? A : B) with std::min

Adam Roben (:aroben)

Comment 31 2010-01-05 09:35:14 PST

Comment on attachment 45900 [details] fix build warnings > - unsigned bytesToCopy = static_cast<unsigned>(length) < segmentFreeSpace ? static_cast<unsigned>(length) : segmentFreeSpace; > + unsigned bytesToCopy = std::min(length, segmentFreeSpace); Our style is to put "using namespace std;" at the top of the file and then omit the "std::" qualifier. Other than that, this looks good. r=me if you fix that before committing.

Yong Li

Comment 32 2010-01-05 09:42:59 PST

(In reply to comment #31) > (From update of attachment 45900 [details]) > > - unsigned bytesToCopy = static_cast<unsigned>(length) < segmentFreeSpace ? static_cast<unsigned>(length) : segmentFreeSpace; > > + unsigned bytesToCopy = std::min(length, segmentFreeSpace); > > Our style is to put "using namespace std;" at the top of the file and then omit > the "std::" qualifier. > > Other than that, this looks good. > > r=me if you fix that before committing. thanks. doing it now

Yong Li

Comment 33 2010-01-05 10:10:46 PST

closed

Yong Li

Comment 34 2010-01-05 10:11:15 PST

Comment on attachment 45900 [details] fix build warnings clear flags

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component WebCore Misc.

Assignee

Nobody

Reported

2010-01-04 12:49 PST

Modified

2021-11-16 01:22 PST History

CC List

4 users Show

URL

Keywords

Depends on

Blocks