25779 – Allow Strings to be created with one malloc node with no copying

RESOLVED FIXED 25779

Allow Strings to be created with one malloc node with no copying

https://bugs.webkit.org/show_bug.cgi?id=25779

Summary Allow Strings to be created with one malloc node with no copying

Dave Moore

Reported 2009-05-13 20:33:53 PDT

Currently when creating StringImpls we have to choose between these two apis: static PassRefPtr<StringImpl> create(const UChar*, unsigned length); static PassRefPtr<StringImpl> adopt(StringBuffer&); The first mallocs an object big enough to hold both the StringImpl and a buffer to contain the characters that are passed to it. These characters are copied. The second doesn't copy the characters but you have to create a second object, the StringBuffer, which hands its character buffer to the StringImpl. I would like to add a new method, static String createUninitialized(unsigned length, UChar*& data); on String that returns a pointer to the inline buffer so that the caller can fill the characters in, yielding a String with one malloc node without any copying.

Attachments
Fix (9.43 KB, patch) 2009-05-13 20:43 PDT, Dave Moore	darin: review-	Details Formatted Diff Diff
Fix (5.19 KB, patch) 2009-05-14 13:20 PDT, Dave Moore	eric: review-	Details Formatted Diff Diff
Fix (5.02 KB, patch) 2009-05-15 09:12 PDT, Dave Moore	darin: review+	Details Formatted Diff Diff
Additional changes (22.50 KB, patch) 2009-05-24 21:04 PDT, Dave Moore	mjs: review-	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Dave Moore

Comment 1 2009-05-13 20:43:28 PDT

Created attachment 30312 [details] Fix

Darin Adler

Comment 2 2009-05-14 02:49:47 PDT

Comment on attachment 30312 [details] Fix > + * platform\text\PlatformString.h: > + * platform\text\String.cpp: > + * platform\text\StringImpl.cpp: > + * platform\text\StringImpl.h > +: The paths here are using the Windows-style \ patch separator. > unsigned end = m_length - 1; > - > + > // skip white space from start Your patch here strips all trailing whitespace from this file. Although we have no formal policy about this, and I believe the git tool does encourage you to do this, we normally discourage including changes like that in the same check-in with substantive changes, since it confuses things in both that check-in and the file history. We also don't have agreement on whether to do something like this project-wide, and that's something I'd like to see before taking patches that just remove trailing whitespace. There are more lines in your patch from the whitespace change than from the substantive one. > + if (!length) { > + data = NULL; We use 0 in this project, not NULL. See the WebKit style guide. > +PassRefPtr<StringImpl> StringImpl::create(const UChar* characters, unsigned length) > +{ > + if (!characters || !length) > + return empty(); > + > + UChar *data; We put the * next to the type -- UChar*, not UChar * -- see the WebKit style guide. > - UChar* data = reinterpret_cast<UChar*>(buffer + sizeof(StringImpl)); > + UChar *data; Same here. I'm slightly concerned that this patch adds a branch or two and a new level of function call to a common code path. I assume it won't slow anything down measurably though. The substance of the patch looks good. Please fix these style issues.

Dave Moore

Comment 3 2009-05-14 13:20:44 PDT

Created attachment 30351 [details] Fix

Darin Adler

Comment 4 2009-05-14 14:00:36 PDT

Comment on attachment 30351 [details] Fix > + * platform/text/PlatformString.h: > + * platform/text/String.cpp: > + * platform/text/StringImpl.cpp: > + * platform/text/StringImpl.h > +: The lists of affected functions are missing. The colon on this last line is on a separate line. > + // Returns an uninitialized string. The characters needs to be written > + // into the buffer returned in data before using > + static String createUninitialized(unsigned length, UChar*& data); The comment seems to stop mid-sentence. Or maybe it's just a missing period. > +String String::createUninitialized(unsigned length, UChar*& data) > +{ > + return StringImpl::createUninitialized(length, data); > +} This should be inlined. No reason to pay function call overhead here. r=me as is, but maybe fix those things above and post a new patch?

Eric Seidel (no email)

Comment 5 2009-05-14 22:43:23 PDT

Comment on attachment 30351 [details] Fix The fix is fine. Marking this r- to remove it from the commit queue. I think Dave Moore has commit bit? Otherwise I'm happy to commit this once he posts a patch with fixed comment (and darin's nits addressed).

Dave Moore

Comment 6 2009-05-15 09:12:32 PDT

Created attachment 30389 [details] Fix

Darin Adler

Comment 7 2009-05-15 10:21:37 PDT

Comment on attachment 30389 [details] Fix > + * platform/text/StringImpl.cpp: > + (WebCore::StringImpl::createUninitialized(), > + create(const UChar* characters, unsigned length), > + create(const char* characters, unsigned length)) This is not the standard ChangeLog format for the function names. In the future, using the prepare-ChangeLog script can help you get the standard format. r=me

Mark Rowe (bdash)

Comment 8 2009-05-16 20:45:11 PDT

Landed in r43808.

Darin Adler

Comment 9 2009-05-16 21:13:31 PDT

There are 8 call sites for StringImpl::adopt and 50 call sites for String::adopt. Some those should use this new function for better efficiency. Should we file a new bug about that. Dave, would you be willing to work on that?

Dave Moore

Comment 10 2009-05-18 08:08:18 PDT

I'd be happy to work on that.

Darin Adler

Comment 11 2009-05-18 09:41:26 PDT

I have a few other thoughts about Vector::adopt call sites: 1) In cases where the string is built up incrementally and often ends up quite large, a Vector and adopt is probably still a good idea. 2) In cases where the string is small, we may want to copy the string and avoid keeping around two memory blocks, even if we are in the Vector adopt call. 3) In some cases where we build up the string incrementally and it usually stays small, we could consider using Vector<UChar, 32> (or some suitable size) and making an adopt function that understands how to create a string two different ways depending on whether the inline storage was used or not. A useful statistic would be how many single-block strings vs. double-block strings are allocated during a page load.

Dave Moore

Comment 12 2009-05-24 21:04:07 PDT

Created attachment 30639 [details] Additional changes This patch changes many of the uses of adopt() to createUninitialized(). Most are just straight-forward wins. A couple are places where we walk through a list of nodes and extract the strings from them. These used to build up the string incrementally, appending to a buffer. Now I walk through the list twice, the first to discover the total length and the second to write the string into the inline buffer. I think it's worth the second navigation to keep everything in one malloc node. In places where the logic was somewhat complicated as to the creation of the to be adopted buffer I left it as it was. I haven't yet done the automatic inlining of small adopted buffers. I'm not sure what the right size is, at which it would be worth the extra copy.

Maciej Stachowiak

Comment 13 2009-05-24 22:41:56 PDT

Comment on attachment 30639 [details] Additional changes This patch should really go in a separate bugzilla bug, as originally suggested by Darin. It is awkward to have a new patch for review on a resolved bug. r- for that but I'd be glad to give substantive review when it is attached to a new bug report. I hope you don't mind the r- for process reasons, it just gets messy to have bug reports reused for a different purpose, especially when already resolved. In general I think this work is great, just trying to keep the bug system clean. A few quick comments: 1) Please listen to the script and <set EMAIL_ADDRESS environment variable> :-) 2) For determining the tradeoff point for inline buffers - you could make a microbenchmark to check or make a rough estimate. The tradeoff is basically this: (a) FastMalloc allocate a StringImpl + allocate a buffer of N UChars. (b) One-shot allocate a StringImpl with extra space for N UChars, and copy N UChars. When N is large, the copy will be more expensive than the extra allocation of the StringImpl body. It should be possible to estimate the tradeoff point with microbenchmarks. Also, since space for N UChars will be reserved on the stack, there is a practical limit to N, to avoid risk of blowing the stack.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS OS X 10.5

Product WebKit

Component Platform

Assignee

Nobody

Reported

2009-05-13 20:33 PDT

Modified

2009-05-24 22:41 PDT History

CC List

1 user Show

URL

Keywords

Depends on

Blocks