WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
25779
Allow Strings to be created with one malloc node with no copying
https://bugs.webkit.org/show_bug.cgi?id=25779
Summary
Allow Strings to be created with one malloc node with no copying
Dave Moore
Reported
2009-05-13 20:33:53 PDT
Currently when creating StringImpls we have to choose between these two apis: static PassRefPtr<StringImpl> create(const UChar*, unsigned length); static PassRefPtr<StringImpl> adopt(StringBuffer&); The first mallocs an object big enough to hold both the StringImpl and a buffer to contain the characters that are passed to it. These characters are copied. The second doesn't copy the characters but you have to create a second object, the StringBuffer, which hands its character buffer to the StringImpl. I would like to add a new method, static String createUninitialized(unsigned length, UChar*& data); on String that returns a pointer to the inline buffer so that the caller can fill the characters in, yielding a String with one malloc node without any copying.
Attachments
Fix
(9.43 KB, patch)
2009-05-13 20:43 PDT
,
Dave Moore
darin
: review-
Details
Formatted Diff
Diff
Fix
(5.19 KB, patch)
2009-05-14 13:20 PDT
,
Dave Moore
eric
: review-
Details
Formatted Diff
Diff
Fix
(5.02 KB, patch)
2009-05-15 09:12 PDT
,
Dave Moore
darin
: review+
Details
Formatted Diff
Diff
Additional changes
(22.50 KB, patch)
2009-05-24 21:04 PDT
,
Dave Moore
mjs
: review-
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Dave Moore
Comment 1
2009-05-13 20:43:28 PDT
Created
attachment 30312
[details]
Fix
Darin Adler
Comment 2
2009-05-14 02:49:47 PDT
Comment on
attachment 30312
[details]
Fix
> + * platform\text\PlatformString.h: > + * platform\text\String.cpp: > + * platform\text\StringImpl.cpp: > + * platform\text\StringImpl.h > +:
The paths here are using the Windows-style \ patch separator.
> unsigned end = m_length - 1; > - > + > // skip white space from start
Your patch here strips all trailing whitespace from this file. Although we have no formal policy about this, and I believe the git tool does encourage you to do this, we normally discourage including changes like that in the same check-in with substantive changes, since it confuses things in both that check-in and the file history. We also don't have agreement on whether to do something like this project-wide, and that's something I'd like to see before taking patches that just remove trailing whitespace. There are more lines in your patch from the whitespace change than from the substantive one.
> + if (!length) { > + data = NULL;
We use 0 in this project, not NULL. See the WebKit style guide.
> +PassRefPtr<StringImpl> StringImpl::create(const UChar* characters, unsigned length) > +{ > + if (!characters || !length) > + return empty(); > + > + UChar *data;
We put the * next to the type -- UChar*, not UChar * -- see the WebKit style guide.
> - UChar* data = reinterpret_cast<UChar*>(buffer + sizeof(StringImpl)); > + UChar *data;
Same here. I'm slightly concerned that this patch adds a branch or two and a new level of function call to a common code path. I assume it won't slow anything down measurably though. The substance of the patch looks good. Please fix these style issues.
Dave Moore
Comment 3
2009-05-14 13:20:44 PDT
Created
attachment 30351
[details]
Fix
Darin Adler
Comment 4
2009-05-14 14:00:36 PDT
Comment on
attachment 30351
[details]
Fix
> + * platform/text/PlatformString.h: > + * platform/text/String.cpp: > + * platform/text/StringImpl.cpp: > + * platform/text/StringImpl.h > +:
The lists of affected functions are missing. The colon on this last line is on a separate line.
> + // Returns an uninitialized string. The characters needs to be written > + // into the buffer returned in data before using > + static String createUninitialized(unsigned length, UChar*& data);
The comment seems to stop mid-sentence. Or maybe it's just a missing period.
> +String String::createUninitialized(unsigned length, UChar*& data) > +{ > + return StringImpl::createUninitialized(length, data); > +}
This should be inlined. No reason to pay function call overhead here. r=me as is, but maybe fix those things above and post a new patch?
Eric Seidel (no email)
Comment 5
2009-05-14 22:43:23 PDT
Comment on
attachment 30351
[details]
Fix The fix is fine. Marking this r- to remove it from the commit queue. I think Dave Moore has commit bit? Otherwise I'm happy to commit this once he posts a patch with fixed comment (and darin's nits addressed).
Dave Moore
Comment 6
2009-05-15 09:12:32 PDT
Created
attachment 30389
[details]
Fix
Darin Adler
Comment 7
2009-05-15 10:21:37 PDT
Comment on
attachment 30389
[details]
Fix
> + * platform/text/StringImpl.cpp: > + (WebCore::StringImpl::createUninitialized(), > + create(const UChar* characters, unsigned length), > + create(const char* characters, unsigned length))
This is not the standard ChangeLog format for the function names. In the future, using the prepare-ChangeLog script can help you get the standard format. r=me
Mark Rowe (bdash)
Comment 8
2009-05-16 20:45:11 PDT
Landed in
r43808
.
Darin Adler
Comment 9
2009-05-16 21:13:31 PDT
There are 8 call sites for StringImpl::adopt and 50 call sites for String::adopt. Some those should use this new function for better efficiency. Should we file a new bug about that. Dave, would you be willing to work on that?
Dave Moore
Comment 10
2009-05-18 08:08:18 PDT
I'd be happy to work on that.
Darin Adler
Comment 11
2009-05-18 09:41:26 PDT
I have a few other thoughts about Vector::adopt call sites: 1) In cases where the string is built up incrementally and often ends up quite large, a Vector and adopt is probably still a good idea. 2) In cases where the string is small, we may want to copy the string and avoid keeping around two memory blocks, even if we are in the Vector adopt call. 3) In some cases where we build up the string incrementally and it usually stays small, we could consider using Vector<UChar, 32> (or some suitable size) and making an adopt function that understands how to create a string two different ways depending on whether the inline storage was used or not. A useful statistic would be how many single-block strings vs. double-block strings are allocated during a page load.
Dave Moore
Comment 12
2009-05-24 21:04:07 PDT
Created
attachment 30639
[details]
Additional changes This patch changes many of the uses of adopt() to createUninitialized(). Most are just straight-forward wins. A couple are places where we walk through a list of nodes and extract the strings from them. These used to build up the string incrementally, appending to a buffer. Now I walk through the list twice, the first to discover the total length and the second to write the string into the inline buffer. I think it's worth the second navigation to keep everything in one malloc node. In places where the logic was somewhat complicated as to the creation of the to be adopted buffer I left it as it was. I haven't yet done the automatic inlining of small adopted buffers. I'm not sure what the right size is, at which it would be worth the extra copy.
Maciej Stachowiak
Comment 13
2009-05-24 22:41:56 PDT
Comment on
attachment 30639
[details]
Additional changes This patch should really go in a separate bugzilla bug, as originally suggested by Darin. It is awkward to have a new patch for review on a resolved bug. r- for that but I'd be glad to give substantive review when it is attached to a new bug report. I hope you don't mind the r- for process reasons, it just gets messy to have bug reports reused for a different purpose, especially when already resolved. In general I think this work is great, just trying to keep the bug system clean. A few quick comments: 1) Please listen to the script and <set EMAIL_ADDRESS environment variable> :-) 2) For determining the tradeoff point for inline buffers - you could make a microbenchmark to check or make a rough estimate. The tradeoff is basically this: (a) FastMalloc allocate a StringImpl + allocate a buffer of N UChars. (b) One-shot allocate a StringImpl with extra space for N UChars, and copy N UChars. When N is large, the copy will be more expensive than the extra allocation of the StringImpl body. It should be possible to estimate the tradeoff point with microbenchmarks. Also, since space for N UChars will be reserved on the stack, there is a practical limit to N, to avoid risk of blowing the stack.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug