WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
103295
TextIterator unnecessarily converts 8 bit strings to 16 bits
https://bugs.webkit.org/show_bug.cgi?id=103295
Summary
TextIterator unnecessarily converts 8 bit strings to 16 bits
Michael Saboff
Reported
2012-11-26 14:16:49 PST
TextIterator uses String::character() in emitText(). Also, the only access to text in a TextIterator is UChar* characters(). Given that most strings are 8 bit, TextIterator and functions that use TextIterator like plainText() should be changed to use Strings instead of UChar pointers.
Attachments
Patch
(14.02 KB, patch)
2012-11-26 16:05 PST
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Updated Patch
(14.27 KB, patch)
2012-11-27 11:35 PST
,
Michael Saboff
bfulgham
: review-
Details
Formatted Diff
Diff
ANother updated patch
(14.27 KB, patch)
2012-11-27 14:21 PST
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Michael Saboff
Comment 1
2012-11-26 16:05:55 PST
Created
attachment 176103
[details]
Patch This change is a progression on the performance test attached to
https://bugs.webkit.org/show_bug.cgi?id=81192
referenced in the code. Comparing the geometric mean of 3 runs before and after the patch: Baseline With Patch Delta innerText: 3676.2ms 3032.87ms +17% outerText: 3625.7ms 3603.27ms +0.6% It is also great improvement on the number of strings/characters that need to be up converted to 16 bits in our page load memory test. Before, 21289 (10.73%) strings up converted totaling 432,175 characters using 864,350 bytes After, 2201 ( 1.08%) strings up converted totaling 82,394 characters using 164,788 bytes This is a 90% reduction of strings and an 81% reduction in characters / bytes.
Michael Saboff
Comment 2
2012-11-26 16:10:07 PST
After posting, I noticed code that should have been removed from TextIterator.h: #if 0 UChar* plainTextToMallocAllocatedBuffer(const Range*, unsigned& bufferLength, bool isDisplayString, TextIteratorBehavior = TextIteratorDefaultBehavior); #endif It has been removed from my working code. I can repost if needed.
Darin Adler
Comment 3
2012-11-26 16:56:24 PST
Comment on
attachment 176103
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=176103&action=review
> Source/WebCore/editing/TextIterator.cpp:468 > + ASSERT(static_cast<int>(index) < length()); > + if (!(static_cast<int>(index) < length()))
I think the typecast needs to be done in the other direction, casting length() to an unsigned, otherwise large indexes will get past this check because they get converted to negative numbers.
> Source/WebCore/editing/TextIterator.h:66 > +#if 0 > UChar* plainTextToMallocAllocatedBuffer(const Range*, unsigned& bufferLength, bool isDisplayString, TextIteratorBehavior = TextIteratorDefaultBehavior); > +#endif
Why just #if'd out?
> Source/WebCore/editing/TextIterator.h:103 > + int startOffset() const { return m_positionStartOffset; } > + const String string() const { return m_text; } > + const UChar* characters() const { return m_textCharacters ? m_textCharacters : m_text.characters() + startOffset(); } > + UChar characterAt(unsigned index) const; > + void appendTextToStringBuilder(StringBuilder&) const;
This additional API makes TextIterator a lot more confusing to use than it was before. The old API was designed to make it hard to use incorrectly; there was no copying of text ever. I’m particularly confused about how to use the string() function. Do all these new functions need to be public? The string() function should not return "const String". It should just return "String".
> Source/WebKit/mac/WebView/WebFrame.mm:500 > - // This will give a system malloc'd buffer that can be turned directly into an NSString > - unsigned length; > - UChar* buf = plainTextToMallocAllocatedBuffer(core(range), length, true); > - > - if (!buf) > - return [NSString string]; > - > - // Transfer buffer ownership to NSString > - return [[[NSString alloc] initWithCharactersNoCopy:buf length:length freeWhenDone:YES] autorelease]; > + return plainText(core(range), TextIteratorDefaultBehavior, true);
Why is it OK to remove the plainTextToMallocAllocatedBuffer optimization? That was a critical change back when Antti originally did it. Did you research why it was needed and determine it’s not needed any more?
Michael Saboff
Comment 4
2012-11-27 11:34:29 PST
(In reply to
comment #3
)
> (From update of
attachment 176103
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=176103&action=review
> > > Source/WebCore/editing/TextIterator.cpp:468 > > + ASSERT(static_cast<int>(index) < length()); > > + if (!(static_cast<int>(index) < length())) > > I think the typecast needs to be done in the other direction, casting length() to an unsigned, otherwise large indexes will get past this check because they get converted to negative numbers. > > > Source/WebCore/editing/TextIterator.h:66 > > +#if 0 > > UChar* plainTextToMallocAllocatedBuffer(const Range*, unsigned& bufferLength, bool isDisplayString, TextIteratorBehavior = TextIteratorDefaultBehavior); > > +#endif > > Why just #if'd out? > > > Source/WebCore/editing/TextIterator.h:103 > > + int startOffset() const { return m_positionStartOffset; } > > + const String string() const { return m_text; } > > + const UChar* characters() const { return m_textCharacters ? m_textCharacters : m_text.characters() + startOffset(); } > > + UChar characterAt(unsigned index) const; > > + void appendTextToStringBuilder(StringBuilder&) const; > > This additional API makes TextIterator a lot more confusing to use than it was before. The old API was designed to make it hard to use incorrectly; there was no copying of text ever. I’m particularly confused about how to use the string() function. Do all these new functions need to be public? > > The string() function should not return "const String". It should just return "String". > > > Source/WebKit/mac/WebView/WebFrame.mm:500 > > - // This will give a system malloc'd buffer that can be turned directly into an NSString > > - unsigned length; > > - UChar* buf = plainTextToMallocAllocatedBuffer(core(range), length, true); > > - > > - if (!buf) > > - return [NSString string]; > > - > > - // Transfer buffer ownership to NSString > > - return [[[NSString alloc] initWithCharactersNoCopy:buf length:length freeWhenDone:YES] autorelease]; > > + return plainText(core(range), TextIteratorDefaultBehavior, true); > > Why is it OK to remove the plainTextToMallocAllocatedBuffer optimization? That was a critical change back when Antti originally did it. Did you research why it was needed and determine it’s not needed any more?
Michael Saboff
Comment 5
2012-11-27 11:35:12 PST
Created
attachment 176308
[details]
Updated Patch (In reply to
comment #3
)
> (From update of
attachment 176103
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=176103&action=review
> > > Source/WebCore/editing/TextIterator.cpp:468 > > + ASSERT(static_cast<int>(index) < length()); > > + if (!(static_cast<int>(index) < length())) > > I think the typecast needs to be done in the other direction, casting length() to an unsigned, otherwise large indexes will get past this check because they get converted to negative numbers.
Done.
> > Source/WebCore/editing/TextIterator.h:66 > > +#if 0 > > UChar* plainTextToMallocAllocatedBuffer(const Range*, unsigned& bufferLength, bool isDisplayString, TextIteratorBehavior = TextIteratorDefaultBehavior); > > +#endif > > Why just #if'd out?
Oversight in original patch - fixed.
> > Source/WebCore/editing/TextIterator.h:103 > > + int startOffset() const { return m_positionStartOffset; } > > + const String string() const { return m_text; } > > + const UChar* characters() const { return m_textCharacters ? m_textCharacters : m_text.characters() + startOffset(); } > > + UChar characterAt(unsigned index) const; > > + void appendTextToStringBuilder(StringBuilder&) const; > > This additional API makes TextIterator a lot more confusing to use than it was before. The old API was designed to make it hard to use incorrectly; there was no copying of text ever. I’m particularly confused about how to use the string() function. Do all these new functions need to be public?
I made startOffset() and string() private. The way to access text data is as before via characters(), if one character is wanted use characterAt() or if appending to a StringBuilder using appendTextToStringBuilder(). The two new functions are optimized for 8 bit strings.
> The string() function should not return "const String". It should just return "String".
Done.
> > Source/WebKit/mac/WebView/WebFrame.mm:500 > > - // This will give a system malloc'd buffer that can be turned directly into an NSString > > - unsigned length; > > - UChar* buf = plainTextToMallocAllocatedBuffer(core(range), length, true); > > - > > - if (!buf) > > - return [NSString string]; > > - > > - // Transfer buffer ownership to NSString > > - return [[[NSString alloc] initWithCharactersNoCopy:buf length:length freeWhenDone:YES] autorelease]; > > + return plainText(core(range), TextIteratorDefaultBehavior, true); > > Why is it OK to remove the plainTextToMallocAllocatedBuffer optimization? That was a critical change back when Antti originally did it. Did you research why it was needed and determine it’s not needed any more?
The original change was done in
http://trac.webkit.org/changeset/24832
to fix fragmentation caused by plainText() on large full documents. The change did two things to fix the problem, use system malloc and allocate large buffers. The patch I propose goes back to using fastMalloc, but it keeps the large buffer behavior exactly for the original reason. I tested it with the test referenced in the code and posted those results with the original patch. That is a performance test and not a memory test. The test site "
http://dscoder.com/test.txt
" referenced in Antti's change is no longer available. Instead I tried some memory tests using the HTML 5 spec file in PerformanceTests/Parser/resources/html5.html (a 5.8MB file). Loading this file invokes plainText() once to create a 2.6MB String. I loaded this file and measured memory before and after this patch: Baseline With Patch Fast Malloc Allocations 835719 720998 Fast Malloc Dirty 133.1MB 127.5MB System Malloc Allocations 35318 35320 System Malloc Dirty 25.2MB 22.8MB Total Dirty 171.0MB 162.7MB
Brent Fulgham
Comment 6
2012-11-27 12:26:52 PST
Comment on
attachment 176308
[details]
Updated Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=176308&action=review
This looks like a huge improvement! However, I think it will be even better if you don't copy the entire String each time you want to access a single character :-) r- for this reason (or please explain what I'm misunderstanding about your code change).
> Source/WebCore/editing/TextIterator.cpp:472 > + return string()[startOffset() + index];
Currently this makes a copy of the string, then returns one character from it!
> Source/WebCore/editing/TextIterator.cpp:480 > + builder.append(string(), startOffset(), length());
Again, I think this is copying for no real reason...
> Source/WebCore/editing/TextIterator.h:110 > + String string() const { return m_text; }
Isn't this making a copy? I think your original "const String" should probably have just been "const String&"
Michael Saboff
Comment 7
2012-11-27 14:21:03 PST
Created
attachment 176338
[details]
ANother updated patch (In reply to
comment #6
)
> (From update of
attachment 176308
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=176308&action=review
> > This looks like a huge improvement! However, I think it will be even better if you don't copy the entire String each time you want to access a single character :-) > > r- for this reason (or please explain what I'm misunderstanding about your code change). > > > Source/WebCore/editing/TextIterator.cpp:472 > > + return string()[startOffset() + index]; > > Currently this makes a copy of the string, then returns one character from it! > > > Source/WebCore/editing/TextIterator.cpp:480 > > + builder.append(string(), startOffset(), length()); > > Again, I think this is copying for no real reason... > > > Source/WebCore/editing/TextIterator.h:110 > > + String string() const { return m_text; } > > Isn't this making a copy? I think your original "const String" should probably have just been "const String&"
Made string() be const String& as suggested.
Brent Fulgham
Comment 8
2012-11-27 21:20:09 PST
Comment on
attachment 176338
[details]
ANother updated patch Looks great! Thanks. r=me.
WebKit Review Bot
Comment 9
2012-11-27 21:28:23 PST
Comment on
attachment 176338
[details]
ANother updated patch Clearing flags on attachment: 176338 Committed
r135972
: <
http://trac.webkit.org/changeset/135972
>
WebKit Review Bot
Comment 10
2012-11-27 21:28:27 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug