WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
97740
HTMLConstructionSite::insertTextNode isn't optimized for 8 bit strings
https://bugs.webkit.org/show_bug.cgi?id=97740
Summary
HTMLConstructionSite::insertTextNode isn't optimized for 8 bit strings
Michael Saboff
Reported
2012-09-26 16:44:40 PDT
HTMLConstructionSite::insertTextNode() always calls characters() on it's argument string, thus up converting an 8 bit argument.
Attachments
Patch
(8.34 KB, patch)
2012-09-26 18:40 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Patch updated from review comments
(8.79 KB, patch)
2012-09-27 11:15 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Patch with extra space removed
(8.79 KB, patch)
2012-09-27 11:28 PDT
,
Michael Saboff
darin
: review+
darin
: commit-queue-
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Michael Saboff
Comment 1
2012-09-26 18:40:06 PDT
Created
attachment 165912
[details]
Patch
Benjamin Poulain
Comment 2
2012-09-26 20:51:02 PDT
Comment on
attachment 165912
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=165912&action=review
> Source/WTF/wtf/text/WTFString.cpp:186 > + LChar* data;
This could go after the if() for clarity.
> Source/WTF/wtf/text/WTFString.cpp:187 > + if (lengthToAppend > numeric_limits<unsigned>::max() - length())
In String, length() will test if m_impl is null. At this point, you know you have m_impl. This call to length() and the following should use the StringImpl directly.
> Source/WTF/wtf/text/WTFString.cpp:190 > + memcpy(data, characters8(), length() * sizeof(LChar));
characters8() also checks if m_impl is null. I am not sure what is better here, memcpy or StringImpl::copyChars()?
> Source/WebCore/dom/CharacterData.cpp:88 > - m_data.append(data, end); > + if (string.is8Bit()) > + m_data.append(string.characters8() + offset, end); > + else > + m_data.append(string.characters16() + offset, end);
Can "string" ever be null?
Michael Saboff
Comment 3
2012-09-27 11:15:58 PDT
Created
attachment 166033
[details]
Patch updated from review comments (In reply to
comment #2
)
> (From update of
attachment 165912
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=165912&action=review
> > > Source/WTF/wtf/text/WTFString.cpp:186 > > + LChar* data; > > This could go after the if() for clarity.
Moved this and the similar instances of UChar* data.
> > Source/WTF/wtf/text/WTFString.cpp:187 > > + if (lengthToAppend > numeric_limits<unsigned>::max() - length()) > > In String, length() will test if m_impl is null. At this point, you know you have m_impl. This call to length() and the following should use the StringImpl directly.
Factored out length to a local.
> > Source/WTF/wtf/text/WTFString.cpp:190 > > + memcpy(data, characters8(), length() * sizeof(LChar)); > > characters8() also checks if m_impl is null.
Changed occurrences of characters8() and characters16() to be via m_impl.
> I am not sure what is better here, memcpy or StringImpl::copyChars()?
Since compChars() has some optimization for word at once and reverts to memcpy after one check, I used it in all cases.
> > Source/WebCore/dom/CharacterData.cpp:88 > > - m_data.append(data, end); > > + if (string.is8Bit()) > > + m_data.append(string.characters8() + offset, end); > > + else > > + m_data.append(string.characters16() + offset, end); > > Can "string" ever be null?
I don't think string can be null in the current path to this method, but I added a check anyway.
WebKit Review Bot
Comment 4
2012-09-27 11:22:39 PDT
Attachment 166033
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/te..." exit_code: 1 Source/WTF/wtf/text/WTFString.cpp:230: Extra space before ) [whitespace/parens] [2] Total errors found: 1 in 8 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 5
2012-09-27 11:28:26 PDT
Created
attachment 166035
[details]
Patch with extra space removed Comments from previous patch still apply.
Darin Adler
Comment 6
2012-10-02 09:36:21 PDT
Comment on
attachment 166035
[details]
Patch with extra space removed View in context:
https://bugs.webkit.org/attachment.cgi?id=166035&action=review
> Source/WebCore/dom/CharacterData.cpp:66 > -unsigned CharacterData::parserAppendData(const UChar* data, unsigned dataLength, unsigned lengthLimit) > +unsigned CharacterData::parserAppendData(const String& string, unsigned dataLength, unsigned lengthLimit, unsigned offset)
None of the callers need the dataLength argument. They all just pass the string length minus the offset. I also think the name dataLength is confusing given that the first line in this function evaluates m_data.length(), which is not the same thing as dataLength. It was defensible before when the argument passed in was named “data”, but now we’re passing a string that already has its intrinsic length so it’s not good to pass a separate length. I suggest removing the dataLength argument and using the string length. I also think the offset would be more logical as the first parameter, right after the string. A little strange to have it be the last one separated from the string by the length limit, even though having it last allows it to be optional. On reflection, the arguments to this are similar the arguments to the String::substring function: An offset followed by a count. The difference here is that lengthLimit is a limit on the total size of the CharacterData object rather than a limit on the number of new characters to be added from the string. I also think this function needs to assert that lengthLimit >= oldLength, because otherwise it will malfunction. Or be re-coded to handle that case if the assert fires.
> Source/WebCore/dom/CharacterData.cpp:82 > + if (!end || !string.length())
This string.length() check we are adding will never be true. That’s because dataLength == (string.length() - offset), and end <= dataLength <= string.length(). So if string.length() is zero, end will also be zero. Also, end is not a good name for the variable now that we have an offset. With the offset involved, end is the length of characters, not the offset of the end in the string, so it no longer makes sense to call it “end”.
> Source/WebCore/dom/CharacterData.h:48 > - unsigned parserAppendData(const UChar*, unsigned dataLength, unsigned lengthLimit); > + unsigned parserAppendData(const String&, unsigned, unsigned , unsigned offset = 0);
Please don’t remove the argument names here. Callers need the names to understand what the otherwise undocumented arguments are. Also got a stray space here before a comma.
Michael Saboff
Comment 7
2012-10-02 11:35:38 PDT
(In reply to
comment #6
)
> (From update of
attachment 166035
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=166035&action=review
> > > Source/WebCore/dom/CharacterData.cpp:66 > > -unsigned CharacterData::parserAppendData(const UChar* data, unsigned dataLength, unsigned lengthLimit) > > +unsigned CharacterData::parserAppendData(const String& string, unsigned dataLength, unsigned lengthLimit, unsigned offset) > > None of the callers need the dataLength argument. They all just pass the string length minus the offset. > > I also think the name dataLength is confusing given that the first line in this function evaluates m_data.length(), which is not the same thing as dataLength. It was defensible before when the argument passed in was named “data”, but now we’re passing a string that already has its intrinsic length so it’s not good to pass a separate length. > > I suggest removing the dataLength argument and using the string length.
Done.
> I also think the offset would be more logical as the first parameter, right after the string. A little strange to have it be the last one separated from the string by the length limit, even though having it last allows it to be optional. > > On reflection, the arguments to this are similar the arguments to the String::substring function: An offset followed by a count. The difference here is that lengthLimit is a limit on the total size of the CharacterData object rather than a limit on the number of new characters to be added from the string.
The args are now string, offset, lengthLimit.
> I also think this function needs to assert that lengthLimit >= oldLength, because otherwise it will malfunction. Or be re-coded to handle that case if the assert fires.
Added the ASSERT.
> > Source/WebCore/dom/CharacterData.cpp:82 > > + if (!end || !string.length()) > > This string.length() check we are adding will never be true. That’s because dataLength == (string.length() - offset), and end <= dataLength <= string.length(). So if string.length() is zero, end will also be zero.
Removed the string.length() check.
> Also, end is not a good name for the variable now that we have an offset. With the offset involved, end is the length of characters, not the offset of the end in the string, so it no longer makes sense to call it “end”.
Renamed end to characterLengthLimit.
> > Source/WebCore/dom/CharacterData.h:48 > > - unsigned parserAppendData(const UChar*, unsigned dataLength, unsigned lengthLimit); > > + unsigned parserAppendData(const String&, unsigned, unsigned , unsigned offset = 0); > > Please don’t remove the argument names here. Callers need the names to understand what the otherwise undocumented arguments are.
Added the argument names back.
> Also got a stray space here before a comma.
Deleted.
Michael Saboff
Comment 8
2012-10-02 11:40:53 PDT
Committed
r130190
: <
http://trac.webkit.org/changeset/130190
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug