97740 – HTMLConstructionSite::insertTextNode isn't optimized for 8 bit strings

RESOLVED FIXED 97740

HTMLConstructionSite::insertTextNode isn't optimized for 8 bit strings

https://bugs.webkit.org/show_bug.cgi?id=97740

Summary HTMLConstructionSite::insertTextNode isn't optimized for 8 bit strings

Michael Saboff

Reported 2012-09-26 16:44:40 PDT

HTMLConstructionSite::insertTextNode() always calls characters() on it's argument string, thus up converting an 8 bit argument.

Attachments
Patch (8.34 KB, patch) 2012-09-26 18:40 PDT, Michael Saboff	no flags	Details Formatted Diff Diff
Patch updated from review comments (8.79 KB, patch) 2012-09-27 11:15 PDT, Michael Saboff	no flags	Details Formatted Diff Diff
Patch with extra space removed (8.79 KB, patch) 2012-09-27 11:28 PDT, Michael Saboff	darin: review+ darin: commit-queue-	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Michael Saboff

Comment 1 2012-09-26 18:40:06 PDT

Created attachment 165912 [details] Patch

Benjamin Poulain

Comment 2 2012-09-26 20:51:02 PDT

Comment on attachment 165912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=165912&action=review > Source/WTF/wtf/text/WTFString.cpp:186 > + LChar* data; This could go after the if() for clarity. > Source/WTF/wtf/text/WTFString.cpp:187 > + if (lengthToAppend > numeric_limits<unsigned>::max() - length()) In String, length() will test if m_impl is null. At this point, you know you have m_impl. This call to length() and the following should use the StringImpl directly. > Source/WTF/wtf/text/WTFString.cpp:190 > + memcpy(data, characters8(), length() * sizeof(LChar)); characters8() also checks if m_impl is null. I am not sure what is better here, memcpy or StringImpl::copyChars()? > Source/WebCore/dom/CharacterData.cpp:88 > - m_data.append(data, end); > + if (string.is8Bit()) > + m_data.append(string.characters8() + offset, end); > + else > + m_data.append(string.characters16() + offset, end); Can "string" ever be null?

Michael Saboff

Comment 3 2012-09-27 11:15:58 PDT

Created attachment 166033 [details] Patch updated from review comments (In reply to comment #2) > (From update of attachment 165912 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=165912&action=review > > > Source/WTF/wtf/text/WTFString.cpp:186 > > + LChar* data; > > This could go after the if() for clarity. Moved this and the similar instances of UChar* data. > > Source/WTF/wtf/text/WTFString.cpp:187 > > + if (lengthToAppend > numeric_limits<unsigned>::max() - length()) > > In String, length() will test if m_impl is null. At this point, you know you have m_impl. This call to length() and the following should use the StringImpl directly. Factored out length to a local. > > Source/WTF/wtf/text/WTFString.cpp:190 > > + memcpy(data, characters8(), length() * sizeof(LChar)); > > characters8() also checks if m_impl is null. Changed occurrences of characters8() and characters16() to be via m_impl. > I am not sure what is better here, memcpy or StringImpl::copyChars()? Since compChars() has some optimization for word at once and reverts to memcpy after one check, I used it in all cases. > > Source/WebCore/dom/CharacterData.cpp:88 > > - m_data.append(data, end); > > + if (string.is8Bit()) > > + m_data.append(string.characters8() + offset, end); > > + else > > + m_data.append(string.characters16() + offset, end); > > Can "string" ever be null? I don't think string can be null in the current path to this method, but I added a check anyway.

WebKit Review Bot

Comment 4 2012-09-27 11:22:39 PDT

Attachment 166033 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/te..." exit_code: 1 Source/WTF/wtf/text/WTFString.cpp:230: Extra space before ) [whitespace/parens] [2] Total errors found: 1 in 8 files If any of these errors are false positives, please file a bug against check-webkit-style.

Michael Saboff

Comment 5 2012-09-27 11:28:26 PDT

Created attachment 166035 [details] Patch with extra space removed Comments from previous patch still apply.

Darin Adler

Comment 6 2012-10-02 09:36:21 PDT

Comment on attachment 166035 [details] Patch with extra space removed View in context: https://bugs.webkit.org/attachment.cgi?id=166035&action=review > Source/WebCore/dom/CharacterData.cpp:66 > -unsigned CharacterData::parserAppendData(const UChar* data, unsigned dataLength, unsigned lengthLimit) > +unsigned CharacterData::parserAppendData(const String& string, unsigned dataLength, unsigned lengthLimit, unsigned offset) None of the callers need the dataLength argument. They all just pass the string length minus the offset. I also think the name dataLength is confusing given that the first line in this function evaluates m_data.length(), which is not the same thing as dataLength. It was defensible before when the argument passed in was named “data”, but now we’re passing a string that already has its intrinsic length so it’s not good to pass a separate length. I suggest removing the dataLength argument and using the string length. I also think the offset would be more logical as the first parameter, right after the string. A little strange to have it be the last one separated from the string by the length limit, even though having it last allows it to be optional. On reflection, the arguments to this are similar the arguments to the String::substring function: An offset followed by a count. The difference here is that lengthLimit is a limit on the total size of the CharacterData object rather than a limit on the number of new characters to be added from the string. I also think this function needs to assert that lengthLimit >= oldLength, because otherwise it will malfunction. Or be re-coded to handle that case if the assert fires. > Source/WebCore/dom/CharacterData.cpp:82 > + if (!end || !string.length()) This string.length() check we are adding will never be true. That’s because dataLength == (string.length() - offset), and end <= dataLength <= string.length(). So if string.length() is zero, end will also be zero. Also, end is not a good name for the variable now that we have an offset. With the offset involved, end is the length of characters, not the offset of the end in the string, so it no longer makes sense to call it “end”. > Source/WebCore/dom/CharacterData.h:48 > - unsigned parserAppendData(const UChar*, unsigned dataLength, unsigned lengthLimit); > + unsigned parserAppendData(const String&, unsigned, unsigned , unsigned offset = 0); Please don’t remove the argument names here. Callers need the names to understand what the otherwise undocumented arguments are. Also got a stray space here before a comma.

Michael Saboff

Comment 7 2012-10-02 11:35:38 PDT

(In reply to comment #6) > (From update of attachment 166035 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=166035&action=review > > > Source/WebCore/dom/CharacterData.cpp:66 > > -unsigned CharacterData::parserAppendData(const UChar* data, unsigned dataLength, unsigned lengthLimit) > > +unsigned CharacterData::parserAppendData(const String& string, unsigned dataLength, unsigned lengthLimit, unsigned offset) > > None of the callers need the dataLength argument. They all just pass the string length minus the offset. > > I also think the name dataLength is confusing given that the first line in this function evaluates m_data.length(), which is not the same thing as dataLength. It was defensible before when the argument passed in was named “data”, but now we’re passing a string that already has its intrinsic length so it’s not good to pass a separate length. > > I suggest removing the dataLength argument and using the string length. Done. > I also think the offset would be more logical as the first parameter, right after the string. A little strange to have it be the last one separated from the string by the length limit, even though having it last allows it to be optional. > > On reflection, the arguments to this are similar the arguments to the String::substring function: An offset followed by a count. The difference here is that lengthLimit is a limit on the total size of the CharacterData object rather than a limit on the number of new characters to be added from the string. The args are now string, offset, lengthLimit. > I also think this function needs to assert that lengthLimit >= oldLength, because otherwise it will malfunction. Or be re-coded to handle that case if the assert fires. Added the ASSERT. > > Source/WebCore/dom/CharacterData.cpp:82 > > + if (!end || !string.length()) > > This string.length() check we are adding will never be true. That’s because dataLength == (string.length() - offset), and end <= dataLength <= string.length(). So if string.length() is zero, end will also be zero. Removed the string.length() check. > Also, end is not a good name for the variable now that we have an offset. With the offset involved, end is the length of characters, not the offset of the end in the string, so it no longer makes sense to call it “end”. Renamed end to characterLengthLimit. > > Source/WebCore/dom/CharacterData.h:48 > > - unsigned parserAppendData(const UChar*, unsigned dataLength, unsigned lengthLimit); > > + unsigned parserAppendData(const String&, unsigned, unsigned , unsigned offset = 0); > > Please don’t remove the argument names here. Callers need the names to understand what the otherwise undocumented arguments are. Added the argument names back. > Also got a stray space here before a comma. Deleted.

Michael Saboff

Comment 8 2012-10-02 11:40:53 PDT

Committed r130190: <http://trac.webkit.org/changeset/130190>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component DOM

Assignee

Michael Saboff

Reported

2012-09-26 16:44 PDT

Modified

2012-10-02 11:40 PDT History

CC List

2 users Show

URL

Keywords

Depends on

Blocks