Bug 97740 - HTMLConstructionSite::insertTextNode isn't optimized for 8 bit strings
Summary: HTMLConstructionSite::insertTextNode isn't optimized for 8 bit strings
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-26 16:44 PDT by Michael Saboff
Modified: 2012-10-02 11:40 PDT (History)
2 users (show)

See Also:


Attachments
Patch (8.34 KB, patch)
2012-09-26 18:40 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch updated from review comments (8.79 KB, patch)
2012-09-27 11:15 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch with extra space removed (8.79 KB, patch)
2012-09-27 11:28 PDT, Michael Saboff
darin: review+
darin: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2012-09-26 16:44:40 PDT
HTMLConstructionSite::insertTextNode() always calls characters() on it's argument string, thus up converting an 8 bit argument.
Comment 1 Michael Saboff 2012-09-26 18:40:06 PDT
Created attachment 165912 [details]
Patch
Comment 2 Benjamin Poulain 2012-09-26 20:51:02 PDT
Comment on attachment 165912 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=165912&action=review

> Source/WTF/wtf/text/WTFString.cpp:186
> +        LChar* data;

This could go after the if() for clarity.

> Source/WTF/wtf/text/WTFString.cpp:187
> +        if (lengthToAppend > numeric_limits<unsigned>::max() - length())

In String, length() will test if m_impl is null. At this point, you know you have m_impl. This call to length() and the following should use the StringImpl directly.

> Source/WTF/wtf/text/WTFString.cpp:190
> +        memcpy(data, characters8(), length() * sizeof(LChar));

characters8() also checks if m_impl is null.

I am not sure what is better here, memcpy or StringImpl::copyChars()?

> Source/WebCore/dom/CharacterData.cpp:88
> -    m_data.append(data, end);
> +    if (string.is8Bit())
> +        m_data.append(string.characters8() + offset, end);
> +    else
> +        m_data.append(string.characters16() + offset, end);

Can "string" ever be null?
Comment 3 Michael Saboff 2012-09-27 11:15:58 PDT
Created attachment 166033 [details]
Patch updated from review comments

(In reply to comment #2)
> (From update of attachment 165912 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=165912&action=review
> 
> > Source/WTF/wtf/text/WTFString.cpp:186
> > +        LChar* data;
> 
> This could go after the if() for clarity.

Moved this and the similar instances of UChar* data.
 
> > Source/WTF/wtf/text/WTFString.cpp:187
> > +        if (lengthToAppend > numeric_limits<unsigned>::max() - length())
> 
> In String, length() will test if m_impl is null. At this point, you know you have m_impl. This call to length() and the following should use the StringImpl directly.
 
Factored out length to a local.

> > Source/WTF/wtf/text/WTFString.cpp:190
> > +        memcpy(data, characters8(), length() * sizeof(LChar));
> 
> characters8() also checks if m_impl is null.

Changed occurrences of characters8() and characters16() to be via m_impl.

> I am not sure what is better here, memcpy or StringImpl::copyChars()?

Since compChars() has some optimization for word at once and reverts to memcpy after one check, I used it in all cases.

> > Source/WebCore/dom/CharacterData.cpp:88
> > -    m_data.append(data, end);
> > +    if (string.is8Bit())
> > +        m_data.append(string.characters8() + offset, end);
> > +    else
> > +        m_data.append(string.characters16() + offset, end);
> 
> Can "string" ever be null?

I don't think string can be null in the current path to this method, but I added a check anyway.
Comment 4 WebKit Review Bot 2012-09-27 11:22:39 PDT
Attachment 166033 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/te..." exit_code: 1
Source/WTF/wtf/text/WTFString.cpp:230:  Extra space before )  [whitespace/parens] [2]
Total errors found: 1 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Michael Saboff 2012-09-27 11:28:26 PDT
Created attachment 166035 [details]
Patch with extra space removed

Comments from previous patch still apply.
Comment 6 Darin Adler 2012-10-02 09:36:21 PDT
Comment on attachment 166035 [details]
Patch with extra space removed

View in context: https://bugs.webkit.org/attachment.cgi?id=166035&action=review

> Source/WebCore/dom/CharacterData.cpp:66
> -unsigned CharacterData::parserAppendData(const UChar* data, unsigned dataLength, unsigned lengthLimit)
> +unsigned CharacterData::parserAppendData(const String& string, unsigned dataLength, unsigned lengthLimit, unsigned offset)

None of the callers need the dataLength argument. They all just pass the string length minus the offset.

I also think the name dataLength is confusing given that the first line in this function evaluates m_data.length(), which is not the same thing as dataLength. It was defensible before when the argument passed in was named “data”, but now we’re passing a string that already has its intrinsic length so it’s not good to pass a separate length.

I suggest removing the dataLength argument and using the string length.

I also think the offset would be more logical as the first parameter, right after the string. A little strange to have it be the last one separated from the string by the length limit, even though having it last allows it to be optional.

On reflection, the arguments to this are similar the arguments to the String::substring function: An offset followed by a count. The difference here is that lengthLimit is a limit on the total size of the CharacterData object rather than a limit on the number of new characters to be added from the string.

I also think this function needs to assert that lengthLimit >= oldLength, because otherwise it will malfunction. Or be re-coded to handle that case if the assert fires.

> Source/WebCore/dom/CharacterData.cpp:82
> +    if (!end || !string.length())

This string.length() check we are adding will never be true. That’s because dataLength == (string.length() - offset), and end <= dataLength <= string.length(). So if string.length() is zero, end will also be zero. Also, end is not a good name for the variable now that we have an offset. With the offset involved, end is the length of characters, not the offset of the end in the string, so it no longer makes sense to call it “end”.

> Source/WebCore/dom/CharacterData.h:48
> -    unsigned parserAppendData(const UChar*, unsigned dataLength, unsigned lengthLimit);
> +    unsigned parserAppendData(const String&, unsigned, unsigned , unsigned offset = 0);

Please don’t remove the argument names here. Callers need the names to understand what the otherwise undocumented arguments are.

Also got a stray space here before a comma.
Comment 7 Michael Saboff 2012-10-02 11:35:38 PDT
(In reply to comment #6)
> (From update of attachment 166035 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=166035&action=review
> 
> > Source/WebCore/dom/CharacterData.cpp:66
> > -unsigned CharacterData::parserAppendData(const UChar* data, unsigned dataLength, unsigned lengthLimit)
> > +unsigned CharacterData::parserAppendData(const String& string, unsigned dataLength, unsigned lengthLimit, unsigned offset)
> 
> None of the callers need the dataLength argument. They all just pass the string length minus the offset.
> 
> I also think the name dataLength is confusing given that the first line in this function evaluates m_data.length(), which is not the same thing as dataLength. It was defensible before when the argument passed in was named “data”, but now we’re passing a string that already has its intrinsic length so it’s not good to pass a separate length.
> 
> I suggest removing the dataLength argument and using the string length.

Done.

> I also think the offset would be more logical as the first parameter, right after the string. A little strange to have it be the last one separated from the string by the length limit, even though having it last allows it to be optional.
> 
> On reflection, the arguments to this are similar the arguments to the String::substring function: An offset followed by a count. The difference here is that lengthLimit is a limit on the total size of the CharacterData object rather than a limit on the number of new characters to be added from the string.

The args are now string, offset, lengthLimit.

> I also think this function needs to assert that lengthLimit >= oldLength, because otherwise it will malfunction. Or be re-coded to handle that case if the assert fires.

Added the ASSERT.

> > Source/WebCore/dom/CharacterData.cpp:82
> > +    if (!end || !string.length())
> 
> This string.length() check we are adding will never be true. That’s because dataLength == (string.length() - offset), and end <= dataLength <= string.length(). So if string.length() is zero, end will also be zero.

Removed the string.length() check.

> Also, end is not a good name for the variable now that we have an offset. With the offset involved, end is the length of characters, not the offset of the end in the string, so it no longer makes sense to call it “end”.

Renamed end to characterLengthLimit.

> > Source/WebCore/dom/CharacterData.h:48
> > -    unsigned parserAppendData(const UChar*, unsigned dataLength, unsigned lengthLimit);
> > +    unsigned parserAppendData(const String&, unsigned, unsigned , unsigned offset = 0);
> 
> Please don’t remove the argument names here. Callers need the names to understand what the otherwise undocumented arguments are.

Added the argument names back.

> Also got a stray space here before a comma.

Deleted.
Comment 8 Michael Saboff 2012-10-02 11:40:53 PDT
Committed r130190: <http://trac.webkit.org/changeset/130190>