Bug 162035 - Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
Summary: Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-15 14:57 PDT by Alex Christensen
Modified: 2016-09-16 13:37 PDT (History)
0 users

See Also:


Attachments
Patch (45.93 KB, patch)
2016-09-15 14:59 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (44.00 KB, patch)
2016-09-15 16:18 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (43.46 KB, patch)
2016-09-15 17:32 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (43.46 KB, patch)
2016-09-15 22:16 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (43.72 KB, patch)
2016-09-16 13:34 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (43.49 KB, patch)
2016-09-16 13:36 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2016-09-15 14:57:29 PDT
Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
Comment 1 Alex Christensen 2016-09-15 14:59:50 PDT
Created attachment 289004 [details]
Patch
Comment 2 Alex Christensen 2016-09-15 15:05:47 PDT
This is a ~20% speedup in url parsing.
Comment 3 Sam Weinig 2016-09-15 15:48:34 PDT
Comment on attachment 289004 [details]
Patch

Should we have a StringBuilder<CharacterType> class?
Comment 4 Alex Christensen 2016-09-15 15:50:15 PDT
(In reply to comment #0)
> Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
I don't think so.  It wouldn't help in this case, where most of the time we are creating 8-bit Strings, but we need to be able to change our mind at the end and make a 16-bit string if there are unicode code points in the fragment.
Comment 5 Chris Dumez 2016-09-15 16:13:17 PDT
Comment on attachment 289004 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=289004&action=review

> Source/WebCore/ChangeLog:11
> +

Perf data?

> Source/WebCore/platform/URLParser.cpp:449
>          builder.append(codePoint);

This looks risky given that codePoint is a Uchar32?

> Source/WebCore/platform/URLParser.cpp:653
> +            m_asciiBuffer.uncheckedAppend(base.m_string[i]);

Using [] operator on String, you have a branch for each iteration that checks if the buffer is 8bit. You only need to do the check once. Then you could access either characters8() or characters16() on the string and iterate over that (maybe even use vector::append(base.m_string.characters8(), asciiLength);)

> Source/WebCore/platform/URLParser.cpp:657
> +        for (size_t i = 0; i < asciiLength; ++i) {

There could be a fast path if fragment.is8bit().

> Source/WebCore/platform/URLParser.cpp:669
> +            ASSERT_WITH_SECURITY_IMPLICATION(isASCII(base.m_string[i]));

Same comments as above.

> Source/WebCore/platform/URLParser.cpp:1159
> +                m_asciiBuffer.uncheckedAppend('/');

I am pretty sure we can write these on 1 line.

> Source/WebCore/platform/URLParser.cpp:1177
> +                m_asciiBuffer.uncheckedAppend('/');

could be on 1 line.

> Source/WebCore/platform/URLParser.cpp:1197
> +                    m_asciiBuffer.uncheckedAppend('/');

Could be on 1 line.

> Source/WebCore/platform/URLParser.cpp:1508
> +            m_asciiBuffer.resize(m_url.m_passwordEnd);

I believe it would be nicer to use shrink() or shrinkCapacity() here.

> Source/WebCore/platform/URLParser.cpp:1603
> +    case 1:

This seems super long. We can probably use something like '0' + digit.

> Source/WebCore/platform/URLParser.cpp:1640
> +    bool appendZero = false;

I think this should just be a loop that keeps dividing by 10 and keeps appending '0' + number %10.

> Source/WebCore/platform/URLParser.cpp:1659
> +static void appendNumber(Vector<LChar>& destination, uint16_t number)

I think we probably don't need this overload if we use the loop I suggest?

> Source/WebCore/platform/URLParser.cpp:2175
> +    for (size_t i = 0; i < asciiDomain.value().length(); ++i)

We don't need a loop here, we can use asciiDomainCharacters and asciiDomain.size() and do 1 call to append().
Comment 6 Alex Christensen 2016-09-15 16:18:41 PDT
Created attachment 289012 [details]
Patch
Comment 7 Chris Dumez 2016-09-15 16:50:41 PDT
Comment on attachment 289012 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=289012&action=review

I think this looks good otherwise.

> Source/WebCore/platform/URLParser.cpp:408
> +static bool isWindowsDriveLetter(const Vector<LChar>& buffer, size_t index)

I think it'd be nice to rename builder to buffer everywhere, not just here since you switch to a vector.

> Source/WebCore/platform/URLParser.cpp:654
> +            m_asciiBuffer.append(base.m_string.characters8(), asciiLength);

In this if case, we know the fragment has no unicode character. Therefore, you can append the whole m_string, not just up to asciiLength.

> Source/WebCore/platform/URLParser.cpp:657
> +                UChar c = base.m_string.characters16()[i];

I would cache base.m_string.characters16() before the loop.

> Source/WebCore/platform/URLParser.cpp:664
> +        for (size_t i = 0; i < asciiLength; ++i) {

This only needs to be done in the else case.

> Source/WebCore/platform/URLParser.cpp:672
> +    } else {

The whole code in this else case is pretty much duplicated from the first part of your if case. I think we should move this to a separate utility function to avoid duplication.

> Source/WebCore/platform/URLParser.cpp:1582
> +    destination.reserveCapacity(destination.size() + strlen("255"));

I would just use + 3 here.

> Source/WebCore/platform/URLParser.cpp:1601
> +    destination.reserveCapacity(destination.size() + strlen("65535"));

I would just use + 5 here.

> Source/WebCore/platform/URLParser.cpp:1609
> +    if (tenThousands) {

Honestly, unless there is a measurable impact on performance. I think it would look a lot nicer and the code would be much shorter if we used something similar to what StringBuilder is using:
    LChar buf[sizeof(UnsignedIntegerType) * 3 + 1];
    LChar* end = buf + WTF_ARRAY_LENGTH(buf);
    LChar* p = end;

    do {
        *--p = (number % 10) + '0';
        number /= 10;
    } while (number);

   destination.append(p, end - p);

Then we don't need 2 separate functions either, just a templated one.

> Source/WebCore/platform/URLParser.cpp:1999
> +        LChar colon = m_asciiBuffer.takeLast();

I would prefer:
ASSERT(m_asciiBuffer.last() == ':');
m_asciiBuffer.shrink(m_asciiBuffer.size() - 1);
Comment 8 Alex Christensen 2016-09-15 17:32:01 PDT
Created attachment 289023 [details]
Patch
Comment 9 Alex Christensen 2016-09-15 22:16:36 PDT
Created attachment 289033 [details]
Patch
Comment 10 Chris Dumez 2016-09-16 13:03:55 PDT
Comment on attachment 289033 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=289033&action=review

r=me but please check why the win EWS is sad.

> Source/WebCore/platform/URLParser.cpp:641
> +static void copyASCIIStringUntil(Vector<LChar>& destination, const String& string, size_t length8Bit, size_t length16Bit)

nit: I would call the parameters: lengthIf8Bit / lengthIf16Bit.

> Source/WebCore/platform/URLParser.cpp:665
> +        if (!base.m_string.is8Bit()) {

nit: Feel free to ignore this comment but I personally feel this is likely not worth the complexity. In the 16bit case, I would just copy the whole fragment to m_unicodeFragmentBuffer. This also probably means we would be able to reuse copyASCIIStringUntil() and do this on one line.
Comment 11 Alex Christensen 2016-09-16 13:34:05 PDT
Created attachment 289104 [details]
Patch
Comment 12 Alex Christensen 2016-09-16 13:35:30 PDT
(In reply to comment #10)
> > Source/WebCore/platform/URLParser.cpp:665
> > +        if (!base.m_string.is8Bit()) {
> 
> nit: Feel free to ignore this comment but I personally feel this is likely
> not worth the complexity. In the 16bit case, I would just copy the whole
> fragment to m_unicodeFragmentBuffer. This also probably means we would be
> able to reuse copyASCIIStringUntil() and do this on one line.
If we feel like optimizing the performance of non-ascii fragments, we could make m_unicodeFragmentBuffer a Vector<UChar> instead of a Vector<UChar32>.  That should be done in a different patch, though.
Comment 13 Alex Christensen 2016-09-16 13:36:39 PDT
Created attachment 289105 [details]
Patch
Comment 14 Alex Christensen 2016-09-16 13:37:36 PDT
http://trac.webkit.org/changeset/206044