162035 – Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser

RESOLVED FIXED 162035

Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser

https://bugs.webkit.org/show_bug.cgi?id=162035

Summary Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser

Alex Christensen

Reported 2016-09-15 14:57:29 PDT

Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser

Attachments
Patch (45.93 KB, patch) 2016-09-15 14:59 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (44.00 KB, patch) 2016-09-15 16:18 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (43.46 KB, patch) 2016-09-15 17:32 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (43.46 KB, patch) 2016-09-15 22:16 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (43.72 KB, patch) 2016-09-16 13:34 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (43.49 KB, patch) 2016-09-16 13:36 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Alex Christensen

Comment 1 2016-09-15 14:59:50 PDT

Created attachment 289004 [details] Patch

Alex Christensen

Comment 2 2016-09-15 15:05:47 PDT

This is a ~20% speedup in url parsing.

Sam Weinig

Comment 3 2016-09-15 15:48:34 PDT

Comment on attachment 289004 [details] Patch Should we have a StringBuilder<CharacterType> class?

Alex Christensen

Comment 4 2016-09-15 15:50:15 PDT

(In reply to comment #0) > Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser I don't think so. It wouldn't help in this case, where most of the time we are creating 8-bit Strings, but we need to be able to change our mind at the end and make a 16-bit string if there are unicode code points in the fragment.

Chris Dumez

Comment 5 2016-09-15 16:13:17 PDT

Comment on attachment 289004 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289004&action=review > Source/WebCore/ChangeLog:11 > + Perf data? > Source/WebCore/platform/URLParser.cpp:449 > builder.append(codePoint); This looks risky given that codePoint is a Uchar32? > Source/WebCore/platform/URLParser.cpp:653 > + m_asciiBuffer.uncheckedAppend(base.m_string[i]); Using [] operator on String, you have a branch for each iteration that checks if the buffer is 8bit. You only need to do the check once. Then you could access either characters8() or characters16() on the string and iterate over that (maybe even use vector::append(base.m_string.characters8(), asciiLength);) > Source/WebCore/platform/URLParser.cpp:657 > + for (size_t i = 0; i < asciiLength; ++i) { There could be a fast path if fragment.is8bit(). > Source/WebCore/platform/URLParser.cpp:669 > + ASSERT_WITH_SECURITY_IMPLICATION(isASCII(base.m_string[i])); Same comments as above. > Source/WebCore/platform/URLParser.cpp:1159 > + m_asciiBuffer.uncheckedAppend('/'); I am pretty sure we can write these on 1 line. > Source/WebCore/platform/URLParser.cpp:1177 > + m_asciiBuffer.uncheckedAppend('/'); could be on 1 line. > Source/WebCore/platform/URLParser.cpp:1197 > + m_asciiBuffer.uncheckedAppend('/'); Could be on 1 line. > Source/WebCore/platform/URLParser.cpp:1508 > + m_asciiBuffer.resize(m_url.m_passwordEnd); I believe it would be nicer to use shrink() or shrinkCapacity() here. > Source/WebCore/platform/URLParser.cpp:1603 > + case 1: This seems super long. We can probably use something like '0' + digit. > Source/WebCore/platform/URLParser.cpp:1640 > + bool appendZero = false; I think this should just be a loop that keeps dividing by 10 and keeps appending '0' + number %10. > Source/WebCore/platform/URLParser.cpp:1659 > +static void appendNumber(Vector<LChar>& destination, uint16_t number) I think we probably don't need this overload if we use the loop I suggest? > Source/WebCore/platform/URLParser.cpp:2175 > + for (size_t i = 0; i < asciiDomain.value().length(); ++i) We don't need a loop here, we can use asciiDomainCharacters and asciiDomain.size() and do 1 call to append().

Alex Christensen

Comment 6 2016-09-15 16:18:41 PDT

Created attachment 289012 [details] Patch

Chris Dumez

Comment 7 2016-09-15 16:50:41 PDT

Comment on attachment 289012 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289012&action=review I think this looks good otherwise. > Source/WebCore/platform/URLParser.cpp:408 > +static bool isWindowsDriveLetter(const Vector<LChar>& buffer, size_t index) I think it'd be nice to rename builder to buffer everywhere, not just here since you switch to a vector. > Source/WebCore/platform/URLParser.cpp:654 > + m_asciiBuffer.append(base.m_string.characters8(), asciiLength); In this if case, we know the fragment has no unicode character. Therefore, you can append the whole m_string, not just up to asciiLength. > Source/WebCore/platform/URLParser.cpp:657 > + UChar c = base.m_string.characters16()[i]; I would cache base.m_string.characters16() before the loop. > Source/WebCore/platform/URLParser.cpp:664 > + for (size_t i = 0; i < asciiLength; ++i) { This only needs to be done in the else case. > Source/WebCore/platform/URLParser.cpp:672 > + } else { The whole code in this else case is pretty much duplicated from the first part of your if case. I think we should move this to a separate utility function to avoid duplication. > Source/WebCore/platform/URLParser.cpp:1582 > + destination.reserveCapacity(destination.size() + strlen("255")); I would just use + 3 here. > Source/WebCore/platform/URLParser.cpp:1601 > + destination.reserveCapacity(destination.size() + strlen("65535")); I would just use + 5 here. > Source/WebCore/platform/URLParser.cpp:1609 > + if (tenThousands) { Honestly, unless there is a measurable impact on performance. I think it would look a lot nicer and the code would be much shorter if we used something similar to what StringBuilder is using: LChar buf[sizeof(UnsignedIntegerType) * 3 + 1]; LChar* end = buf + WTF_ARRAY_LENGTH(buf); LChar* p = end; do { *--p = (number % 10) + '0'; number /= 10; } while (number); destination.append(p, end - p); Then we don't need 2 separate functions either, just a templated one. > Source/WebCore/platform/URLParser.cpp:1999 > + LChar colon = m_asciiBuffer.takeLast(); I would prefer: ASSERT(m_asciiBuffer.last() == ':'); m_asciiBuffer.shrink(m_asciiBuffer.size() - 1);

Alex Christensen

Comment 8 2016-09-15 17:32:01 PDT

Created attachment 289023 [details] Patch

Alex Christensen

Comment 9 2016-09-15 22:16:36 PDT

Created attachment 289033 [details] Patch

Chris Dumez

Comment 10 2016-09-16 13:03:55 PDT

Comment on attachment 289033 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289033&action=review r=me but please check why the win EWS is sad. > Source/WebCore/platform/URLParser.cpp:641 > +static void copyASCIIStringUntil(Vector<LChar>& destination, const String& string, size_t length8Bit, size_t length16Bit) nit: I would call the parameters: lengthIf8Bit / lengthIf16Bit. > Source/WebCore/platform/URLParser.cpp:665 > + if (!base.m_string.is8Bit()) { nit: Feel free to ignore this comment but I personally feel this is likely not worth the complexity. In the 16bit case, I would just copy the whole fragment to m_unicodeFragmentBuffer. This also probably means we would be able to reuse copyASCIIStringUntil() and do this on one line.

Alex Christensen

Comment 11 2016-09-16 13:34:05 PDT

Created attachment 289104 [details] Patch

Alex Christensen

Comment 12 2016-09-16 13:35:30 PDT

(In reply to comment #10) > > Source/WebCore/platform/URLParser.cpp:665 > > + if (!base.m_string.is8Bit()) { > > nit: Feel free to ignore this comment but I personally feel this is likely > not worth the complexity. In the 16bit case, I would just copy the whole > fragment to m_unicodeFragmentBuffer. This also probably means we would be > able to reuse copyASCIIStringUntil() and do this on one line. If we feel like optimizing the performance of non-ascii fragments, we could make m_unicodeFragmentBuffer a Vector<UChar> instead of a Vector<UChar32>. That should be done in a different patch, though.

Alex Christensen

Comment 13 2016-09-16 13:36:39 PDT

Created attachment 289105 [details] Patch

Alex Christensen

Comment 14 2016-09-16 13:37:36 PDT

http://trac.webkit.org/changeset/206044

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Alex Christensen

Reported

2016-09-15 14:57 PDT

Modified

2016-09-16 13:37 PDT History

CC List

0 users

URL

Keywords

Depends on

Blocks