Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
Created attachment 289004 [details] Patch
This is a ~20% speedup in url parsing.
Comment on attachment 289004 [details] Patch Should we have a StringBuilder<CharacterType> class?
(In reply to comment #0) > Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser I don't think so. It wouldn't help in this case, where most of the time we are creating 8-bit Strings, but we need to be able to change our mind at the end and make a 16-bit string if there are unicode code points in the fragment.
Comment on attachment 289004 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289004&action=review > Source/WebCore/ChangeLog:11 > + Perf data? > Source/WebCore/platform/URLParser.cpp:449 > builder.append(codePoint); This looks risky given that codePoint is a Uchar32? > Source/WebCore/platform/URLParser.cpp:653 > + m_asciiBuffer.uncheckedAppend(base.m_string[i]); Using [] operator on String, you have a branch for each iteration that checks if the buffer is 8bit. You only need to do the check once. Then you could access either characters8() or characters16() on the string and iterate over that (maybe even use vector::append(base.m_string.characters8(), asciiLength);) > Source/WebCore/platform/URLParser.cpp:657 > + for (size_t i = 0; i < asciiLength; ++i) { There could be a fast path if fragment.is8bit(). > Source/WebCore/platform/URLParser.cpp:669 > + ASSERT_WITH_SECURITY_IMPLICATION(isASCII(base.m_string[i])); Same comments as above. > Source/WebCore/platform/URLParser.cpp:1159 > + m_asciiBuffer.uncheckedAppend('/'); I am pretty sure we can write these on 1 line. > Source/WebCore/platform/URLParser.cpp:1177 > + m_asciiBuffer.uncheckedAppend('/'); could be on 1 line. > Source/WebCore/platform/URLParser.cpp:1197 > + m_asciiBuffer.uncheckedAppend('/'); Could be on 1 line. > Source/WebCore/platform/URLParser.cpp:1508 > + m_asciiBuffer.resize(m_url.m_passwordEnd); I believe it would be nicer to use shrink() or shrinkCapacity() here. > Source/WebCore/platform/URLParser.cpp:1603 > + case 1: This seems super long. We can probably use something like '0' + digit. > Source/WebCore/platform/URLParser.cpp:1640 > + bool appendZero = false; I think this should just be a loop that keeps dividing by 10 and keeps appending '0' + number %10. > Source/WebCore/platform/URLParser.cpp:1659 > +static void appendNumber(Vector<LChar>& destination, uint16_t number) I think we probably don't need this overload if we use the loop I suggest? > Source/WebCore/platform/URLParser.cpp:2175 > + for (size_t i = 0; i < asciiDomain.value().length(); ++i) We don't need a loop here, we can use asciiDomainCharacters and asciiDomain.size() and do 1 call to append().
Created attachment 289012 [details] Patch
Comment on attachment 289012 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289012&action=review I think this looks good otherwise. > Source/WebCore/platform/URLParser.cpp:408 > +static bool isWindowsDriveLetter(const Vector<LChar>& buffer, size_t index) I think it'd be nice to rename builder to buffer everywhere, not just here since you switch to a vector. > Source/WebCore/platform/URLParser.cpp:654 > + m_asciiBuffer.append(base.m_string.characters8(), asciiLength); In this if case, we know the fragment has no unicode character. Therefore, you can append the whole m_string, not just up to asciiLength. > Source/WebCore/platform/URLParser.cpp:657 > + UChar c = base.m_string.characters16()[i]; I would cache base.m_string.characters16() before the loop. > Source/WebCore/platform/URLParser.cpp:664 > + for (size_t i = 0; i < asciiLength; ++i) { This only needs to be done in the else case. > Source/WebCore/platform/URLParser.cpp:672 > + } else { The whole code in this else case is pretty much duplicated from the first part of your if case. I think we should move this to a separate utility function to avoid duplication. > Source/WebCore/platform/URLParser.cpp:1582 > + destination.reserveCapacity(destination.size() + strlen("255")); I would just use + 3 here. > Source/WebCore/platform/URLParser.cpp:1601 > + destination.reserveCapacity(destination.size() + strlen("65535")); I would just use + 5 here. > Source/WebCore/platform/URLParser.cpp:1609 > + if (tenThousands) { Honestly, unless there is a measurable impact on performance. I think it would look a lot nicer and the code would be much shorter if we used something similar to what StringBuilder is using: LChar buf[sizeof(UnsignedIntegerType) * 3 + 1]; LChar* end = buf + WTF_ARRAY_LENGTH(buf); LChar* p = end; do { *--p = (number % 10) + '0'; number /= 10; } while (number); destination.append(p, end - p); Then we don't need 2 separate functions either, just a templated one. > Source/WebCore/platform/URLParser.cpp:1999 > + LChar colon = m_asciiBuffer.takeLast(); I would prefer: ASSERT(m_asciiBuffer.last() == ':'); m_asciiBuffer.shrink(m_asciiBuffer.size() - 1);
Created attachment 289023 [details] Patch
Created attachment 289033 [details] Patch
Comment on attachment 289033 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289033&action=review r=me but please check why the win EWS is sad. > Source/WebCore/platform/URLParser.cpp:641 > +static void copyASCIIStringUntil(Vector<LChar>& destination, const String& string, size_t length8Bit, size_t length16Bit) nit: I would call the parameters: lengthIf8Bit / lengthIf16Bit. > Source/WebCore/platform/URLParser.cpp:665 > + if (!base.m_string.is8Bit()) { nit: Feel free to ignore this comment but I personally feel this is likely not worth the complexity. In the 16bit case, I would just copy the whole fragment to m_unicodeFragmentBuffer. This also probably means we would be able to reuse copyASCIIStringUntil() and do this on one line.
Created attachment 289104 [details] Patch
(In reply to comment #10) > > Source/WebCore/platform/URLParser.cpp:665 > > + if (!base.m_string.is8Bit()) { > > nit: Feel free to ignore this comment but I personally feel this is likely > not worth the complexity. In the 16bit case, I would just copy the whole > fragment to m_unicodeFragmentBuffer. This also probably means we would be > able to reuse copyASCIIStringUntil() and do this on one line. If we feel like optimizing the performance of non-ascii fragments, we could make m_unicodeFragmentBuffer a Vector<UChar> instead of a Vector<UChar32>. That should be done in a different patch, though.
Created attachment 289105 [details] Patch
http://trac.webkit.org/changeset/206044