WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
162035
Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
https://bugs.webkit.org/show_bug.cgi?id=162035
Summary
Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
Alex Christensen
Reported
2016-09-15 14:57:29 PDT
Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
Attachments
Patch
(45.93 KB, patch)
2016-09-15 14:59 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(44.00 KB, patch)
2016-09-15 16:18 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(43.46 KB, patch)
2016-09-15 17:32 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(43.46 KB, patch)
2016-09-15 22:16 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(43.72 KB, patch)
2016-09-16 13:34 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(43.49 KB, patch)
2016-09-16 13:36 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Alex Christensen
Comment 1
2016-09-15 14:59:50 PDT
Created
attachment 289004
[details]
Patch
Alex Christensen
Comment 2
2016-09-15 15:05:47 PDT
This is a ~20% speedup in url parsing.
Sam Weinig
Comment 3
2016-09-15 15:48:34 PDT
Comment on
attachment 289004
[details]
Patch Should we have a StringBuilder<CharacterType> class?
Alex Christensen
Comment 4
2016-09-15 15:50:15 PDT
(In reply to
comment #0
)
> Use Vector<LChar> instead of StringBuilder for the ASCII parts of URLParser
I don't think so. It wouldn't help in this case, where most of the time we are creating 8-bit Strings, but we need to be able to change our mind at the end and make a 16-bit string if there are unicode code points in the fragment.
Chris Dumez
Comment 5
2016-09-15 16:13:17 PDT
Comment on
attachment 289004
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=289004&action=review
> Source/WebCore/ChangeLog:11 > +
Perf data?
> Source/WebCore/platform/URLParser.cpp:449 > builder.append(codePoint);
This looks risky given that codePoint is a Uchar32?
> Source/WebCore/platform/URLParser.cpp:653 > + m_asciiBuffer.uncheckedAppend(base.m_string[i]);
Using [] operator on String, you have a branch for each iteration that checks if the buffer is 8bit. You only need to do the check once. Then you could access either characters8() or characters16() on the string and iterate over that (maybe even use vector::append(base.m_string.characters8(), asciiLength);)
> Source/WebCore/platform/URLParser.cpp:657 > + for (size_t i = 0; i < asciiLength; ++i) {
There could be a fast path if fragment.is8bit().
> Source/WebCore/platform/URLParser.cpp:669 > + ASSERT_WITH_SECURITY_IMPLICATION(isASCII(base.m_string[i]));
Same comments as above.
> Source/WebCore/platform/URLParser.cpp:1159 > + m_asciiBuffer.uncheckedAppend('/');
I am pretty sure we can write these on 1 line.
> Source/WebCore/platform/URLParser.cpp:1177 > + m_asciiBuffer.uncheckedAppend('/');
could be on 1 line.
> Source/WebCore/platform/URLParser.cpp:1197 > + m_asciiBuffer.uncheckedAppend('/');
Could be on 1 line.
> Source/WebCore/platform/URLParser.cpp:1508 > + m_asciiBuffer.resize(m_url.m_passwordEnd);
I believe it would be nicer to use shrink() or shrinkCapacity() here.
> Source/WebCore/platform/URLParser.cpp:1603 > + case 1:
This seems super long. We can probably use something like '0' + digit.
> Source/WebCore/platform/URLParser.cpp:1640 > + bool appendZero = false;
I think this should just be a loop that keeps dividing by 10 and keeps appending '0' + number %10.
> Source/WebCore/platform/URLParser.cpp:1659 > +static void appendNumber(Vector<LChar>& destination, uint16_t number)
I think we probably don't need this overload if we use the loop I suggest?
> Source/WebCore/platform/URLParser.cpp:2175 > + for (size_t i = 0; i < asciiDomain.value().length(); ++i)
We don't need a loop here, we can use asciiDomainCharacters and asciiDomain.size() and do 1 call to append().
Alex Christensen
Comment 6
2016-09-15 16:18:41 PDT
Created
attachment 289012
[details]
Patch
Chris Dumez
Comment 7
2016-09-15 16:50:41 PDT
Comment on
attachment 289012
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=289012&action=review
I think this looks good otherwise.
> Source/WebCore/platform/URLParser.cpp:408 > +static bool isWindowsDriveLetter(const Vector<LChar>& buffer, size_t index)
I think it'd be nice to rename builder to buffer everywhere, not just here since you switch to a vector.
> Source/WebCore/platform/URLParser.cpp:654 > + m_asciiBuffer.append(base.m_string.characters8(), asciiLength);
In this if case, we know the fragment has no unicode character. Therefore, you can append the whole m_string, not just up to asciiLength.
> Source/WebCore/platform/URLParser.cpp:657 > + UChar c = base.m_string.characters16()[i];
I would cache base.m_string.characters16() before the loop.
> Source/WebCore/platform/URLParser.cpp:664 > + for (size_t i = 0; i < asciiLength; ++i) {
This only needs to be done in the else case.
> Source/WebCore/platform/URLParser.cpp:672 > + } else {
The whole code in this else case is pretty much duplicated from the first part of your if case. I think we should move this to a separate utility function to avoid duplication.
> Source/WebCore/platform/URLParser.cpp:1582 > + destination.reserveCapacity(destination.size() + strlen("255"));
I would just use + 3 here.
> Source/WebCore/platform/URLParser.cpp:1601 > + destination.reserveCapacity(destination.size() + strlen("65535"));
I would just use + 5 here.
> Source/WebCore/platform/URLParser.cpp:1609 > + if (tenThousands) {
Honestly, unless there is a measurable impact on performance. I think it would look a lot nicer and the code would be much shorter if we used something similar to what StringBuilder is using: LChar buf[sizeof(UnsignedIntegerType) * 3 + 1]; LChar* end = buf + WTF_ARRAY_LENGTH(buf); LChar* p = end; do { *--p = (number % 10) + '0'; number /= 10; } while (number); destination.append(p, end - p); Then we don't need 2 separate functions either, just a templated one.
> Source/WebCore/platform/URLParser.cpp:1999 > + LChar colon = m_asciiBuffer.takeLast();
I would prefer: ASSERT(m_asciiBuffer.last() == ':'); m_asciiBuffer.shrink(m_asciiBuffer.size() - 1);
Alex Christensen
Comment 8
2016-09-15 17:32:01 PDT
Created
attachment 289023
[details]
Patch
Alex Christensen
Comment 9
2016-09-15 22:16:36 PDT
Created
attachment 289033
[details]
Patch
Chris Dumez
Comment 10
2016-09-16 13:03:55 PDT
Comment on
attachment 289033
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=289033&action=review
r=me but please check why the win EWS is sad.
> Source/WebCore/platform/URLParser.cpp:641 > +static void copyASCIIStringUntil(Vector<LChar>& destination, const String& string, size_t length8Bit, size_t length16Bit)
nit: I would call the parameters: lengthIf8Bit / lengthIf16Bit.
> Source/WebCore/platform/URLParser.cpp:665 > + if (!base.m_string.is8Bit()) {
nit: Feel free to ignore this comment but I personally feel this is likely not worth the complexity. In the 16bit case, I would just copy the whole fragment to m_unicodeFragmentBuffer. This also probably means we would be able to reuse copyASCIIStringUntil() and do this on one line.
Alex Christensen
Comment 11
2016-09-16 13:34:05 PDT
Created
attachment 289104
[details]
Patch
Alex Christensen
Comment 12
2016-09-16 13:35:30 PDT
(In reply to
comment #10
)
> > Source/WebCore/platform/URLParser.cpp:665 > > + if (!base.m_string.is8Bit()) { > > nit: Feel free to ignore this comment but I personally feel this is likely > not worth the complexity. In the 16bit case, I would just copy the whole > fragment to m_unicodeFragmentBuffer. This also probably means we would be > able to reuse copyASCIIStringUntil() and do this on one line.
If we feel like optimizing the performance of non-ascii fragments, we could make m_unicodeFragmentBuffer a Vector<UChar> instead of a Vector<UChar32>. That should be done in a different patch, though.
Alex Christensen
Comment 13
2016-09-16 13:36:39 PDT
Created
attachment 289105
[details]
Patch
Alex Christensen
Comment 14
2016-09-16 13:37:36 PDT
http://trac.webkit.org/changeset/206044
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug