URLParser: Add fast path for utf8 encoding queries
Created attachment 288824 [details] Patch
Comment on attachment 288824 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288824&action=review > Source/WebCore/ChangeLog:13 > + If the text encoding is utf8 (which is quite common), then we can encode the query Nit: utf8 => UTF-8 > Source/WebCore/platform/URLParser.cpp:121 > + // FIXME: Check error. I know you have a FIXME here. For your consideration, I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" now after the call to U8_APPEND(). We can revisit this code to add additional error handling. > Source/WebCore/platform/URLParser.cpp:497 > + bool encodingIsUTF8 = encoding == UTF8Encoding(); We tend to prefer putting the verb at the beginning of the variable name. Maybe a better name would be isUTF8Encoded? isUTF8Encoding?
(In reply to comment #2) > I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" Added such assertions with offset <= sizeof(buffer) instead of offset < sizeof(buffer). http://trac.webkit.org/changeset/205918
Comment on attachment 288824 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288824&action=review A few thoughts on this utf8PercentEncodeQuery function. > Source/WebCore/platform/URLParser.cpp:115 > +static void utf8PercentEncodeQuery(UChar32 codePoint, StringBuilder& builder) This appends a single UTF-8 sequence that is part of a query, possibly percent-encoded as needed. The function name "percent encode query" is not so great for a function that does that, since it makes it sound like it always percent encodes and it makes it sound like it encodes an entire query, not a single character. The goal is to be fast; doing this with a function call and a loop for each code point is not going to be very fast. For great speed it is likely we need something where it can speed through a lot of plain ASCII characters and append them all at once. > Source/WebCore/platform/URLParser.cpp:120 > + U8_APPEND(buffer, offset, U8_MAX_LENGTH, codePoint, error); U8_APPEND is not a good choice here. There is no way we will have a problem with the buffer being too small when converting a single code point into a buffer where the size of the buffer is U8_MAX_LENGTH, so extra checking for that is not valuable. If we want to check if the code point is invalid, I think we can do that separately. It would be much better to check that and then to use U8_APPEND_UNSAFE. First, I think a fast path for ASCII is needed: if (isASCII(codePoint)) { builder.append(static_cast<LChar>(codePoint)); return; } Second, I think test cases for invalid code points are needed. I can’t tell you what kind of code to write for those errors without knowing what behavior we need for them. if (!U_IS_UNICODE_CHAR(codePoint)) { // Do whatever makes sense for this case. // But also, not sure if U_IS_UNICODE_CHAR is the check we want above; need to find out what we consider valid vs. invalid. return; } Only then would we use U8_APPEND_UNSAFE. >> Source/WebCore/platform/URLParser.cpp:121 >> + // FIXME: Check error. > > I know you have a FIXME here. For your consideration, I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" now after the call to U8_APPEND(). We can revisit this code to add additional error handling. We do not need that assertion. That invariant is guaranteed in a really reliable way by the use of the constant U8_MAX_LENGTH. > Source/WebCore/platform/URLParser.cpp:128 > + for (int32_t i = 0; i < offset; ++i) { > + auto byte = buffer[i]; > + if (shouldPercentEncodeQueryByte(byte)) > + percentEncode(byte, builder); > + else > + builder.append(byte); > + } A slightly more complicated way to do this that would be more efficient is to make a little object that does this work when it is assigned to with an array-like interface and pass that directly to U8_APPEND_UNSAFE. Can avoid the buffer and the loop and basically generate an unrolled loop instead.
Thanks for the insights, Darin! I'll use them in my next patch of optimization and testing.
This is being used in https://bugs.webkit.org/show_bug.cgi?id=162105