RESOLVED FIXED 161968
URLParser: Add fast path for utf8 encoding queries
https://bugs.webkit.org/show_bug.cgi?id=161968
Summary URLParser: Add fast path for utf8 encoding queries
Alex Christensen
Reported 2016-09-14 09:14:06 PDT
URLParser: Add fast path for utf8 encoding queries
Attachments
Patch (3.35 KB, patch)
2016-09-14 09:16 PDT, Alex Christensen
dbates: review+
Alex Christensen
Comment 1 2016-09-14 09:16:30 PDT
Daniel Bates
Comment 2 2016-09-14 10:12:02 PDT
Comment on attachment 288824 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288824&action=review > Source/WebCore/ChangeLog:13 > + If the text encoding is utf8 (which is quite common), then we can encode the query Nit: utf8 => UTF-8 > Source/WebCore/platform/URLParser.cpp:121 > + // FIXME: Check error. I know you have a FIXME here. For your consideration, I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" now after the call to U8_APPEND(). We can revisit this code to add additional error handling. > Source/WebCore/platform/URLParser.cpp:497 > + bool encodingIsUTF8 = encoding == UTF8Encoding(); We tend to prefer putting the verb at the beginning of the variable name. Maybe a better name would be isUTF8Encoded? isUTF8Encoding?
Alex Christensen
Comment 3 2016-09-14 11:20:13 PDT
(In reply to comment #2) > I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" Added such assertions with offset <= sizeof(buffer) instead of offset < sizeof(buffer). http://trac.webkit.org/changeset/205918
Darin Adler
Comment 4 2016-09-16 14:48:03 PDT
Comment on attachment 288824 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288824&action=review A few thoughts on this utf8PercentEncodeQuery function. > Source/WebCore/platform/URLParser.cpp:115 > +static void utf8PercentEncodeQuery(UChar32 codePoint, StringBuilder& builder) This appends a single UTF-8 sequence that is part of a query, possibly percent-encoded as needed. The function name "percent encode query" is not so great for a function that does that, since it makes it sound like it always percent encodes and it makes it sound like it encodes an entire query, not a single character. The goal is to be fast; doing this with a function call and a loop for each code point is not going to be very fast. For great speed it is likely we need something where it can speed through a lot of plain ASCII characters and append them all at once. > Source/WebCore/platform/URLParser.cpp:120 > + U8_APPEND(buffer, offset, U8_MAX_LENGTH, codePoint, error); U8_APPEND is not a good choice here. There is no way we will have a problem with the buffer being too small when converting a single code point into a buffer where the size of the buffer is U8_MAX_LENGTH, so extra checking for that is not valuable. If we want to check if the code point is invalid, I think we can do that separately. It would be much better to check that and then to use U8_APPEND_UNSAFE. First, I think a fast path for ASCII is needed: if (isASCII(codePoint)) { builder.append(static_cast<LChar>(codePoint)); return; } Second, I think test cases for invalid code points are needed. I can’t tell you what kind of code to write for those errors without knowing what behavior we need for them. if (!U_IS_UNICODE_CHAR(codePoint)) { // Do whatever makes sense for this case. // But also, not sure if U_IS_UNICODE_CHAR is the check we want above; need to find out what we consider valid vs. invalid. return; } Only then would we use U8_APPEND_UNSAFE. >> Source/WebCore/platform/URLParser.cpp:121 >> + // FIXME: Check error. > > I know you have a FIXME here. For your consideration, I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" now after the call to U8_APPEND(). We can revisit this code to add additional error handling. We do not need that assertion. That invariant is guaranteed in a really reliable way by the use of the constant U8_MAX_LENGTH. > Source/WebCore/platform/URLParser.cpp:128 > + for (int32_t i = 0; i < offset; ++i) { > + auto byte = buffer[i]; > + if (shouldPercentEncodeQueryByte(byte)) > + percentEncode(byte, builder); > + else > + builder.append(byte); > + } A slightly more complicated way to do this that would be more efficient is to make a little object that does this work when it is assigned to with an array-like interface and pass that directly to U8_APPEND_UNSAFE. Can avoid the buffer and the loop and basically generate an unrolled loop instead.
Alex Christensen
Comment 5 2016-09-16 14:51:27 PDT
Thanks for the insights, Darin! I'll use them in my next patch of optimization and testing.
Alex Christensen
Comment 6 2016-09-16 17:00:18 PDT
Note You need to log in before you can comment on or make changes to this bug.