161968 – URLParser: Add fast path for utf8 encoding queries

RESOLVED FIXED 161968

URLParser: Add fast path for utf8 encoding queries

https://bugs.webkit.org/show_bug.cgi?id=161968

Summary URLParser: Add fast path for utf8 encoding queries

Alex Christensen

Reported 2016-09-14 09:14:06 PDT

URLParser: Add fast path for utf8 encoding queries

Attachments
Patch (3.35 KB, patch) 2016-09-14 09:16 PDT, Alex Christensen	dbates: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Alex Christensen

Comment 1 2016-09-14 09:16:30 PDT

Created attachment 288824 [details] Patch

Daniel Bates

Comment 2 2016-09-14 10:12:02 PDT

Comment on attachment 288824 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288824&action=review > Source/WebCore/ChangeLog:13 > + If the text encoding is utf8 (which is quite common), then we can encode the query Nit: utf8 => UTF-8 > Source/WebCore/platform/URLParser.cpp:121 > + // FIXME: Check error. I know you have a FIXME here. For your consideration, I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" now after the call to U8_APPEND(). We can revisit this code to add additional error handling. > Source/WebCore/platform/URLParser.cpp:497 > + bool encodingIsUTF8 = encoding == UTF8Encoding(); We tend to prefer putting the verb at the beginning of the variable name. Maybe a better name would be isUTF8Encoded? isUTF8Encoding?

Alex Christensen

Comment 3 2016-09-14 11:20:13 PDT

(In reply to comment #2) > I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" Added such assertions with offset <= sizeof(buffer) instead of offset < sizeof(buffer). http://trac.webkit.org/changeset/205918

Darin Adler

Comment 4 2016-09-16 14:48:03 PDT

Comment on attachment 288824 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288824&action=review A few thoughts on this utf8PercentEncodeQuery function. > Source/WebCore/platform/URLParser.cpp:115 > +static void utf8PercentEncodeQuery(UChar32 codePoint, StringBuilder& builder) This appends a single UTF-8 sequence that is part of a query, possibly percent-encoded as needed. The function name "percent encode query" is not so great for a function that does that, since it makes it sound like it always percent encodes and it makes it sound like it encodes an entire query, not a single character. The goal is to be fast; doing this with a function call and a loop for each code point is not going to be very fast. For great speed it is likely we need something where it can speed through a lot of plain ASCII characters and append them all at once. > Source/WebCore/platform/URLParser.cpp:120 > + U8_APPEND(buffer, offset, U8_MAX_LENGTH, codePoint, error); U8_APPEND is not a good choice here. There is no way we will have a problem with the buffer being too small when converting a single code point into a buffer where the size of the buffer is U8_MAX_LENGTH, so extra checking for that is not valuable. If we want to check if the code point is invalid, I think we can do that separately. It would be much better to check that and then to use U8_APPEND_UNSAFE. First, I think a fast path for ASCII is needed: if (isASCII(codePoint)) { builder.append(static_cast<LChar>(codePoint)); return; } Second, I think test cases for invalid code points are needed. I can’t tell you what kind of code to write for those errors without knowing what behavior we need for them. if (!U_IS_UNICODE_CHAR(codePoint)) { // Do whatever makes sense for this case. // But also, not sure if U_IS_UNICODE_CHAR is the check we want above; need to find out what we consider valid vs. invalid. return; } Only then would we use U8_APPEND_UNSAFE. >> Source/WebCore/platform/URLParser.cpp:121 >> + // FIXME: Check error. > > I know you have a FIXME here. For your consideration, I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" now after the call to U8_APPEND(). We can revisit this code to add additional error handling. We do not need that assertion. That invariant is guaranteed in a really reliable way by the use of the constant U8_MAX_LENGTH. > Source/WebCore/platform/URLParser.cpp:128 > + for (int32_t i = 0; i < offset; ++i) { > + auto byte = buffer[i]; > + if (shouldPercentEncodeQueryByte(byte)) > + percentEncode(byte, builder); > + else > + builder.append(byte); > + } A slightly more complicated way to do this that would be more efficient is to make a little object that does this work when it is assigned to with an array-like interface and pass that directly to U8_APPEND_UNSAFE. Can avoid the buffer and the loop and basically generate an unrolled loop instead.

Alex Christensen

Comment 5 2016-09-16 14:51:27 PDT

Thanks for the insights, Darin! I'll use them in my next patch of optimization and testing.

Alex Christensen

Comment 6 2016-09-16 17:00:18 PDT

This is being used in https://bugs.webkit.org/show_bug.cgi?id=162105

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Alex Christensen

Reported

2016-09-14 09:14 PDT

Modified

2016-09-16 17:00 PDT History

CC List

1 user Show

URL

Keywords

Depends on

Blocks