Bug 161968 - URLParser: Add fast path for utf8 encoding queries
Summary: URLParser: Add fast path for utf8 encoding queries
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-14 09:14 PDT by Alex Christensen
Modified: 2016-09-16 17:00 PDT (History)
1 user (show)

See Also:


Attachments
Patch (3.35 KB, patch)
2016-09-14 09:16 PDT, Alex Christensen
dbates: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2016-09-14 09:14:06 PDT
URLParser: Add fast path for utf8 encoding queries
Comment 1 Alex Christensen 2016-09-14 09:16:30 PDT
Created attachment 288824 [details]
Patch
Comment 2 Daniel Bates 2016-09-14 10:12:02 PDT
Comment on attachment 288824 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288824&action=review

> Source/WebCore/ChangeLog:13
> +        If the text encoding is utf8 (which is quite common), then we can encode the query

Nit: utf8 => UTF-8

> Source/WebCore/platform/URLParser.cpp:121
> +    // FIXME: Check error.

I know you have a FIXME here. For your consideration, I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" now after the call to U8_APPEND(). We can revisit this code to add additional error handling.

> Source/WebCore/platform/URLParser.cpp:497
> +    bool encodingIsUTF8 = encoding == UTF8Encoding();

We tend to prefer putting the verb at the beginning of the variable name. Maybe a better name would be isUTF8Encoded? isUTF8Encoding?
Comment 3 Alex Christensen 2016-09-14 11:20:13 PDT
(In reply to comment #2)
> I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));"
Added such assertions with offset <= sizeof(buffer) instead of offset < sizeof(buffer).

http://trac.webkit.org/changeset/205918
Comment 4 Darin Adler 2016-09-16 14:48:03 PDT
Comment on attachment 288824 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288824&action=review

A few thoughts on this utf8PercentEncodeQuery function.

> Source/WebCore/platform/URLParser.cpp:115
> +static void utf8PercentEncodeQuery(UChar32 codePoint, StringBuilder& builder)

This appends a single UTF-8 sequence that is part of a query, possibly percent-encoded as needed. The function name "percent encode query" is not so great for a function that does that, since it makes it sound like it always percent encodes and it makes it sound like it encodes an entire query, not a single character.

The goal is to be fast; doing this with a function call and a loop for each code point is not going to be very fast. For great speed it is likely we need something where it can speed through a lot of plain ASCII characters and append them all at once.

> Source/WebCore/platform/URLParser.cpp:120
> +    U8_APPEND(buffer, offset, U8_MAX_LENGTH, codePoint, error);

U8_APPEND is not a good choice here. There is no way we will have a problem with the buffer being too small when converting a single code point into a buffer where the size of the buffer is U8_MAX_LENGTH, so extra checking for that is not valuable. If we want to check if the code point is invalid, I think we can do that separately. It would be much better to check that and then to use U8_APPEND_UNSAFE.

First, I think a fast path for ASCII is needed:

    if (isASCII(codePoint)) {
        builder.append(static_cast<LChar>(codePoint));
        return;
    }

Second, I think test cases for invalid code points are needed. I can’t tell you what kind of code to write for those errors without knowing what behavior we need for them.

    if (!U_IS_UNICODE_CHAR(codePoint)) {
        // Do whatever makes sense for this case.
        // But also, not sure if U_IS_UNICODE_CHAR is the check we want above; need to find out what we consider valid vs. invalid.
        return;
    }

Only then would we use U8_APPEND_UNSAFE.

>> Source/WebCore/platform/URLParser.cpp:121
>> +    // FIXME: Check error.
> 
> I know you have a FIXME here. For your consideration, I suggest adding "ASSERT_WITH_SECURITY_IMPLICATION(offset < sizeof(buffer));" now after the call to U8_APPEND(). We can revisit this code to add additional error handling.

We do not need that assertion. That invariant is guaranteed in a really reliable way by the use of the constant U8_MAX_LENGTH.

> Source/WebCore/platform/URLParser.cpp:128
> +    for (int32_t i = 0; i < offset; ++i) {
> +        auto byte = buffer[i];
> +        if (shouldPercentEncodeQueryByte(byte))
> +            percentEncode(byte, builder);
> +        else
> +            builder.append(byte);
> +    }

A slightly more complicated way to do this that would be more efficient is to make a little object that does this work when it is assigned to with an array-like interface and pass that directly to U8_APPEND_UNSAFE. Can avoid the buffer and the loop and basically generate an unrolled loop instead.
Comment 5 Alex Christensen 2016-09-16 14:51:27 PDT
Thanks for the insights, Darin!  I'll use them in my next patch of optimization and testing.
Comment 6 Alex Christensen 2016-09-16 17:00:18 PDT
This is being used in https://bugs.webkit.org/show_bug.cgi?id=162105