<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>161968</bug_id>
          
          <creation_ts>2016-09-14 09:14:06 -0700</creation_ts>
          <short_desc>URLParser: Add fast path for utf8 encoding queries</short_desc>
          <delta_ts>2016-09-16 17:00:18 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>New Bugs</component>
          <version>WebKit Nightly Build</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Alex Christensen">achristensen</reporter>
          <assigned_to name="Alex Christensen">achristensen</assigned_to>
          <cc>dbates</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1229575</commentid>
    <comment_count>0</comment_count>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2016-09-14 09:14:06 -0700</bug_when>
    <thetext>URLParser: Add fast path for utf8 encoding queries</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1229576</commentid>
    <comment_count>1</comment_count>
      <attachid>288824</attachid>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2016-09-14 09:16:30 -0700</bug_when>
    <thetext>Created attachment 288824
Patch</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1229615</commentid>
    <comment_count>2</comment_count>
      <attachid>288824</attachid>
    <who name="Daniel Bates">dbates</who>
    <bug_when>2016-09-14 10:12:02 -0700</bug_when>
    <thetext>Comment on attachment 288824
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288824&amp;action=review

&gt; Source/WebCore/ChangeLog:13
&gt; +        If the text encoding is utf8 (which is quite common), then we can encode the query

Nit: utf8 =&gt; UTF-8

&gt; Source/WebCore/platform/URLParser.cpp:121
&gt; +    // FIXME: Check error.

I know you have a FIXME here. For your consideration, I suggest adding &quot;ASSERT_WITH_SECURITY_IMPLICATION(offset &lt; sizeof(buffer));&quot; now after the call to U8_APPEND(). We can revisit this code to add additional error handling.

&gt; Source/WebCore/platform/URLParser.cpp:497
&gt; +    bool encodingIsUTF8 = encoding == UTF8Encoding();

We tend to prefer putting the verb at the beginning of the variable name. Maybe a better name would be isUTF8Encoded? isUTF8Encoding?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1229645</commentid>
    <comment_count>3</comment_count>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2016-09-14 11:20:13 -0700</bug_when>
    <thetext>(In reply to comment #2)
&gt; I suggest adding &quot;ASSERT_WITH_SECURITY_IMPLICATION(offset &lt; sizeof(buffer));&quot;
Added such assertions with offset &lt;= sizeof(buffer) instead of offset &lt; sizeof(buffer).

http://trac.webkit.org/changeset/205918</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1230624</commentid>
    <comment_count>4</comment_count>
      <attachid>288824</attachid>
    <who name="Darin Adler">darin</who>
    <bug_when>2016-09-16 14:48:03 -0700</bug_when>
    <thetext>Comment on attachment 288824
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288824&amp;action=review

A few thoughts on this utf8PercentEncodeQuery function.

&gt; Source/WebCore/platform/URLParser.cpp:115
&gt; +static void utf8PercentEncodeQuery(UChar32 codePoint, StringBuilder&amp; builder)

This appends a single UTF-8 sequence that is part of a query, possibly percent-encoded as needed. The function name &quot;percent encode query&quot; is not so great for a function that does that, since it makes it sound like it always percent encodes and it makes it sound like it encodes an entire query, not a single character.

The goal is to be fast; doing this with a function call and a loop for each code point is not going to be very fast. For great speed it is likely we need something where it can speed through a lot of plain ASCII characters and append them all at once.

&gt; Source/WebCore/platform/URLParser.cpp:120
&gt; +    U8_APPEND(buffer, offset, U8_MAX_LENGTH, codePoint, error);

U8_APPEND is not a good choice here. There is no way we will have a problem with the buffer being too small when converting a single code point into a buffer where the size of the buffer is U8_MAX_LENGTH, so extra checking for that is not valuable. If we want to check if the code point is invalid, I think we can do that separately. It would be much better to check that and then to use U8_APPEND_UNSAFE.

First, I think a fast path for ASCII is needed:

    if (isASCII(codePoint)) {
        builder.append(static_cast&lt;LChar&gt;(codePoint));
        return;
    }

Second, I think test cases for invalid code points are needed. I can’t tell you what kind of code to write for those errors without knowing what behavior we need for them.

    if (!U_IS_UNICODE_CHAR(codePoint)) {
        // Do whatever makes sense for this case.
        // But also, not sure if U_IS_UNICODE_CHAR is the check we want above; need to find out what we consider valid vs. invalid.
        return;
    }

Only then would we use U8_APPEND_UNSAFE.

&gt;&gt; Source/WebCore/platform/URLParser.cpp:121
&gt;&gt; +    // FIXME: Check error.
&gt; 
&gt; I know you have a FIXME here. For your consideration, I suggest adding &quot;ASSERT_WITH_SECURITY_IMPLICATION(offset &lt; sizeof(buffer));&quot; now after the call to U8_APPEND(). We can revisit this code to add additional error handling.

We do not need that assertion. That invariant is guaranteed in a really reliable way by the use of the constant U8_MAX_LENGTH.

&gt; Source/WebCore/platform/URLParser.cpp:128
&gt; +    for (int32_t i = 0; i &lt; offset; ++i) {
&gt; +        auto byte = buffer[i];
&gt; +        if (shouldPercentEncodeQueryByte(byte))
&gt; +            percentEncode(byte, builder);
&gt; +        else
&gt; +            builder.append(byte);
&gt; +    }

A slightly more complicated way to do this that would be more efficient is to make a little object that does this work when it is assigned to with an array-like interface and pass that directly to U8_APPEND_UNSAFE. Can avoid the buffer and the loop and basically generate an unrolled loop instead.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1230625</commentid>
    <comment_count>5</comment_count>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2016-09-16 14:51:27 -0700</bug_when>
    <thetext>Thanks for the insights, Darin!  I&apos;ll use them in my next patch of optimization and testing.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1230708</commentid>
    <comment_count>6</comment_count>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2016-09-16 17:00:18 -0700</bug_when>
    <thetext>This is being used in https://bugs.webkit.org/show_bug.cgi?id=162105</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>288824</attachid>
            <date>2016-09-14 09:16:30 -0700</date>
            <delta_ts>2016-09-14 10:12:02 -0700</delta_ts>
            <desc>Patch</desc>
            <filename>bug-161968-20160914091356.patch</filename>
            <type>text/plain</type>
            <size>3430</size>
            <attacher name="Alex Christensen">achristensen</attacher>
            
              <data encoding="base64">SW5kZXg6IFNvdXJjZS9XZWJDb3JlL0NoYW5nZUxvZwo9PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBTb3VyY2UvV2Vi
Q29yZS9DaGFuZ2VMb2cJKHJldmlzaW9uIDIwNTkwNSkKKysrIFNvdXJjZS9XZWJDb3JlL0NoYW5n
ZUxvZwkod29ya2luZyBjb3B5KQpAQCAtMSwzICsxLDE5IEBACisyMDE2LTA5LTE0ICBBbGV4IENo
cmlzdGVuc2VuICA8YWNocmlzdGVuc2VuQHdlYmtpdC5vcmc+CisKKyAgICAgICAgVVJMUGFyc2Vy
OiBBZGQgZmFzdCBwYXRoIGZvciB1dGY4IGVuY29kaW5nIHF1ZXJpZXMKKyAgICAgICAgaHR0cHM6
Ly9idWdzLndlYmtpdC5vcmcvc2hvd19idWcuY2dpP2lkPTE2MTk2OAorCisgICAgICAgIFJldmll
d2VkIGJ5IE5PQk9EWSAoT09QUyEpLgorCisgICAgICAgIE5vIGNoYW5nZSBpbiBiZWhhdmlvci4g
IENvdmVyZWQgYnkgZXhpc3RpbmcgdGVzdHMuCisKKyAgICAgICAgKiBwbGF0Zm9ybS9VUkxQYXJz
ZXIuY3BwOgorICAgICAgICAoV2ViQ29yZTo6dXRmOFBlcmNlbnRFbmNvZGVRdWVyeSk6CisgICAg
ICAgIChXZWJDb3JlOjpVUkxQYXJzZXI6OnBhcnNlKToKKyAgICAgICAgSWYgdGhlIHRleHQgZW5j
b2RpbmcgaXMgdXRmOCAod2hpY2ggaXMgcXVpdGUgY29tbW9uKSwgdGhlbiB3ZSBjYW4gZW5jb2Rl
IHRoZSBxdWVyeQorICAgICAgICBhcyB3ZSBpdGVyYXRlIGl0cyBjb2RlIHBvaW50cy4gVGhpcyBy
ZWR1Y2VzIG1lbW9yeSBhbGxvY2F0aW9uIGFuZCBzaWduaWZpY2FudGx5IHNwZWVkcworICAgICAg
ICB1cCBteSBVUkwgcGFyc2luZyBiZW5jaG1hcmsuCisKIDIwMTYtMDktMTMgIERhdmUgSHlhdHQg
IDxoeWF0dEBhcHBsZS5jb20+CiAKICAgICAgICAgW0NTUyBQYXJzZXJdIEVuYWJsZSB0aGUgbmV3
IHNpemVzIHBhcnNlciBieSBkZWZhdWx0CkluZGV4OiBTb3VyY2UvV2ViQ29yZS9wbGF0Zm9ybS9V
UkxQYXJzZXIuY3BwCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIFNvdXJjZS9XZWJDb3JlL3BsYXRmb3JtL1VSTFBh
cnNlci5jcHAJKHJldmlzaW9uIDIwNTg5MykKKysrIFNvdXJjZS9XZWJDb3JlL3BsYXRmb3JtL1VS
TFBhcnNlci5jcHAJKHdvcmtpbmcgY29weSkKQEAgLTExMiw2ICsxMTIsMjIgQEAgc3RhdGljIGJv
b2wgc2hvdWxkUGVyY2VudEVuY29kZVF1ZXJ5Qnl0ZQogICAgIHJldHVybiBieXRlID09IDB4M0U7
CiB9CiAKK3N0YXRpYyB2b2lkIHV0ZjhQZXJjZW50RW5jb2RlUXVlcnkoVUNoYXIzMiBjb2RlUG9p
bnQsIFN0cmluZ0J1aWxkZXImIGJ1aWxkZXIpCit7CisgICAgdWludDhfdCBidWZmZXJbVThfTUFY
X0xFTkdUSF07CisgICAgaW50MzJfdCBvZmZzZXQgPSAwOworICAgIFVCb29sIGVycm9yID0gZmFs
c2U7CisgICAgVThfQVBQRU5EKGJ1ZmZlciwgb2Zmc2V0LCBVOF9NQVhfTEVOR1RILCBjb2RlUG9p
bnQsIGVycm9yKTsKKyAgICAvLyBGSVhNRTogQ2hlY2sgZXJyb3IuCisgICAgZm9yIChpbnQzMl90
IGkgPSAwOyBpIDwgb2Zmc2V0OyArK2kpIHsKKyAgICAgICAgYXV0byBieXRlID0gYnVmZmVyW2ld
OworICAgICAgICBpZiAoc2hvdWxkUGVyY2VudEVuY29kZVF1ZXJ5Qnl0ZShieXRlKSkKKyAgICAg
ICAgICAgIHBlcmNlbnRFbmNvZGUoYnl0ZSwgYnVpbGRlcik7CisgICAgICAgIGVsc2UKKyAgICAg
ICAgICAgIGJ1aWxkZXIuYXBwZW5kKGJ5dGUpOworICAgIH0KK30KKyAgICAKIHN0YXRpYyB2b2lk
IGVuY29kZVF1ZXJ5KGNvbnN0IFN0cmluZ0J1aWxkZXImIHNvdXJjZSwgU3RyaW5nQnVpbGRlciYg
ZGVzdGluYXRpb24sIGNvbnN0IFRleHRFbmNvZGluZyYgZW5jb2RpbmcpCiB7CiAgICAgLy8gRklY
TUU6IEl0IGlzIHVuY2xlYXIgaW4gdGhlIHNwZWMgd2hhdCB0byBkbyB3aGVuIGVuY29kaW5nIGZh
aWxzLiBUaGUgYmVoYXZpb3Igc2hvdWxkIGJlIHNwZWNpZmllZCBhbmQgdGVzdGVkLgpAQCAtNDc4
LDcgKzQ5NCw3IEBAIFVSTCBVUkxQYXJzZXI6OnBhcnNlKGNvbnN0IFN0cmluZyYgaW5wdXQKICAg
ICBtX2J1ZmZlci5jbGVhcigpOwogICAgIG1fYnVmZmVyLnJlc2VydmVDYXBhY2l0eShpbnB1dC5s
ZW5ndGgoKSk7CiAgICAgCi0gICAgLy8gRklYTUU6IFdlIHNob3VsZG4ndCBuZWVkIHRvIGFsbG9j
YXRlIGFub3RoZXIgYnVmZmVyIGZvciB0aGlzLgorICAgIGJvb2wgZW5jb2RpbmdJc1VURjggPSBl
bmNvZGluZyA9PSBVVEY4RW5jb2RpbmcoKTsKICAgICBTdHJpbmdCdWlsZGVyIHF1ZXJ5QnVmZmVy
OwogCiAgICAgdW5zaWduZWQgZW5kSW5kZXggPSBpbnB1dC5sZW5ndGgoKTsKQEAgLTk0MCwxMiAr
OTU2LDE2IEBAIFVSTCBVUkxQYXJzZXI6OnBhcnNlKGNvbnN0IFN0cmluZyYgaW5wdXQKICAgICAg
ICAgY2FzZSBTdGF0ZTo6UXVlcnk6CiAgICAgICAgICAgICBMT0dfU1RBVEUoIlF1ZXJ5Iik7CiAg
ICAgICAgICAgICBpZiAoKmMgPT0gJyMnKSB7Ci0gICAgICAgICAgICAgICAgZW5jb2RlUXVlcnko
cXVlcnlCdWZmZXIsIG1fYnVmZmVyLCBlbmNvZGluZyk7CisgICAgICAgICAgICAgICAgaWYgKCFl
bmNvZGluZ0lzVVRGOCkKKyAgICAgICAgICAgICAgICAgICAgZW5jb2RlUXVlcnkocXVlcnlCdWZm
ZXIsIG1fYnVmZmVyLCBlbmNvZGluZyk7CiAgICAgICAgICAgICAgICAgbV91cmwubV9xdWVyeUVu
ZCA9IG1fYnVmZmVyLmxlbmd0aCgpOwogICAgICAgICAgICAgICAgIHN0YXRlID0gU3RhdGU6OkZy
YWdtZW50OwogICAgICAgICAgICAgICAgIGJyZWFrOwogICAgICAgICAgICAgfQotICAgICAgICAg
ICAgcXVlcnlCdWZmZXIuYXBwZW5kKCpjKTsKKyAgICAgICAgICAgIGlmIChlbmNvZGluZ0lzVVRG
OCkKKyAgICAgICAgICAgICAgICB1dGY4UGVyY2VudEVuY29kZVF1ZXJ5KCpjLCBtX2J1ZmZlcik7
CisgICAgICAgICAgICBlbHNlCisgICAgICAgICAgICAgICAgcXVlcnlCdWZmZXIuYXBwZW5kKCpj
KTsKICAgICAgICAgICAgICsrYzsKICAgICAgICAgICAgIGJyZWFrOwogICAgICAgICBjYXNlIFN0
YXRlOjpGcmFnbWVudDoKQEAgLTEwOTcsNyArMTExNyw4IEBAIFVSTCBVUkxQYXJzZXI6OnBhcnNl
KGNvbnN0IFN0cmluZyYgaW5wdXQKICAgICAgICAgYnJlYWs7CiAgICAgY2FzZSBTdGF0ZTo6UXVl
cnk6CiAgICAgICAgIExPR19GSU5BTF9TVEFURSgiUXVlcnkiKTsKLSAgICAgICAgZW5jb2RlUXVl
cnkocXVlcnlCdWZmZXIsIG1fYnVmZmVyLCBlbmNvZGluZyk7CisgICAgICAgIGlmICghZW5jb2Rp
bmdJc1VURjgpCisgICAgICAgICAgICBlbmNvZGVRdWVyeShxdWVyeUJ1ZmZlciwgbV9idWZmZXIs
IGVuY29kaW5nKTsKICAgICAgICAgbV91cmwubV9xdWVyeUVuZCA9IG1fYnVmZmVyLmxlbmd0aCgp
OwogICAgICAgICBtX3VybC5tX2ZyYWdtZW50RW5kID0gbV91cmwubV9xdWVyeUVuZDsKICAgICAg
ICAgYnJlYWs7Cg==
</data>
<flag name="review"
          id="312103"
          type_id="1"
          status="+"
          setter="dbates"
    />
          </attachment>
      

    </bug>

</bugzilla>