Bug 161390 - Implement IPv6 parsing in URLParser
Summary: Implement IPv6 parsing in URLParser
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-30 10:14 PDT by Alex Christensen
Modified: 2016-09-02 09:08 PDT (History)
5 users (show)

See Also:


Attachments
Patch (10.44 KB, patch)
2016-08-30 10:38 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (10.67 KB, patch)
2016-08-30 17:12 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (10.85 KB, patch)
2016-08-31 15:43 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (12.09 KB, patch)
2016-08-31 16:27 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (12.45 KB, patch)
2016-08-31 16:44 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2016-08-30 10:14:05 PDT
Implement IPv6 parsing in URLParser
Comment 1 Alex Christensen 2016-08-30 10:38:44 PDT
Created attachment 287407 [details]
Patch
Comment 2 Alex Christensen 2016-08-30 10:39:24 PDT
Comment on attachment 287407 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=287407&action=review

> Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:104
> +    // checkURL("http://[0:f::f:f:0:0]", {"http", "", "", "[0:f::f:f:0:0]", 0, "/", "", "", "http://[0:f::f:f:0:0]/"});
> +    // checkURL("http://[0:f:0:0:f::]", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"});
> +    // checkURL("http://[::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});

These tests don't pass yet, but they should if the parser were 100% correct..

> Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:191
> +    /*
> +    checkURLDifferences("http://[0:0:f:0:0:f:0:0]",

ditto
Comment 3 Alex Christensen 2016-08-30 17:12:14 PDT
Created attachment 287460 [details]
Patch
Comment 4 Alex Christensen 2016-08-30 22:18:49 PDT
I fixed the tests.  Not every branch is tested, but I plan to go over all this code with code coverage analysis before turning it on and make sure all code paths are tested.
Comment 5 Brady Eidson 2016-08-31 15:31:08 PDT
Comment on attachment 287460 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=287460&action=review

> Source/WebCore/platform/URLParser.cpp:591
> +static char nibbleToHex(uint8_t nibble)

This method is weird. It doesn't specify which bits of the byte are the nibble in question.

In person you said it's always the lower bits.

This is confusing from a code clarity standpoint.

Also, we already have lowerNibbleToASCIIHexDigit.

In person you said the problem is that lowerNibbleToASCIIHexDigit returns uppercase characters whereas you need lowercase.

May I suggest u_tolower(lowerNibbleToASCIIHexDigit(...)) or convertASCIIAlphaToLower(lowerNibbleToASCIIHexDigit(...))?

Or if performance is critical, naming this lowerNibbleToLowercaseASCIIHexDigit?

> Source/WebCore/platform/URLParser.cpp:601
> +    uint8_t c0 = piece >> 12;
> +    uint8_t c1 = piece >> 8 & 0xF;
> +    uint8_t c2 = piece >> 4 & 0xF;
> +    uint8_t c3 = piece & 0xF;

These characters are poorly named nibbles.
Comment 6 Alex Christensen 2016-08-31 15:43:48 PDT
Created attachment 287553 [details]
Patch
Comment 7 Darin Adler 2016-08-31 16:09:18 PDT
Comment on attachment 287553 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=287553&action=review

> Source/WebCore/platform/URLParser.cpp:594
> +static char lowerNibbleToLowercaseASCIIHexDigit(uint8_t nibble)
> +{
> +    return nibble + (nibble < 10 ? '0' : 'a' - 10);
> +}

Please put this in ASCIICType.h, not here.

> Source/WebCore/platform/URLParser.cpp:598
> +    uint8_t nibbles[4] = { static_cast<uint8_t>(piece >> 12), static_cast<uint8_t>(piece >> 8 & 0xF), static_cast<uint8_t>(piece >> 4 & 0xF), static_cast<uint8_t>(piece & 0xF) };

Are you sure these static_cast are needed? What happens if you omit them?

It seems overkill to use an array and too specific to specify the type uint8_t. I would write this:

    if (auto nibble0 = piece >> 12)
        ...
    auto nibble1 = piece >> 8 & 0xF;
    if (printed || nibble1) {

etc.

> Source/WebCore/platform/URLParser.cpp:616
> +    Optional<size_t> compressPointer = findLongestZeroSequence(address);

I suggest auto here.

> Source/WebCore/platform/URLParser.cpp:819
> +        size_t swaps = piecePointer - compressPointer.value();
> +        piecePointer = 7;
> +        while (swaps)
> +            std::swap(address[piecePointer--], address[compressPointer.value() + swaps-- - 1]);

This seems so non-obvious to me; what does it even do? Can we do this with std::reverse instead?

> Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:104
> +    checkURL("http://[0:f::f:f:0:0]", {"http", "", "", "[0:f::f:f:0:0]", 0, "/", "", "", "http://[0:f::f:f:0:0]/"});
> +    checkURL("http://[0:f:0:0:f::]", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"});
> +    checkURL("http://[::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});

Don’t we need more cases? Using only 0 and f seems to not cover enough of the parsing.
Comment 8 Alex Christensen 2016-08-31 16:18:00 PDT
(In reply to comment #7)
> > Source/WebCore/platform/URLParser.cpp:598
> > +    uint8_t nibbles[4] = { static_cast<uint8_t>(piece >> 12), static_cast<uint8_t>(piece >> 8 & 0xF), static_cast<uint8_t>(piece >> 4 & 0xF), static_cast<uint8_t>(piece & 0xF) };
> 
> Are you sure these static_cast are needed? What happens if you omit them?
They are needed.  It doesn't compile successfully without them.
> 
> It seems overkill to use an array and too specific to specify the type
> uint8_t. I would write this:
> 
>     if (auto nibble0 = piece >> 12)
>         ...
>     auto nibble1 = piece >> 8 & 0xF;
>     if (printed || nibble1) {
> 
> etc.
This would make the static_cast not necessary.
> > Source/WebCore/platform/URLParser.cpp:819
> > +        size_t swaps = piecePointer - compressPointer.value();
> > +        piecePointer = 7;
> > +        while (swaps)
> > +            std::swap(address[piecePointer--], address[compressPointer.value() + swaps-- - 1]);
> 
> This seems so non-obvious to me; what does it even do? Can we do this with
> std::reverse instead?
Maybe.  This is what the spec says.  I was quite aggressive with my use of postfix decrement, though.
> 
> > Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:104
> > +    checkURL("http://[0:f::f:f:0:0]", {"http", "", "", "[0:f::f:f:0:0]", 0, "/", "", "", "http://[0:f::f:f:0:0]/"});
> > +    checkURL("http://[0:f:0:0:f::]", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"});
> > +    checkURL("http://[::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});
> 
> Don’t we need more cases? Using only 0 and f seems to not cover enough of
> the parsing.
Yes we need more cases.  I am going to do much more thorough testing with code coverage in a second pass.
Comment 9 Alex Christensen 2016-08-31 16:27:14 PDT
Created attachment 287559 [details]
Patch
Comment 10 Alex Christensen 2016-08-31 16:44:44 PDT
Created attachment 287567 [details]
Patch
Comment 11 Alex Christensen 2016-08-31 17:01:22 PDT
(In reply to comment #8)
> > Don’t we need more cases? Using only 0 and f seems to not cover enough of
> > the parsing.
> Yes we need more cases.  I am going to do much more thorough testing with
> code coverage in a second pass.
To be clear, the tests with lots of f's are testing the location of the ::
Entire features like the ipv4 parser inside of the ipv6 parser currently have 0% test coverage, but I'm trying to get the parser to a semi-usable state then polish it from there.
Comment 12 Alex Christensen 2016-08-31 17:13:11 PDT
> > This seems so non-obvious to me; what does it even do? Can we do this with
> > std::reverse instead?
> Maybe.  This is what the spec says.  I was quite aggressive with my use of
> postfix decrement, though.
std::reverse isn't what this does.  std::reverse switches the order of the elements.  This just shifts the elements over without changing their order.  We could use a reverse version of std::swap_ranges, but such a thing does not exist.
Comment 13 Alex Christensen 2016-08-31 17:25:59 PDT
http://trac.webkit.org/changeset/205273
Comment 14 Darin Adler 2016-09-02 09:08:40 PDT
Comment on attachment 287567 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=287567&action=review

> Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:56
> +    return s1.utf8() == s2.utf8();

This is no different than s1 == s2; the conversion to UTF-8 has no effect on equality. Not sure why this exists.