Implement IPv6 parsing in URLParser
Created attachment 287407 [details] Patch
Comment on attachment 287407 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=287407&action=review > Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:104 > + // checkURL("http://[0:f::f:f:0:0]", {"http", "", "", "[0:f::f:f:0:0]", 0, "/", "", "", "http://[0:f::f:f:0:0]/"}); > + // checkURL("http://[0:f:0:0:f::]", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"}); > + // checkURL("http://[::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"}); These tests don't pass yet, but they should if the parser were 100% correct.. > Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:191 > + /* > + checkURLDifferences("http://[0:0:f:0:0:f:0:0]", ditto
Created attachment 287460 [details] Patch
I fixed the tests. Not every branch is tested, but I plan to go over all this code with code coverage analysis before turning it on and make sure all code paths are tested.
Comment on attachment 287460 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=287460&action=review > Source/WebCore/platform/URLParser.cpp:591 > +static char nibbleToHex(uint8_t nibble) This method is weird. It doesn't specify which bits of the byte are the nibble in question. In person you said it's always the lower bits. This is confusing from a code clarity standpoint. Also, we already have lowerNibbleToASCIIHexDigit. In person you said the problem is that lowerNibbleToASCIIHexDigit returns uppercase characters whereas you need lowercase. May I suggest u_tolower(lowerNibbleToASCIIHexDigit(...)) or convertASCIIAlphaToLower(lowerNibbleToASCIIHexDigit(...))? Or if performance is critical, naming this lowerNibbleToLowercaseASCIIHexDigit? > Source/WebCore/platform/URLParser.cpp:601 > + uint8_t c0 = piece >> 12; > + uint8_t c1 = piece >> 8 & 0xF; > + uint8_t c2 = piece >> 4 & 0xF; > + uint8_t c3 = piece & 0xF; These characters are poorly named nibbles.
Created attachment 287553 [details] Patch
Comment on attachment 287553 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=287553&action=review > Source/WebCore/platform/URLParser.cpp:594 > +static char lowerNibbleToLowercaseASCIIHexDigit(uint8_t nibble) > +{ > + return nibble + (nibble < 10 ? '0' : 'a' - 10); > +} Please put this in ASCIICType.h, not here. > Source/WebCore/platform/URLParser.cpp:598 > + uint8_t nibbles[4] = { static_cast<uint8_t>(piece >> 12), static_cast<uint8_t>(piece >> 8 & 0xF), static_cast<uint8_t>(piece >> 4 & 0xF), static_cast<uint8_t>(piece & 0xF) }; Are you sure these static_cast are needed? What happens if you omit them? It seems overkill to use an array and too specific to specify the type uint8_t. I would write this: if (auto nibble0 = piece >> 12) ... auto nibble1 = piece >> 8 & 0xF; if (printed || nibble1) { etc. > Source/WebCore/platform/URLParser.cpp:616 > + Optional<size_t> compressPointer = findLongestZeroSequence(address); I suggest auto here. > Source/WebCore/platform/URLParser.cpp:819 > + size_t swaps = piecePointer - compressPointer.value(); > + piecePointer = 7; > + while (swaps) > + std::swap(address[piecePointer--], address[compressPointer.value() + swaps-- - 1]); This seems so non-obvious to me; what does it even do? Can we do this with std::reverse instead? > Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:104 > + checkURL("http://[0:f::f:f:0:0]", {"http", "", "", "[0:f::f:f:0:0]", 0, "/", "", "", "http://[0:f::f:f:0:0]/"}); > + checkURL("http://[0:f:0:0:f::]", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"}); > + checkURL("http://[::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"}); Don’t we need more cases? Using only 0 and f seems to not cover enough of the parsing.
(In reply to comment #7) > > Source/WebCore/platform/URLParser.cpp:598 > > + uint8_t nibbles[4] = { static_cast<uint8_t>(piece >> 12), static_cast<uint8_t>(piece >> 8 & 0xF), static_cast<uint8_t>(piece >> 4 & 0xF), static_cast<uint8_t>(piece & 0xF) }; > > Are you sure these static_cast are needed? What happens if you omit them? They are needed. It doesn't compile successfully without them. > > It seems overkill to use an array and too specific to specify the type > uint8_t. I would write this: > > if (auto nibble0 = piece >> 12) > ... > auto nibble1 = piece >> 8 & 0xF; > if (printed || nibble1) { > > etc. This would make the static_cast not necessary. > > Source/WebCore/platform/URLParser.cpp:819 > > + size_t swaps = piecePointer - compressPointer.value(); > > + piecePointer = 7; > > + while (swaps) > > + std::swap(address[piecePointer--], address[compressPointer.value() + swaps-- - 1]); > > This seems so non-obvious to me; what does it even do? Can we do this with > std::reverse instead? Maybe. This is what the spec says. I was quite aggressive with my use of postfix decrement, though. > > > Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:104 > > + checkURL("http://[0:f::f:f:0:0]", {"http", "", "", "[0:f::f:f:0:0]", 0, "/", "", "", "http://[0:f::f:f:0:0]/"}); > > + checkURL("http://[0:f:0:0:f::]", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"}); > > + checkURL("http://[::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"}); > > Don’t we need more cases? Using only 0 and f seems to not cover enough of > the parsing. Yes we need more cases. I am going to do much more thorough testing with code coverage in a second pass.
Created attachment 287559 [details] Patch
Created attachment 287567 [details] Patch
(In reply to comment #8) > > Don’t we need more cases? Using only 0 and f seems to not cover enough of > > the parsing. > Yes we need more cases. I am going to do much more thorough testing with > code coverage in a second pass. To be clear, the tests with lots of f's are testing the location of the :: Entire features like the ipv4 parser inside of the ipv6 parser currently have 0% test coverage, but I'm trying to get the parser to a semi-usable state then polish it from there.
> > This seems so non-obvious to me; what does it even do? Can we do this with > > std::reverse instead? > Maybe. This is what the spec says. I was quite aggressive with my use of > postfix decrement, though. std::reverse isn't what this does. std::reverse switches the order of the elements. This just shifts the elements over without changing their order. We could use a reverse version of std::swap_ranges, but such a thing does not exist.
http://trac.webkit.org/changeset/205273
Comment on attachment 287567 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=287567&action=review > Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp:56 > + return s1.utf8() == s2.utf8(); This is no different than s1 == s2; the conversion to UTF-8 has no effect on equality. Not sure why this exists.