Check for "xn--" in any subdomain when parsing URL hosts
Created attachment 431178 [details] Patch
This patch modifies the imported WPT tests. Please ensure that any changes on the tests (not coming from a WPT import) are exported to WPT. Please see https://trac.webkit.org/wiki/WPTExportProcess
Comment on attachment 431178 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431178&action=review > Source/WTF/ChangeLog:3 > + Check for "xn--" in any subdomain when parsing URL hosts This changes many WPT results from pass to fail, please explain why that’s for the best.
Created attachment 431224 [details] Patch
Created attachment 431228 [details] Patch
Created attachment 431239 [details] Patch
Comment on attachment 431239 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431239&action=review > Source/WTF/wtf/URLParser.cpp:2662 > + state = State::NotAtSubdomainBeginOrInXNDashDash; Does not seem to do the right thing if c is "." here, since we skip a character before starting to look for ".". Test case that would fail would be "x.xn--". > Source/WTF/wtf/URLParser.cpp:2668 > + state = State::NotAtSubdomainBeginOrInXNDashDash; Ditto. Test case that would fail would be "xn.xn--". > Source/WTF/wtf/URLParser.cpp:2673 > + state = State::NotAtSubdomainBeginOrInXNDashDash; Ditto. Test case that would fail would be "xn-.xn--".
Good catch! Adding those tests.
Created attachment 431356 [details] Patch
Comment on attachment 431356 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431356&action=review > Source/WTF/wtf/URLParser.cpp:2655 > + if (c == 'x' || c == 'X') > + state = State::AfterX; > + else if (c == '.') > + state = State::AtSubdomainBegin; > + else > + state = State::NotAtSubdomainBeginOrInXNDashDash; > + break; More concise way to write this: case State::AtSubdomainBegin: if (c == 'x' || c == 'X') state = State::AfterX; continue; } break; case State::... ... } if (c == '.') state = State::AtSubdomainBegin; else state = State::NotAtSubdomainBeginOrInXNDashDash; Code would be less repetitive.
I like it.
Created attachment 431360 [details] Patch
Comment on attachment 431360 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431360&action=review > Source/WTF/wtf/URLParser.cpp:2546 > + WTFLogAlways("ERRORS %x", processingDetails.errors); Will remove before landing.
Comment on attachment 431360 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431360&action=review > Source/WTF/wtf/URL.cpp:-406 > - // For host names bigger than this, we won't do IDN encoding, which is almost certainly OK. I feel like it’s a problem to lose this comment. Could write a better one, but just removing it seems wrong. Without the comment it’s not obvious why it’s OK to leave the hostname as-is when it’s super long, and the check just below is mysterious. > Source/WTF/wtf/URLHelpers.cpp:-43 > -// For host names bigger than this, we won't do IDN encoding, which is almost certainly OK. Again, losing this comment. > Source/WTF/wtf/URLHelpers.cpp:575 > + if (hostName.length() > URLParser::hostnameBufferLength) Mysterious code without a comment. Why is it safe to just do nothing for long strings.
Comment on attachment 431360 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431360&action=review >> Source/WTF/wtf/URL.cpp:-406 >> - // For host names bigger than this, we won't do IDN encoding, which is almost certainly OK. > > I feel like it’s a problem to lose this comment. Could write a better one, but just removing it seems wrong. > > Without the comment it’s not obvious why it’s OK to leave the hostname as-is when it’s super long, and the check just below is mysterious. I'll keep the comment here too. >> Source/WTF/wtf/URLHelpers.cpp:575 >> + if (hostName.length() > URLParser::hostnameBufferLength) > > Mysterious code without a comment. Why is it safe to just do nothing for long strings. I'll put the comment here. > Source/WTF/wtf/URLParser.h:52 > + // Needs to be big enough to hold an IDN-encoded name. > + // For host names bigger than this, we won't do IDN encoding, which is almost certainly OK. > + constexpr static size_t hostnameBufferLength = 2048; ... instead of moving it along with the definition of hostnameBufferLength to here.
r278879
<rdar://problem/79347990>