Bug 226912 - Check for "xn--" in any subdomain when parsing URL hosts
Summary: Check for "xn--" in any subdomain when parsing URL hosts
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-06-10 21:17 PDT by Alex Christensen
Modified: 2021-06-15 10:00 PDT (History)
9 users (show)

See Also:


Attachments
Patch (31.07 KB, patch)
2021-06-10 21:20 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (39.98 KB, patch)
2021-06-11 13:01 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (42.05 KB, patch)
2021-06-11 13:39 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (46.21 KB, patch)
2021-06-11 16:21 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (49.56 KB, patch)
2021-06-14 12:59 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (49.12 KB, patch)
2021-06-14 13:51 PDT, Alex Christensen
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2021-06-10 21:17:07 PDT
Check for "xn--" in any subdomain when parsing URL hosts
Comment 1 Alex Christensen 2021-06-10 21:20:06 PDT
Created attachment 431178 [details]
Patch
Comment 2 EWS Watchlist 2021-06-10 21:21:07 PDT
This patch modifies the imported WPT tests. Please ensure that any changes on the tests (not coming from a WPT import) are exported to WPT. Please see https://trac.webkit.org/wiki/WPTExportProcess
Comment 3 Alexey Proskuryakov 2021-06-11 07:11:04 PDT
Comment on attachment 431178 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431178&action=review

> Source/WTF/ChangeLog:3
> +        Check for "xn--" in any subdomain when parsing URL hosts

This changes many WPT results from pass to fail, please explain why that’s for the best.
Comment 4 Alex Christensen 2021-06-11 13:01:08 PDT
Created attachment 431224 [details]
Patch
Comment 5 Alex Christensen 2021-06-11 13:39:06 PDT
Created attachment 431228 [details]
Patch
Comment 6 Alex Christensen 2021-06-11 16:21:10 PDT
Created attachment 431239 [details]
Patch
Comment 7 Darin Adler 2021-06-12 15:23:09 PDT
Comment on attachment 431239 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431239&action=review

> Source/WTF/wtf/URLParser.cpp:2662
> +                state = State::NotAtSubdomainBeginOrInXNDashDash;

Does not seem to do the right thing if c is "." here, since we skip a character before starting to look for ".". Test case that would fail would be "x.xn--".

> Source/WTF/wtf/URLParser.cpp:2668
> +                state = State::NotAtSubdomainBeginOrInXNDashDash;

Ditto. Test case that would fail would be "xn.xn--".

> Source/WTF/wtf/URLParser.cpp:2673
> +            state = State::NotAtSubdomainBeginOrInXNDashDash;

Ditto. Test case that would fail would be "xn-.xn--".
Comment 8 Alex Christensen 2021-06-14 12:58:16 PDT
Good catch!  Adding those tests.
Comment 9 Alex Christensen 2021-06-14 12:59:22 PDT
Created attachment 431356 [details]
Patch
Comment 10 Darin Adler 2021-06-14 13:05:34 PDT
Comment on attachment 431356 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431356&action=review

> Source/WTF/wtf/URLParser.cpp:2655
> +            if (c == 'x' || c == 'X')
> +                state = State::AfterX;
> +            else if (c == '.')
> +                state = State::AtSubdomainBegin;
> +            else
> +                state = State::NotAtSubdomainBeginOrInXNDashDash;
> +            break;

More concise way to write this:

    case State::AtSubdomainBegin:
        if (c == 'x' || c == 'X')
            state = State::AfterX;
            continue;
        }
        break;
    case State::...
        ...
    }

    if (c == '.')
        state = State::AtSubdomainBegin;
    else
        state = State::NotAtSubdomainBeginOrInXNDashDash;

Code would be less repetitive.
Comment 11 Alex Christensen 2021-06-14 13:50:54 PDT
I like it.
Comment 12 Alex Christensen 2021-06-14 13:51:21 PDT
Created attachment 431360 [details]
Patch
Comment 13 Alex Christensen 2021-06-15 08:28:41 PDT
Comment on attachment 431360 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431360&action=review

> Source/WTF/wtf/URLParser.cpp:2546
> +    WTFLogAlways("ERRORS %x", processingDetails.errors);

Will remove before landing.
Comment 14 Darin Adler 2021-06-15 08:49:26 PDT
Comment on attachment 431360 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431360&action=review

> Source/WTF/wtf/URL.cpp:-406
> -    // For host names bigger than this, we won't do IDN encoding, which is almost certainly OK.

I feel like it’s a problem to lose this comment. Could write a better one, but just removing it seems wrong.

Without the comment it’s not obvious why it’s OK to leave the hostname as-is when it’s super long, and the check just below is mysterious.

> Source/WTF/wtf/URLHelpers.cpp:-43
> -// For host names bigger than this, we won't do IDN encoding, which is almost certainly OK.

Again, losing this comment.

> Source/WTF/wtf/URLHelpers.cpp:575
> +    if (hostName.length() > URLParser::hostnameBufferLength)

Mysterious code without a comment. Why is it safe to just do nothing for long strings.
Comment 15 Alex Christensen 2021-06-15 09:51:25 PDT
Comment on attachment 431360 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431360&action=review

>> Source/WTF/wtf/URL.cpp:-406
>> -    // For host names bigger than this, we won't do IDN encoding, which is almost certainly OK.
> 
> I feel like it’s a problem to lose this comment. Could write a better one, but just removing it seems wrong.
> 
> Without the comment it’s not obvious why it’s OK to leave the hostname as-is when it’s super long, and the check just below is mysterious.

I'll keep the comment here too.

>> Source/WTF/wtf/URLHelpers.cpp:575
>> +    if (hostName.length() > URLParser::hostnameBufferLength)
> 
> Mysterious code without a comment. Why is it safe to just do nothing for long strings.

I'll put the comment here.

> Source/WTF/wtf/URLParser.h:52
> +    // Needs to be big enough to hold an IDN-encoded name.
> +    // For host names bigger than this, we won't do IDN encoding, which is almost certainly OK.
> +    constexpr static size_t hostnameBufferLength = 2048;

... instead of moving it along with the definition of hostnameBufferLength to here.
Comment 16 Alex Christensen 2021-06-15 09:59:30 PDT
r278879
Comment 17 Radar WebKit Bug Importer 2021-06-15 10:00:34 PDT
<rdar://problem/79347990>