Bug 165885

Summary: REGRESSION (r208902) Null pointer dereference in wkIsPublicSuffix
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: New BugsAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, darin
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=144194
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch none

Description Alex Christensen 2016-12-14 18:12:20 PST
REGRESSION (r208902) Null pointer dereference in wkIsPublicSuffix
Comment 1 Alex Christensen 2016-12-14 18:21:28 PST
Created attachment 297154 [details]
Patch
Comment 2 Alex Christensen 2016-12-14 18:24:58 PST
rdar://problem/29476917
Comment 3 Alex Christensen 2016-12-14 19:56:12 PST
Created attachment 297161 [details]
Patch
Comment 4 Alex Christensen 2016-12-14 20:24:08 PST
Created attachment 297162 [details]
Patch
Comment 5 Darin Adler 2016-12-14 21:44:12 PST
Comment on attachment 297162 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=297162&action=review

> Source/WebCore/ChangeLog:9
> +        wkIsPublicSuffix crashes if you give it a nil NSString*.

This patch seems to do more than just check for nil; changes the topPrivatelyControlledDomain algorithm.

> Source/WebCore/platform/mac/PublicSuffixMac.mm:49
>      if (domain.isNull() || domain.isEmpty())
>          return String();

Never need to check both isNull and isEmpty. Null strings will all compare as empty.

    if (domain.isEmpty())
        return String();

But also, we should see if we can just leave this check out entirely and test that null and empty strings both work.

> Source/WebCore/platform/mac/PublicSuffixMac.mm:52
> +    if ([decodeHostName(domain) _web_looksLikeIPAddress])
>          return domain;

Do we really need to decode the host name before calling _web_looksLikeIPAddress? That does not seem necessary. The old code did it because the single call to decodeHostName was shared by the entire function, but now I suspect we can get the correct result by just calling it without decoding.

> Source/WebCore/platform/mac/PublicSuffixMac.mm:67
> +    Vector<size_t, 8> substringStarts;
> +    substringStarts.append(0);
> +    for (size_t i = 0; i < domain.length(); ++i) {
> +        if (domain[i] == '.')
> +            substringStarts.append(i + 1);
> +    }
> +    
>      // Match the longest possible suffix.
> -    bool hasTopLevelDomain = false;
> -    NSCharacterSet *dot = [[NSCharacterSet characterSetWithCharactersInString:@"."] retain];
> -    NSRange nextDot = NSMakeRange(0, [host length]);
> -    for (NSRange previousDot = [host rangeOfCharacterFromSet:dot]; previousDot.location != NSNotFound; nextDot = previousDot, previousDot = [host rangeOfCharacterFromSet:dot options:0 range:NSMakeRange(previousDot.location + previousDot.length, [host length] - (previousDot.location + previousDot.length))]) {
> -        NSString *substring = [host substringFromIndex:previousDot.location + previousDot.length];
> -        if (wkIsPublicSuffix(substring)) {
> -            hasTopLevelDomain = true;
> -            break;
> -        }
> +    for (size_t i = 0; i < substringStarts.size(); ++i) {
> +        NSString *host = decodeHostName(domain.substring(substringStarts[i]));
> +        if (host && wkIsPublicSuffix(host))
> +            return i ? domain.substring(substringStarts[i - 1]) : String();
>      }
> -
> -    [dot release];
> -    if (!hasTopLevelDomain)
> -        return String();
> -
> -    if (!nextDot.location)
> -        return domain;
> -
> -    return encodeHostName([host substringFromIndex:nextDot.location + nextDot.length]);
> +    return String();

If we feel the need to rewrite this to use String, we don’t need to allocate a vector, even on the stack. We can just translate the old algorithm more directly. Here’s my cut at it, taking my lead from your function in changing how it works:

    if (isPublicSuffix(domain))
        return String(); // FIXME: Do we really need this special case? If we really do need this, we should add a test case proving that we do.
    size_t separatorPosition;
    for (unsigned labelStart = 0; (separatorPosition = domain.find('.', labelStart)) != notFound; labelStart = separatorPosition + 1) {
        if (isPublicSuffix(domain.substring(separatorPosition + 1)))
            return domain.substring(labelStart);
    }
    return String();

Note that indices into a string are unsigned, not size_t, except for the return value from find, which is peculiar because it uses size_t.

What’s the rationale for calling decodeHostName over and over again? Could that perhaps give incorrect results if there is some special treatment at the very start of a host name? Or maybe it might be bad for performance?
Comment 6 Alex Christensen 2016-12-15 00:09:45 PST
We do need to call decodeHostName multiple times for correct behavior in the case that caused me to look into this, a url like "r4---asdf.example.com" where the first part causes ICU to fail but the rest is fine.  We could build up a Vector<String> of each section between the dots, but that would require possibly more calls to uidna_nameToUnicode.  I'm not sure whether each call to uidna_nameToUnicode slows it down because it is setting things up, or if uidna_nameToUnicode is slow internally.  I don't imagine there is much of a performance difference between the two possible implementations of this, both of which require many calls to uidna_nameToUnicode, so I'm going to commit the version you suggested.
Comment 7 Alex Christensen 2016-12-15 00:12:52 PST
Created attachment 297178 [details]
Patch
Comment 8 Alex Christensen 2016-12-15 00:20:39 PST
http://trac.webkit.org/changeset/209857