Summary: | REGRESSION (r208902) Null pointer dereference in wkIsPublicSuffix | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Alex Christensen <achristensen> | ||||||||||
Component: | New Bugs | Assignee: | Alex Christensen <achristensen> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | ap, darin | ||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||
Version: | WebKit Nightly Build | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=144194 | ||||||||||||
Attachments: |
|
Description
Alex Christensen
2016-12-14 18:12:20 PST
Created attachment 297154 [details]
Patch
Created attachment 297161 [details]
Patch
Created attachment 297162 [details]
Patch
Comment on attachment 297162 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=297162&action=review > Source/WebCore/ChangeLog:9 > + wkIsPublicSuffix crashes if you give it a nil NSString*. This patch seems to do more than just check for nil; changes the topPrivatelyControlledDomain algorithm. > Source/WebCore/platform/mac/PublicSuffixMac.mm:49 > if (domain.isNull() || domain.isEmpty()) > return String(); Never need to check both isNull and isEmpty. Null strings will all compare as empty. if (domain.isEmpty()) return String(); But also, we should see if we can just leave this check out entirely and test that null and empty strings both work. > Source/WebCore/platform/mac/PublicSuffixMac.mm:52 > + if ([decodeHostName(domain) _web_looksLikeIPAddress]) > return domain; Do we really need to decode the host name before calling _web_looksLikeIPAddress? That does not seem necessary. The old code did it because the single call to decodeHostName was shared by the entire function, but now I suspect we can get the correct result by just calling it without decoding. > Source/WebCore/platform/mac/PublicSuffixMac.mm:67 > + Vector<size_t, 8> substringStarts; > + substringStarts.append(0); > + for (size_t i = 0; i < domain.length(); ++i) { > + if (domain[i] == '.') > + substringStarts.append(i + 1); > + } > + > // Match the longest possible suffix. > - bool hasTopLevelDomain = false; > - NSCharacterSet *dot = [[NSCharacterSet characterSetWithCharactersInString:@"."] retain]; > - NSRange nextDot = NSMakeRange(0, [host length]); > - for (NSRange previousDot = [host rangeOfCharacterFromSet:dot]; previousDot.location != NSNotFound; nextDot = previousDot, previousDot = [host rangeOfCharacterFromSet:dot options:0 range:NSMakeRange(previousDot.location + previousDot.length, [host length] - (previousDot.location + previousDot.length))]) { > - NSString *substring = [host substringFromIndex:previousDot.location + previousDot.length]; > - if (wkIsPublicSuffix(substring)) { > - hasTopLevelDomain = true; > - break; > - } > + for (size_t i = 0; i < substringStarts.size(); ++i) { > + NSString *host = decodeHostName(domain.substring(substringStarts[i])); > + if (host && wkIsPublicSuffix(host)) > + return i ? domain.substring(substringStarts[i - 1]) : String(); > } > - > - [dot release]; > - if (!hasTopLevelDomain) > - return String(); > - > - if (!nextDot.location) > - return domain; > - > - return encodeHostName([host substringFromIndex:nextDot.location + nextDot.length]); > + return String(); If we feel the need to rewrite this to use String, we don’t need to allocate a vector, even on the stack. We can just translate the old algorithm more directly. Here’s my cut at it, taking my lead from your function in changing how it works: if (isPublicSuffix(domain)) return String(); // FIXME: Do we really need this special case? If we really do need this, we should add a test case proving that we do. size_t separatorPosition; for (unsigned labelStart = 0; (separatorPosition = domain.find('.', labelStart)) != notFound; labelStart = separatorPosition + 1) { if (isPublicSuffix(domain.substring(separatorPosition + 1))) return domain.substring(labelStart); } return String(); Note that indices into a string are unsigned, not size_t, except for the return value from find, which is peculiar because it uses size_t. What’s the rationale for calling decodeHostName over and over again? Could that perhaps give incorrect results if there is some special treatment at the very start of a host name? Or maybe it might be bad for performance? We do need to call decodeHostName multiple times for correct behavior in the case that caused me to look into this, a url like "r4---asdf.example.com" where the first part causes ICU to fail but the rest is fine. We could build up a Vector<String> of each section between the dots, but that would require possibly more calls to uidna_nameToUnicode. I'm not sure whether each call to uidna_nameToUnicode slows it down because it is setting things up, or if uidna_nameToUnicode is slow internally. I don't imagine there is much of a performance difference between the two possible implementations of this, both of which require many calls to uidna_nameToUnicode, so I'm going to commit the version you suggested. Created attachment 297178 [details]
Patch
|