RESOLVED FIXED 165885
REGRESSION (r208902) Null pointer dereference in wkIsPublicSuffix
https://bugs.webkit.org/show_bug.cgi?id=165885
Summary REGRESSION (r208902) Null pointer dereference in wkIsPublicSuffix
Alex Christensen
Reported 2016-12-14 18:12:20 PST
REGRESSION (r208902) Null pointer dereference in wkIsPublicSuffix
Attachments
Patch (3.32 KB, patch)
2016-12-14 18:21 PST, Alex Christensen
no flags
Patch (4.09 KB, patch)
2016-12-14 19:56 PST, Alex Christensen
no flags
Patch (7.47 KB, patch)
2016-12-14 20:24 PST, Alex Christensen
no flags
Patch (8.17 KB, patch)
2016-12-15 00:12 PST, Alex Christensen
no flags
Alex Christensen
Comment 1 2016-12-14 18:21:28 PST
Alex Christensen
Comment 2 2016-12-14 18:24:58 PST
Alex Christensen
Comment 3 2016-12-14 19:56:12 PST
Alex Christensen
Comment 4 2016-12-14 20:24:08 PST
Darin Adler
Comment 5 2016-12-14 21:44:12 PST
Comment on attachment 297162 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=297162&action=review > Source/WebCore/ChangeLog:9 > + wkIsPublicSuffix crashes if you give it a nil NSString*. This patch seems to do more than just check for nil; changes the topPrivatelyControlledDomain algorithm. > Source/WebCore/platform/mac/PublicSuffixMac.mm:49 > if (domain.isNull() || domain.isEmpty()) > return String(); Never need to check both isNull and isEmpty. Null strings will all compare as empty. if (domain.isEmpty()) return String(); But also, we should see if we can just leave this check out entirely and test that null and empty strings both work. > Source/WebCore/platform/mac/PublicSuffixMac.mm:52 > + if ([decodeHostName(domain) _web_looksLikeIPAddress]) > return domain; Do we really need to decode the host name before calling _web_looksLikeIPAddress? That does not seem necessary. The old code did it because the single call to decodeHostName was shared by the entire function, but now I suspect we can get the correct result by just calling it without decoding. > Source/WebCore/platform/mac/PublicSuffixMac.mm:67 > + Vector<size_t, 8> substringStarts; > + substringStarts.append(0); > + for (size_t i = 0; i < domain.length(); ++i) { > + if (domain[i] == '.') > + substringStarts.append(i + 1); > + } > + > // Match the longest possible suffix. > - bool hasTopLevelDomain = false; > - NSCharacterSet *dot = [[NSCharacterSet characterSetWithCharactersInString:@"."] retain]; > - NSRange nextDot = NSMakeRange(0, [host length]); > - for (NSRange previousDot = [host rangeOfCharacterFromSet:dot]; previousDot.location != NSNotFound; nextDot = previousDot, previousDot = [host rangeOfCharacterFromSet:dot options:0 range:NSMakeRange(previousDot.location + previousDot.length, [host length] - (previousDot.location + previousDot.length))]) { > - NSString *substring = [host substringFromIndex:previousDot.location + previousDot.length]; > - if (wkIsPublicSuffix(substring)) { > - hasTopLevelDomain = true; > - break; > - } > + for (size_t i = 0; i < substringStarts.size(); ++i) { > + NSString *host = decodeHostName(domain.substring(substringStarts[i])); > + if (host && wkIsPublicSuffix(host)) > + return i ? domain.substring(substringStarts[i - 1]) : String(); > } > - > - [dot release]; > - if (!hasTopLevelDomain) > - return String(); > - > - if (!nextDot.location) > - return domain; > - > - return encodeHostName([host substringFromIndex:nextDot.location + nextDot.length]); > + return String(); If we feel the need to rewrite this to use String, we don’t need to allocate a vector, even on the stack. We can just translate the old algorithm more directly. Here’s my cut at it, taking my lead from your function in changing how it works: if (isPublicSuffix(domain)) return String(); // FIXME: Do we really need this special case? If we really do need this, we should add a test case proving that we do. size_t separatorPosition; for (unsigned labelStart = 0; (separatorPosition = domain.find('.', labelStart)) != notFound; labelStart = separatorPosition + 1) { if (isPublicSuffix(domain.substring(separatorPosition + 1))) return domain.substring(labelStart); } return String(); Note that indices into a string are unsigned, not size_t, except for the return value from find, which is peculiar because it uses size_t. What’s the rationale for calling decodeHostName over and over again? Could that perhaps give incorrect results if there is some special treatment at the very start of a host name? Or maybe it might be bad for performance?
Alex Christensen
Comment 6 2016-12-15 00:09:45 PST
We do need to call decodeHostName multiple times for correct behavior in the case that caused me to look into this, a url like "r4---asdf.example.com" where the first part causes ICU to fail but the rest is fine. We could build up a Vector<String> of each section between the dots, but that would require possibly more calls to uidna_nameToUnicode. I'm not sure whether each call to uidna_nameToUnicode slows it down because it is setting things up, or if uidna_nameToUnicode is slow internally. I don't imagine there is much of a performance difference between the two possible implementations of this, both of which require many calls to uidna_nameToUnicode, so I'm going to commit the version you suggested.
Alex Christensen
Comment 7 2016-12-15 00:12:52 PST
Alex Christensen
Comment 8 2016-12-15 00:20:39 PST
Note You need to log in before you can comment on or make changes to this bug.