RESOLVED FIXED 61572
KURL::protocolIs(const char* protocol) asserts in Debug builds with valid protocols
https://bugs.webkit.org/show_bug.cgi?id=61572
Summary KURL::protocolIs(const char* protocol) asserts in Debug builds with valid pro...
David Kilzer (:ddkilzer)
Reported 2011-05-26 15:51:43 PDT
* SUMMARY KURL::protocolIs(const char* protocol) throws an assertion in Debug builds when a (custom) protocol is compared that contains anything other than letters: ASSERTION FAILED: isASCIILower(lowercaseLetter) Source/WebCore/platform/KURL.cpp(63) : bool WebCore::isLetterMatchIgnoringCase(UChar, char) 1 _ZN7WebCoreL25isLetterMatchIgnoringCaseEtc 2 WebCore::KURL::protocolIs(char const*) const Note that Section 3.1 of RFC 3986 <http://tools.ietf.org/html/rfc3986#section-3.1> states that a scheme must start with a letter, and be composed of letters, digits, "+", "-" and "." characters. Thus this code that assumes that all characters in the scheme are letters is incorrect: // Do the comparison without making a new string object. for (int i = 0; i < m_schemeEnd; ++i) { if (!protocol[i] || !isLetterMatchIgnoringCase(m_string[i], protocol[i])) return false; }
Attachments
Patch (3.01 KB, patch)
2011-06-20 17:22 PDT, Andy Estes
darin: review+
Darin Adler
Comment 1 2011-05-26 16:41:02 PDT
This is working as designed. The protocolIs function was designed to be used only with protocols that contain only letters. We can add the feature of checking for other protocols, and that will make it slower. We’re not assuming that all valid protocols contain entirely letters, just that all the ones we are using with this function are entirely letters.
David Kilzer (:ddkilzer)
Comment 2 2011-05-26 17:30:20 PDT
Darin Adler
Comment 3 2011-06-17 17:28:09 PDT
The issue is not just an assertion. The code actually works wrong for custom protocols with non-letters. When I originally wrote this we wanted it to be super-fast for protocols like "http" and "file" and it seemed the right tradeoff to only work with letters. If we want a slightly slower one that can work for any protocol, it's easy to change this into that. Or we could even have both, although I don't see an obvious way to automatically choose the faster one at compile time. The current function could be named fastAllLetterProtocolIs or something annoying like that.
Andy Estes
Comment 4 2011-06-17 17:32:30 PDT
*** Bug 62917 has been marked as a duplicate of this bug. ***
Darin Adler
Comment 5 2011-06-17 17:35:46 PDT
(In reply to comment #3) > The current function could be named fastAllLetterProtocolIs or something annoying like that. Not sure we need a super-fast version, though. Please also see my comments in the patch in bug 62917.
Andy Estes
Comment 6 2011-06-20 17:07:04 PDT
(In reply to comment #3) > The issue is not just an assertion. The code actually works wrong for custom protocols with non-letters. > > When I originally wrote this we wanted it to be super-fast for protocols like "http" and "file" and it seemed the right tradeoff to only work with letters. If we want a slightly slower one that can work for any protocol, it's easy to change this into that. > > Or we could even have both, although I don't see an obvious way to automatically choose the faster one at compile time. > > The current function could be named fastAllLetterProtocolIs or something annoying like that. Since the valid non-letter scheme characters ('0'-'9', '+', '-', '.') all fall in the range [32, 63], ORing with 0x20 is a no-op. I think we can make a special version of isLetterMatchIgnoringCase() that works only for protocol characters. The check will be identical to isLetterMatchIgnoringCase() but will have a more permissive assertion.
Darin Adler
Comment 7 2011-06-20 17:07:55 PDT
Good idea!
Andy Estes
Comment 8 2011-06-20 17:22:29 PDT
Darin Adler
Comment 9 2011-06-20 18:05:09 PDT
Comment on attachment 97894 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review Is the isLetterMatchIgnoringCase function used anywhere else? If not, then we probably should remove it. > Source/WebCore/platform/KURL.cpp:249 > + ASSERT(isSchemeChar(character)); > + ASSERT(isASCIILower(schemeCharacter) || (!isASCIIUpper(schemeCharacter) && isSchemeChar(schemeCharacter))); > + return (character | 0x20) == schemeCharacter; I think we want another assertion: ASSERT(schemeCharacter & 0x20);
Andy Estes
Comment 10 2011-06-20 18:50:55 PDT
(In reply to comment #9) > (From update of attachment 97894 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review > > Is the isLetterMatchIgnoringCase function used anywhere else? If not, then we probably should remove it. It's used in numerous other places in KURL. > > > Source/WebCore/platform/KURL.cpp:249 > > + ASSERT(isSchemeChar(character)); > > + ASSERT(isASCIILower(schemeCharacter) || (!isASCIIUpper(schemeCharacter) && isSchemeChar(schemeCharacter))); > > + return (character | 0x20) == schemeCharacter; > > I think we want another assertion: > > ASSERT(schemeCharacter & 0x20); Agreed. Thanks for the review!
Andy Estes
Comment 11 2011-06-20 23:05:29 PDT
Andy Estes
Comment 12 2011-06-20 23:10:48 PDT
Comment on attachment 97894 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review > Source/WebCore/platform/KURL.cpp:1788 > + if (!isSchemeCharacterMatchIgnoringCase(url[i], protocol[i])) I landed this without this change. Unlike the KURL member function, many strings are passed into the protocolIs() free function that are not normalized URLs and trigger assertions in isSchemeCharacterMatchIgnoringCase(). Of course the free function will trigger an assertion if the second argument isn't a letter-only protocol just like the member function did before this patch. I'm not sure how to handle this. It looks like some, but not all, call sites can be converted to call the KURL member function instead.
Darin Adler
Comment 13 2011-06-21 09:08:58 PDT
I suggest making a slightly less optimized version for the case where we need it. I don’t know why I was so obsessed with using the | 0x20 trick.
Note You need to log in before you can comment on or make changes to this bug.