61572 – KURL::protocolIs(const char* protocol) asserts in Debug builds with valid protocols

RESOLVED FIXED 61572

KURL::protocolIs(const char* protocol) asserts in Debug builds with valid protocols

https://bugs.webkit.org/show_bug.cgi?id=61572

Summary KURL::protocolIs(const char* protocol) asserts in Debug builds with valid pro...

David Kilzer (:ddkilzer)

Reported 2011-05-26 15:51:43 PDT

* SUMMARY KURL::protocolIs(const char* protocol) throws an assertion in Debug builds when a (custom) protocol is compared that contains anything other than letters: ASSERTION FAILED: isASCIILower(lowercaseLetter) Source/WebCore/platform/KURL.cpp(63) : bool WebCore::isLetterMatchIgnoringCase(UChar, char) 1 _ZN7WebCoreL25isLetterMatchIgnoringCaseEtc 2 WebCore::KURL::protocolIs(char const*) const Note that Section 3.1 of RFC 3986 <http://tools.ietf.org/html/rfc3986#section-3.1> states that a scheme must start with a letter, and be composed of letters, digits, "+", "-" and "." characters. Thus this code that assumes that all characters in the scheme are letters is incorrect: // Do the comparison without making a new string object. for (int i = 0; i < m_schemeEnd; ++i) { if (!protocol[i] || !isLetterMatchIgnoringCase(m_string[i], protocol[i])) return false; }

Attachments
Patch (3.01 KB, patch) 2011-06-20 17:22 PDT, Andy Estes	darin: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Darin Adler

Comment 1 2011-05-26 16:41:02 PDT

This is working as designed. The protocolIs function was designed to be used only with protocols that contain only letters. We can add the feature of checking for other protocols, and that will make it slower. We’re not assuming that all valid protocols contain entirely letters, just that all the ones we are using with this function are entirely letters.

David Kilzer (:ddkilzer)

Comment 2 2011-05-26 17:30:20 PDT

<rdar://problem/9510987>

Darin Adler

Comment 3 2011-06-17 17:28:09 PDT

The issue is not just an assertion. The code actually works wrong for custom protocols with non-letters. When I originally wrote this we wanted it to be super-fast for protocols like "http" and "file" and it seemed the right tradeoff to only work with letters. If we want a slightly slower one that can work for any protocol, it's easy to change this into that. Or we could even have both, although I don't see an obvious way to automatically choose the faster one at compile time. The current function could be named fastAllLetterProtocolIs or something annoying like that.

Andy Estes

Comment 4 2011-06-17 17:32:30 PDT

*** Bug 62917 has been marked as a duplicate of this bug. ***

Darin Adler

Comment 5 2011-06-17 17:35:46 PDT

(In reply to comment #3) > The current function could be named fastAllLetterProtocolIs or something annoying like that. Not sure we need a super-fast version, though. Please also see my comments in the patch in bug 62917.

Andy Estes

Comment 6 2011-06-20 17:07:04 PDT

(In reply to comment #3) > The issue is not just an assertion. The code actually works wrong for custom protocols with non-letters. > > When I originally wrote this we wanted it to be super-fast for protocols like "http" and "file" and it seemed the right tradeoff to only work with letters. If we want a slightly slower one that can work for any protocol, it's easy to change this into that. > > Or we could even have both, although I don't see an obvious way to automatically choose the faster one at compile time. > > The current function could be named fastAllLetterProtocolIs or something annoying like that. Since the valid non-letter scheme characters ('0'-'9', '+', '-', '.') all fall in the range [32, 63], ORing with 0x20 is a no-op. I think we can make a special version of isLetterMatchIgnoringCase() that works only for protocol characters. The check will be identical to isLetterMatchIgnoringCase() but will have a more permissive assertion.

Darin Adler

Comment 7 2011-06-20 17:07:55 PDT

Good idea!

Andy Estes

Comment 8 2011-06-20 17:22:29 PDT

Created attachment 97894 [details] Patch

Darin Adler

Comment 9 2011-06-20 18:05:09 PDT

Comment on attachment 97894 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review Is the isLetterMatchIgnoringCase function used anywhere else? If not, then we probably should remove it. > Source/WebCore/platform/KURL.cpp:249 > + ASSERT(isSchemeChar(character)); > + ASSERT(isASCIILower(schemeCharacter) || (!isASCIIUpper(schemeCharacter) && isSchemeChar(schemeCharacter))); > + return (character | 0x20) == schemeCharacter; I think we want another assertion: ASSERT(schemeCharacter & 0x20);

Andy Estes

Comment 10 2011-06-20 18:50:55 PDT

(In reply to comment #9) > (From update of attachment 97894 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review > > Is the isLetterMatchIgnoringCase function used anywhere else? If not, then we probably should remove it. It's used in numerous other places in KURL. > > > Source/WebCore/platform/KURL.cpp:249 > > + ASSERT(isSchemeChar(character)); > > + ASSERT(isASCIILower(schemeCharacter) || (!isASCIIUpper(schemeCharacter) && isSchemeChar(schemeCharacter))); > > + return (character | 0x20) == schemeCharacter; > > I think we want another assertion: > > ASSERT(schemeCharacter & 0x20); Agreed. Thanks for the review!

Andy Estes

Comment 11 2011-06-20 23:05:29 PDT

Committed r89336: <http://trac.webkit.org/changeset/89336>

Andy Estes

Comment 12 2011-06-20 23:10:48 PDT

Comment on attachment 97894 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review > Source/WebCore/platform/KURL.cpp:1788 > + if (!isSchemeCharacterMatchIgnoringCase(url[i], protocol[i])) I landed this without this change. Unlike the KURL member function, many strings are passed into the protocolIs() free function that are not normalized URLs and trigger assertions in isSchemeCharacterMatchIgnoringCase(). Of course the free function will trigger an assertion if the second argument isn't a letter-only protocol just like the member function did before this patch. I'm not sure how to handle this. It looks like some, but not all, call sites can be converted to call the KURL member function instead.

Darin Adler

Comment 13 2011-06-21 09:08:58 PDT

I suggest making a slightly less optimized version for the case where we need it. I don’t know why I was so obsessed with using the | 0x20 trick.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component WebCore Misc.

Assignee

Andy Estes

Reported

2011-05-26 15:51 PDT

Modified

2011-06-21 09:08 PDT History

CC List

3 users Show

URL

Keywords InRadar

Duplicates (1)

62917 View as bug list

Depends on

Blocks