Bug 61572 - KURL::protocolIs(const char* protocol) asserts in Debug builds with valid protocols
Summary: KURL::protocolIs(const char* protocol) asserts in Debug builds with valid pro...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Andy Estes
URL:
Keywords: InRadar
: 62917 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-05-26 15:51 PDT by David Kilzer (:ddkilzer)
Modified: 2011-06-21 09:08 PDT (History)
3 users (show)

See Also:


Attachments
Patch (3.01 KB, patch)
2011-06-20 17:22 PDT, Andy Estes
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Kilzer (:ddkilzer) 2011-05-26 15:51:43 PDT
* SUMMARY
KURL::protocolIs(const char* protocol) throws an assertion in Debug builds when a (custom) protocol is compared that contains anything other than letters:

ASSERTION FAILED: isASCIILower(lowercaseLetter)
Source/WebCore/platform/KURL.cpp(63) : bool WebCore::isLetterMatchIgnoringCase(UChar, char)
1   _ZN7WebCoreL25isLetterMatchIgnoringCaseEtc
2   WebCore::KURL::protocolIs(char const*) const

Note that Section 3.1 of RFC 3986 <http://tools.ietf.org/html/rfc3986#section-3.1> states that a scheme must start with a letter, and be composed of letters, digits, "+", "-" and "." characters.  Thus this code that assumes that all characters in the scheme are letters is incorrect:

    // Do the comparison without making a new string object.
    for (int i = 0; i < m_schemeEnd; ++i) {
        if (!protocol[i] || !isLetterMatchIgnoringCase(m_string[i], protocol[i]))
            return false;
    }
Comment 1 Darin Adler 2011-05-26 16:41:02 PDT
This is working as designed.

The protocolIs function was designed to be used only with protocols that contain only letters. We can add the feature of checking for other protocols, and that will make it slower. We’re not assuming that all valid protocols contain entirely letters, just that all the ones we are using with this function are entirely letters.
Comment 2 David Kilzer (:ddkilzer) 2011-05-26 17:30:20 PDT
<rdar://problem/9510987>
Comment 3 Darin Adler 2011-06-17 17:28:09 PDT
The issue is not just an assertion. The code actually works wrong for custom protocols with non-letters.

When I originally wrote this we wanted it to be super-fast for protocols like "http" and "file" and it seemed the right tradeoff to only work with letters. If we want a slightly slower one that can work for any protocol, it's easy to change this into that.

Or we could even have both, although I don't see an obvious way to automatically choose the faster one at compile time.

The current function could be named fastAllLetterProtocolIs or something annoying like that.
Comment 4 Andy Estes 2011-06-17 17:32:30 PDT
*** Bug 62917 has been marked as a duplicate of this bug. ***
Comment 5 Darin Adler 2011-06-17 17:35:46 PDT
(In reply to comment #3)
> The current function could be named fastAllLetterProtocolIs or something annoying like that.

Not sure we need a super-fast version, though.

Please also see my comments in the patch in bug 62917.
Comment 6 Andy Estes 2011-06-20 17:07:04 PDT
(In reply to comment #3)
> The issue is not just an assertion. The code actually works wrong for custom protocols with non-letters.
> 
> When I originally wrote this we wanted it to be super-fast for protocols like "http" and "file" and it seemed the right tradeoff to only work with letters. If we want a slightly slower one that can work for any protocol, it's easy to change this into that.
> 
> Or we could even have both, although I don't see an obvious way to automatically choose the faster one at compile time.
> 
> The current function could be named fastAllLetterProtocolIs or something annoying like that.

Since the valid non-letter scheme characters ('0'-'9', '+', '-', '.') all fall in the range [32, 63], ORing with 0x20 is a no-op. I think we can make a special version of isLetterMatchIgnoringCase() that works only for protocol characters. The check will be identical to isLetterMatchIgnoringCase() but will have a more permissive assertion.
Comment 7 Darin Adler 2011-06-20 17:07:55 PDT
Good idea!
Comment 8 Andy Estes 2011-06-20 17:22:29 PDT
Created attachment 97894 [details]
Patch
Comment 9 Darin Adler 2011-06-20 18:05:09 PDT
Comment on attachment 97894 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review

Is the isLetterMatchIgnoringCase function used anywhere else? If not, then we probably should remove it.

> Source/WebCore/platform/KURL.cpp:249
> +    ASSERT(isSchemeChar(character));
> +    ASSERT(isASCIILower(schemeCharacter) || (!isASCIIUpper(schemeCharacter) && isSchemeChar(schemeCharacter)));
> +    return (character | 0x20) == schemeCharacter;

I think we want another assertion:

    ASSERT(schemeCharacter & 0x20);
Comment 10 Andy Estes 2011-06-20 18:50:55 PDT
(In reply to comment #9)
> (From update of attachment 97894 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review
> 
> Is the isLetterMatchIgnoringCase function used anywhere else? If not, then we probably should remove it.

It's used in numerous other places in KURL.

> 
> > Source/WebCore/platform/KURL.cpp:249
> > +    ASSERT(isSchemeChar(character));
> > +    ASSERT(isASCIILower(schemeCharacter) || (!isASCIIUpper(schemeCharacter) && isSchemeChar(schemeCharacter)));
> > +    return (character | 0x20) == schemeCharacter;
> 
> I think we want another assertion:
> 
>     ASSERT(schemeCharacter & 0x20);

Agreed. Thanks for the review!
Comment 11 Andy Estes 2011-06-20 23:05:29 PDT
Committed r89336: <http://trac.webkit.org/changeset/89336>
Comment 12 Andy Estes 2011-06-20 23:10:48 PDT
Comment on attachment 97894 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=97894&action=review

> Source/WebCore/platform/KURL.cpp:1788
> +        if (!isSchemeCharacterMatchIgnoringCase(url[i], protocol[i]))

I landed this without this change. Unlike the KURL member function, many strings are passed into the protocolIs() free function that are not normalized URLs and trigger assertions in isSchemeCharacterMatchIgnoringCase(). Of course the free function will trigger an assertion if the second argument isn't a letter-only protocol just like the member function did before this patch.

I'm not sure how to handle this. It looks like some, but not all, call sites can be converted to call the KURL member function instead.
Comment 13 Darin Adler 2011-06-21 09:08:58 PDT
I suggest making a slightly less optimized version for the case where we need it. I don’t know why I was so obsessed with using the | 0x20 trick.