Bug 155602

Summary: Refactor WebSockets handshake to use StringView instead of String for header validation
Product: WebKit Reporter: John Wilander <wilander>
Component: WebCore Misc.Assignee: John Wilander <wilander>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, aestes, ap, bburg, bfulgham, buildbot, commit-queue, darin, ossy, peavo, rniwa, wilander
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 82714    
Bug Blocks: 156255, 157057, 157157    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Archive of layout-test-results from ews103 for mac-yosemite
none
Archive of layout-test-results from ews106 for mac-yosemite-wk2
none
Archive of layout-test-results from ews121 for ios-simulator-wk2
none
Archive of layout-test-results from ews115 for mac-yosemite
none
Patch
none
Patch
none
Patch none

Description John Wilander 2016-03-17 15:26:29 PDT
WebSocketsHandshake.cpp uses String for header validation. A switch to StringView would cut down on memory allocation. This is a follow-up on Darin Adler's review comments in  https://bugs.webkit.org/show_bug.cgi?id=82714#c36.
Comment 1 John Wilander 2016-04-04 21:44:02 PDT
Created attachment 275634 [details]
Patch
Comment 2 Brent Fulgham 2016-04-05 10:28:19 PDT
Comment on attachment 275634 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=275634&action=review

A very nice improvement! The patch has some issues with cross-platform code that will need to be fixed before we can land this. r- to fix the EFL/GTK/Windows builds.

> Source/WebCore/platform/network/ResourceResponseBase.h:87
> +    void addHTTPHeaderField(HTTPHeaderName, const String& value);

If you make a change like this, you need to make sure to check GTK, EFL, and Windows code for uses of this API.

For example, the Windows build is failing because it expected a string argument here:

HRESULT WebMutableURLRequest::addValue(_In_ BSTR value, _In_ BSTR field)
{
    m_request.addHTTPHeaderField(WTF::AtomicString(value, SysStringLen(value)), String(field, SysStringLen(field)));
    return S_OK;
}

EFL/GTK probably have similar issues.

You should grep through the source repository to find uses of they method.
Comment 3 Brent Fulgham 2016-04-05 10:30:39 PDT
Comment on attachment 275634 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=275634&action=review

>> Source/WebCore/platform/network/ResourceResponseBase.h:87
>> +    void addHTTPHeaderField(HTTPHeaderName, const String& value);
> 
> If you make a change like this, you need to make sure to check GTK, EFL, and Windows code for uses of this API.
> 
> For example, the Windows build is failing because it expected a string argument here:
> 
> HRESULT WebMutableURLRequest::addValue(_In_ BSTR value, _In_ BSTR field)
> {
>     m_request.addHTTPHeaderField(WTF::AtomicString(value, SysStringLen(value)), String(field, SysStringLen(field)));
>     return S_OK;
> }
> 
> EFL/GTK probably have similar issues.
> 
> You should grep through the source repository to find uses of they method.

You might need to add WEBCORE_EXPORT to the signature for:

bool findHTTPHeaderName(StringView, HTTPHeaderName&);
Comment 4 John Wilander 2016-04-05 15:14:57 PDT
Created attachment 275701 [details]
Patch
Comment 5 John Wilander 2016-04-05 15:26:08 PDT
(In reply to comment #3)
> Comment on attachment 275634 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=275634&action=review
> 
> >> Source/WebCore/platform/network/ResourceResponseBase.h:87
> >> +    void addHTTPHeaderField(HTTPHeaderName, const String& value);
> > 
> > If you make a change like this, you need to make sure to check GTK, EFL, and Windows code for uses of this API.
> > 
> > For example, the Windows build is failing because it expected a string argument here:
> > 
> > HRESULT WebMutableURLRequest::addValue(_In_ BSTR value, _In_ BSTR field)
> > {
> >     m_request.addHTTPHeaderField(WTF::AtomicString(value, SysStringLen(value)), String(field, SysStringLen(field)));
> >     return S_OK;
> > }
> > 
> > EFL/GTK probably have similar issues.
> > 
> > You should grep through the source repository to find uses of they method.

Thanks! I went through it and decided to instead keep the addHTTPHeaderField() function which accepts a String reference. It is not obvious how to handle failures in finding a matching HTTPHeaderName value and this patch is intended to refactor WebSockets code. Filed https://bugs.webkit.org/show_bug.cgi?id=156255 to track the work.

> 
> You might need to add WEBCORE_EXPORT to the signature for:
> 
> bool findHTTPHeaderName(StringView, HTTPHeaderName&);

I don't think so. Did not touch that code.
Comment 6 John Wilander 2016-04-05 15:31:17 PDT
Oh, I see now what you meant with adding WEBCORE_EXPORT to findHTTPHeaderName(). That is probably needed to change the Win/GTK/EFL code.
Comment 7 John Wilander 2016-04-05 17:52:20 PDT
Created attachment 275730 [details]
Patch
Comment 8 Brent Fulgham 2016-04-05 18:02:33 PDT
Comment on attachment 275730 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=275730&action=review

I think this is pretty much ready to land. I'm concerned about the 'delete' you are using on the template declaration, but otherwise I think this looks good. Since you make a one-word change in a WK2 file, I guess I'll have to leave the review incomplete. :-\

> Source/WebCore/platform/network/ResourceRequestBase.cpp:478
> +void ResourceRequestBase::addHTTPHeaderField(const String& headerNameStr, const String& value)

It's a shame this code has to be duplicated, but I don't see an easy way to combine them such that you get 'updateResourceRequest' and so forth to happen.

You could short-circuit if headerNameStr is not a valid HTTPHeaderName, but then you would skip 'updateResourceRequest', so the behavior would not be the same.

> Source/WebCore/platform/network/ResourceResponseBase.h:93
> +    template<size_t length> void addHTTPHeaderField(HTTPHeaderName, const String&) = delete;

Do you really want to delete this template, given the comment that suggests you use the HTTPHeaderName version of the template?
Comment 9 John Wilander 2016-04-05 18:20:01 PDT
Created attachment 275736 [details]
Patch
Comment 10 John Wilander 2016-04-05 18:25:37 PDT
(In reply to comment #8)
> Comment on attachment 275730 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=275730&action=review
> 
> I think this is pretty much ready to land. I'm concerned about the 'delete'
> you are using on the template declaration, but otherwise I think this looks
> good. Since you make a one-word change in a WK2 file, I guess I'll have to
> leave the review incomplete. :-\
> 
> > Source/WebCore/platform/network/ResourceRequestBase.cpp:478
> > +void ResourceRequestBase::addHTTPHeaderField(const String& headerNameStr, const String& value)
> 
> It's a shame this code has to be duplicated, but I don't see an easy way to
> combine them such that you get 'updateResourceRequest' and so forth to
> happen.
> 
> You could short-circuit if headerNameStr is not a valid HTTPHeaderName, but
> then you would skip 'updateResourceRequest', so the behavior would not be
> the same.

That was my conclusion too. I took a few stabs at it but it either got weird or duplicated.

> > Source/WebCore/platform/network/ResourceResponseBase.h:93
> > +    template<size_t length> void addHTTPHeaderField(HTTPHeaderName, const String&) = delete;
> 
> Do you really want to delete this template, given the comment that suggests
> you use the HTTPHeaderName version of the template?

Oh, didn't see that comment before uploading a new patch with fixes for the GTK build. I'll address it.
Comment 11 John Wilander 2016-04-05 18:34:41 PDT
Comment on attachment 275736 [details]
Patch

Obsoleting the patch because of build error.
Comment 12 John Wilander 2016-04-06 11:44:50 PDT
Created attachment 275792 [details]
Patch
Comment 13 John Wilander 2016-04-06 13:43:48 PDT
Whoever reviews this, please take a look at this call in WebKit2/UIProcess/InspectorServer/HTTPRequest.cpp:

m_headerFields.add(name.toStringWithoutCopying(), value);

I think it's safe to send the string that way since previously we would just send the String variable.
Comment 14 Andy Estes 2016-04-06 14:49:03 PDT
Comment on attachment 275792 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=275792&action=review

I didn't have time for a full review, but I did have a few comments.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:399
> +    const String httpVersionStaticPreamble = "HTTP/";

You should use ASCIILiteral here to avoid a copy.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:411
> +    if (httpStatusLine.length() < httpVersionStaticPreamble.length() + 3)
>          return false;
>  
>      // Check that the version number is 1.1 or above
>      bool versionNumberIsOneDotOneOrAbove = false;
> -    if (httpVersionString.characterAt(posOfHttpVersionNumberString + 1) == '.') {
> -        UChar majorVersion = httpVersionString.characterAt(posOfHttpVersionNumberString);
> -        UChar minorVersion = httpVersionString.characterAt(posOfHttpVersionNumberString + 2);
> +    if (httpStatusLine[httpVersionStaticPreamble.length() + 1] == '.') {
> +        UChar majorVersion = httpStatusLine[httpVersionStaticPreamble.length()];
> +        UChar minorVersion = httpStatusLine[httpVersionStaticPreamble.length() + 2];

I'd store the result of httpVersionStaticPreamble.length() in a local variable instead of calling it repeatedly.

Also, I think this logic is technically wrong. https://tools.ietf.org/html/rfc2616#section-3.1 says the following:

   Note that the major and minor numbers MUST be treated as separate
   integers and that each MAY be incremented higher than a single digit.
   Thus, HTTP/2.4 is a lower version than HTTP/2.13, which in turn is
   lower than HTTP/12.3. Leading zeros MUST be ignored by recipients and
   MUST NOT be sent.

Not that we'd likely encounter multi-digit HTTP versions in practice, but the RFC does allow for it.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:475
> +    StringView httpStatusLine(String(header, space1 - header));

Why change this variable name? httpVersionString seemed more reflective of what this value should represent (the HTTP-Version production) than httpStatusLine does.

And unless I'm missing something, this doesn't seem to actually address Darin's suggestion in https://bugs.webkit.org/show_bug.cgi?id=82714#c36. You are still making a copy by virtue of the call to String(). Wrapping that copy in a StringView doesn't help.

> Source/WebCore/platform/network/ResourceResponseBase.cpp:354
> -
> +    

Unnecessary whitespace.
Comment 15 Andy Estes 2016-04-06 14:59:02 PDT
(In reply to comment #14)
> Comment on attachment 275792 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=275792&action=review
> 
> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:475
> > +    StringView httpStatusLine(String(header, space1 - header));
> 
> Why change this variable name? httpVersionString seemed more reflective of
> what this value should represent (the HTTP-Version production) than
> httpStatusLine does.
> 
> And unless I'm missing something, this doesn't seem to actually address
> Darin's suggestion in https://bugs.webkit.org/show_bug.cgi?id=82714#c36. You
> are still making a copy by virtue of the call to String(). Wrapping that
> copy in a StringView doesn't help.

Even worse, this actually introduces a use-after-free. StringView is a non-owning reference to a string, so it won't keep the temporary you pass it alive. When that temporary goes out of scope at the end of the statement, the StringView is left with a dangling pointer.
Comment 16 Andy Estes 2016-04-06 17:08:15 PDT
Comment on attachment 275792 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=275792&action=review

Ok, I had time to finish my review. r- due to the use-after-free issues, but there's also some additional cleanup that needs to be done here.

> Source/WebCore/platform/network/HTTPParsers.cpp:683
> -    nameStr = String::fromUTF8(name.data(), name.size());
> +    nameStr = StringView(String::fromUTF8(name.data(), name.size()));

This also introduces a use-after-free.

> Source/WebCore/platform/network/ResourceRequestBase.cpp:468
> +void ResourceRequestBase::addHTTPHeaderField(HTTPHeaderName headerName, const String& value)

I'd just call the first parameter "name". We already know it's the name of a header from looking at the type and the function name.

> Source/WebCore/platform/network/ResourceRequestBase.h:90
> -        void addHTTPHeaderField(const String& name, const String& value);
> +        void addHTTPHeaderField(HTTPHeaderName, const String& value);
> +        void addHTTPHeaderField(const String& headerNameStr, const String& value);

I don't see why the signature that takes a const String& had to change. This just obscures useful blame information. Given the context, calling the first parameter "name" already seemed clear enough.

> Source/WebCore/platform/network/ResourceRequestBase.h:96
> -        template<size_t length> void addHTTPHeaderField(const char (&)[length], const String&) = delete;
> +        template<size_t length> void addHTTPHeaderField(HTTPHeaderName, const String&) = delete;
> +        template<size_t length> void addHTTPHeaderField(const String&, const String&) = delete;

I don't understand this change. The point here is to prevent string literals from being used instead of HTTPHeaderNames. But it looks like you're trying to prevent HTTPHeaderNames and non-literal strings. Why?

> Source/WebCore/platform/network/ResourceResponseBase.cpp:337
> +void ResourceResponseBase::addHTTPHeaderField(HTTPHeaderName headerName, const String& value)

I'd just call the first parameter "name".

> Source/WebCore/platform/network/ResourceResponseBase.cpp:344
> +void ResourceResponseBase::addHTTPHeaderField(const String& headerNameStr, const String& value)

Ditto.

> Source/WebCore/platform/network/ResourceResponseBase.cpp:352
> +    if (findHTTPHeaderName(headerNameStr, headerName))
> +        addHTTPHeaderField(headerName, value);
> +    else {
> +        lazyInit(AllFields);
> +        m_httpHeaderFields.add(headerNameStr, value);
> +    }

The call to findHTTPHeaderName() is unnecessary, since the string version of HTTPHeaderMap::add() already does this for you. I'd just leave this function unchanged.

> Source/WebCore/platform/network/ResourceResponseBase.h:93
> -    template<size_t length> void addHTTPHeaderField(const char (&)[length], const String&) = delete;
> +    template<size_t length> void addHTTPHeaderField(HTTPHeaderName, const String&);

Again, I don't understand this change. Why are you removing the deleted declaration of addHTTPHeaderField() that takes a string literal? Why are you declaring a function template that you never implement?

> Source/WebKit2/UIProcess/InspectorServer/HTTPRequest.cpp:95
> -        m_headerFields.add(name, value);
> +        m_headerFields.add(name.toStringWithoutCopying(), value);

As I mentioned above in my comment regarding parseHTTPHeadeR(), "name" will be left with a dangling pointer after parseHTTPHeader() returns. If that weren't the case, then this would be ok, as HTTPHeaderMap will ultimately copy the string into its HashMap.
Comment 17 Andy Estes 2016-04-06 17:26:24 PDT
The two UAF issues found here might've been caught at compile time if StringView deleted its constructors that take String&& and StringImpl&&. I'll file a bug about this.
Comment 18 Andy Estes 2016-04-07 11:59:03 PDT
(In reply to comment #17)
> The two UAF issues found here might've been caught at compile time if
> StringView deleted its constructors that take String&& and StringImpl&&.
> I'll file a bug about this.

Strike that. There's at least one valid use case for passing a String(View) rvalue to a StringView constructor, which is passing an rvalue argument to a StringView function parameter. That's safe. I think we can just rely on the CHECK_STRINGVIEW_LIFETIME machinery to catch mistakes like this.
Comment 19 John Wilander 2016-04-11 20:06:48 PDT
Created attachment 276203 [details]
Patch
Comment 20 John Wilander 2016-04-11 20:13:07 PDT
Thank you, Andy! Your review was very helpful here. I had obviously misunderstood how StringView works. The deleted functions were accidental leftovers from when I worked through this – thanks for catching that.

The new patch addresses all your comments except for 

> The call to findHTTPHeaderName() is unnecessary, since the string version of
> HTTPHeaderMap::add() already does this for you. I'd just leave this function
> unchanged.

… regarding this code:

void ResourceResponseBase::addHTTPHeaderField(const String& name, const String& value)
{
    HTTPHeaderName headerName;
    if (findHTTPHeaderName(name, headerName))
        addHTTPHeaderField(headerName, value);
    else {
        lazyInit(AllFields);
        m_httpHeaderFields.add(name, value);
    }
}

The call to findHTTPHeaderName() was there in the exiting version. The reason I changed it was to not have two duplicated functions. updateHeaderParsedState(name) is only called if we find an HTTP header.

Please review.
Comment 21 Build Bot 2016-04-11 21:01:25 PDT
Comment on attachment 276203 [details]
Patch

Attachment 276203 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/1139548

New failing tests:
http/tests/websocket/tests/hybi/handshake-ok-with-http-version-beyond-1_1.html
Comment 22 Build Bot 2016-04-11 21:01:27 PDT
Created attachment 276209 [details]
Archive of layout-test-results from ews103 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 23 Build Bot 2016-04-11 21:06:13 PDT
Comment on attachment 276203 [details]
Patch

Attachment 276203 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/1139575

New failing tests:
http/tests/websocket/tests/hybi/handshake-ok-with-http-version-beyond-1_1.html
Comment 24 Build Bot 2016-04-11 21:06:17 PDT
Created attachment 276211 [details]
Archive of layout-test-results from ews106 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 25 Build Bot 2016-04-11 21:09:41 PDT
Comment on attachment 276203 [details]
Patch

Attachment 276203 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/1139557

New failing tests:
http/tests/websocket/tests/hybi/handshake-ok-with-http-version-beyond-1_1.html
Comment 26 Build Bot 2016-04-11 21:09:45 PDT
Created attachment 276212 [details]
Archive of layout-test-results from ews121 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews121  Port: ios-simulator-wk2  Platform: Mac OS X 10.10.5
Comment 27 Build Bot 2016-04-12 01:27:28 PDT
Comment on attachment 276203 [details]
Patch

Attachment 276203 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1140553

New failing tests:
http/tests/websocket/tests/hybi/handshake-ok-with-http-version-beyond-1_1.html
Comment 28 Build Bot 2016-04-12 01:27:31 PDT
Created attachment 276224 [details]
Archive of layout-test-results from ews115 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 29 John Wilander 2016-04-12 10:49:40 PDT
Created attachment 276245 [details]
Patch
Comment 30 John Wilander 2016-04-12 11:21:30 PDT
The changed test got a different ordering of console messages on the build bots than on my machine. Fixed a strict ordering in the new patch.
Comment 31 Darin Adler 2016-04-12 17:19:27 PDT
Comment on attachment 276245 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276245&action=review

I like the direction of this patch.

This does more than use StringView; it changes our algorithm to correctly process multi-digit HTTP version numeric components. Is that something that’s important to support?

review- because the version of parseHTTPHeader here returns a StringView pointing to deallocated memory.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:399
> +static inline bool isDigit(const UChar character)
> +{
> +    return (character >= '1' || character <= '9');
> +}

It’s strange to name it this way and not include '0'. There’s an existing function for this, isASCIIDigit, in ASCIICType.h but it does include '0'. We should use that.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:403
> +static inline bool headerHasValidHTTPVersion(const StringView& httpStatusLine)

We can use the type StringView rather than const StringView& for an argument like this. It’s actually more efficient in release builds!

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:406
> +    if (!httpStatusLine.startsWith(httpVersionStaticPreamble))

Normally we’d want to just call startsWith and pass in a const char*, rather than allocating and destroying a string every time.

If startsWith doesn’t have the correct overload in StringView, we should add it.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:410
> +    size_t preambleLength = httpVersionStaticPreamble.length();

This can be unsigned instead of size_t. StringView lengths are unsigned.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:414
> +    size_t dotPosition = httpStatusLine.find('.', preambleLength);

I suggest using "auto" here instead of size_t. We need size_t here only because of the notFound constant. I think auto is the best way to express the fact that we want the correct type for the return of find.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:420
> +    int majorVersion = majorVersionView.toInt(isValid);

I think we want toIntStrict; we don’t want to tolerate trailing non-numeric junk characters, which toInt tolerates.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:424
> +    size_t endOfMinorVersion;

An index within a StringView is an unsigned, not a size_t, so I think e can use size_t here.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:425
> +    for (endOfMinorVersion = 1; endOfMinorVersion < (httpStatusLine.length() - dotPosition); endOfMinorVersion++) {

I think we should compute "length - dotPosition" and put it in a local; I think it’s not trivial computation.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:426
> +        if (!isDigit(httpStatusLine[dotPosition + endOfMinorVersion]))

Seeing this makes it clear that we need to use isASCIIDigit and not the isDigit function above; we don’t want to reject '0'!

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:429
> +    int minorVersion = (httpStatusLine.substring(dotPosition + 1, dotPosition + endOfMinorVersion)).toInt(isValid);

I think we want toIntStrict; we don’t want to tolerate trailing non-numeric junk characters, which toInt tolerates.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:433
> +    return ((majorVersion >= 1 && minorVersion >= 1) || majorVersion >= 2);

I suggest we remove the outer parentheses here.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:492
> +    StringView httpStatusLine((LChar*)header, space1 - header);

We normally prefer to use C++ casts, not C-style, so this should be reinterpret_cast<LChar*>(header).

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:498
> +    StringView statusCodeString((LChar*)(space1 + 1), space2 - space1 - 1);

Ditto.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:508
> +    statusCode = statusCodeString.toInt(ok);

I think we may want toIntStrict here; do we want to tolerate trailing non-numeric junk characters? Do we have tests to cover that?

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:535
> +            m_failureReason = "Unknown header name " + name.toStringWithoutCopying() + " in HTTP response";

I think this should use makeString instead of the + operator; then we will save allocating name as a string.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:545
> +            m_failureReason = name.toStringWithoutCopying() + " header value should only contain ASCII characters";

I think this should use makeString instead of the + operator; then we will save allocating name as a string.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:562
> +                    m_failureReason = "The Sec-WebSocket-Accept header must not appear more than once in an HTTP response";

Probably should use ASCIILiteral. Not critical.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:568
> +                    m_failureReason = "The Sec-WebSocket-Protocol header must not appear more than once in an HTTP response";

Probably should use ASCIILiteral. Not critical.

> Source/WebCore/platform/network/HTTPParsers.cpp:663
> +    if ((String::fromUTF8(namePtr, nameSize)).isNull()) {

We don’t need to create a string from UTF-8 just to check if it’s valid UTF-8. There are lower level functions in WTF that we can use to check that.

> Source/WebCore/platform/network/HTTPParsers.cpp:667
> +    nameStr = StringView((LChar*)namePtr, nameSize);

This isn’t safe. We are creating a StringView pointing into the local name vector, but then we will return and the name vector will be destroyed and the StringView will be pointing to deallocated memory.

> Source/WebCore/platform/network/ResourceResponseBase.cpp:354
> -
> +    

Shouldn’t add this leading whitespace.

> Source/WebKit2/UIProcess/InspectorServer/HTTPRequest.cpp:95
> +        m_headerFields.add(name.toStringWithoutCopying(), value);

I think this needs to use toString(). It’s not safe to use toStringWithoutCopying() here.
Comment 32 John Wilander 2016-04-13 17:53:24 PDT
(In reply to comment #31)
> Comment on attachment 276245 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=276245&action=review
> 
> I like the direction of this patch.
> 
> This does more than use StringView; it changes our algorithm to correctly
> process multi-digit HTTP version numeric components. Is that something
> that’s important to support?

Andy commented earlier that if we restrict WebSockets to version >=1.1 we should allow for multi-digit version numbers and correctly interpret major/minor version.

> review- because the version of parseHTTPHeader here returns a StringView
> pointing to deallocated memory.
> 
> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:399
> > +static inline bool isDigit(const UChar character)
> > +{
> > +    return (character >= '1' || character <= '9');
> > +}
> 
> It’s strange to name it this way and not include '0'. There’s an existing
> function for this, isASCIIDigit, in ASCIICType.h but it does include '0'. We
> should use that.

Thanks for catching this one! Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:403
> > +static inline bool headerHasValidHTTPVersion(const StringView& httpStatusLine)
> 
> We can use the type StringView rather than const StringView& for an argument
> like this. It’s actually more efficient in release builds!

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:406
> > +    if (!httpStatusLine.startsWith(httpVersionStaticPreamble))
> 
> Normally we’d want to just call startsWith and pass in a const char*, rather
> than allocating and destroying a string every time.
> 
> If startsWith doesn’t have the correct overload in StringView, we should add
> it.

Fixed with
static NeverDestroyed<String> httpVersionStaticPreamble(ASCIILiteral("HTTP/"));
… after a discussion with David Kilzer.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:410
> > +    size_t preambleLength = httpVersionStaticPreamble.length();
> 
> This can be unsigned instead of size_t. StringView lengths are unsigned.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:414
> > +    size_t dotPosition = httpStatusLine.find('.', preambleLength);
> 
> I suggest using "auto" here instead of size_t. We need size_t here only
> because of the notFound constant. I think auto is the best way to express
> the fact that we want the correct type for the return of find.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:420
> > +    int majorVersion = majorVersionView.toInt(isValid);
> 
> I think we want toIntStrict; we don’t want to tolerate trailing non-numeric
> junk characters, which toInt tolerates.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:424
> > +    size_t endOfMinorVersion;
> 
> An index within a StringView is an unsigned, not a size_t, so I think e can
> use size_t here.

Fixed. (Interpreted as a request to use unsigned, not size_t.)

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:425
> > +    for (endOfMinorVersion = 1; endOfMinorVersion < (httpStatusLine.length() - dotPosition); endOfMinorVersion++) {
> 
> I think we should compute "length - dotPosition" and put it in a local; I
> think it’s not trivial computation.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:426
> > +        if (!isDigit(httpStatusLine[dotPosition + endOfMinorVersion]))
> 
> Seeing this makes it clear that we need to use isASCIIDigit and not the
> isDigit function above; we don’t want to reject '0'!

As mentioned above, fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:429
> > +    int minorVersion = (httpStatusLine.substring(dotPosition + 1, dotPosition + endOfMinorVersion)).toInt(isValid);
> 
> I think we want toIntStrict; we don’t want to tolerate trailing non-numeric
> junk characters, which toInt tolerates.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:433
> > +    return ((majorVersion >= 1 && minorVersion >= 1) || majorVersion >= 2);
> 
> I suggest we remove the outer parentheses here.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:492
> > +    StringView httpStatusLine((LChar*)header, space1 - header);
> 
> We normally prefer to use C++ casts, not C-style, so this should be
> reinterpret_cast<LChar*>(header).

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:498
> > +    StringView statusCodeString((LChar*)(space1 + 1), space2 - space1 - 1);
> 
> Ditto.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:508
> > +    statusCode = statusCodeString.toInt(ok);
> 
> I think we may want toIntStrict here; do we want to tolerate trailing
> non-numeric junk characters? Do we have tests to cover that?

Fixed. Added another test to cover trailing junk.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:535
> > +            m_failureReason = "Unknown header name " + name.toStringWithoutCopying() + " in HTTP response";
> 
> I think this should use makeString instead of the + operator; then we will
> save allocating name as a string.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:545
> > +            m_failureReason = name.toStringWithoutCopying() + " header value should only contain ASCII characters";
> 
> I think this should use makeString instead of the + operator; then we will
> save allocating name as a string.

Fixed. I went ahead and fixed it in other places too in the files I changed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:562
> > +                    m_failureReason = "The Sec-WebSocket-Accept header must not appear more than once in an HTTP response";
> 
> Probably should use ASCIILiteral. Not critical.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:568
> > +                    m_failureReason = "The Sec-WebSocket-Protocol header must not appear more than once in an HTTP response";
> 
> Probably should use ASCIILiteral. Not critical.

Fixed. I went ahead and fixed it in other places too in the files I changed.

> > Source/WebCore/platform/network/HTTPParsers.cpp:663
> > +    if ((String::fromUTF8(namePtr, nameSize)).isNull()) {
> 
> We don’t need to create a string from UTF-8 just to check if it’s valid
> UTF-8. There are lower level functions in WTF that we can use to check that.

I searched in WTF and the rest of WebKit but couldn't find a stand-alone UTF8 checker. There is one returning an AtomicString but I'm not sure that would be useful.

Am I missing something? If not, should I file a bug?

> > Source/WebCore/platform/network/HTTPParsers.cpp:667
> > +    nameStr = StringView((LChar*)namePtr, nameSize);
> 
> This isn’t safe. We are creating a StringView pointing into the local name
> vector, but then we will return and the name vector will be destroyed and
> the StringView will be pointing to deallocated memory.

Fixed by having namePtr point into the incoming char buffer instead.

> > Source/WebCore/platform/network/ResourceResponseBase.cpp:354
> > -
> > +    
> 
> Shouldn’t add this leading whitespace.

Fixed.

> > Source/WebKit2/UIProcess/InspectorServer/HTTPRequest.cpp:95
> > +        m_headerFields.add(name.toStringWithoutCopying(), value);
> 
> I think this needs to use toString(). It’s not safe to use
> toStringWithoutCopying() here.

As per Andy's earlier comment:
> HTTPHeaderMap will ultimately copy the string into its HashMap.
… I believe this is in fact safe.

Thanks for the review, Darin!

New patch coming up.
Comment 33 John Wilander 2016-04-13 17:57:21 PDT
Created attachment 276368 [details]
Patch
Comment 34 Darin Adler 2016-04-13 19:42:09 PDT
Comment on attachment 276368 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276368&action=review

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:400
> +static inline bool isDigit(const UChar character)
> +{
> +    return (character >= '0' || character <= '9');
> +}

Please don’t add this function. Please use isASCIIDigit instead.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:407
> +    static NeverDestroyed<String> httpVersionStaticPreamble(ASCIILiteral("HTTP/"));
> +    const StringView httpVersionStaticPreambleView(httpVersionStaticPreamble);

I don’t agree that this is what we want to do. What we really want to do is add a StringView::startsWith overload for const char* or ASCIILiteral that works for ASCII literals. But it’s OK to not do that in this patch.

Even if we do want to use a StringView, we don’t need to create a String; slow and allocates memory. We can just do this:

    const char* httpVersionStaticPreambleLiteral = "HTTP/";
    StringView httpVersionStaticPreamble(reinterpret_cast<const LChar*>(httpVersionStaticPreambleLiteral), strlen(httpVersionStaticPreambleLiteral));

Ugly but much more efficient than allocating a StringImpl on the heap and reference counting it.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:491
> -        m_failureReason = "No response code found: " + trimInputSample(header, lineLength - 2);
> +        m_failureReason = makeString("No response code found: ", trimInputSample(header, lineLength - 2));

This change is OK, but not needed. Generates exactly the same code.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:538
> +            m_failureReason = makeString("Unknown header name ", name.toStringWithoutCopying(), " in HTTP response");

Should just say name here, not name.toStringWithoutCopying(). It's unnecessarily inefficient to create a String when makeString can just handle the StringView.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:548
> +            m_failureReason = makeString(name.toStringWithoutCopying(), " header value should only contain ASCII characters");

Ditto.

> Source/WebCore/platform/network/HTTPParsers.cpp:669
> +    if ((String::fromUTF8(namePtr, nameSize)).isNull()) {

One function that checks UTF-8 validity without allocating any memory is calculateStringHashAndLengthFromUTF8MaskingTop8Bits from the UTF8.h header. It will return 0 for invalid sequences. We could also add a new function that only does the validity check. But I think the entire concept here is flawed, because we need to convert the UTF-8 to UTF-16 if we want to create a String with the characters in it. An 8-bit String or StringView is Latin-1, not UTF-8.

> Source/WebCore/platform/network/HTTPParsers.cpp:673
> +    nameStr = StringView(reinterpret_cast<const LChar*>(namePtr), nameSize);

This will incorrectly interpret UTF-8 sequences as if they are ISO Latin-1 sequences and give us the wrong result. Do we have test cases that cover non-ASCII UTF-8 header names? If so, we should see the problem.

> Source/WebCore/platform/network/HTTPParsers.cpp:683
> -                failureReason = "Unexpected LF in value at " + trimInputSample(value.data(), value.size());
> +                failureReason = makeString("Unexpected LF in value at ", trimInputSample(value.data(), value.size()));

This change is OK, but not needed. Generates exactly the same code.

> Source/WebCore/platform/network/HTTPParsers.cpp:696
> -        failureReason = "CR doesn't follow LF after value at " + trimInputSample(p, end - p);
> +        failureReason = makeString("CR doesn't follow LF after value at ", trimInputSample(p, end - p));

This change is OK, but not needed. Generates exactly the same code.

> Source/WebKit2/UIProcess/InspectorServer/HTTPRequest.cpp:95
> +        m_headerFields.add(name.toStringWithoutCopying(), value);

Sorry, Andy was incorrect. This is not safe. When the String is copied into the UncommonHeadersHashMap, it will simply bump the reference count on the StringImpl. The StringImpl points to memory that will be deallocated later.
Comment 35 John Wilander 2016-04-14 15:06:17 PDT
(In reply to comment #34)
> Comment on attachment 276368 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=276368&action=review
> 
> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:400
> > +static inline bool isDigit(const UChar character)
> > +{
> > +    return (character >= '0' || character <= '9');
> > +}
> 
> Please don’t add this function. Please use isASCIIDigit instead.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:407
> > +    static NeverDestroyed<String> httpVersionStaticPreamble(ASCIILiteral("HTTP/"));
> > +    const StringView httpVersionStaticPreambleView(httpVersionStaticPreamble);
> 
> I don’t agree that this is what we want to do. What we really want to do is
> add a StringView::startsWith overload for const char* or ASCIILiteral that
> works for ASCII literals. But it’s OK to not do that in this patch.
> 
> Even if we do want to use a StringView, we don’t need to create a String;
> slow and allocates memory. We can just do this:
> 
>     const char* httpVersionStaticPreambleLiteral = "HTTP/";
>     StringView httpVersionStaticPreamble(reinterpret_cast<const
> LChar*>(httpVersionStaticPreambleLiteral),
> strlen(httpVersionStaticPreambleLiteral));
> 
> Ugly but much more efficient than allocating a StringImpl on the heap and
> reference counting it.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:491
> > -        m_failureReason = "No response code found: " + trimInputSample(header, lineLength - 2);
> > +        m_failureReason = makeString("No response code found: ", trimInputSample(header, lineLength - 2));
> 
> This change is OK, but not needed. Generates exactly the same code.
> 
> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:538
> > +            m_failureReason = makeString("Unknown header name ", name.toStringWithoutCopying(), " in HTTP response");
> 
> Should just say name here, not name.toStringWithoutCopying(). It's
> unnecessarily inefficient to create a String when makeString can just handle
> the StringView.

Fixed.

> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:548
> > +            m_failureReason = makeString(name.toStringWithoutCopying(), " header value should only contain ASCII characters");
> 
> Ditto.

Fixed.

> > Source/WebCore/platform/network/HTTPParsers.cpp:669
> > +    if ((String::fromUTF8(namePtr, nameSize)).isNull()) {
> 
> One function that checks UTF-8 validity without allocating any memory is
> calculateStringHashAndLengthFromUTF8MaskingTop8Bits from the UTF8.h header.
> It will return 0 for invalid sequences. We could also add a new function
> that only does the validity check. But I think the entire concept here is
> flawed, because we need to convert the UTF-8 to UTF-16 if we want to create
> a String with the characters in it. An 8-bit String or StringView is
> Latin-1, not UTF-8.
> 
> > Source/WebCore/platform/network/HTTPParsers.cpp:673
> > +    nameStr = StringView(reinterpret_cast<const LChar*>(namePtr), nameSize);
> 
> This will incorrectly interpret UTF-8 sequences as if they are ISO Latin-1
> sequences and give us the wrong result. Do we have test cases that cover
> non-ASCII UTF-8 header names? If so, we should see the problem.

Non-ASCII characters are not allowed in header names as far as I know. This part of the code is now changed to restrict HTTP header names according to RFC 7230. I'll ask Alexey to have a look to make sure the previous check for invalid UTF8 sequences didn't cover some specific case that now will break.

> > Source/WebCore/platform/network/HTTPParsers.cpp:683
> > -                failureReason = "Unexpected LF in value at " + trimInputSample(value.data(), value.size());
> > +                failureReason = makeString("Unexpected LF in value at ", trimInputSample(value.data(), value.size()));
> 
> This change is OK, but not needed. Generates exactly the same code.
> 
> > Source/WebCore/platform/network/HTTPParsers.cpp:696
> > -        failureReason = "CR doesn't follow LF after value at " + trimInputSample(p, end - p);
> > +        failureReason = makeString("CR doesn't follow LF after value at ", trimInputSample(p, end - p));
> 
> This change is OK, but not needed. Generates exactly the same code.
> 
> > Source/WebKit2/UIProcess/InspectorServer/HTTPRequest.cpp:95
> > +        m_headerFields.add(name.toStringWithoutCopying(), value);
> 
> Sorry, Andy was incorrect. This is not safe. When the String is copied into
> the UncommonHeadersHashMap, it will simply bump the reference count on the
> StringImpl. The StringImpl points to memory that will be deallocated later.

Fixed.

Thanks for all the comments and recommendations!

New patch coming up.
Comment 36 John Wilander 2016-04-14 15:33:51 PDT
Created attachment 276442 [details]
Patch
Comment 37 Alexey Proskuryakov 2016-04-14 17:30:59 PDT
I don't know why the old code parsed header names as UTF-8.

I haven't seen non-ASCII header field names. Non-ASCII header field values do occur in practice and are important (at least for HTTP, maybe not for WebSockets), even though the spec requires ASCII.

Note that a lot of code uses isValidHTTPToken() for header name validation (see Fetch.cpp, XMLHttpRequest.cpp). This should probably be updated to use isValidHeaderNameCharacter().
Comment 38 Darin Adler 2016-04-15 08:47:10 PDT
Comment on attachment 276442 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=276442&action=review

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:308
> -        m_failureReason = "Unexpected response code: " + String::number(statusCode);
> +        m_failureReason = makeString("Unexpected response code: ", String::number(statusCode));

Longer term someone who wants to work on our string code needs to come up with an efficient idiom for this. I fixed this for StringBuilder a while back by adding an appendNumber function, and we should do something similar for makeString so the conversion to a number can be done without allocating and releasing a temporary StringImpl on the heap. Not critical, but something that’s on my mind when I see this.

Also, this change has no effect; doesn’t have any effect on the code generated. An advantage of the makeString style is that it’s possibly one step closer to the efficient implementation we have been thinking about.

> Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:398
> +static inline bool headerHasValidHTTPVersion(StringView httpStatusLine)

Worth considering the fact that this function now allows any number of leading zeroes. So it will accept HTTP/01.01. I am not certain that this actually matches the rules in the RFC. I won’t review- the patch because of it, but I am slightly concerned about it and would like to see tests covering this edge case.

> Source/WebCore/platform/network/HTTPParsers.cpp:623
> +static inline bool isValidHeaderNameCharacter(const char* character)

I think this function should take a char, not a const char*.
Comment 39 Darin Adler 2016-04-15 08:50:32 PDT
(In reply to comment #37)
> I haven't seen non-ASCII header field names. Non-ASCII header field values
> do occur in practice and are important (at least for HTTP, maybe not for
> WebSockets), even though the spec requires ASCII.

Given that the specification calls for ASCII, I think it’s a bit strange that we only accept non-ASCII header field values if they are valid UTF-8 and that we convert them to and from UTF-16. A more traditional way to deal with this would be to accept and preserve bytes in the 0x80-0xFF range without trying to interpret them at all. Also, it’s in everyone’s interest to get the specification to match what’s done in practice, too, since it helps ensure interoperability. It seems like something we should pursue in the standards world.

> Note that a lot of code uses isValidHTTPToken() for header name validation
> (see Fetch.cpp, XMLHttpRequest.cpp). This should probably be updated to use
> isValidHeaderNameCharacter().

Excellent point! And we should add test cases.
Comment 40 WebKit Commit Bot 2016-04-15 09:38:50 PDT
Comment on attachment 276442 [details]
Patch

Clearing flags on attachment: 276442

Committed r199590: <http://trac.webkit.org/changeset/199590>
Comment 41 WebKit Commit Bot 2016-04-15 09:38:57 PDT
All reviewed patches have been landed.  Closing bug.
Comment 42 John Wilander 2016-04-15 10:19:42 PDT
(In reply to comment #38)
> Comment on attachment 276442 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=276442&action=review
> 
> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:308
> > -        m_failureReason = "Unexpected response code: " + String::number(statusCode);
> > +        m_failureReason = makeString("Unexpected response code: ", String::number(statusCode));
> 
> Longer term someone who wants to work on our string code needs to come up
> with an efficient idiom for this. I fixed this for StringBuilder a while
> back by adding an appendNumber function, and we should do something similar
> for makeString so the conversion to a number can be done without allocating
> and releasing a temporary StringImpl on the heap. Not critical, but
> something that’s on my mind when I see this.
> 
> Also, this change has no effect; doesn’t have any effect on the code
> generated. An advantage of the makeString style is that it’s possibly one
> step closer to the efficient implementation we have been thinking about.
> 
> > Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:398
> > +static inline bool headerHasValidHTTPVersion(StringView httpStatusLine)
> 
> Worth considering the fact that this function now allows any number of
> leading zeroes. So it will accept HTTP/01.01.

Another one of Andy's earlier comments:

> Also, I think this logic is technically wrong. https://tools.ietf.org/html/rfc2616#section-3.1 says the following:
> 
>    Note that the major and minor numbers MUST be treated as separate
>    integers and that each MAY be incremented higher than a single digit.
>    Thus, HTTP/2.4 is a lower version than HTTP/2.13, which in turn is
>    lower than HTTP/12.3. Leading zeros MUST be ignored by recipients and
>    MUST NOT be sent.

Agents should not fail on leading zeroes and I believe we now "ignore" them. Another way would be to treat them as whitespace.

> I am not certain that this
> actually matches the rules in the RFC. I won’t review- the patch because of
> it, but I am slightly concerned about it and would like to see tests
> covering this edge case.

We do in fact have tests. I added them when I changed this code.

Expected to be accepted:
HTTP/1.11
HTTP/11.0
HTTP/001.01

Expected to be blocked:
HTTP/01.00
HTTP/000.99
HTTP/0.00
HTTP/-11.9
HTTP/0x1.0x00
HTTP/.1
HTTP/
HTTP/1.1

> > Source/WebCore/platform/network/HTTPParsers.cpp:623
> > +static inline bool isValidHeaderNameCharacter(const char* character)
> 
> I think this function should take a char, not a const char*.
Comment 43 Csaba Osztrogonác 2016-04-15 13:14:30 PDT
(In reply to comment #40)
> Comment on attachment 276442 [details]
> Patch
> 
> Clearing flags on attachment: 276442
> 
> Committed r199590: <http://trac.webkit.org/changeset/199590>

It broke the WinCairo build, cc-ing port maintainers.
Comment 44 Darin Adler 2016-04-15 14:45:57 PDT
Alex fixed it in <http://trac.webkit.org/changeset/199609>