Bug 175597 - XMLHttpRequest should not sniff content encoding
Summary: XMLHttpRequest should not sniff content encoding
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Daniel Bates
URL:
Keywords: InRadar
Depends on: 175579
Blocks: 179155
  Show dependency treegraph
 
Reported: 2017-08-15 13:50 PDT by JF Bastien
Modified: 2018-10-25 13:05 PDT (History)
14 users (show)

See Also:


Attachments
Patch and layout tests (54.74 KB, patch)
2017-10-27 19:25 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch and layout tests (57.07 KB, patch)
2017-10-27 20:03 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch (57.70 KB, patch)
2017-10-27 20:04 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch (61.63 KB, patch)
2017-10-27 22:15 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch (61.70 KB, patch)
2017-10-27 22:38 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch (61.61 KB, patch)
2017-10-27 22:42 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch (61.64 KB, patch)
2017-10-27 22:54 PDT, Daniel Bates
no flags Details | Formatted Diff | Diff
Patch (63.81 KB, patch)
2017-10-30 12:04 PDT, Daniel Bates
achristensen: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description JF Bastien 2017-08-15 13:50:37 PDT
XHRs currently don't do content sniffing, whereas Chrome and Firefox seem to. This is a problem when Content-Encoding isn't set yet the content is gzip. This occurs on webpages such as https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html
A smaller repro is to take the test from #175579 and change the URL to https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz and try it out on each browser. Chrome and Firefox don't see any gzip header, but definitely unzip the content. A misleading thing is that both curl and nscurl get the Content-Endoding header from AWS!
Comment 1 JF Bastien 2017-08-15 14:02:05 PDT
FWIW I tried the "obvious" fix in XMLHttpRequest.cpp:
    options.sniffContent = SniffContent;
That doesn't seem to do the trick, or I did it wrong.

Further, we'd need to check that what other browsers do is actually what the XHR spec imports indirectly from the MIME Sniffing spec.
Comment 2 JF Bastien 2017-08-15 14:05:18 PDT
<rdar://problem/33822249>
Comment 3 Alexey Proskuryakov 2017-08-15 15:34:57 PDT
I think that the first question to answer is why this resource is served with different response headers to Safari than to curl. You can try adding request headers from Safari to curl until it matches.

Checking with telnet, a very minimal request gets the header too.

Without actually looking at it, my guess is that CFNetwork decodes the response for us, and removes the header before WebKit gets to see it.
Comment 4 Daniel Bates 2017-08-15 15:48:49 PDT
WebKit is receiving an HTTP response that is comparable with the response seen using curl. Setting a breakpoint in DocumentThreadableLoader::didReceiveResponse(), <https://trac.webkit.org/browser/trunk/Source/WebCore/loader/DocumentThreadableLoader.cpp?rev=220750#L299> (where we process the XHR response) I see:

(lldb) po response.m_nsResponse.m_ptr
<NSHTTPURLResponse: 0x6180000222c0> { URL: http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } { status code: 200, headers {
    "Accept-Ranges" = bytes;
    "Content-Encoding" = gzip;
    "Content-Length" = 90198148;
    "Content-Type" = "application/octet-stream";
    Date = "Tue, 15 Aug 2017 22:44:19 GMT";
    Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\"";
    "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT";
    Server = AmazonS3;
    "x-amz-id-2" = "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs=";
    "x-amz-request-id" = 0A63140CDBC51938;
} }
Comment 5 Daniel Bates 2017-08-17 14:26:40 PDT
(In reply to Daniel Bates from comment #4)
> WebKit is receiving an HTTP response that is comparable with the response
> seen using curl. Setting a breakpoint in
> DocumentThreadableLoader::didReceiveResponse(),
> <https://trac.webkit.org/browser/trunk/Source/WebCore/loader/
> DocumentThreadableLoader.cpp?rev=220750#L299> (where we process the XHR
> response) I see:
> 
> (lldb) po response.m_nsResponse.m_ptr
> <NSHTTPURLResponse: 0x6180000222c0> { URL:
> http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } {
> status code: 200, headers {
>     "Accept-Ranges" = bytes;
>     "Content-Encoding" = gzip;
>     "Content-Length" = 90198148;
>     "Content-Type" = "application/octet-stream";
>     Date = "Tue, 15 Aug 2017 22:44:19 GMT";
>     Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\"";
>     "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT";
>     Server = AmazonS3;
>     "x-amz-id-2" =
> "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs=
> ";
>     "x-amz-request-id" = 0A63140CDBC51938;
> } }

This response depends on the networking layer to compensate. If the intention of the web developer was to serve pre-compressed data for the browser to auto decompress then the server is misconfigured by definition of the HTTP header Content-Type in RFC 7231 and Chrome and Firefox are compensating for this misconfiguration. The Content-Type header is defined as:

The "Content-Type" header field indicates the media type of the associated representation...The indicated media type defines .... how that data is intended to be processed by a recipient ... after any content codings indicated by Content-Encoding are decoded.

The key phrase to look for is "after any content codings indicated by Content-Encoding are decoded". CFNetwork treats a response with "Content-Type: application/octet-stream" and "Content-Encoding: gzip" as a response with "Content-Type: application/gzip" and "Content-Encoding: identity".

To achieve auto decompressing of the pre-compressed data, the web server should have returned a response with a Content-Type that is handled by WebKit, say text/plain, such that response.m_nsResponse.m_ptr would look like:

(lldb) po response.m_nsResponse.m_ptr
<NSHTTPURLResponse: 0x6180000222c0> { URL: http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } { status code: 200, headers {
    "Accept-Ranges" = bytes;
    "Content-Encoding" = gzip;
    "Content-Length" = 90198148;
    "Content-Type" = "text/plain";
    Date = "Tue, 15 Aug 2017 22:44:19 GMT";
    Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\"";
    "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT";
    Server = AmazonS3;
    "x-amz-id-2" = "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs=";
    "x-amz-request-id" = 0A63140CDBC51938;
} }
Comment 6 Daniel Bates 2017-08-17 14:32:02 PDT
(In reply to JF Bastien from comment #1)
> FWIW I tried the "obvious" fix in XMLHttpRequest.cpp:
>     options.sniffContent = SniffContent;
> That doesn't seem to do the trick, or I did it wrong.

: ( I would have expected that enabling sniffing would have made CFNetwork override the Content-Type in the response with text/plain and hence auto-decompress it.
Comment 7 JF Bastien 2017-08-17 14:34:23 PDT
(In reply to Daniel Bates from comment #5)
> (In reply to Daniel Bates from comment #4)
[... snip ...]

So you're saying we're doing the right thing, and the server is just misconfigured?
Comment 8 Daniel Bates 2017-08-17 14:40:34 PDT
(In reply to Daniel Bates from comment #6)
> (In reply to JF Bastien from comment #1)
> > FWIW I tried the "obvious" fix in XMLHttpRequest.cpp:
> >     options.sniffContent = SniffContent;
> > That doesn't seem to do the trick, or I did it wrong.
> 
> : ( I would have expected that enabling sniffing would have made CFNetwork
> override the Content-Type in the response with text/plain and hence
> auto-decompress it.

Disregard this remark.
Comment 9 Daniel Bates 2017-08-17 14:44:33 PDT
(In reply to JF Bastien from comment #7)
> (In reply to Daniel Bates from comment #5)
> > (In reply to Daniel Bates from comment #4)
> [... snip ...]
> 
> So you're saying we're doing the right thing, and the server is just
> misconfigured?

Yes, we are doing the "right thing". We should check how Microsoft Edge and IE handle a response with "Content-Encoding: gzip" and "Content-Type: application/octet-stream". Do they treat it as "Content-Encoding: gzip" and "Content-Type: text/plain"? If WebKit is the only browser engine that does not auto decompress <http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz> then we may want to consider adding a quirk either in WebKit or CFNetwork.
Comment 10 Daniel Bates 2017-08-17 15:47:17 PDT
After reading over the definition of Content-Type and Content-Encoding in RFC 7231 again, I no longer think this bug represents a server misconfiguration. We should look to understand why CF Network explicitly chose to treat "Content-Type: application/octet-stream" and "Content-Encoding: gzip" as "Content-Type: application/gzip" and "Content-Encoding: identity", respectively.
Comment 11 Daniel Bates 2017-10-27 18:25:02 PDT
(In reply to JF Bastien from comment #2)
> <rdar://problem/33822249>

This tracks the CFNetwork support infrastructure needed to fix this bug.
Comment 12 Daniel Bates 2017-10-27 18:25:35 PDT
The following radar tracks the WebKit work associated with this bug:
<rdar://problem/34912624>
Comment 13 Daniel Bates 2017-10-27 19:25:13 PDT
Created attachment 325232 [details]
Patch and layout tests
Comment 14 Daniel Bates 2017-10-27 20:03:47 PDT
Created attachment 325233 [details]
Patch and layout tests
Comment 15 Daniel Bates 2017-10-27 20:04:44 PDT
Created attachment 325234 [details]
Patch
Comment 16 JF Bastien 2017-10-27 21:23:17 PDT
Comment on attachment 325234 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325234&action=review

A small suggestion and bugs look sad:

/home/ews/ltilve-gtk-wk2-ews/WebKit/Source/WebKit/NetworkProcess/NetworkDataTask.cpp:59:206: error: no matching function for call to 'WebKit::NetworkDataTaskSoup::create(WebKit::NetworkSession&, WebKit::NetworkDataTaskClient&, const WebCore::ResourceRequest&, const WebCore::StoredCredentialsPolicy&, const WebCore::ContentEncodingSniffingPolicy&, const bool&)'

Otherwise r=me

> Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211
> +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302

This doesn't also affect iOS?

> Source/WebCore/loader/ResourceLoader.cpp:219
> +    m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff);

I tend to like enum classes rather than booleans for this type of thing: passing in multiple booleans is error prone and harder to read, whereas having an enum class as parameter can't be messed up. So I'd just pass the ContentEncodingSnifingPolicy strait into create(). Is that not practical?

> Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480
> +    bool shouldContentEncodingSniff = true;

Even here, it's way more fool-proof to have an enum class.
Comment 17 Daniel Bates 2017-10-27 22:15:19 PDT
Created attachment 325243 [details]
Patch
Comment 18 Build Bot 2017-10-27 22:17:31 PDT Comment hidden (obsolete)
Comment 19 Daniel Bates 2017-10-27 22:36:01 PDT
(In reply to JF Bastien from comment #16)
> > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211
> > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302
> 
> This doesn't also affect iOS?
> 

No, according to <rdar://problem/33822249> this SPI does not affect iOS because this bug represents a macOS quirk. iOS does not sniff content encoding based on the URL.

> > Source/WebCore/loader/ResourceLoader.cpp:219
> > +    m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff);
> 
> I tend to like enum classes rather than booleans for this type of thing:
> passing in multiple booleans is error prone and harder to read, whereas
> having an enum class as parameter can't be messed up. So I'd just pass the
> ContentEncodingSnifingPolicy strait into create(). Is that not practical?
> 

We should clean this up as passing consecutive booleans is error prone. I hope you do not mind that defer this to a follow up bug so as to keep this patch focused on the bug fix (*) due to its time sensitive nature. Filed bug #178979.

(*) I thought to limit style changes to either lines I needed to touch or lines immediately above or below lines I needed to touch. Although I could make the change you mentioned now we may also want to consider whether passing a struct of options (similar to NetworkLoadParameters) to ResourceHandle::create()/ResourceHandle::ResourceHandle() could simplify this code and callers. 

> > Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480
> > +    bool shouldContentEncodingSniff = true;
> 
> Even here, it's way more fool-proof to have an enum class.

See above remark.
Comment 20 Daniel Bates 2017-10-27 22:38:34 PDT
Created attachment 325246 [details]
Patch
Comment 21 Build Bot 2017-10-27 22:40:40 PDT Comment hidden (obsolete)
Comment 22 Daniel Bates 2017-10-27 22:42:45 PDT
Created attachment 325247 [details]
Patch
Comment 23 Build Bot 2017-10-27 22:45:37 PDT Comment hidden (obsolete)
Comment 24 Daniel Bates 2017-10-27 22:54:43 PDT
Created attachment 325249 [details]
Patch
Comment 25 Build Bot 2017-10-27 22:57:51 PDT Comment hidden (obsolete)
Comment 26 JF Bastien 2017-10-28 07:32:16 PDT
(In reply to Daniel Bates from comment #19)
> (In reply to JF Bastien from comment #16)
> > > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211
> > > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302
> > 
> > This doesn't also affect iOS?
> > 
> 
> No, according to <rdar://problem/33822249> this SPI does not affect iOS
> because this bug represents a macOS quirk. iOS does not sniff content
> encoding based on the URL.

Ok good!

> > > Source/WebCore/loader/ResourceLoader.cpp:219
> > > +    m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff);
> > 
> > I tend to like enum classes rather than booleans for this type of thing:
> > passing in multiple booleans is error prone and harder to read, whereas
> > having an enum class as parameter can't be messed up. So I'd just pass the
> > ContentEncodingSnifingPolicy strait into create(). Is that not practical?
> > 
> 
> We should clean this up as passing consecutive booleans is error prone. I
> hope you do not mind that defer this to a follow up bug so as to keep this
> patch focused on the bug fix (*) due to its time sensitive nature. Filed bug
> #178979.

No problem at all. 

> (*) I thought to limit style changes to either lines I needed to touch or
> lines immediately above or below lines I needed to touch. Although I could
> make the change you mentioned now we may also want to consider whether
> passing a struct of options (similar to NetworkLoadParameters) to
> ResourceHandle::create()/ResourceHandle::ResourceHandle() could simplify
> this code and callers. 
> 
> > > Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480
> > > +    bool shouldContentEncodingSniff = true;
> > 
> > Even here, it's way more fool-proof to have an enum class.
> 
> See above remark.
Comment 27 Brady Eidson 2017-10-30 11:03:15 PDT
I think you meant for your Changelogs reference radar 34912624, not 33822249
Comment 28 Alex Christensen 2017-10-30 11:12:23 PDT
Comment on attachment 325249 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325249&action=review

> Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm:108
> +    auto mutableRequest = adoptNS([nsRequest mutableCopy]);

Before, the common case would make 0 copies of the NSURLRequest.  Now the common case makes 1.  Let's not increase that.
Comment 29 Daniel Bates 2017-10-30 11:40:56 PDT
(In reply to Brady Eidson from comment #27)
> I think you meant for your Changelogs reference radar 34912624, not 33822249

Will fix.
Comment 30 Daniel Bates 2017-10-30 11:41:21 PDT
(In reply to Alex Christensen from comment #28)
> Comment on attachment 325249 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=325249&action=review
> 
> > Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm:108
> > +    auto mutableRequest = adoptNS([nsRequest mutableCopy]);
> 
> Before, the common case would make 0 copies of the NSURLRequest.  Now the
> common case makes 1.  Let's not increase that.

Will fix.
Comment 31 Daniel Bates 2017-10-30 12:04:23 PDT
Created attachment 325365 [details]
Patch

Updated patch based on feedback from Brady Eidson and Alex Christensen.
Comment 32 Build Bot 2017-10-30 12:06:49 PDT
Attachment 325365 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/network/BlobResourceHandle.cpp:137:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1 in 35 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 33 Daniel Bates 2017-10-30 14:17:46 PDT
Comment on attachment 325365 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review

> Source/WebCore/ChangeLog:13
> +        Fixes an issue where the body of an HTTP response associated with an XHR request to a .gz file
> +        would be automatically gzipped decompressed if the HTTP response omitted a Content-Encoding HTTP
> +        header. Specifically, such a response would be treated analogous to a response with headers
> +        "Content-Type: application/gzip" and "Content-Encoding: identity". This behavior does not conform
> +        the HTTP 1.1 spec. and breaks Epic Zen Garden, <https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html>.

Will change this paragraph and the same paragraph in Source/WebKit/ChangeLog to read:

Fixes an issue where the body of an HTTP response with headers "Content-Type: application/octet-stream" and "Content-Encoding: gzip" associated with an XHR request to a .gz file would not be automatically gzipped decompressed. Specifically, such a response would be treated analogous to a response with headers "Content-Type: application/gzip" and "Content-Encoding: identity". This behavior does not conform to the behavior of the Content-Encoding header as defined in the HTTP 1.1 and later specs. Moreover this behavior breaks the Epic Zen Garden demo: <https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html>.
Comment 34 Alex Christensen 2017-11-01 10:58:33 PDT
Comment on attachment 325365 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review

> Source/WebCore/platform/network/PingHandle.h:55
> +        bool defersLoading = false;

It would be nice if these were two-state enums.
enum class DefersLoading { No, Yes };

> LayoutTests/platform/mac/TestExpectations:35
> +http/tests/xmlhttprequest/gzip-content-type-no-content-encoding.html [ Skip ]

I think this could be marked as pass/fail instead of skipping.
Comment 35 Daniel Bates 2017-11-01 15:04:07 PDT
(In reply to Alex Christensen from comment #34)
> Comment on attachment 325365 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=325365&action=review
> 
> > Source/WebCore/platform/network/PingHandle.h:55
> > +        bool defersLoading = false;
> 
> It would be nice if these were two-state enums.
> enum class DefersLoading { No, Yes };
> 

I plan to clean this up in bug #178979 as mentioned in comment #18.

> > LayoutTests/platform/mac/TestExpectations:35
> > +http/tests/xmlhttprequest/gzip-content-type-no-content-encoding.html [ Skip ]
> 
> I think this could be marked as pass/fail instead of skipping.

OK.
Comment 36 Daniel Bates 2017-11-01 15:23:53 PDT
Committed r224299: <https://trac.webkit.org/changeset/224299>
Comment 37 Daniel Bates 2017-11-01 21:04:11 PDT
Fixed change log description in <https://trac.webkit.org/changeset/224315>.
Comment 38 Alexey Proskuryakov 2018-10-25 12:10:26 PDT
Comment on attachment 325365 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review

> Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211
> +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302

This check is not correct. The deployment target for 10.13 stays at 101300, so this check is equivalent to "__MAC_OS_X_VERSION_MIN_REQUIRED >= 101400".

> Source/WebCore/platform/network/cf/ResourceHandleCFNet.cpp:165
> +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302

Ditto.

> Source/WebCore/platform/network/mac/ResourceHandleMac.mm:129
> +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED < 101302

Ditto.
Comment 39 Alexey Proskuryakov 2018-10-25 13:05:58 PDT
> No, according to <rdar://problem/33822249> this SPI does not affect iOS because this bug represents a macOS quirk. iOS does not sniff content encoding based on the URL.

If so, why was the test not enabled on iOS?

Enabled the test for Mojave+ and for iOS in r237425 and r237426. Let's see if iOS actually passes.