XHRs currently don't do content sniffing, whereas Chrome and Firefox seem to. This is a problem when Content-Encoding isn't set yet the content is gzip. This occurs on webpages such as https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html A smaller repro is to take the test from #175579 and change the URL to https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz and try it out on each browser. Chrome and Firefox don't see any gzip header, but definitely unzip the content. A misleading thing is that both curl and nscurl get the Content-Endoding header from AWS!
FWIW I tried the "obvious" fix in XMLHttpRequest.cpp: options.sniffContent = SniffContent; That doesn't seem to do the trick, or I did it wrong. Further, we'd need to check that what other browsers do is actually what the XHR spec imports indirectly from the MIME Sniffing spec.
<rdar://problem/33822249>
I think that the first question to answer is why this resource is served with different response headers to Safari than to curl. You can try adding request headers from Safari to curl until it matches. Checking with telnet, a very minimal request gets the header too. Without actually looking at it, my guess is that CFNetwork decodes the response for us, and removes the header before WebKit gets to see it.
WebKit is receiving an HTTP response that is comparable with the response seen using curl. Setting a breakpoint in DocumentThreadableLoader::didReceiveResponse(), <https://trac.webkit.org/browser/trunk/Source/WebCore/loader/DocumentThreadableLoader.cpp?rev=220750#L299> (where we process the XHR response) I see: (lldb) po response.m_nsResponse.m_ptr <NSHTTPURLResponse: 0x6180000222c0> { URL: http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } { status code: 200, headers { "Accept-Ranges" = bytes; "Content-Encoding" = gzip; "Content-Length" = 90198148; "Content-Type" = "application/octet-stream"; Date = "Tue, 15 Aug 2017 22:44:19 GMT"; Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\""; "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT"; Server = AmazonS3; "x-amz-id-2" = "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs="; "x-amz-request-id" = 0A63140CDBC51938; } }
(In reply to Daniel Bates from comment #4) > WebKit is receiving an HTTP response that is comparable with the response > seen using curl. Setting a breakpoint in > DocumentThreadableLoader::didReceiveResponse(), > <https://trac.webkit.org/browser/trunk/Source/WebCore/loader/ > DocumentThreadableLoader.cpp?rev=220750#L299> (where we process the XHR > response) I see: > > (lldb) po response.m_nsResponse.m_ptr > <NSHTTPURLResponse: 0x6180000222c0> { URL: > http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } { > status code: 200, headers { > "Accept-Ranges" = bytes; > "Content-Encoding" = gzip; > "Content-Length" = 90198148; > "Content-Type" = "application/octet-stream"; > Date = "Tue, 15 Aug 2017 22:44:19 GMT"; > Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\""; > "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT"; > Server = AmazonS3; > "x-amz-id-2" = > "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs= > "; > "x-amz-request-id" = 0A63140CDBC51938; > } } This response depends on the networking layer to compensate. If the intention of the web developer was to serve pre-compressed data for the browser to auto decompress then the server is misconfigured by definition of the HTTP header Content-Type in RFC 7231 and Chrome and Firefox are compensating for this misconfiguration. The Content-Type header is defined as: The "Content-Type" header field indicates the media type of the associated representation...The indicated media type defines .... how that data is intended to be processed by a recipient ... after any content codings indicated by Content-Encoding are decoded. The key phrase to look for is "after any content codings indicated by Content-Encoding are decoded". CFNetwork treats a response with "Content-Type: application/octet-stream" and "Content-Encoding: gzip" as a response with "Content-Type: application/gzip" and "Content-Encoding: identity". To achieve auto decompressing of the pre-compressed data, the web server should have returned a response with a Content-Type that is handled by WebKit, say text/plain, such that response.m_nsResponse.m_ptr would look like: (lldb) po response.m_nsResponse.m_ptr <NSHTTPURLResponse: 0x6180000222c0> { URL: http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz } { status code: 200, headers { "Accept-Ranges" = bytes; "Content-Encoding" = gzip; "Content-Length" = 90198148; "Content-Type" = "text/plain"; Date = "Tue, 15 Aug 2017 22:44:19 GMT"; Etag = "\"abd8ecef2479f13cc29e248e294b5330-11\""; "Last-Modified" = "Thu, 08 Jun 2017 16:29:01 GMT"; Server = AmazonS3; "x-amz-id-2" = "DoBSu33Sovmsp3CEZKWhkhnQZXGWyq40Yb2h0UjCvUdtVIecMM0iFyCQIlUyA3pZUtdyCIlVzxs="; "x-amz-request-id" = 0A63140CDBC51938; } }
(In reply to JF Bastien from comment #1) > FWIW I tried the "obvious" fix in XMLHttpRequest.cpp: > options.sniffContent = SniffContent; > That doesn't seem to do the trick, or I did it wrong. : ( I would have expected that enabling sniffing would have made CFNetwork override the Content-Type in the response with text/plain and hence auto-decompress it.
(In reply to Daniel Bates from comment #5) > (In reply to Daniel Bates from comment #4) [... snip ...] So you're saying we're doing the right thing, and the server is just misconfigured?
(In reply to Daniel Bates from comment #6) > (In reply to JF Bastien from comment #1) > > FWIW I tried the "obvious" fix in XMLHttpRequest.cpp: > > options.sniffContent = SniffContent; > > That doesn't seem to do the trick, or I did it wrong. > > : ( I would have expected that enabling sniffing would have made CFNetwork > override the Content-Type in the response with text/plain and hence > auto-decompress it. Disregard this remark.
(In reply to JF Bastien from comment #7) > (In reply to Daniel Bates from comment #5) > > (In reply to Daniel Bates from comment #4) > [... snip ...] > > So you're saying we're doing the right thing, and the server is just > misconfigured? Yes, we are doing the "right thing". We should check how Microsoft Edge and IE handle a response with "Content-Encoding: gzip" and "Content-Type: application/octet-stream". Do they treat it as "Content-Encoding: gzip" and "Content-Type: text/plain"? If WebKit is the only browser engine that does not auto decompress <http://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.data.gz> then we may want to consider adding a quirk either in WebKit or CFNetwork.
After reading over the definition of Content-Type and Content-Encoding in RFC 7231 again, I no longer think this bug represents a server misconfiguration. We should look to understand why CF Network explicitly chose to treat "Content-Type: application/octet-stream" and "Content-Encoding: gzip" as "Content-Type: application/gzip" and "Content-Encoding: identity", respectively.
(In reply to JF Bastien from comment #2) > <rdar://problem/33822249> This tracks the CFNetwork support infrastructure needed to fix this bug.
The following radar tracks the WebKit work associated with this bug: <rdar://problem/34912624>
Created attachment 325232 [details] Patch and layout tests
Created attachment 325233 [details] Patch and layout tests
Created attachment 325234 [details] Patch
Comment on attachment 325234 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325234&action=review A small suggestion and bugs look sad: /home/ews/ltilve-gtk-wk2-ews/WebKit/Source/WebKit/NetworkProcess/NetworkDataTask.cpp:59:206: error: no matching function for call to 'WebKit::NetworkDataTaskSoup::create(WebKit::NetworkSession&, WebKit::NetworkDataTaskClient&, const WebCore::ResourceRequest&, const WebCore::StoredCredentialsPolicy&, const WebCore::ContentEncodingSniffingPolicy&, const bool&)' Otherwise r=me > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211 > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 This doesn't also affect iOS? > Source/WebCore/loader/ResourceLoader.cpp:219 > + m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff); I tend to like enum classes rather than booleans for this type of thing: passing in multiple booleans is error prone and harder to read, whereas having an enum class as parameter can't be messed up. So I'd just pass the ContentEncodingSnifingPolicy strait into create(). Is that not practical? > Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480 > + bool shouldContentEncodingSniff = true; Even here, it's way more fool-proof to have an enum class.
Created attachment 325243 [details] Patch
Attachment 325243 [details] did not pass style-queue: ERROR: Source/WebCore/platform/network/BlobResourceHandle.cpp:137: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 35 files If any of these errors are false positives, please file a bug against check-webkit-style.
(In reply to JF Bastien from comment #16) > > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211 > > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 > > This doesn't also affect iOS? > No, according to <rdar://problem/33822249> this SPI does not affect iOS because this bug represents a macOS quirk. iOS does not sniff content encoding based on the URL. > > Source/WebCore/loader/ResourceLoader.cpp:219 > > + m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff); > > I tend to like enum classes rather than booleans for this type of thing: > passing in multiple booleans is error prone and harder to read, whereas > having an enum class as parameter can't be messed up. So I'd just pass the > ContentEncodingSnifingPolicy strait into create(). Is that not practical? > We should clean this up as passing consecutive booleans is error prone. I hope you do not mind that defer this to a follow up bug so as to keep this patch focused on the bug fix (*) due to its time sensitive nature. Filed bug #178979. (*) I thought to limit style changes to either lines I needed to touch or lines immediately above or below lines I needed to touch. Although I could make the change you mentioned now we may also want to consider whether passing a struct of options (similar to NetworkLoadParameters) to ResourceHandle::create()/ResourceHandle::ResourceHandle() could simplify this code and callers. > > Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480 > > + bool shouldContentEncodingSniff = true; > > Even here, it's way more fool-proof to have an enum class. See above remark.
Created attachment 325246 [details] Patch
Attachment 325246 [details] did not pass style-queue: ERROR: Source/WebCore/platform/network/BlobResourceHandle.cpp:137: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 35 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 325247 [details] Patch
Attachment 325247 [details] did not pass style-queue: ERROR: Source/WebCore/platform/network/BlobResourceHandle.cpp:137: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 35 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 325249 [details] Patch
Attachment 325249 [details] did not pass style-queue: ERROR: Source/WebCore/platform/network/BlobResourceHandle.cpp:137: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 35 files If any of these errors are false positives, please file a bug against check-webkit-style.
(In reply to Daniel Bates from comment #19) > (In reply to JF Bastien from comment #16) > > > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211 > > > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 > > > > This doesn't also affect iOS? > > > > No, according to <rdar://problem/33822249> this SPI does not affect iOS > because this bug represents a macOS quirk. iOS does not sniff content > encoding based on the URL. Ok good! > > > Source/WebCore/loader/ResourceLoader.cpp:219 > > > + m_handle = ResourceHandle::create(frameLoader()->networkingContext(), m_request, this, m_defersLoading, m_options.sniffContent == SniffContent, m_options.sniffContentEncoding == ContentEncodingSniffingPolicy::Sniff); > > > > I tend to like enum classes rather than booleans for this type of thing: > > passing in multiple booleans is error prone and harder to read, whereas > > having an enum class as parameter can't be messed up. So I'd just pass the > > ContentEncodingSnifingPolicy strait into create(). Is that not practical? > > > > We should clean this up as passing consecutive booleans is error prone. I > hope you do not mind that defer this to a follow up bug so as to keep this > patch focused on the bug fix (*) due to its time sensitive nature. Filed bug > #178979. No problem at all. > (*) I thought to limit style changes to either lines I needed to touch or > lines immediately above or below lines I needed to touch. Although I could > make the change you mentioned now we may also want to consider whether > passing a struct of options (similar to NetworkLoadParameters) to > ResourceHandle::create()/ResourceHandle::ResourceHandle() could simplify > this code and callers. > > > > Source/WebCore/loader/appcache/ApplicationCacheGroup.cpp:480 > > > + bool shouldContentEncodingSniff = true; > > > > Even here, it's way more fool-proof to have an enum class. > > See above remark.
I think you meant for your Changelogs reference radar 34912624, not 33822249
Comment on attachment 325249 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325249&action=review > Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm:108 > + auto mutableRequest = adoptNS([nsRequest mutableCopy]); Before, the common case would make 0 copies of the NSURLRequest. Now the common case makes 1. Let's not increase that.
(In reply to Brady Eidson from comment #27) > I think you meant for your Changelogs reference radar 34912624, not 33822249 Will fix.
(In reply to Alex Christensen from comment #28) > Comment on attachment 325249 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=325249&action=review > > > Source/WebKit/NetworkProcess/cocoa/NetworkDataTaskCocoa.mm:108 > > + auto mutableRequest = adoptNS([nsRequest mutableCopy]); > > Before, the common case would make 0 copies of the NSURLRequest. Now the > common case makes 1. Let's not increase that. Will fix.
Created attachment 325365 [details] Patch Updated patch based on feedback from Brady Eidson and Alex Christensen.
Attachment 325365 [details] did not pass style-queue: ERROR: Source/WebCore/platform/network/BlobResourceHandle.cpp:137: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 35 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 325365 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review > Source/WebCore/ChangeLog:13 > + Fixes an issue where the body of an HTTP response associated with an XHR request to a .gz file > + would be automatically gzipped decompressed if the HTTP response omitted a Content-Encoding HTTP > + header. Specifically, such a response would be treated analogous to a response with headers > + "Content-Type: application/gzip" and "Content-Encoding: identity". This behavior does not conform > + the HTTP 1.1 spec. and breaks Epic Zen Garden, <https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html>. Will change this paragraph and the same paragraph in Source/WebKit/ChangeLog to read: Fixes an issue where the body of an HTTP response with headers "Content-Type: application/octet-stream" and "Content-Encoding: gzip" associated with an XHR request to a .gz file would not be automatically gzipped decompressed. Specifically, such a response would be treated analogous to a response with headers "Content-Type: application/gzip" and "Content-Encoding: identity". This behavior does not conform to the behavior of the Content-Encoding header as defined in the HTTP 1.1 and later specs. Moreover this behavior breaks the Epic Zen Garden demo: <https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html>.
Comment on attachment 325365 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review > Source/WebCore/platform/network/PingHandle.h:55 > + bool defersLoading = false; It would be nice if these were two-state enums. enum class DefersLoading { No, Yes }; > LayoutTests/platform/mac/TestExpectations:35 > +http/tests/xmlhttprequest/gzip-content-type-no-content-encoding.html [ Skip ] I think this could be marked as pass/fail instead of skipping.
(In reply to Alex Christensen from comment #34) > Comment on attachment 325365 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=325365&action=review > > > Source/WebCore/platform/network/PingHandle.h:55 > > + bool defersLoading = false; > > It would be nice if these were two-state enums. > enum class DefersLoading { No, Yes }; > I plan to clean this up in bug #178979 as mentioned in comment #18. > > LayoutTests/platform/mac/TestExpectations:35 > > +http/tests/xmlhttprequest/gzip-content-type-no-content-encoding.html [ Skip ] > > I think this could be marked as pass/fail instead of skipping. OK.
Committed r224299: <https://trac.webkit.org/changeset/224299>
Fixed change log description in <https://trac.webkit.org/changeset/224315>.
Comment on attachment 325365 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325365&action=review > Source/WebCore/PAL/pal/spi/cf/CFNetworkSPI.h:211 > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 This check is not correct. The deployment target for 10.13 stays at 101300, so this check is equivalent to "__MAC_OS_X_VERSION_MIN_REQUIRED >= 101400". > Source/WebCore/platform/network/cf/ResourceHandleCFNet.cpp:165 > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101302 Ditto. > Source/WebCore/platform/network/mac/ResourceHandleMac.mm:129 > +#if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED < 101302 Ditto.
> No, according to <rdar://problem/33822249> this SPI does not affect iOS because this bug represents a macOS quirk. iOS does not sniff content encoding based on the URL. If so, why was the test not enabled on iOS? Enabled the test for Mojave+ and for iOS in r237425 and r237426. Let's see if iOS actually passes.
I am the original author of https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html , and have now been made aware of this bug in Safari on Zen Garden the first time. This bug reads fixed, but essentially the same bug is still occurring in the wild in bug https://bugs.webkit.org/show_bug.cgi?id=247421 . Did the patch in this bug ever land?
Yes, the patch did land.