WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
247421
Content downloaded with fetch() API when Content-Encoding: gzip is set is not decompressed
https://bugs.webkit.org/show_bug.cgi?id=247421
Summary
Content downloaded with fetch() API when Content-Encoding: gzip is set is not...
jujjyl
Reported
2022-11-03 06:33:56 PDT
Safari browser does not decompress gzip-compressed content downloaded with fetch() even though Content-Encoding: gzip is instructed on the content by the web server. On other browsers (Firefox, Chrome), this works properly. STR: 1. Download attached fetch_test.tgz and unzip it. 2. Navigate in console to the uncompressed directory, and run "python3 emrun.py --no_browser --port 8000 ." This step will launch an ad hoc web server to listen at address
http://localhost:8000/
. This is based on the regular ad hoc python3 web server, except that the server has been augmented to specify "Content-Encoding: gzip" on the served test.data.gz file. 3. Navigate to
http://localhost:8000/test.html
4. Open the page web console, and observe the logs printed there, and the final test result that appears on the page body. Observed: On Safari, the browser will print (among other things) [Warning] Downloaded an ArrayBuffer of size 25098 bytes (test.html, line 10) [Warning] Fetch Response headers: (test.html, line 11) [Log] content-encoding: gzip (test.html, line 13) [Log] content-length: 25098 (test.html, line 13) and the page body will report Test ERROR: Received gzip-compressed data, but browser should have decompressed it Expected: On Chrome and Firefox, the browser will print Downloaded an ArrayBuffer of size 93274 bytes Fetch Response headers: content-encoding: gzip content-length: 25098 and the page body will report Test passed OK
Attachments
Test case to reproduce the issue
(47.82 KB, application/x-gzip)
2022-11-03 06:34 PDT
,
jujjyl
no flags
Details
Test case to poke the behavior when using Brotli compression
(46.96 KB, application/x-gzip)
2022-11-04 01:37 PDT
,
jujjyl
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
jujjyl
Comment 1
2022-11-03 06:34:49 PDT
Created
attachment 463386
[details]
Test case to reproduce the issue
Radar WebKit Bug Importer
Comment 2
2022-11-03 15:57:48 PDT
<
rdar://problem/101935292
>
Ryan Reno
Comment 3
2022-11-03 19:33:50 PDT
Thanks for providing a test case that is very helpful! I can't reproduce this issue when using a basic HTTP server with python -m http.server. That is, the test case passes. Looking at the response headers the Web Inspector Network tab I see the main difference between the emscripten server and the stock server is the emscripten server returns a Content-type: application/octet-stream header for the gzipped file. I'll investigate further to see if I can figure out why we aren't decompressing that MIME type.
Ryan Reno
Comment 4
2022-11-03 19:36:48 PDT
I'll also note that the emscripten server provided does respond with COOP, COEP, and CORP headers while the basic HTTP server does not. Those headers may interact in such a way to cause us to not decompress the data.
jujjyl
Comment 5
2022-11-04 01:23:01 PDT
I notice if I change emrun.py to override the MIME type of served test.data.gz from application/octet-stream to either "application/gzip" or something else random, like "application/vnd.oasis.opendocument.spreadsheet", then the test passes. Commenting out the lines # self.send_header("Last-Modified", self.date_time_string(fs.st_mtime)) # self.send_header('Cache-Control', 'no-cache, must-revalidate') # self.send_header('Connection', 'close') # self.send_header('Expires', '-1') # self.send_header('Access-Control-Allow-Origin', '*') # self.send_header('Cross-Origin-Opener-Policy', 'same-origin') # self.send_header('Cross-Origin-Embedder-Policy', 'require-corp') # self.send_header('Cross-Origin-Resource-Policy', 'cross-origin') in emrun.py to remove potentially affecting headers, like COOP/COEP did not affect the scenario. Oddly, also if I rename the test filename from 'test.data.gz' to e.g. 'test.data.jgz' (and also edit test.html to download the renamed file instead), then the test also passes. But when looking at the web console in the passed test, it prints [Warning] Fetch Response headers: (test.html, line 10) [Log] access-control-allow-origin: * (test.html, line 12) [Log] cache-control: no-cache, must-revalidate (test.html, line 12) [Log] connection: close (test.html, line 12) [Log] content-encoding: gzip (test.html, line 12) [Log] content-length: 25098 (test.html, line 12) [Log] content-type: application/octet-stream (test.html, line 12) so here the file did still have content-type application/octet-stream in it.
jujjyl
Comment 6
2022-11-04 01:36:41 PDT
Of note is that while the test case uses an "exotic" emrun.py ad hoc web server, the issue is met in the wild by a lot of Unity game developers. The best practices documentation at Unity at
https://docs.unity3d.com/Manual/webgl-server-configuration-code-samples.html
currently recommends that Unity game asset data files, which have a filename structure file.data.gz should be served with headers Content-Type: application/octet-stream Content-Encoding: gzip so game developers who follow this best practices guidance (which does read reasonable) will be affected if they want to run their gzipped Unity game builds on Safari. Of interest is also how Safari behaves if content is compressed with Brotli instead of gzip. Attaching a variant of the same test that is set up to utilize Brotli compression instead.
jujjyl
Comment 7
2022-11-04 01:37:20 PDT
Created
attachment 463396
[details]
Test case to poke the behavior when using Brotli compression
jujjyl
Comment 8
2022-11-04 01:49:00 PDT
> I can't reproduce this issue when using a basic HTTP server with python -m http.server. That is, the test case passes.
Ops, sorry, I think the test cases "passes" with python -m http.server because I coded the check a bit silly in test.html. (I tried to observe Content-Encoding intent, but does not make sense for the scope of this bug report) If you change the line if (u8[0] == 0x1f && u8[1] == 0x8b && response.headers.get('Content-Encoding') == 'gzip') document.body.innerHTML = 'Test ERROR: Received gzip-compressed data, but browser should have decompressed it'; to if (u8.length != 93274) document.body.innerHTML = 'Test ERROR: Received compressed data, but browser should have decompressed it'; then the test should fail also with regular python -m http.server.
Ryan Reno
Comment 9
2022-11-10 23:01:23 PST
Thanks for the updated test case and detailed analysis. I can't reproduce this on iOS 16 so this seems to be macOS-specific, at least as of iOS 16. Will continue to investigate.
Ryan Reno
Comment 10
2022-11-10 23:13:09 PST
Possibly related to
https://bugs.webkit.org/show_bug.cgi?id=175597
jujjyl
Comment 11
2022-11-11 01:31:07 PST
Oh my god no :( that bug was reported in 2017.. five years ago. I am the original author of the project
https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html
from bug
https://bugs.webkit.org/show_bug.cgi?id=175597
. (which looks like has since been taken down after I left Mozilla) The experience from that project hosting is the very reason why all Unity content on the web today is being produced with .data.gz files with Content-Type: application/octet-stream and Content-Encoding: gzip. I was never aware of
bug 175597
, and thought all this time that scheme would have been the "best practices" pre-compressed gzip encoding scheme that would work everywhere. Specifically, the "success" of that project hosting was the reason that Unity rejected the notion of using ad hoc .jsgz, .datagz and .wasmgz suffixes for compressed files, but instead in early 2020 chose to use .js.gz, .data.gz and .wasm.gz (and a few others, like .symbols.gz) for the opportunity to enable using systematic rulesets on web servers to define precompressed assets via wildcards, e.g. set *.gz to get Content-Encoding: gzip, and set Content-Type: according to regex-matching the first extension before .gz part, where supported). So there are now potentially thousands of Unity web game build sites out there that follow this scheme. Although Unity does default to utilizing brotli compression, which does not seem to have this issue. Unity also offers a "Decompression Fallback" build option that enables a software JS based gzip decompressor that was all the time intended "for developers that do not have access to configuring web server hosting with the best practices Content-Type/Content-Encoding settings". It may be that the existence of this fallback option is what has caused nobody to ever before report to us that there is an issue with gzipped Unity content on Safari, but they just thought they would need to tick that checkbox for "things to work". We have been trying to vocally get developers to not use this option, since it has a major impact to Unity game startup times. But for some reason (likely this bug), the use of that fallback option has persisted. It does look like the issue occurs with both XHR and not limited to Fetch.
> I can't reproduce this on iOS 16 so this seems to be macOS-specific, at least as of iOS 16.
I concur, our QA reported back that they were not able to reproduce the issue on iOS either, but only on macOS Safari. Additionally the bug occurs on old macOS Catalina 10.15.7 (19H2) test system with Safari Version 14.0 (15610.1.28.1.9, 15610, released on Sep 16, 2020), so it does not seem like any fix from 175597 would have regressed (or if it did, it has regressed before Safari 14). All updated Unity content will now explicitly test whether the conditions of this issue are met, and guide developers to workarounds. Can't wait to see this issue being resolved in Safari, it will simplify hosting configuration headscratchers for a lot of developers!
Ryan Reno
Comment 12
2022-11-11 08:06:26 PST
Forcing the web server to return Content-Type: text/plain does decompress the data on macOS. So this looks very much like that XHR bug. The solution there was to adopt CFNetwork SPI to opt-in to auto-decompress data even if the HTTP headers aren't what the framework expects in order to trigger auto decompression. I'm not yet sure if that's because of a CFNetwork change or a change in WebKit.
> Additionally the bug occurs on old macOS Catalina 10.15.7 (19H2) test system > with Safari Version 14.0 (15610.1.28.1.9, 15610, released on Sep 16, 2020), > so it does not seem like any fix from 175597 would have regressed (or if it > did, it has regressed before Safari 14).
It seems like whatever the change may have been it did not happen very recently.
Ryan Reno
Comment 13
2022-11-11 08:48:40 PST
I think we're creating our NetworkLoadParameters with a flipped boolean for requesting to opt-in to the CFNetwork behavior we want. In other words, we're not asking CFNetwork to do decompression when requesting gzipped resources. I was able to fix the issue by setting the appropriate CFNetwork parameter in NetworkDataTaskCocoa::applySniffingPoliciesAndBindRequestToInferfaceIfNeeded A better solution is probably to set the NetworkLoadParameters field appropriately based on the request we're making.
Ryan Reno
Comment 14
2022-11-11 12:34:40 PST
Pull request:
https://github.com/WebKit/WebKit/pull/6408
Ryan Reno
Comment 15
2022-11-15 14:33:16 PST
Submitted web-platform-tests pull request:
https://github.com/web-platform-tests/wpt/pull/36978
EWS
Comment 16
2022-11-16 15:07:58 PST
Committed
256755@main
(ec8ff55e6568): <
https://commits.webkit.org/256755@main
> Reviewed commits have been landed. Closing PR #6408 and removing active labels.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug