Bug 247421 - Content downloaded with fetch() API when Content-Encoding: gzip is set is not decompressed
Summary: Content downloaded with fetch() API when Content-Encoding: gzip is set is not...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: Safari 16
Hardware: Mac (Apple Silicon) macOS 13
: P2 Major
Assignee: Ryan Reno
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-11-03 06:33 PDT by jujjyl
Modified: 2022-11-16 15:08 PST (History)
12 users (show)

See Also:


Attachments
Test case to reproduce the issue (47.82 KB, application/x-gzip)
2022-11-03 06:34 PDT, jujjyl
no flags Details
Test case to poke the behavior when using Brotli compression (46.96 KB, application/x-gzip)
2022-11-04 01:37 PDT, jujjyl
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description jujjyl 2022-11-03 06:33:56 PDT
Safari browser does not decompress gzip-compressed content downloaded with fetch() even though Content-Encoding: gzip is instructed on the content by the web server.

On other browsers (Firefox, Chrome), this works properly.

STR:

1. Download attached fetch_test.tgz and unzip it.
2. Navigate in console to the uncompressed directory, and run "python3 emrun.py --no_browser --port 8000 ."

This step will launch an ad hoc web server to listen at address http://localhost:8000/ . This is based on the regular ad hoc python3 web server, except that the server has been augmented to specify "Content-Encoding: gzip" on the served test.data.gz file.

3. Navigate to http://localhost:8000/test.html

4. Open the page web console, and observe the logs printed there, and the final test result that appears on the page body.

Observed:

On Safari, the browser will print (among other things)

  [Warning] Downloaded an ArrayBuffer of size 25098 bytes (test.html, line 10)
  [Warning] Fetch Response headers: (test.html, line 11)
  [Log] content-encoding: gzip (test.html, line 13)
  [Log] content-length: 25098 (test.html, line 13)

and the page body will report

  Test ERROR: Received gzip-compressed data, but browser should have decompressed it

Expected:

On Chrome and Firefox, the browser will print

  Downloaded an ArrayBuffer of size 93274 bytes
  Fetch Response headers:
  content-encoding: gzip
  content-length: 25098

and the page body will report

  Test passed OK
Comment 1 jujjyl 2022-11-03 06:34:49 PDT
Created attachment 463386 [details]
Test case to reproduce the issue
Comment 2 Radar WebKit Bug Importer 2022-11-03 15:57:48 PDT
<rdar://problem/101935292>
Comment 3 Ryan Reno 2022-11-03 19:33:50 PDT
Thanks for providing a test case that is very helpful! 

I can't reproduce this issue when using a basic HTTP server with python -m http.server. That is, the test case passes.

Looking at the response headers the Web Inspector Network tab I see the main difference between the emscripten server and the stock server is the emscripten server returns a Content-type: application/octet-stream header for the gzipped file.


I'll investigate further to see if I can figure out why we aren't decompressing that MIME type.
Comment 4 Ryan Reno 2022-11-03 19:36:48 PDT
I'll also note that the emscripten server provided does respond with COOP, COEP, and CORP headers while the basic HTTP server does not. Those headers may interact in such a way to cause us to not decompress the data.
Comment 5 jujjyl 2022-11-04 01:23:01 PDT
I notice if I change emrun.py to override the MIME type of served test.data.gz from application/octet-stream to either "application/gzip" or something else random, like "application/vnd.oasis.opendocument.spreadsheet", then the test passes.

Commenting out the lines

#    self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
#    self.send_header('Cache-Control', 'no-cache, must-revalidate')
#    self.send_header('Connection', 'close')
#    self.send_header('Expires', '-1')
#    self.send_header('Access-Control-Allow-Origin', '*')
#    self.send_header('Cross-Origin-Opener-Policy', 'same-origin')
#    self.send_header('Cross-Origin-Embedder-Policy', 'require-corp')
#    self.send_header('Cross-Origin-Resource-Policy', 'cross-origin')

in emrun.py to remove potentially affecting headers, like COOP/COEP did not affect the scenario.

Oddly, also if I rename the test filename from 'test.data.gz' to e.g. 'test.data.jgz' (and also edit test.html to download the renamed file instead), then the test also passes. But when looking at the web console in the passed test, it prints

[Warning] Fetch Response headers: (test.html, line 10)
[Log] access-control-allow-origin: * (test.html, line 12)
[Log] cache-control: no-cache, must-revalidate (test.html, line 12)
[Log] connection: close (test.html, line 12)
[Log] content-encoding: gzip (test.html, line 12)
[Log] content-length: 25098 (test.html, line 12)
[Log] content-type: application/octet-stream (test.html, line 12)

so here the file did still have content-type application/octet-stream in it.
Comment 6 jujjyl 2022-11-04 01:36:41 PDT
Of note is that while the test case uses an "exotic" emrun.py ad hoc web server, the issue is met in the wild by a lot of Unity game developers. The best practices documentation at Unity at https://docs.unity3d.com/Manual/webgl-server-configuration-code-samples.html currently recommends that Unity game asset data files, which have a filename structure

file.data.gz

should be served with headers

Content-Type: application/octet-stream
Content-Encoding: gzip

so game developers who follow this best practices guidance (which does read reasonable) will be affected if they want to run their gzipped Unity game builds on Safari.

Of interest is also how Safari behaves if content is compressed with Brotli instead of gzip. Attaching a variant of the same test that is set up to utilize Brotli compression instead.
Comment 7 jujjyl 2022-11-04 01:37:20 PDT
Created attachment 463396 [details]
Test case to poke the behavior when using Brotli compression
Comment 8 jujjyl 2022-11-04 01:49:00 PDT
> I can't reproduce this issue when using a basic HTTP server with python -m http.server. That is, the test case passes.

Ops, sorry, I think the test cases "passes" with python -m http.server because I coded the check a bit silly in test.html. (I tried to observe Content-Encoding intent, but does not make sense for the scope of this bug report)

If you change the line

    if (u8[0] == 0x1f && u8[1] == 0x8b && response.headers.get('Content-Encoding') == 'gzip') document.body.innerHTML = 'Test ERROR: Received gzip-compressed data, but browser should have decompressed it';


to

    if (u8.length != 93274) document.body.innerHTML = 'Test ERROR: Received compressed data, but browser should have decompressed it';

then the test should fail also with regular python -m http.server.
Comment 9 Ryan Reno 2022-11-10 23:01:23 PST
Thanks for the updated test case and detailed analysis.

I can't reproduce this on iOS 16 so this seems to be macOS-specific, at least as of iOS 16.

Will continue to investigate.
Comment 10 Ryan Reno 2022-11-10 23:13:09 PST
Possibly related to https://bugs.webkit.org/show_bug.cgi?id=175597
Comment 11 jujjyl 2022-11-11 01:31:07 PST
Oh my god no :( that bug was reported in 2017.. five years ago.

I am the original author of the project https://s3.amazonaws.com/mozilla-games/ZenGarden/EpicZenGarden.html from bug https://bugs.webkit.org/show_bug.cgi?id=175597. (which looks like has since been taken down after I left Mozilla)

The experience from that project hosting is the very reason why all Unity content on the web today is being produced with .data.gz files with Content-Type: application/octet-stream and Content-Encoding: gzip.

I was never aware of bug 175597, and thought all this time that scheme would have been the "best practices" pre-compressed gzip encoding scheme that would work everywhere.

Specifically, the "success" of that project hosting was the reason that Unity rejected the notion of using ad hoc .jsgz, .datagz and .wasmgz suffixes for compressed files, but instead in early 2020 chose to use .js.gz, .data.gz and .wasm.gz (and a few others, like .symbols.gz) for the opportunity to enable using systematic rulesets on web servers to define precompressed assets via wildcards, e.g. set *.gz to get Content-Encoding: gzip, and set Content-Type: according to regex-matching the first extension before .gz part, where supported).

So there are now potentially thousands of Unity web game build sites out there that follow this scheme. Although Unity does default to utilizing brotli compression, which does not seem to have this issue.

Unity also offers a "Decompression Fallback" build option that enables a software JS based gzip decompressor that was all the time intended "for developers that do not have access to configuring web server hosting with the best practices Content-Type/Content-Encoding settings". It may be that the existence of this fallback option is what has caused nobody to ever before report to us that there is an issue with gzipped Unity content on Safari, but they just thought they would need to tick that checkbox for "things to work".

We have been trying to vocally get developers to not use this option, since it has a major impact to Unity game startup times. But for some reason (likely this bug), the use of that fallback option has persisted.

It does look like the issue occurs with both XHR and not limited to Fetch.

> I can't reproduce this on iOS 16 so this seems to be macOS-specific, at least as of iOS 16.

I concur, our QA reported back that they were not able to reproduce the issue on iOS either, but only on macOS Safari.

Additionally the bug occurs on old macOS Catalina 10.15.7 (19H2) test system with Safari Version 14.0 (15610.1.28.1.9, 15610, released on Sep 16, 2020), so it does not seem like any fix from 175597 would have regressed (or if it did, it has regressed before Safari 14).

All updated Unity content will now explicitly test whether the conditions of this issue are met, and guide developers to workarounds. Can't wait to see this issue being resolved in Safari, it will simplify hosting configuration headscratchers for a lot of developers!
Comment 12 Ryan Reno 2022-11-11 08:06:26 PST
Forcing the web server to return Content-Type: text/plain does decompress the data on macOS.

So this looks very much like that XHR bug. The solution there was to adopt CFNetwork SPI to opt-in to auto-decompress data even if the HTTP headers aren't what the framework expects in order to trigger auto decompression.

I'm not yet sure if that's because of a CFNetwork change or a change in WebKit.

> Additionally the bug occurs on old macOS Catalina 10.15.7 (19H2) test system
> with Safari Version 14.0 (15610.1.28.1.9, 15610, released on Sep 16, 2020),
> so it does not seem like any fix from 175597 would have regressed (or if it
> did, it has regressed before Safari 14).

It seems like whatever the change may have been it did not happen very recently.
Comment 13 Ryan Reno 2022-11-11 08:48:40 PST
I think we're creating our NetworkLoadParameters with a flipped boolean for requesting to opt-in to the CFNetwork behavior we want. In other words, we're not asking CFNetwork to do decompression when requesting gzipped resources.

I was able to fix the issue by setting the appropriate CFNetwork parameter in NetworkDataTaskCocoa::applySniffingPoliciesAndBindRequestToInferfaceIfNeeded

A better solution is probably to set the NetworkLoadParameters field appropriately based on the request we're making.
Comment 14 Ryan Reno 2022-11-11 12:34:40 PST
Pull request: https://github.com/WebKit/WebKit/pull/6408
Comment 15 Ryan Reno 2022-11-15 14:33:16 PST
Submitted web-platform-tests pull request: https://github.com/web-platform-tests/wpt/pull/36978
Comment 16 EWS 2022-11-16 15:07:58 PST
Committed 256755@main (ec8ff55e6568): <https://commits.webkit.org/256755@main>

Reviewed commits have been landed. Closing PR #6408 and removing active labels.