Bug 234518

Summary: WPT imported tests store non-utf8 files in wrong encoding
Product: WebKit Reporter: Patrick Griffis <pgriffis>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: ap, gsnedders, simon.fraser, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=234159

Description Patrick Griffis 2021-12-20 11:35:47 PST
Some files served during tests are specifically in non-utf8 encodings and the HTTP server will always send an incorrectly decoded version of those files.

A reproducer of this is `LayoutTests/imported/w3c/web-platform-tests/content-security-policy/script-src/hash-always-converted-to-utf-8/iso-8859-1.html`

You can run either `run-webkit-httpd` or `run-webkit-tests` on that file to reproduce. The file is in `iso-8859-1` encoding and `charset` is set in `iso-8859-1.html.sub.headers`

At the HTTP layer before the browser decodes anything it is sent `?`'s in place of the invalid UTF-8 characters so decoding happened somewhere before that point.
Comment 1 Radar WebKit Bug Importer 2021-12-27 11:36:18 PST
<rdar://problem/86942960>
Comment 2 Sam Sneddon [:gsnedders] 2022-01-05 08:01:33 PST
The imported file itself is wrong (containing UTF-8 encoded U+FFFD); my guess is the importer is breaking the file (and assuming the input is always UTF-8?).