Bug 234518 - WPT imported tests store non-utf8 files in wrong encoding
Summary: WPT imported tests store non-utf8 files in wrong encoding
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-12-20 11:35 PST by Patrick Griffis
Modified: 2022-01-20 20:28 PST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Griffis 2021-12-20 11:35:47 PST
Some files served during tests are specifically in non-utf8 encodings and the HTTP server will always send an incorrectly decoded version of those files.

A reproducer of this is `LayoutTests/imported/w3c/web-platform-tests/content-security-policy/script-src/hash-always-converted-to-utf-8/iso-8859-1.html`

You can run either `run-webkit-httpd` or `run-webkit-tests` on that file to reproduce. The file is in `iso-8859-1` encoding and `charset` is set in `iso-8859-1.html.sub.headers`

At the HTTP layer before the browser decodes anything it is sent `?`'s in place of the invalid UTF-8 characters so decoding happened somewhere before that point.
Comment 1 Radar WebKit Bug Importer 2021-12-27 11:36:18 PST
<rdar://problem/86942960>
Comment 2 Sam Sneddon [:gsnedders] 2022-01-05 08:01:33 PST
The imported file itself is wrong (containing UTF-8 encoded U+FFFD); my guess is the importer is breaking the file (and assuming the input is always UTF-8?).