Bug 235308 - The encoding argument to PAL::decodeURLEscapeSequencesAsData is unnecessary
Summary: The encoding argument to PAL::decodeURLEscapeSequencesAsData is unnecessary
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Anne van Kesteren
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-01-17 17:50 PST by Andreu Botella
Modified: 2024-01-02 08:53 PST (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreu Botella 2022-01-17 17:50:26 PST
While investigating bug 235307, I noticed that `PAL::decodeURLEscapeSequencesAsData` only seems to be used in `WebCore::DataURLDecoder`, and the `TextEncoding` object that is passed corresponds to the charset parsed from the data URL's MIME type. That algorithm is used to encode the parts of the input string that aren't percent escapes. But the spec's algorithm to process data URLs (https://fetch.spec.whatwg.org/#data-urls), while it parses the MIME type, it does not try to extract the charset, let alone use it for decoding the body.

What seems to be happening is that the input to the data URL processor is a URL object, and that URL is then serialized in step 2 of the processor (in WebKit, this happens in `DecodeTask::process()`). For data URLs, the result of parsing and serializing is always an ASCII string, with non-ASCII characters percent-encoded as UTF-8 (or as the encoding with which the URL was parsed, if they happen to be parsed as part of the query). Therefore, as long as the `string` parameter to `decodeURLEscapeSequencesAsData` is a serialized URL, there are no code points in the input string that would encode differently depending on the passed encoding*, and so the encoding is effectively irrelevant.

Removing this argument would also make the `charset` field of `WebCore::DataURLDecoder::Result` unnecessary.

*. C0 controls are also serialized, so ISO-2022-JP will behave the same as the rest of encodings.
Comment 1 Radar WebKit Bug Importer 2022-01-24 17:51:15 PST
<rdar://problem/88000173>
Comment 2 Anne van Kesteren 2023-12-28 23:59:52 PST
Pull request: https://github.com/WebKit/WebKit/pull/22264
Comment 3 EWS 2024-01-02 08:53:54 PST
Committed 272569@main (7e4ae6913e3e): <https://commits.webkit.org/272569@main>

Reviewed commits have been landed. Closing PR #22264 and removing active labels.