306742 – URL query percent encoding is incorrect

RESOLVED FIXED306742

URL query percent encoding is incorrect

https://bugs.webkit.org/show_bug.cgi?id=306742

Summary URL query percent encoding is incorrect

Nikita Skovoroda

Reported 2026-02-01 19:12:14 PST

See https://encoding.spec.whatwg.org/index-iso-8859-2.txt -- there is no `U+00A2` here, and `128+34` maps to `0x02D8`, 128+13 maps to `0x008D` Which is what TextDecoder does: ``` > new TextDecoder('iso-8859-2').decode(Uint8Array.of(0xa2)).codePointAt(0).toString(16) '2d8' > new TextDecoder('iso-8859-2').decode(Uint8Array.of(0x8d)).codePointAt(0).toString(16) '8d' ``` But in https://url.spec.whatwg.org/#string-percent-encode-after-encoding + https://encoding.spec.whatwg.org/#encode-or-fail though, the result is wrong with `iso-8859-2` encoing: ``` var a = document.createElement('a'); a.href = 'https://example.com/?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = 'https://example.com/?' + String.fromCodePoint(0x8d) console.log(b.search.slice(1)) ``` WebKit prints `%8D` for both! Per spec (and in Chrome, Firefox, Servo) it is `%26%23162%3B` for `U+A2` and `%8D` for `U+8D`. --- This is not limited to just `iso-8859-2` Here is the list of encodings where trivial encoding checks fails: ``` ✖ FAIL percent-encode after encoding matches browser > iso-8859-2 ✖ FAIL percent-encode after encoding matches browser > iso-8859-4 ✖ FAIL percent-encode after encoding matches browser > iso-8859-5 ✖ FAIL percent-encode after encoding matches browser > iso-8859-13 ✖ FAIL percent-encode after encoding matches browser > iso-8859-15 ✖ FAIL percent-encode after encoding matches browser > koi8-r ✖ FAIL percent-encode after encoding matches browser > macintosh ✖ FAIL percent-encode after encoding matches browser > windows-1250 ✖ FAIL percent-encode after encoding matches browser > windows-1251 ✖ FAIL percent-encode after encoding matches browser > windows-1254 ✖ FAIL percent-encode after encoding matches browser > windows-1256 ✖ FAIL percent-encode after encoding matches browser > windows-1258 ✖ FAIL percent-encode after encoding matches browser > x-mac-cyrillic ✖ FAIL percent-encode after encoding matches browser > gbk ✖ FAIL percent-encode after encoding matches browser > gb18030 ``` It passes on Chrome, Firefox and Servo.

Attachments
Add attachment proposed patch, testcase, etc.

Nikita Skovoroda

Comment 1 2026-02-01 19:14:44 PST

Quick isolated test, just run it in `about:blank`: ``` const iframe = document.createElement('iframe') document.body.append(iframe) const encoding = 'iso-8859-2' const codepoint = 0x5a7a const html = ` <!DOCTYPE html> <script> var a = document.createElement('a'); a.href = 'https://example.com/?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = 'https://example.com/?' + String.fromCodePoint(0x8d) console.log(b.search.slice(1)) </script>` iframe.src = `data:text/html;charset=${encoding},${encodeURI(html)}` ```

Alexey Proskuryakov

Comment 2 2026-02-02 14:43:19 PST

Thank you for the report! There is certainly a discrepancy in results across WebKit and other engines here. I don't understand what's going on without tracing through all the specs though. In particular, how do the other engines arrive at '%26%23162%3B', which is percent encoding for '¢'? HTML entities should be meaningless in URLs. And I expected UTF-8 anyway, regardless of document encoding.

Nikita Skovoroda

Comment 3 2026-02-02 18:02:42 PST

See the links to the spec in the issue Here they are again: 1. https://url.spec.whatwg.org/#string-percent-encode-after-encoding 2. https://encoding.spec.whatwg.org/#encode-or-fail They mention how `%26%23`-escaping gets there.

Nikita Skovoroda

Comment 4 2026-02-02 18:06:54 PST

Moreover, WebKit encodes other bytes correctly and has the same logic. Just it misbehaves on certain input Try e.g. ``` (() => { const iframe = document.createElement('iframe') document.body.append(iframe) const encoding = 'iso-8859-2' const html = ` <!DOCTYPE html> <script> var a = document.createElement('a'); a.href = 'https://example.com/?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = 'https://example.com/?' + String.fromCodePoint(0xa3) console.log(b.search.slice(1)) </script>` iframe.src = `data:text/html;charset=${encoding},${encodeURI(html)}` })(); ``` WebKit prints: ``` %8D %26%23163%3B ``` The behavior on 0xa3 is correct, just 0xa2 is broken somewhy.

Karl Dubost

Comment 5 2026-02-02 19:32:32 PST

Nikita, does it affect one of your sites? library?

Nikita Skovoroda

Comment 6 2026-02-03 01:59:39 PST

Karl, no. It was found in cross-tests. The only thing it affects for me is this: https://github.com/ExodusOSS/bytes/blob/6d0030bfe/tests/whatwg.browser.test.js#L78-L83 I do have another impls to compare to though, so this is not important for my case.

Nikita Skovoroda

Comment 7 2026-02-03 02:02:47 PST

Re: title change: I doubt that this affects only query percent-encoding The methods are common for all non-utf8 encoding, query or not. So it's likely `non-utf8 encoding`, not `URL query percent encoding`

Alexey Proskuryakov

Comment 8 2026-02-03 10:43:51 PST

I meant to highlight that it's not affecting TextDecoder API, thus it's likely not in the underlying decoder code, and seems to be more about URLs. As already mentioned, I'm not sure about where the problem actually is.

Alex Christensen

Comment 9 2026-02-03 13:42:43 PST

This is a part of the encoding, not decoding, hence why TextDecoder is unaffected. This seems to be specified in https://url.spec.whatwg.org/#string-percent-encode-after-encoding at the very end. Firefox implements it in a function named nsStandardURL::nsSegmentEncoder::EncodeSegmentCount Chromium implements it in a function named appendURLEscapedChar Our analogous function urlEscapedEntityCallback calls TextCodec::getUnencodableReplacement which implements it. I'm not completely sure why it isn't being called in this case. I don't see the (reason == UCNV_UNASSIGNED) condition being met, which seems strange to me.

Alex Christensen

Comment 10 2026-02-03 14:20:17 PST

Seems related to our use of ucnv_setFallback

Radar WebKit Bug Importer

Comment 11 2026-02-03 14:36:26 PST

<rdar://problem/169566553>

Alex Christensen

Comment 12 2026-02-03 14:38:51 PST

Pull request: https://github.com/WebKit/WebKit/pull/57812

EWS

Comment 13 2026-02-04 04:47:15 PST

Committed 306768@main (47164fceaef4): <https://commits.webkit.org/306768@main> Reviewed commits have been landed. Closing PR #57812 and removing active labels.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Platform

Assignee

Alex Christensen

Reported

2026-02-01 19:12 PST

Modified

2026-02-04 04:47 PST History

CC List

6 users Show

URL

Keywords InRadar

Depends on

Blocks