Bug 306742

Summary: URL query percent encoding is incorrect
Product: WebKit Reporter: Nikita Skovoroda <chalkerx>
Component: PlatformAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, annevk, ap, darin, karlcow, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=162068

Nikita Skovoroda
Reported 2026-02-01 19:12:14 PST
See https://encoding.spec.whatwg.org/index-iso-8859-2.txt -- there is no `U+00A2` here, and `128+34` maps to `0x02D8`, 128+13 maps to `0x008D` Which is what TextDecoder does: ``` > new TextDecoder('iso-8859-2').decode(Uint8Array.of(0xa2)).codePointAt(0).toString(16) '2d8' > new TextDecoder('iso-8859-2').decode(Uint8Array.of(0x8d)).codePointAt(0).toString(16) '8d' ``` But in https://url.spec.whatwg.org/#string-percent-encode-after-encoding + https://encoding.spec.whatwg.org/#encode-or-fail though, the result is wrong with `iso-8859-2` encoing: ``` var a = document.createElement('a'); a.href = 'https://example.com/?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = 'https://example.com/?' + String.fromCodePoint(0x8d) console.log(b.search.slice(1)) ``` WebKit prints `%8D` for both! Per spec (and in Chrome, Firefox, Servo) it is `%26%23162%3B` for `U+A2` and `%8D` for `U+8D`. --- This is not limited to just `iso-8859-2` Here is the list of encodings where trivial encoding checks fails: ``` ✖ FAIL percent-encode after encoding matches browser > iso-8859-2 ✖ FAIL percent-encode after encoding matches browser > iso-8859-4 ✖ FAIL percent-encode after encoding matches browser > iso-8859-5 ✖ FAIL percent-encode after encoding matches browser > iso-8859-13 ✖ FAIL percent-encode after encoding matches browser > iso-8859-15 ✖ FAIL percent-encode after encoding matches browser > koi8-r ✖ FAIL percent-encode after encoding matches browser > macintosh ✖ FAIL percent-encode after encoding matches browser > windows-1250 ✖ FAIL percent-encode after encoding matches browser > windows-1251 ✖ FAIL percent-encode after encoding matches browser > windows-1254 ✖ FAIL percent-encode after encoding matches browser > windows-1256 ✖ FAIL percent-encode after encoding matches browser > windows-1258 ✖ FAIL percent-encode after encoding matches browser > x-mac-cyrillic ✖ FAIL percent-encode after encoding matches browser > gbk ✖ FAIL percent-encode after encoding matches browser > gb18030 ``` It passes on Chrome, Firefox and Servo.
Attachments
Nikita Skovoroda
Comment 1 2026-02-01 19:14:44 PST
Quick isolated test, just run it in `about:blank`: ``` const iframe = document.createElement('iframe') document.body.append(iframe) const encoding = 'iso-8859-2' const codepoint = 0x5a7a const html = ` <!DOCTYPE html> <script> var a = document.createElement('a'); a.href = 'https://example.com/?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = 'https://example.com/?' + String.fromCodePoint(0x8d) console.log(b.search.slice(1)) </script>` iframe.src = `data:text/html;charset=${encoding},${encodeURI(html)}` ```
Alexey Proskuryakov
Comment 2 2026-02-02 14:43:19 PST
Thank you for the report! There is certainly a discrepancy in results across WebKit and other engines here. I don't understand what's going on without tracing through all the specs though. In particular, how do the other engines arrive at '%26%23162%3B', which is percent encoding for '&#162;'? HTML entities should be meaningless in URLs. And I expected UTF-8 anyway, regardless of document encoding.
Nikita Skovoroda
Comment 3 2026-02-02 18:02:42 PST
See the links to the spec in the issue Here they are again: 1. https://url.spec.whatwg.org/#string-percent-encode-after-encoding 2. https://encoding.spec.whatwg.org/#encode-or-fail They mention how `%26%23`-escaping gets there.
Nikita Skovoroda
Comment 4 2026-02-02 18:06:54 PST
Moreover, WebKit encodes other bytes correctly and has the same logic. Just it misbehaves on certain input Try e.g. ``` (() => { const iframe = document.createElement('iframe') document.body.append(iframe) const encoding = 'iso-8859-2' const html = ` <!DOCTYPE html> <script> var a = document.createElement('a'); a.href = 'https://example.com/?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = 'https://example.com/?' + String.fromCodePoint(0xa3) console.log(b.search.slice(1)) </script>` iframe.src = `data:text/html;charset=${encoding},${encodeURI(html)}` })(); ``` WebKit prints: ``` %8D %26%23163%3B ``` The behavior on 0xa3 is correct, just 0xa2 is broken somewhy.
Karl Dubost
Comment 5 2026-02-02 19:32:32 PST
Nikita, does it affect one of your sites? library?
Nikita Skovoroda
Comment 6 2026-02-03 01:59:39 PST
Karl, no. It was found in cross-tests. The only thing it affects for me is this: https://github.com/ExodusOSS/bytes/blob/6d0030bfe/tests/whatwg.browser.test.js#L78-L83 I do have another impls to compare to though, so this is not important for my case.
Nikita Skovoroda
Comment 7 2026-02-03 02:02:47 PST
Re: title change: I doubt that this affects only query percent-encoding The methods are common for all non-utf8 encoding, query or not. So it's likely `non-utf8 encoding`, not `URL query percent encoding`
Alexey Proskuryakov
Comment 8 2026-02-03 10:43:51 PST
I meant to highlight that it's not affecting TextDecoder API, thus it's likely not in the underlying decoder code, and seems to be more about URLs. As already mentioned, I'm not sure about where the problem actually is.
Alex Christensen
Comment 9 2026-02-03 13:42:43 PST
This is a part of the encoding, not decoding, hence why TextDecoder is unaffected. This seems to be specified in https://url.spec.whatwg.org/#string-percent-encode-after-encoding at the very end. Firefox implements it in a function named nsStandardURL::nsSegmentEncoder::EncodeSegmentCount Chromium implements it in a function named appendURLEscapedChar Our analogous function urlEscapedEntityCallback calls TextCodec::getUnencodableReplacement which implements it. I'm not completely sure why it isn't being called in this case. I don't see the (reason == UCNV_UNASSIGNED) condition being met, which seems strange to me.
Alex Christensen
Comment 10 2026-02-03 14:20:17 PST
Seems related to our use of ucnv_setFallback
Radar WebKit Bug Importer
Comment 11 2026-02-03 14:36:26 PST
Alex Christensen
Comment 12 2026-02-03 14:38:51 PST
EWS
Comment 13 2026-02-04 04:47:15 PST
Committed 306768@main (47164fceaef4): <https://commits.webkit.org/306768@main> Reviewed commits have been landed. Closing PR #57812 and removing active labels.
Note You need to log in before you can comment on or make changes to this bug.