WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
306742
URL query percent encoding is incorrect
https://bugs.webkit.org/show_bug.cgi?id=306742
Summary
URL query percent encoding is incorrect
Nikita Skovoroda
Reported
2026-02-01 19:12:14 PST
See
https://encoding.spec.whatwg.org/index-iso-8859-2.txt
-- there is no `U+00A2` here, and `128+34` maps to `0x02D8`, 128+13 maps to `0x008D` Which is what TextDecoder does: ```
> new TextDecoder('iso-8859-2').decode(Uint8Array.of(0xa2)).codePointAt(0).toString(16)
'2d8'
> new TextDecoder('iso-8859-2').decode(Uint8Array.of(0x8d)).codePointAt(0).toString(16)
'8d' ``` But in
https://url.spec.whatwg.org/#string-percent-encode-after-encoding
+
https://encoding.spec.whatwg.org/#encode-or-fail
though, the result is wrong with `iso-8859-2` encoing: ``` var a = document.createElement('a'); a.href = '
https://example.com/
?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = '
https://example.com/
?' + String.fromCodePoint(0x8d) console.log(b.search.slice(1)) ``` WebKit prints `%8D` for both! Per spec (and in Chrome, Firefox, Servo) it is `%26%23162%3B` for `U+A2` and `%8D` for `U+8D`. --- This is not limited to just `iso-8859-2` Here is the list of encodings where trivial encoding checks fails: ``` ✖ FAIL percent-encode after encoding matches browser > iso-8859-2 ✖ FAIL percent-encode after encoding matches browser > iso-8859-4 ✖ FAIL percent-encode after encoding matches browser > iso-8859-5 ✖ FAIL percent-encode after encoding matches browser > iso-8859-13 ✖ FAIL percent-encode after encoding matches browser > iso-8859-15 ✖ FAIL percent-encode after encoding matches browser > koi8-r ✖ FAIL percent-encode after encoding matches browser > macintosh ✖ FAIL percent-encode after encoding matches browser > windows-1250 ✖ FAIL percent-encode after encoding matches browser > windows-1251 ✖ FAIL percent-encode after encoding matches browser > windows-1254 ✖ FAIL percent-encode after encoding matches browser > windows-1256 ✖ FAIL percent-encode after encoding matches browser > windows-1258 ✖ FAIL percent-encode after encoding matches browser > x-mac-cyrillic ✖ FAIL percent-encode after encoding matches browser > gbk ✖ FAIL percent-encode after encoding matches browser > gb18030 ``` It passes on Chrome, Firefox and Servo.
Attachments
Add attachment
proposed patch, testcase, etc.
Nikita Skovoroda
Comment 1
2026-02-01 19:14:44 PST
Quick isolated test, just run it in `about:blank`: ``` const iframe = document.createElement('iframe') document.body.append(iframe) const encoding = 'iso-8859-2' const codepoint = 0x5a7a const html = ` <!DOCTYPE html> <script> var a = document.createElement('a'); a.href = '
https://example.com/
?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = '
https://example.com/
?' + String.fromCodePoint(0x8d) console.log(b.search.slice(1)) </script>` iframe.src = `data:text/html;charset=${encoding},${encodeURI(html)}` ```
Alexey Proskuryakov
Comment 2
2026-02-02 14:43:19 PST
Thank you for the report! There is certainly a discrepancy in results across WebKit and other engines here. I don't understand what's going on without tracing through all the specs though. In particular, how do the other engines arrive at '%26%23162%3B', which is percent encoding for '¢'? HTML entities should be meaningless in URLs. And I expected UTF-8 anyway, regardless of document encoding.
Nikita Skovoroda
Comment 3
2026-02-02 18:02:42 PST
See the links to the spec in the issue Here they are again: 1.
https://url.spec.whatwg.org/#string-percent-encode-after-encoding
2.
https://encoding.spec.whatwg.org/#encode-or-fail
They mention how `%26%23`-escaping gets there.
Nikita Skovoroda
Comment 4
2026-02-02 18:06:54 PST
Moreover, WebKit encodes other bytes correctly and has the same logic. Just it misbehaves on certain input Try e.g. ``` (() => { const iframe = document.createElement('iframe') document.body.append(iframe) const encoding = 'iso-8859-2' const html = ` <!DOCTYPE html> <script> var a = document.createElement('a'); a.href = '
https://example.com/
?' + String.fromCodePoint(0xa2) console.log(a.search.slice(1)) var b = document.createElement('a'); b.href = '
https://example.com/
?' + String.fromCodePoint(0xa3) console.log(b.search.slice(1)) </script>` iframe.src = `data:text/html;charset=${encoding},${encodeURI(html)}` })(); ``` WebKit prints: ``` %8D %26%23163%3B ``` The behavior on 0xa3 is correct, just 0xa2 is broken somewhy.
Karl Dubost
Comment 5
2026-02-02 19:32:32 PST
Nikita, does it affect one of your sites? library?
Nikita Skovoroda
Comment 6
2026-02-03 01:59:39 PST
Karl, no. It was found in cross-tests. The only thing it affects for me is this:
https://github.com/ExodusOSS/bytes/blob/6d0030bfe/tests/whatwg.browser.test.js#L78-L83
I do have another impls to compare to though, so this is not important for my case.
Nikita Skovoroda
Comment 7
2026-02-03 02:02:47 PST
Re: title change: I doubt that this affects only query percent-encoding The methods are common for all non-utf8 encoding, query or not. So it's likely `non-utf8 encoding`, not `URL query percent encoding`
Alexey Proskuryakov
Comment 8
2026-02-03 10:43:51 PST
I meant to highlight that it's not affecting TextDecoder API, thus it's likely not in the underlying decoder code, and seems to be more about URLs. As already mentioned, I'm not sure about where the problem actually is.
Alex Christensen
Comment 9
2026-02-03 13:42:43 PST
This is a part of the encoding, not decoding, hence why TextDecoder is unaffected. This seems to be specified in
https://url.spec.whatwg.org/#string-percent-encode-after-encoding
at the very end. Firefox implements it in a function named nsStandardURL::nsSegmentEncoder::EncodeSegmentCount Chromium implements it in a function named appendURLEscapedChar Our analogous function urlEscapedEntityCallback calls TextCodec::getUnencodableReplacement which implements it. I'm not completely sure why it isn't being called in this case. I don't see the (reason == UCNV_UNASSIGNED) condition being met, which seems strange to me.
Alex Christensen
Comment 10
2026-02-03 14:20:17 PST
Seems related to our use of ucnv_setFallback
Radar WebKit Bug Importer
Comment 11
2026-02-03 14:36:26 PST
<
rdar://problem/169566553
>
Alex Christensen
Comment 12
2026-02-03 14:38:51 PST
Pull request:
https://github.com/WebKit/WebKit/pull/57812
EWS
Comment 13
2026-02-04 04:47:15 PST
Committed
306768@main
(47164fceaef4): <
https://commits.webkit.org/306768@main
> Reviewed commits have been landed. Closing PR #57812 and removing active labels.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug