WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
181181
The percent encoding in anchorElement.search depends on the encoding of the page
https://bugs.webkit.org/show_bug.cgi?id=181181
Summary
The percent encoding in anchorElement.search depends on the encoding of the page
Pierre-Yves Gérardy
Reported
2017-12-28 07:02:44 PST
On a page loaded with iso-8859-1 encoding, run this code: var a = document.createElement("a") a.href = "?" + String.fromCodePoint(246) console.log(a.search) You get back "?%F6", not "?%C3%B6". According to the URL spec, all percent-encoded bytes in URLs should represent valid UTF-8 code points. `location.search` and `new URL().serach` are not affected, and neither are the `.pathname` and `.hash` getters (they all return percent-encoded UTF-8 bytes). Repro here (you must set the encoding manually using the "view/text encoding" menu.
http://bl.ocks.org/pygy/raw/b4f638659162c321d40694a38c16a6e7/8e718d92c41228d5681cc989627f80e5f8573a20/
Attachments
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2017-12-31 14:23:31 PST
This behavior is of course intentional, and used to be necessary for web compatibility. Maybe it's not needed any more.
Pierre-Yves Gérardy
Comment 2
2018-01-02 04:34:29 PST
That would explain why Chrome and Firefox behave in the same way... In Firefox, `location.search` also depends on the page encoding... Was it also the case in earlier WebKit versions? If it is still needed for Web compat, then I suppose that the URL spec must be updated accordingly... I can open an issue on the WhatWG tracker if needed.
Alexey Proskuryakov
Comment 3
2018-01-02 09:19:03 PST
If three browsers do this, then updating the spec would seem like the logical next step indeed. I do not know if anything changed with regards to location.search in WebKit.
Anne van Kesteren
Comment 4
2018-01-02 10:14:37 PST
https://url.spec.whatwg.org/#query-state
takes the encoding into account, no? Note that new URL() and some other code paths in the browser will always force UTF-8, but <a> and location will use the encoding of the document.
Alex Christensen
Comment 5
2018-01-02 11:45:57 PST
Yep, this is intentional, all browsers behave this way, and it is in the URL specification.
Pierre-Yves Gérardy
Comment 6
2018-01-03 01:38:23 PST
Firefox is the only browser that treats location and <a> identically. Chrome and Safari both have location and new URL() work the same. Also, the URL specification is not consistent, because it also states that """ A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits. Sequences of percent-encoded bytes, after conversion to bytes, should not cause UTF-8 decode without BOM or fail to return failure. """ Yet it proceeds to describe an algorithm that produces non-UTF-8 sequences.
https://url.spec.whatwg.org/#percent-encoded-bytes
decodeURI and friends rely on this choke on non-UTF-8 sequences of perccent-encoded bytes. For Latin-1, unescape() works, but that's about it. This is in the context of a SPA router that supports routes as pathname, search or hash (only one at a time :-). Since I can't even rely on location and <a> behaving consistently (for feature detection) I'll probably disable non-ascii routes if document.characterSet.toUpperCase() is not "UTF-8".
Anne van Kesteren
Comment 7
2018-01-03 01:50:34 PST
Having different requirements for web developers and user agents is fairly common in standards. Web developers are also supposed to exclusively use UTF-8, for instance. Not sure what you mean with regards to interoperability issues. It might be worth filing an issue against
https://github.com/whatwg/url/issues/new
with more detail so we can add the necessary tests and make browsers fully consistent where they're currently not (or if you want to work on web-platform-tests for that yourself that'd be great too).
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug