RESOLVED FIXED 181181
The percent encoding in anchorElement.search depends on the encoding of the page
https://bugs.webkit.org/show_bug.cgi?id=181181
Summary The percent encoding in anchorElement.search depends on the encoding of the page
Pierre-Yves Gérardy
Reported 2017-12-28 07:02:44 PST
On a page loaded with iso-8859-1 encoding, run this code: var a = document.createElement("a") a.href = "?" + String.fromCodePoint(246) console.log(a.search) You get back "?%F6", not "?%C3%B6". According to the URL spec, all percent-encoded bytes in URLs should represent valid UTF-8 code points. `location.search` and `new URL().serach` are not affected, and neither are the `.pathname` and `.hash` getters (they all return percent-encoded UTF-8 bytes). Repro here (you must set the encoding manually using the "view/text encoding" menu. http://bl.ocks.org/pygy/raw/b4f638659162c321d40694a38c16a6e7/8e718d92c41228d5681cc989627f80e5f8573a20/
Attachments
Alexey Proskuryakov
Comment 1 2017-12-31 14:23:31 PST
This behavior is of course intentional, and used to be necessary for web compatibility. Maybe it's not needed any more.
Pierre-Yves Gérardy
Comment 2 2018-01-02 04:34:29 PST
That would explain why Chrome and Firefox behave in the same way... In Firefox, `location.search` also depends on the page encoding... Was it also the case in earlier WebKit versions? If it is still needed for Web compat, then I suppose that the URL spec must be updated accordingly... I can open an issue on the WhatWG tracker if needed.
Alexey Proskuryakov
Comment 3 2018-01-02 09:19:03 PST
If three browsers do this, then updating the spec would seem like the logical next step indeed. I do not know if anything changed with regards to location.search in WebKit.
Anne van Kesteren
Comment 4 2018-01-02 10:14:37 PST
https://url.spec.whatwg.org/#query-state takes the encoding into account, no? Note that new URL() and some other code paths in the browser will always force UTF-8, but <a> and location will use the encoding of the document.
Alex Christensen
Comment 5 2018-01-02 11:45:57 PST
Yep, this is intentional, all browsers behave this way, and it is in the URL specification.
Pierre-Yves Gérardy
Comment 6 2018-01-03 01:38:23 PST
Firefox is the only browser that treats location and <a> identically. Chrome and Safari both have location and new URL() work the same. Also, the URL specification is not consistent, because it also states that """ A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits. Sequences of percent-encoded bytes, after conversion to bytes, should not cause UTF-8 decode without BOM or fail to return failure. """ Yet it proceeds to describe an algorithm that produces non-UTF-8 sequences. https://url.spec.whatwg.org/#percent-encoded-bytes decodeURI and friends rely on this choke on non-UTF-8 sequences of perccent-encoded bytes. For Latin-1, unescape() works, but that's about it. This is in the context of a SPA router that supports routes as pathname, search or hash (only one at a time :-). Since I can't even rely on location and <a> behaving consistently (for feature detection) I'll probably disable non-ascii routes if document.characterSet.toUpperCase() is not "UTF-8".
Anne van Kesteren
Comment 7 2018-01-03 01:50:34 PST
Having different requirements for web developers and user agents is fairly common in standards. Web developers are also supposed to exclusively use UTF-8, for instance. Not sure what you mean with regards to interoperability issues. It might be worth filing an issue against https://github.com/whatwg/url/issues/new with more detail so we can add the necessary tests and make browsers fully consistent where they're currently not (or if you want to work on web-platform-tests for that yourself that'd be great too).
Note You need to log in before you can comment on or make changes to this bug.