181181 – The percent encoding in anchorElement.search depends on the encoding of the page

RESOLVED FIXED 181181

The percent encoding in anchorElement.search depends on the encoding of the page

https://bugs.webkit.org/show_bug.cgi?id=181181

Summary The percent encoding in anchorElement.search depends on the encoding of the page

Pierre-Yves Gérardy

Reported 2017-12-28 07:02:44 PST

On a page loaded with iso-8859-1 encoding, run this code: var a = document.createElement("a") a.href = "?" + String.fromCodePoint(246) console.log(a.search) You get back "?%F6", not "?%C3%B6". According to the URL spec, all percent-encoded bytes in URLs should represent valid UTF-8 code points. `location.search` and `new URL().serach` are not affected, and neither are the `.pathname` and `.hash` getters (they all return percent-encoded UTF-8 bytes). Repro here (you must set the encoding manually using the "view/text encoding" menu. http://bl.ocks.org/pygy/raw/b4f638659162c321d40694a38c16a6e7/8e718d92c41228d5681cc989627f80e5f8573a20/

Attachments
Add attachment proposed patch, testcase, etc.

Alexey Proskuryakov

Comment 1 2017-12-31 14:23:31 PST

This behavior is of course intentional, and used to be necessary for web compatibility. Maybe it's not needed any more.

Pierre-Yves Gérardy

Comment 2 2018-01-02 04:34:29 PST

That would explain why Chrome and Firefox behave in the same way... In Firefox, `location.search` also depends on the page encoding... Was it also the case in earlier WebKit versions? If it is still needed for Web compat, then I suppose that the URL spec must be updated accordingly... I can open an issue on the WhatWG tracker if needed.

Alexey Proskuryakov

Comment 3 2018-01-02 09:19:03 PST

If three browsers do this, then updating the spec would seem like the logical next step indeed. I do not know if anything changed with regards to location.search in WebKit.

Anne van Kesteren

Comment 4 2018-01-02 10:14:37 PST

https://url.spec.whatwg.org/#query-state takes the encoding into account, no? Note that new URL() and some other code paths in the browser will always force UTF-8, but <a> and location will use the encoding of the document.

Alex Christensen

Comment 5 2018-01-02 11:45:57 PST

Yep, this is intentional, all browsers behave this way, and it is in the URL specification.

Pierre-Yves Gérardy

Comment 6 2018-01-03 01:38:23 PST

Firefox is the only browser that treats location and <a> identically. Chrome and Safari both have location and new URL() work the same. Also, the URL specification is not consistent, because it also states that """ A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits. Sequences of percent-encoded bytes, after conversion to bytes, should not cause UTF-8 decode without BOM or fail to return failure. """ Yet it proceeds to describe an algorithm that produces non-UTF-8 sequences. https://url.spec.whatwg.org/#percent-encoded-bytes decodeURI and friends rely on this choke on non-UTF-8 sequences of perccent-encoded bytes. For Latin-1, unescape() works, but that's about it. This is in the context of a SPA router that supports routes as pathname, search or hash (only one at a time :-). Since I can't even rely on location and <a> behaving consistently (for feature detection) I'll probably disable non-ascii routes if document.characterSet.toUpperCase() is not "UTF-8".

Anne van Kesteren

Comment 7 2018-01-03 01:50:34 PST

Having different requirements for web developers and user agents is fairly common in standards. Web developers are also supposed to exclusively use UTF-8, for instance. Not sure what you mean with regards to interoperability issues. It might be worth filing an issue against https://github.com/whatwg/url/issues/new with more detail so we can add the necessary tests and make browsers fully consistent where they're currently not (or if you want to work on web-platform-tests for that yourself that'd be great too).

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Mac

OS macOS 10.12

Product WebKit

Component WebCore JavaScript

Assignee

Nobody

Reported

2017-12-28 07:02 PST

Modified

2018-01-03 01:50 PST History

CC List

3 users Show

URL

Keywords

Depends on

Blocks