Bug 13167
Summary: | Unescape %-escaped hostnames and convert them to punycode before DNS lookup | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jungshik Shin <jshin> |
Component: | Page Loading | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED CONFIGURATION CHANGED | ||
Severity: | Normal | CC: | achristensen, adele, ahmad.saleem792, annevk, ap, mrowe, webkit-bugs |
Priority: | P2 | Keywords: | InRadar |
Version: | 523.x (Safari 3) | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://sailor%e6%9c%88.com/ |
Jungshik Shin
ftp://ftp.rfc-editor.org/in-notes/rfc3986.txt
RFC 3686 3.2.2 specifies that %-encoded hostnames need to be supported.
Needless to say, IDN support needs to be added first (is it now supported in the trunk?). If not yet supported, has it been filed as a bug here? My quick search (not so thorough) turned up nothing..
The corresponding Gecko bug is at
https://bugzilla.mozilla.org/show_bug.cgi?id=309671
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Mark Rowe (bdash)
I believe the support for IDN is presently at the WebKit level. A good test URL is http://www.xn--sailor-183m.com/ -- Safari loads it correctly and handles http://www.sailor月.com/ in the URL bar correctly too. It doesn't handle http://www.sailor%e6%9c%88.com/ though, which is what you mention in this bug report. The behaviour that I observe is http://www.sailor%e6%9c%88.com/ is converted into http://www.sailor月.com/ in the Safari address bar, but the page load fails due to www.sailor%e6%9c%88.com being used in the DNS lookup.
Mark Rowe (bdash)
Sigh. The mangled URL is intended to be the kanji character equivalent of the %-escaped triplet.
Jungshik Shin
Thanks for the info. Indeed, WebKit trunk supports IDN. Can you tell me when it was fixed?
I've just tried http://www.청와대.kr and it worked fine. (before submitting a comment with non-ASCII characters, make sure that View | Encoding is set to UTF-8. If you had done that, you wouldn't have had a problem you mentioned in comment #2).
Mark Rowe (bdash)
As far as I am aware, Safari 2.0 supports IDN correctly too. Unless I am mistaken it is not a recent addition to WebKit.
As far as UTF-8 goes, your comment shows up with garbled characters too as Bugzilla doesn't specify any character set in its HTTP headers or document header. I should look at fixing this on the server side so that all pages are served as UTF-8 and forms are submitted as the same.
Jungshik Shin
(In reply to comment #4)
> As far as I am aware, Safari 2.0 supports IDN correctly too. Unless I am
> mistaken it is not a recent addition to WebKit.
Thanks a lot for the info. Indeed, Safari 2.0.4 on my Mac supports it well. I should have tried it before asking.
> As far as UTF-8 goes, your comment shows up with garbled characters too as
> Bugzilla doesn't specify any character set in its HTTP headers or document
> header.
Of course, I'm well aware of that. :-) I thought it's obvious that you should set view | encoding to UTF-8 when reading my comment :-) In your case, characters not covered by the encoding in effect (most likely ISO-8859-1 or Windows-1252) when you submitted comment were converted to NCRs and stored that way on bugzilla DB so that simply changing the encoding on the browser-side does not give back the original. In my case, UTF-8 byte sequences are stored in the DB and 'emitted' to a browser so that just changing the encoding works.
> I should look at fixing this on the server side so that all pages are
> served as UTF-8 and forms are submitted as the same.
It took bugzilla.mozilla.org to fix that problem 5+ years !!! WebKit bugzilla has only 13k bugs and I guess most of them are straight ASCII so that it should be easier. See http://bugzilla.mozilla.org/show_bug.cgi?id=126266 (and bugs that were made its dupe and it spun off) about a long and winding road they took.
Mark Rowe (bdash)
<rdar://problem/5166146>
Rosyna
radr://4379131 I believe is also this exact bug.
Eric Seidel (no email)
My guess is that this bug lies in:
static DeprecatedString encodeHostname(const DeprecatedString &s)
which uses uidna_IDNToASCII (I believe to handle unicode # escapes).
If that's true, then uidna_IDNToASCII probably doesn't handle % escapes and we'd just have to fix them up first before passing it through.
This is all just a guess however.
Ahmad Saleem
Test Case - (taken from Mozilla Bugzilla from Comment 1) - https://bug309671.bmoattachments.org/attachment.cgi?id=206800
I noticed that (3) and (4) shows dialog box and the outputs goes to next line rather than one line. Firefox Nightly 103 shows those test in one line. For other, it matches with Safari 15.5.
In Chrome, first two matches Safari 15.5 but (3) and (4) are weird and does not match any other browser.
Thanks!
Anne van Kesteren
Yeah, this has been working correctly since the URL parser was revamped.