WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED CONFIGURATION CHANGED
13167
Unescape %-escaped hostnames and convert them to punycode before DNS lookup
https://bugs.webkit.org/show_bug.cgi?id=13167
Summary
Unescape %-escaped hostnames and convert them to punycode before DNS lookup
Jungshik Shin
Reported
2007-03-22 17:55:50 PDT
ftp://ftp.rfc-editor.org/in-notes/rfc3986.txt
RFC 3686 3.2.2 specifies that %-encoded hostnames need to be supported. Needless to say, IDN support needs to be added first (is it now supported in the trunk?). If not yet supported, has it been filed as a bug here? My quick search (not so thorough) turned up nothing.. The corresponding Gecko bug is at
https://bugzilla.mozilla.org/show_bug.cgi?id=309671
Attachments
Add attachment
proposed patch, testcase, etc.
Mark Rowe (bdash)
Comment 1
2007-03-23 04:01:36 PDT
I believe the support for IDN is presently at the WebKit level. A good test URL is
http://www.xn--sailor-183m.com/
-- Safari loads it correctly and handles
http://www.sailor月.com/
in the URL bar correctly too. It doesn't handle
http://www.sailor%e6%9c%88.com/
though, which is what you mention in this bug report. The behaviour that I observe is
http://www.sailor%e6%9c%88.com/
is converted into
http://www.sailor月.com/
in the Safari address bar, but the page load fails due to www.sailor%e6%9c%88.com being used in the DNS lookup.
Mark Rowe (bdash)
Comment 2
2007-03-23 04:42:25 PDT
Sigh. The mangled URL is intended to be the kanji character equivalent of the %-escaped triplet.
Jungshik Shin
Comment 3
2007-03-23 10:23:31 PDT
Thanks for the info. Indeed, WebKit trunk supports IDN. Can you tell me when it was fixed? I've just tried
http://www.청와대.kr
and it worked fine. (before submitting a comment with non-ASCII characters, make sure that View | Encoding is set to UTF-8. If you had done that, you wouldn't have had a problem you mentioned in
comment #2
).
Mark Rowe (bdash)
Comment 4
2007-03-23 10:31:36 PDT
As far as I am aware, Safari 2.0 supports IDN correctly too. Unless I am mistaken it is not a recent addition to WebKit. As far as UTF-8 goes, your comment shows up with garbled characters too as Bugzilla doesn't specify any character set in its HTTP headers or document header. I should look at fixing this on the server side so that all pages are served as UTF-8 and forms are submitted as the same.
Jungshik Shin
Comment 5
2007-03-23 10:55:28 PDT
(In reply to
comment #4
)
> As far as I am aware, Safari 2.0 supports IDN correctly too. Unless I am > mistaken it is not a recent addition to WebKit.
Thanks a lot for the info. Indeed, Safari 2.0.4 on my Mac supports it well. I should have tried it before asking.
> As far as UTF-8 goes, your comment shows up with garbled characters too as > Bugzilla doesn't specify any character set in its HTTP headers or document > header.
Of course, I'm well aware of that. :-) I thought it's obvious that you should set view | encoding to UTF-8 when reading my comment :-) In your case, characters not covered by the encoding in effect (most likely ISO-8859-1 or Windows-1252) when you submitted comment were converted to NCRs and stored that way on bugzilla DB so that simply changing the encoding on the browser-side does not give back the original. In my case, UTF-8 byte sequences are stored in the DB and 'emitted' to a browser so that just changing the encoding works.
> I should look at fixing this on the server side so that all pages are > served as UTF-8 and forms are submitted as the same.
It took bugzilla.mozilla.org to fix that problem 5+ years !!! WebKit bugzilla has only 13k bugs and I guess most of them are straight ASCII so that it should be easier. See
http://bugzilla.mozilla.org/show_bug.cgi?id=126266
(and bugs that were made its dupe and it spun off) about a long and winding road they took.
Mark Rowe (bdash)
Comment 6
2007-04-27 03:00:14 PDT
<
rdar://problem/5166146
>
Rosyna
Comment 7
2007-05-14 05:16:21 PDT
radr://4379131 I believe is also this exact bug.
Eric Seidel (no email)
Comment 8
2008-01-17 01:28:55 PST
My guess is that this bug lies in: static DeprecatedString encodeHostname(const DeprecatedString &s) which uses uidna_IDNToASCII (I believe to handle unicode # escapes). If that's true, then uidna_IDNToASCII probably doesn't handle % escapes and we'd just have to fix them up first before passing it through. This is all just a guess however.
Ahmad Saleem
Comment 9
2022-06-01 05:50:28 PDT
Test Case - (taken from Mozilla Bugzilla from
Comment 1
) -
https://bug309671.bmoattachments.org/attachment.cgi?id=206800
I noticed that (3) and (4) shows dialog box and the outputs goes to next line rather than one line. Firefox Nightly 103 shows those test in one line. For other, it matches with Safari 15.5. In Chrome, first two matches Safari 15.5 but (3) and (4) are weird and does not match any other browser. Thanks!
Anne van Kesteren
Comment 10
2023-03-26 02:20:13 PDT
Yeah, this has been working correctly since the URL parser was revamped.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug