RFC 3686 3.2.2 specifies that %-encoded hostnames need to be supported.
Needless to say, IDN support needs to be added first (is it now supported in the trunk?). If not yet supported, has it been filed as a bug here? My quick search (not so thorough) turned up nothing..
The corresponding Gecko bug is at
I believe the support for IDN is presently at the WebKit level. A good test URL is http://www.xn--sailor-183m.com/ -- Safari loads it correctly and handles http://www.sailor月.com/ in the URL bar correctly too. It doesn't handle http://www.sailor%e6%9c%88.com/ though, which is what you mention in this bug report. The behaviour that I observe is http://www.sailor%e6%9c%88.com/ is converted into http://www.sailor月.com/ in the Safari address bar, but the page load fails due to www.sailor%e6%9c%88.com being used in the DNS lookup.
Sigh. The mangled URL is intended to be the kanji character equivalent of the %-escaped triplet.
Thanks for the info. Indeed, WebKit trunk supports IDN. Can you tell me when it was fixed?
I've just tried http://www.청와대.kr and it worked fine. (before submitting a comment with non-ASCII characters, make sure that View | Encoding is set to UTF-8. If you had done that, you wouldn't have had a problem you mentioned in comment #2).
As far as I am aware, Safari 2.0 supports IDN correctly too. Unless I am mistaken it is not a recent addition to WebKit.
As far as UTF-8 goes, your comment shows up with garbled characters too as Bugzilla doesn't specify any character set in its HTTP headers or document header. I should look at fixing this on the server side so that all pages are served as UTF-8 and forms are submitted as the same.
(In reply to comment #4)
> As far as I am aware, Safari 2.0 supports IDN correctly too. Unless I am
> mistaken it is not a recent addition to WebKit.
Thanks a lot for the info. Indeed, Safari 2.0.4 on my Mac supports it well. I should have tried it before asking.
> As far as UTF-8 goes, your comment shows up with garbled characters too as
> Bugzilla doesn't specify any character set in its HTTP headers or document
Of course, I'm well aware of that. :-) I thought it's obvious that you should set view | encoding to UTF-8 when reading my comment :-) In your case, characters not covered by the encoding in effect (most likely ISO-8859-1 or Windows-1252) when you submitted comment were converted to NCRs and stored that way on bugzilla DB so that simply changing the encoding on the browser-side does not give back the original. In my case, UTF-8 byte sequences are stored in the DB and 'emitted' to a browser so that just changing the encoding works.
> I should look at fixing this on the server side so that all pages are
> served as UTF-8 and forms are submitted as the same.
It took bugzilla.mozilla.org to fix that problem 5+ years !!! WebKit bugzilla has only 13k bugs and I guess most of them are straight ASCII so that it should be easier. See http://bugzilla.mozilla.org/show_bug.cgi?id=126266 (and bugs that were made its dupe and it spun off) about a long and winding road they took.
radr://4379131 I believe is also this exact bug.
My guess is that this bug lies in:
static DeprecatedString encodeHostname(const DeprecatedString &s)
which uses uidna_IDNToASCII (I believe to handle unicode # escapes).
If that's true, then uidna_IDNToASCII probably doesn't handle % escapes and we'd just have to fix them up first before passing it through.
This is all just a guess however.