Bug 13167

Summary: Unescape %-escaped hostnames and convert them to punycode before DNS lookup
Product: WebKit Reporter: Jungshik Shin <jshin>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED CONFIGURATION CHANGED    
Severity: Normal CC: achristensen, adele, ahmad.saleem792, annevk, ap, mrowe, webkit-bugs
Priority: P2 Keywords: InRadar
Version: 523.x (Safari 3)   
Hardware: All   
OS: All   
URL: http://sailor%e6%9c%88.com/

Jungshik Shin
Reported 2007-03-22 17:55:50 PDT
ftp://ftp.rfc-editor.org/in-notes/rfc3986.txt RFC 3686 3.2.2 specifies that %-encoded hostnames need to be supported. Needless to say, IDN support needs to be added first (is it now supported in the trunk?). If not yet supported, has it been filed as a bug here? My quick search (not so thorough) turned up nothing.. The corresponding Gecko bug is at https://bugzilla.mozilla.org/show_bug.cgi?id=309671
Attachments
Mark Rowe (bdash)
Comment 1 2007-03-23 04:01:36 PDT
I believe the support for IDN is presently at the WebKit level. A good test URL is http://www.xn--sailor-183m.com/ -- Safari loads it correctly and handles http://www.sailor&#26376;.com/ in the URL bar correctly too. It doesn't handle http://www.sailor%e6%9c%88.com/ though, which is what you mention in this bug report. The behaviour that I observe is http://www.sailor%e6%9c%88.com/ is converted into http://www.sailor&#26376;.com/ in the Safari address bar, but the page load fails due to www.sailor%e6%9c%88.com being used in the DNS lookup.
Mark Rowe (bdash)
Comment 2 2007-03-23 04:42:25 PDT
Sigh. The mangled URL is intended to be the kanji character equivalent of the %-escaped triplet.
Jungshik Shin
Comment 3 2007-03-23 10:23:31 PDT
Thanks for the info. Indeed, WebKit trunk supports IDN. Can you tell me when it was fixed? I've just tried http://www.청와대.kr and it worked fine. (before submitting a comment with non-ASCII characters, make sure that View | Encoding is set to UTF-8. If you had done that, you wouldn't have had a problem you mentioned in comment #2).
Mark Rowe (bdash)
Comment 4 2007-03-23 10:31:36 PDT
As far as I am aware, Safari 2.0 supports IDN correctly too. Unless I am mistaken it is not a recent addition to WebKit. As far as UTF-8 goes, your comment shows up with garbled characters too as Bugzilla doesn't specify any character set in its HTTP headers or document header. I should look at fixing this on the server side so that all pages are served as UTF-8 and forms are submitted as the same.
Jungshik Shin
Comment 5 2007-03-23 10:55:28 PDT
(In reply to comment #4) > As far as I am aware, Safari 2.0 supports IDN correctly too. Unless I am > mistaken it is not a recent addition to WebKit. Thanks a lot for the info. Indeed, Safari 2.0.4 on my Mac supports it well. I should have tried it before asking. > As far as UTF-8 goes, your comment shows up with garbled characters too as > Bugzilla doesn't specify any character set in its HTTP headers or document > header. Of course, I'm well aware of that. :-) I thought it's obvious that you should set view | encoding to UTF-8 when reading my comment :-) In your case, characters not covered by the encoding in effect (most likely ISO-8859-1 or Windows-1252) when you submitted comment were converted to NCRs and stored that way on bugzilla DB so that simply changing the encoding on the browser-side does not give back the original. In my case, UTF-8 byte sequences are stored in the DB and 'emitted' to a browser so that just changing the encoding works. > I should look at fixing this on the server side so that all pages are > served as UTF-8 and forms are submitted as the same. It took bugzilla.mozilla.org to fix that problem 5+ years !!! WebKit bugzilla has only 13k bugs and I guess most of them are straight ASCII so that it should be easier. See http://bugzilla.mozilla.org/show_bug.cgi?id=126266 (and bugs that were made its dupe and it spun off) about a long and winding road they took.
Mark Rowe (bdash)
Comment 6 2007-04-27 03:00:14 PDT
Rosyna
Comment 7 2007-05-14 05:16:21 PDT
radr://4379131 I believe is also this exact bug.
Eric Seidel (no email)
Comment 8 2008-01-17 01:28:55 PST
My guess is that this bug lies in: static DeprecatedString encodeHostname(const DeprecatedString &s) which uses uidna_IDNToASCII (I believe to handle unicode # escapes). If that's true, then uidna_IDNToASCII probably doesn't handle % escapes and we'd just have to fix them up first before passing it through. This is all just a guess however.
Ahmad Saleem
Comment 9 2022-06-01 05:50:28 PDT
Test Case - (taken from Mozilla Bugzilla from Comment 1) - https://bug309671.bmoattachments.org/attachment.cgi?id=206800 I noticed that (3) and (4) shows dialog box and the outputs goes to next line rather than one line. Firefox Nightly 103 shows those test in one line. For other, it matches with Safari 15.5. In Chrome, first two matches Safari 15.5 but (3) and (4) are weird and does not match any other browser. Thanks!
Anne van Kesteren
Comment 10 2023-03-26 02:20:13 PDT
Yeah, this has been working correctly since the URL parser was revamped.
Note You need to log in before you can comment on or make changes to this bug.