Bug 144194

Summary: Support IDN2008 instead of IDN2003
Product: WebKit Reporter: Darin Adler <darin>
Component: Page LoadingAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: annevk, ap, buildbot, michael, rniwa
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=161841
https://bugs.webkit.org/show_bug.cgi?id=165885
Attachments:
Description Flags
Patch
none
Patch
none
Archive of layout-test-results from ews101 for mac-yosemite
none
Archive of layout-test-results from ews104 for mac-yosemite-wk2
none
Archive of layout-test-results from ews113 for mac-yosemite
none
Archive of layout-test-results from ews125 for ios-simulator-wk2
none
Patch
none
Patch
none
Patch none

Description Darin Adler 2015-04-25 11:27:03 PDT
Support IDN2008 instead of IDN2003
Comment 1 Darin Adler 2015-04-25 11:28:23 PDT
Created attachment 251641 [details]
Patch
Comment 2 Darin Adler 2015-04-25 11:31:04 PDT
Here’s what’s wrong with the patch I uploaded:

1) I haven’t been able to verify that this correctly causes any websites to be loaded differently. I tried testing with <http://fußball.de>, but that just redirects to <http://fussball.de>.

2) There are no test cases yet!

3) UIDNA_DEFAULT may not be the correct flags to pass to uidna_openUTS46. We need to decide whether to pass one or more of UIDNA_USE_STD3_RULES, UIDNA_CHECK_BIDI, UIDNA_CHECK_CONTEXTJ, UIDNA_NONTRANSITIONAL_TO_ASCII, and UIDNA_NONTRANSITIONAL_TO_UNICODE.
Comment 3 Darin Adler 2015-04-25 11:31:54 PDT
<rdar://problem/8809208>
Comment 4 Darin Adler 2015-10-18 15:24:04 PDT
Alexey, any idea what we should do next here?
Comment 5 Alexey Proskuryakov 2015-10-18 22:40:41 PDT
I like the next steps alluded to in comment 2.

If I were doing this, I'd try to find any real world examples of why the change is a net positive, given that we'd have to start displaying http://💩.la as punycode. Reading through UTS#46 may provide enough insight.
Comment 6 Michael 2016-04-10 03:47:31 PDT
Summary:
The .de NIC (denic.de) will implement IDNA2008 from 2010-11-16 onwards,
especially allowing for ß (\u00df) in domain names. Hence, the automatic
translation of ß to ss may result in looking up the wrong domain name, allowing
for spoofing attacks.
(DENIC will run a sunrise period (2010-10-26 to 2010-11-15) during which
holders of domains with ss will be allowed top register the respective ß domain
in advance.)

http://www.denic.de/en/domains/internationalized-domain-names/sharp-s.html

ß and ss are not exchangable in German. ss instead of ß is just a makeshift. Germans expect ß to usually just work if
umlauts work (which already do for a while).

Steps to Reproduce:
1. Start Safari on OS X Mountain Lion or on iOS
2. Open the domain "http://www.heß.de" (a family name).


Expected Results:
Safari will change the "ß" character to "ss" and open "http://www.hess.de" which is a completely different family name (last name). For example: "Michael Heß" and "Peter Hess".


FireFox Nightly Version 46.0a1 did it now correct.

Here are some informations:
https://hg.mozilla.org/mozilla-central/rev/ac843b130537
https://hg.mozilla.org/mozilla-central/rev/dd3d6c83f354
https://hg.mozilla.org/mozilla-central/rev/f23234d57557

And the FireFox Bug Thread for this issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=479520


You can check the following Websites in FireFox Nightly Version 46.0a1 (2016-01-24) to understand:

http://www.roessner.de

http://www.rössner.de
(is linked to www.roessner.de because it's the same owner, but it's also a separate name in Germany).

http://www.rößner.de

For example:
Bill Roessner
Jack Rössner
Mike Rößner

The browsers do it right with "roessner" and "rössner" and separate it because these are different names.

Now, it's time to do it also right with the "ß" character.


For now i have to use the URL in Safari:

http://www.xn--rner-vna1l.de

if i want to go to:

http://www.rößner.de


thanks & kind regards

Michael
Comment 7 Alex Christensen 2016-11-16 10:05:33 PST
Created attachment 294948 [details]
Patch
Comment 8 Alex Christensen 2016-11-16 10:57:55 PST
IDN2003 and UTS#46 are functionally quite similar for URL parsing.  The biggest difference I could find was their treatment of U+2132 in http://www.unicode.org/reports/tr46/#IDNAComparison

As for the uidna_openUTS46 flags:
Chromium uses UIDNA_CHECK_BIDI <https://cs.chromium.org/chromium/src/components/url_formatter/url_formatter.cc?q=UIDNA_CHECK_BIDI&sq=package:chromium&dr=C&l=510>
Firefox uses UIDNA_CHECK_BIDI, UIDNA_CHECK_CONTEXTJ, and effectively UIDNA_NONTRANSITIONAL_TO_UNICODE
<https://dxr.mozilla.org/mozilla-central/source/netwerk/dns/nsIDNService.cpp#139>
If I understand the spec correctly, it basically says to just use UIDNA_DEFAULT. https://url.spec.whatwg.org/#concept-domain-to-ascii maybe it should be changed? Anne?
Comment 9 Build Bot 2016-11-16 11:04:46 PST
Comment on attachment 294948 [details]
Patch

Attachment 294948 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/2526540

New failing tests:
fast/url/invalid-idn.html
fast/encoding/idn-security.html
fast/url/idna2008.html
fast/url/idna2003.html
Comment 10 Build Bot 2016-11-16 11:04:49 PST
Created attachment 294952 [details]
Archive of layout-test-results from ews101 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 11 Build Bot 2016-11-16 11:09:45 PST
Comment on attachment 294948 [details]
Patch

Attachment 294948 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/2526548

New failing tests:
fast/url/invalid-idn.html
fast/url/idna2008.html
fast/url/idna2003.html
Comment 12 Build Bot 2016-11-16 11:09:49 PST
Created attachment 294954 [details]
Archive of layout-test-results from ews104 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 13 Build Bot 2016-11-16 11:12:56 PST
Comment on attachment 294948 [details]
Patch

Attachment 294948 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/2526543

New failing tests:
fast/url/invalid-idn.html
fast/encoding/idn-security.html
fast/url/idna2008.html
fast/url/idna2003.html
Comment 14 Build Bot 2016-11-16 11:13:01 PST
Created attachment 294956 [details]
Archive of layout-test-results from ews113 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews113  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 15 Build Bot 2016-11-16 11:20:17 PST
Comment on attachment 294948 [details]
Patch

Attachment 294948 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/2526549

New failing tests:
fast/url/idna2008.html
fast/url/invalid-idn.html
fast/url/idna2003.html
Comment 16 Build Bot 2016-11-16 11:20:22 PST
Created attachment 294957 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.6
Comment 17 Alex Christensen 2016-11-16 11:26:01 PST
Created attachment 294958 [details]
Patch
Comment 18 Alexey Proskuryakov 2016-11-16 11:45:06 PST
Comment on attachment 294958 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=294958&action=review

> LayoutTests/fast/encoding/idn-security-expected.txt:89
> -PASS testIDNRoundTrip(0x337) is 'punycode'
> +FAIL testIDNRoundTrip(0x337) should be punycode. Was %u0337.

Looks like there are many changes from PASS to FAIL here. Why is that good?
Comment 19 Alex Christensen 2016-11-16 12:07:44 PST
It is an intentional deviation from what we used to do, and it increases our compliance with the spec and compatibility with other browsers.
Comment 20 Alexey Proskuryakov 2016-11-16 12:15:37 PST
A better way to reflect that in a test is to change the expected value, instead of changing the result to FAIL.

Alex, can you address Darin's questions in comment 2? Does the patch fix the examples in comment 6?
Comment 21 Alex Christensen 2016-11-16 12:21:09 PST
The examples in comment 6 are not fixed in my proposed patch because my proposed patch matches the spec and Chrome, which use UTS#46, which I believe is what is intended by "Transitional_Processing" in the spec.  Maybe we should just switch to IDN2008 without UTS#46, but I'd like Anne's input.
Comment 22 Alex Christensen 2016-11-16 14:08:48 PST
I think we should just switch do IDN2008 to prevent these German homograph attacks.
Comment 23 Michael 2016-11-16 14:50:11 PST
Please don't give up and help us Germans with "ß" in our surnames!

We are waiting since 2010 that the Browsers interpret our Domains right.

Exactly today 6 years before i register my domain and my dream is, that i can finally enter the domain:

http://www.rößner.de

instead of

http://www.xn--rner-vna1l.de

in the following Browsers:

- Safari
- Chrome
- Opera
- MS Edge


Mozilla FireFox do it right.


Thank you so much & Kind Regards


Michael
Comment 24 Anne van Kesteren 2016-11-17 05:41:26 PST
I don't see how you can get around using UTS46. I don't think IDNA2008 without UTS46 really has a processing model. E.g., lowercasing "A" would be undefined.

Transitional_Processing of UTS46 matches IDNA2003 pretty closely and is currently what the URL Standard requires since that's what the majority of implementations do. Nontransitional_Processing would be closer to IDNA2008 behavior, while still doing things like lowercasing "A" to "a".

I used to be somewhat opposed to switching away from IDNA2003-like behavior since it's not backwards compatible and can cause issues, but Firefox manages to ship it without complaints (https://bugzilla.mozilla.org/show_bug.cgi?id=1218179 is the bug where the switch was flipped), so maybe it's not that bad.

https://github.com/whatwg/url/issues/110 and https://github.com/whatwg/url/issues/53 track IDNA issues in the URL Standard and UTS46.
Comment 25 Michael 2016-11-17 07:28:24 PST
(In reply to comment #24)
> but Firefox
> manages to ship it without complaints
> (https://bugzilla.mozilla.org/show_bug.cgi?id=1218179 is the bug where the
> switch was flipped), so maybe it's not that bad.

Thats the point.

Maybe the developers and or the community should now take the chance to turn on the way of progress.

Give it a try. Please.
And the discussed risks will disappear and vanish more and more.


Kind Regards

Michael
Comment 26 Alex Christensen 2016-11-18 11:50:06 PST
Created attachment 295171 [details]
Patch
Comment 27 Darin Adler 2016-11-18 12:47:33 PST
Comment on attachment 295171 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=295171&action=review

Code change seems OK to me, but I am concerned that we have insufficient test coverage.

> LayoutTests/ChangeLog:10
> +        * fast/encoding/idn-security-expected.txt:
> +        * fast/url/idna2003-expected.txt:
> +        * fast/url/idna2008-expected.txt:

Why are any of these tests still expecting behavior different from what our new code is doing? If we have a new understanding of what is correct behavior, I suggest we update the tests, not just the expected results. If we still think some of our behavior is incorrect, then in that case it would be appropriate to have some test results that say FAIL, but I don’t think it’s OK to have tests that say FAIL even though that failure represents our best understanding of correct behavior, since these are tests that we wrote ourselves for the WebKit project.

Is it valuable to still have three different tests, or should we consolidate them?

Do we need additional test coverage beyond what is here and what you added to the URLParser.cpp test?
Comment 28 Alex Christensen 2016-11-18 13:13:56 PST
I'll update idn-security.html because it looks like we wrote it starting in r23963.
idn2003.html and idn2008.html should be unchanged.  They test conformance with idn2003 and idn2008 respectively, and those standards don't change.  We intentionally deviate from both of them because of UTS #46.
I'll also note that some of my tests were based on https://tools.ietf.org/html/rfc5893
Comment 29 Alex Christensen 2016-11-18 14:34:02 PST
Created attachment 295195 [details]
Patch
Comment 30 Alex Christensen 2016-11-18 14:47:58 PST
http://trac.webkit.org/changeset/208902
Comment 31 Alex Christensen 2016-11-30 10:19:40 PST
http://trac.webkit.org/changeset/208905