Bug 144194

Summary: Support IDN2008 instead of IDN2003
Product: WebKit Reporter: Darin Adler <darin>
Component: Page LoadingAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: annevk, ap, buildbot, michael, rniwa
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=161841
https://bugs.webkit.org/show_bug.cgi?id=165885
Attachments:
Description Flags
Patch
none
Patch
none
Archive of layout-test-results from ews101 for mac-yosemite
none
Archive of layout-test-results from ews104 for mac-yosemite-wk2
none
Archive of layout-test-results from ews113 for mac-yosemite
none
Archive of layout-test-results from ews125 for ios-simulator-wk2
none
Patch
none
Patch
none
Patch none

Darin Adler
Reported 2015-04-25 11:27:03 PDT
Support IDN2008 instead of IDN2003
Attachments
Patch (6.03 KB, patch)
2015-04-25 11:28 PDT, Darin Adler
no flags
Patch (7.59 KB, patch)
2016-11-16 10:05 PST, Alex Christensen
no flags
Archive of layout-test-results from ews101 for mac-yosemite (961.27 KB, application/zip)
2016-11-16 11:04 PST, Build Bot
no flags
Archive of layout-test-results from ews104 for mac-yosemite-wk2 (956.98 KB, application/zip)
2016-11-16 11:09 PST, Build Bot
no flags
Archive of layout-test-results from ews113 for mac-yosemite (1.63 MB, application/zip)
2016-11-16 11:13 PST, Build Bot
no flags
Archive of layout-test-results from ews125 for ios-simulator-wk2 (9.37 MB, application/zip)
2016-11-16 11:20 PST, Build Bot
no flags
Patch (19.17 KB, patch)
2016-11-16 11:26 PST, Alex Christensen
no flags
Patch (21.19 KB, patch)
2016-11-18 11:50 PST, Alex Christensen
no flags
Patch (26.41 KB, patch)
2016-11-18 14:34 PST, Alex Christensen
no flags
Darin Adler
Comment 1 2015-04-25 11:28:23 PDT
Darin Adler
Comment 2 2015-04-25 11:31:04 PDT
Here’s what’s wrong with the patch I uploaded: 1) I haven’t been able to verify that this correctly causes any websites to be loaded differently. I tried testing with <http://fußball.de>, but that just redirects to <http://fussball.de>. 2) There are no test cases yet! 3) UIDNA_DEFAULT may not be the correct flags to pass to uidna_openUTS46. We need to decide whether to pass one or more of UIDNA_USE_STD3_RULES, UIDNA_CHECK_BIDI, UIDNA_CHECK_CONTEXTJ, UIDNA_NONTRANSITIONAL_TO_ASCII, and UIDNA_NONTRANSITIONAL_TO_UNICODE.
Darin Adler
Comment 3 2015-04-25 11:31:54 PDT
Darin Adler
Comment 4 2015-10-18 15:24:04 PDT
Alexey, any idea what we should do next here?
Alexey Proskuryakov
Comment 5 2015-10-18 22:40:41 PDT
I like the next steps alluded to in comment 2. If I were doing this, I'd try to find any real world examples of why the change is a net positive, given that we'd have to start displaying http://💩.la as punycode. Reading through UTS#46 may provide enough insight.
Michael
Comment 6 2016-04-10 03:47:31 PDT
Summary: The .de NIC (denic.de) will implement IDNA2008 from 2010-11-16 onwards, especially allowing for ß (\u00df) in domain names. Hence, the automatic translation of ß to ss may result in looking up the wrong domain name, allowing for spoofing attacks. (DENIC will run a sunrise period (2010-10-26 to 2010-11-15) during which holders of domains with ss will be allowed top register the respective ß domain in advance.) http://www.denic.de/en/domains/internationalized-domain-names/sharp-s.html ß and ss are not exchangable in German. ss instead of ß is just a makeshift. Germans expect ß to usually just work if umlauts work (which already do for a while). Steps to Reproduce: 1. Start Safari on OS X Mountain Lion or on iOS 2. Open the domain "http://www.heß.de" (a family name). Expected Results: Safari will change the "ß" character to "ss" and open "http://www.hess.de" which is a completely different family name (last name). For example: "Michael Heß" and "Peter Hess". FireFox Nightly Version 46.0a1 did it now correct. Here are some informations: https://hg.mozilla.org/mozilla-central/rev/ac843b130537 https://hg.mozilla.org/mozilla-central/rev/dd3d6c83f354 https://hg.mozilla.org/mozilla-central/rev/f23234d57557 And the FireFox Bug Thread for this issue: https://bugzilla.mozilla.org/show_bug.cgi?id=479520 You can check the following Websites in FireFox Nightly Version 46.0a1 (2016-01-24) to understand: http://www.roessner.de http://www.rössner.de (is linked to www.roessner.de because it's the same owner, but it's also a separate name in Germany). http://www.rößner.de For example: Bill Roessner Jack Rössner Mike Rößner The browsers do it right with "roessner" and "rössner" and separate it because these are different names. Now, it's time to do it also right with the "ß" character. For now i have to use the URL in Safari: http://www.xn--rner-vna1l.de if i want to go to: http://www.rößner.de thanks & kind regards Michael
Alex Christensen
Comment 7 2016-11-16 10:05:33 PST
Alex Christensen
Comment 8 2016-11-16 10:57:55 PST
IDN2003 and UTS#46 are functionally quite similar for URL parsing. The biggest difference I could find was their treatment of U+2132 in http://www.unicode.org/reports/tr46/#IDNAComparison As for the uidna_openUTS46 flags: Chromium uses UIDNA_CHECK_BIDI <https://cs.chromium.org/chromium/src/components/url_formatter/url_formatter.cc?q=UIDNA_CHECK_BIDI&sq=package:chromium&dr=C&l=510> Firefox uses UIDNA_CHECK_BIDI, UIDNA_CHECK_CONTEXTJ, and effectively UIDNA_NONTRANSITIONAL_TO_UNICODE <https://dxr.mozilla.org/mozilla-central/source/netwerk/dns/nsIDNService.cpp#139> If I understand the spec correctly, it basically says to just use UIDNA_DEFAULT. https://url.spec.whatwg.org/#concept-domain-to-ascii maybe it should be changed? Anne?
Build Bot
Comment 9 2016-11-16 11:04:46 PST
Comment on attachment 294948 [details] Patch Attachment 294948 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/2526540 New failing tests: fast/url/invalid-idn.html fast/encoding/idn-security.html fast/url/idna2008.html fast/url/idna2003.html
Build Bot
Comment 10 2016-11-16 11:04:49 PST
Created attachment 294952 [details] Archive of layout-test-results from ews101 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-yosemite Platform: Mac OS X 10.10.5
Build Bot
Comment 11 2016-11-16 11:09:45 PST
Comment on attachment 294948 [details] Patch Attachment 294948 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/2526548 New failing tests: fast/url/invalid-idn.html fast/url/idna2008.html fast/url/idna2003.html
Build Bot
Comment 12 2016-11-16 11:09:49 PST
Created attachment 294954 [details] Archive of layout-test-results from ews104 for mac-yosemite-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews104 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Build Bot
Comment 13 2016-11-16 11:12:56 PST
Comment on attachment 294948 [details] Patch Attachment 294948 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/2526543 New failing tests: fast/url/invalid-idn.html fast/encoding/idn-security.html fast/url/idna2008.html fast/url/idna2003.html
Build Bot
Comment 14 2016-11-16 11:13:01 PST
Created attachment 294956 [details] Archive of layout-test-results from ews113 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews113 Port: mac-yosemite Platform: Mac OS X 10.10.5
Build Bot
Comment 15 2016-11-16 11:20:17 PST
Comment on attachment 294948 [details] Patch Attachment 294948 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/2526549 New failing tests: fast/url/idna2008.html fast/url/invalid-idn.html fast/url/idna2003.html
Build Bot
Comment 16 2016-11-16 11:20:22 PST
Created attachment 294957 [details] Archive of layout-test-results from ews125 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews125 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.6
Alex Christensen
Comment 17 2016-11-16 11:26:01 PST
Alexey Proskuryakov
Comment 18 2016-11-16 11:45:06 PST
Comment on attachment 294958 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=294958&action=review > LayoutTests/fast/encoding/idn-security-expected.txt:89 > -PASS testIDNRoundTrip(0x337) is 'punycode' > +FAIL testIDNRoundTrip(0x337) should be punycode. Was %u0337. Looks like there are many changes from PASS to FAIL here. Why is that good?
Alex Christensen
Comment 19 2016-11-16 12:07:44 PST
It is an intentional deviation from what we used to do, and it increases our compliance with the spec and compatibility with other browsers.
Alexey Proskuryakov
Comment 20 2016-11-16 12:15:37 PST
A better way to reflect that in a test is to change the expected value, instead of changing the result to FAIL. Alex, can you address Darin's questions in comment 2? Does the patch fix the examples in comment 6?
Alex Christensen
Comment 21 2016-11-16 12:21:09 PST
The examples in comment 6 are not fixed in my proposed patch because my proposed patch matches the spec and Chrome, which use UTS#46, which I believe is what is intended by "Transitional_Processing" in the spec. Maybe we should just switch to IDN2008 without UTS#46, but I'd like Anne's input.
Alex Christensen
Comment 22 2016-11-16 14:08:48 PST
I think we should just switch do IDN2008 to prevent these German homograph attacks.
Michael
Comment 23 2016-11-16 14:50:11 PST
Please don't give up and help us Germans with "ß" in our surnames! We are waiting since 2010 that the Browsers interpret our Domains right. Exactly today 6 years before i register my domain and my dream is, that i can finally enter the domain: http://www.rößner.de instead of http://www.xn--rner-vna1l.de in the following Browsers: - Safari - Chrome - Opera - MS Edge Mozilla FireFox do it right. Thank you so much & Kind Regards Michael
Anne van Kesteren
Comment 24 2016-11-17 05:41:26 PST
I don't see how you can get around using UTS46. I don't think IDNA2008 without UTS46 really has a processing model. E.g., lowercasing "A" would be undefined. Transitional_Processing of UTS46 matches IDNA2003 pretty closely and is currently what the URL Standard requires since that's what the majority of implementations do. Nontransitional_Processing would be closer to IDNA2008 behavior, while still doing things like lowercasing "A" to "a". I used to be somewhat opposed to switching away from IDNA2003-like behavior since it's not backwards compatible and can cause issues, but Firefox manages to ship it without complaints (https://bugzilla.mozilla.org/show_bug.cgi?id=1218179 is the bug where the switch was flipped), so maybe it's not that bad. https://github.com/whatwg/url/issues/110 and https://github.com/whatwg/url/issues/53 track IDNA issues in the URL Standard and UTS46.
Michael
Comment 25 2016-11-17 07:28:24 PST
(In reply to comment #24) > but Firefox > manages to ship it without complaints > (https://bugzilla.mozilla.org/show_bug.cgi?id=1218179 is the bug where the > switch was flipped), so maybe it's not that bad. Thats the point. Maybe the developers and or the community should now take the chance to turn on the way of progress. Give it a try. Please. And the discussed risks will disappear and vanish more and more. Kind Regards Michael
Alex Christensen
Comment 26 2016-11-18 11:50:06 PST
Darin Adler
Comment 27 2016-11-18 12:47:33 PST
Comment on attachment 295171 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=295171&action=review Code change seems OK to me, but I am concerned that we have insufficient test coverage. > LayoutTests/ChangeLog:10 > + * fast/encoding/idn-security-expected.txt: > + * fast/url/idna2003-expected.txt: > + * fast/url/idna2008-expected.txt: Why are any of these tests still expecting behavior different from what our new code is doing? If we have a new understanding of what is correct behavior, I suggest we update the tests, not just the expected results. If we still think some of our behavior is incorrect, then in that case it would be appropriate to have some test results that say FAIL, but I don’t think it’s OK to have tests that say FAIL even though that failure represents our best understanding of correct behavior, since these are tests that we wrote ourselves for the WebKit project. Is it valuable to still have three different tests, or should we consolidate them? Do we need additional test coverage beyond what is here and what you added to the URLParser.cpp test?
Alex Christensen
Comment 28 2016-11-18 13:13:56 PST
I'll update idn-security.html because it looks like we wrote it starting in r23963. idn2003.html and idn2008.html should be unchanged. They test conformance with idn2003 and idn2008 respectively, and those standards don't change. We intentionally deviate from both of them because of UTS #46. I'll also note that some of my tests were based on https://tools.ietf.org/html/rfc5893
Alex Christensen
Comment 29 2016-11-18 14:34:02 PST
Alex Christensen
Comment 30 2016-11-18 14:47:58 PST
Alex Christensen
Comment 31 2016-11-30 10:19:40 PST
Note You need to log in before you can comment on or make changes to this bug.