Bug 54271 - Safari Extensions broken due to KURL canonicalization added in r78044
Summary: Safari Extensions broken due to KURL canonicalization added in r78044
Status: RESOLVED WORKSFORME
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P1 Blocker
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-10 21:51 PST by Eric Seidel (no email)
Modified: 2011-02-12 11:30 PST (History)
9 users (show)

See Also:


Attachments
Patch (9.47 KB, patch)
2011-02-11 04:02 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (10.53 KB, patch)
2011-02-11 04:33 PST, Eric Seidel (no email)
timothy: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2011-02-10 21:51:19 PST
safari-extension: scheme needs special case-sensitive treatment in KURL

safari-extensions broke after http://trac.webkit.org/changeset/78044 (or so I'm told).

See further discussion in:
https://bugs.webkit.org/show_bug.cgi?id=54084#c4
Comment 1 Eric Seidel (no email) 2011-02-11 04:02:23 PST
Created attachment 82120 [details]
Patch
Comment 2 Early Warning System Bot 2011-02-11 04:17:04 PST
Attachment 82120 [details] did not build on qt:
Build output: http://queues.webkit.org/results/7884571
Comment 3 Eric Seidel (no email) 2011-02-11 04:24:06 PST
append() was never used on non-apple ports.  Fixing it now.
Comment 4 Eric Seidel (no email) 2011-02-11 04:33:56 PST
Created attachment 82121 [details]
Patch
Comment 5 Timothy Hatcher 2011-02-11 08:02:55 PST
Comment on attachment 82121 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=82121&action=review

> Source/WebCore/platform/KURL.cpp:1354
> +            // safari-extension: host contents are case sensitive (in violation of RFC 2396).
> +            // https://bugs.webkit.org/show_bug.cgi?id=54271

No where in RFC 2396 do I see that safari-extension is in violation.

> This authority component is typically defined by an Internet-based
> server or a scheme-specific registry of naming authorities.

And:

> The structure of a registry-based naming authority is specific to the
> URI scheme, but constrained to the allowed characters for an
> authority component.
>
>     reg_name      = 1*( unreserved | escaped | "$" | "," |
>                          ";" | ":" | "@" | "&" | "=" | "+" )

And "unreserved" is defined as (expanded) "lowalpha | upalpha | digit | mark".

Only if the scheme uses "Server-based Naming Authority" does the case-insensitive domain name logic apply.

> Hostnames take the form described in Section 3 of [RFC1034] and section 2.1 of [RFC1123]

So I disagree that "safari-extension" is in violation, and that any domain canonicalization should only apply to schemes we know treat the athority as server-based. I would also argue that this should apply to security origin too. Only the scheme is called out as being case-insensitive for all URIs.
Comment 6 Adam Roben (:aroben) 2011-02-11 08:11:17 PST
Section 6 "URI Normalization and Equivalence" seems relevant:

In many cases, different URI strings may actually identify the
identical resource. For example, the host names used in URL are
actually case insensitive, and the URL <http://www.XEROX.com> is
equivalent to <http://www.xerox.com>. In general, the rules for
equivalence and definition of a normal form, if any, are scheme
dependent. When a scheme uses elements of the common syntax, it will
also use the common syntax equivalence rules, namely that the scheme
and hostname are case insensitive and a URL with an explicit ":port",
where the port is the default for the scheme, is equivalent to one
where the port is elided.
Comment 7 Adam Roben (:aroben) 2011-02-11 08:23:03 PST
I guess Tim is arguing that the authority part of safari-extension: URLs falls under "Registry-based Naming Authority", not "Server-based Naming Authority". I think I agree with him. Here's my attempt to clarify the argument.

From section 3.2 "Authority Component":

> Many URI schemes include a top hierarchical element for a naming
> authority, such that the namespace defined by the remainder of the
> URI is governed by that authority. 

This certainly applies to safari-extension: URLs.

> This authority component is typically defined by an Internet-based
> server or a scheme-specific registry of naming authorities.

safari-extension: URLs definitely don't use an "Internet-based server", and they do use a "scheme-specific registry of naming authorities".

In addition, section 3.2.2 "Server-based Naming Authority" specifically mentions "URL schemes that involve the direct use of an IP-based protocol to a specified server on the Internet", which does not describe safari-extension: URLs.

So, given that safari-extension: URLs do not use a server-based naming authority, what rules apply to their syntax? Section 3.2.1 "Registry-based Naming Authority" says only this:

> The structure of a registry-based naming authority is specific to the
> URI scheme, but constrained to the allowed characters for an
> authority component.
>
>    reg_name      = 1*( unreserved | escaped | "$" | "," |
>                        ";" | ":" | "@" | "&" | "=" | "+" )

Note that "hostname" isn't defined here at all; it's only defined for URIs that use a server-based naming authority. So presumably the prose in section 6 that mentions hostnames being case-insensitive doesn't apply here; safari-extension: URLs have no hostnames at all.
Comment 8 Alexey Proskuryakov 2011-02-11 10:37:40 PST
+        * platform/mac/fast/url/script-tests/TEMPLATE.html: Added.
+        * platform/mac/fast/url/script-tests/safari-extension.js: Added.

Please don't add a script-tests directory.
Comment 9 Timothy Hatcher 2011-02-11 11:55:22 PST
Should we revert this KURL change so we can take our time considering the right fix?
Comment 10 Alexey Proskuryakov 2011-02-11 12:22:41 PST
I would encourage reverting if we cannot come up with a solution very quickly. We should also take a closer look at other schemes that may be affected by this change.
Comment 11 Eric Seidel (no email) 2011-02-11 13:47:02 PST
I'm confused by the r-.  Was it for the comment?

It seems there were no negative comments about the code change.
Comment 12 Adam Barth 2011-02-11 14:14:39 PST
As discussed with Eric, this issue is coupled with the fact that KURL don't have a notion of a hierarchal URI scheme.
Comment 13 Adam Barth 2011-02-11 14:15:38 PST
More specifically, it attempts to infer whether a scheme is hierarchical based on its structure, which is different from how other browsers behave.
Comment 14 Eric Seidel (no email) 2011-02-11 14:24:29 PST
I'm working on a new patch which only lowers the host name for schemes which are known to be "standard".
http://www.google.com/codesearch/p?hl=en#R_oU-jn3RFk/trunk/src/url_util.cc&q=standardurl%20package:http://google-url%5C.googlecode%5C.com&l=62
Comment 15 Eric Seidel (no email) 2011-02-11 14:28:56 PST
(In reply to comment #13)
> More specifically, it attempts to infer whether a scheme is hierarchical based on its structure, which is different from how other browsers behave.

An example of what Adam is referring to:
https://bugs.webkit.org/show_bug.cgi?id=10323
Comment 16 Timothy Hatcher 2011-02-11 14:40:40 PST
The r- was not for the comment specifically — though the comment is incorrect according to my interpretation of the RFCs.

The r- was because special casing the "safari-extension" scheme is not the right solution. It sounds like you are working on the right solution now.
Comment 17 Timothy Hatcher 2011-02-12 09:28:08 PST
The original change that broke this was rolled out by Eric in r78395. Should we close this then?
Comment 18 Adam Barth 2011-02-12 11:30:36 PST
Sure.  This patch in this bug will likely be useful when we re-land hostname canonicalization, but this issue is resolved.