Bug 61097

Summary: Implement HTML5 location resolveURL
Product: WebKit Reporter: Erik Arvidsson <arv>
Component: DOMAssignee: Erik Arvidsson <arv>
Status: RESOLVED WONTFIX    
Severity: Normal CC: abarth, andreip, annevk, ap, darin, dglazkov, dominicc, eae, ian, ojan, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
URL: http://dev.w3.org/html5/spec/Overview.html#dom-location-resolveurl
Attachments:
Description Flags
Patch
none
Patch: expanded the tests
none
Archive of layout-test-results from ec2-cr-linux-01
none
Patch
none
Patch: Use KURL::string so that the return URL is not decoded ap: review-

Description Erik Arvidsson 2011-05-18 16:34:54 PDT
Implement the resolveURL method on the Location interface
Comment 1 Erik Arvidsson 2011-05-19 13:40:22 PDT
Created attachment 94108 [details]
Patch
Comment 2 Alexey Proskuryakov 2011-05-19 14:13:30 PDT
Comment on attachment 94108 [details]
Patch

I'm wondering why prettyURL is used.

Could you please add some tests with percent escaped URLs (both base and relative part), as well as with non-ASCII ones?

Does any other browser implement this, so that we could compare the behavior in edge cases?
Comment 3 Erik Arvidsson 2011-05-19 14:15:20 PDT
No other browser implements this yet.

I'll add some more tests.

Thanks for looking at the patch.
Comment 4 Erik Arvidsson 2011-05-19 15:41:16 PDT
It is unclear to me what we want here. location.href uses prettyURL which does the following

/a/%2F/b   =>  /a//b

Do we want to do the same thing here or would it be better to use string() which would not decode the percent encoded parts?
Comment 5 Erik Arvidsson 2011-05-19 16:16:51 PDT
Created attachment 94139 [details]
Patch: expanded the tests
Comment 6 Alexey Proskuryakov 2011-05-19 16:44:18 PDT
I don't know how to answer these questions. But I think that they need to be answered before adding this new feature. ResoleURL is mostly useful because of its promise to do "the right thing" - and if we are going to change behavior later, it can be a problem for developers using it.
Comment 7 Erik Arvidsson 2011-05-19 16:59:41 PDT
(In reply to comment #6)
> I don't know how to answer these questions. But I think that they need to be answered before adding this new feature. ResoleURL is mostly useful because of its promise to do "the right thing" - and if we are going to change behavior later, it can be a problem for developers using it.

I'm just following the current behavior of location.href in this patch. We already have open bugs that Safari does not follow any of the other browsers and as far as I know Adam already spent a lot of effort trying to resolve this. See  https://bugs.webkit.org/show_bug.cgi?id=30225 for more details.
Comment 8 WebKit Review Bot 2011-05-19 17:13:07 PDT
Comment on attachment 94139 [details]
Patch: expanded the tests

Attachment 94139 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8719220

New failing tests:
fast/dom/location-resolve-url.html
Comment 9 WebKit Review Bot 2011-05-19 17:13:12 PDT
Created attachment 94149 [details]
Archive of layout-test-results from ec2-cr-linux-01

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-01  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 10 Erik Arvidsson 2011-05-19 17:33:55 PDT
Comment on attachment 94139 [details]
Patch: expanded the tests

I need to update the tests to support both percent encoded and non encoded
Comment 11 Adam Barth 2011-05-19 17:34:19 PDT
Comment on attachment 94139 [details]
Patch: expanded the tests

View in context: https://bugs.webkit.org/attachment.cgi?id=94139&action=review

> Source/WebCore/page/Location.cpp:289
> +    return resolvedURL.hasPath() ? resolvedURL.prettyURL() : resolvedURL.prettyURL() + "/";

This looks great, but can you make a static helper function to share this code with Location::href?

> Source/WebCore/workers/WorkerLocation.cpp:89
> +    return resolvedURL.hasPath() ? resolvedURL.prettyURL() : resolvedURL.prettyURL() + "/";

Same here.  This code is copy/pasted just two lines above.  Maybe this should be a function on KURL?
Comment 12 Alexey Proskuryakov 2011-05-19 17:37:33 PDT
> I'm just following the current behavior of location.href in this patch.

I don't think that this is a good explanation of why an new API can be implemented with a behavior that's not understood well.
Comment 13 Adam Barth 2011-05-19 17:43:14 PDT
> I don't think that this is a good explanation of why an new API can be implemented with a behavior that's not understood well.

I don't think we should block implementing this API on unifying URL behavior across the web.  URLs have crappy interoperability.  That's a problem we should fix, but it's not a problem that needs to be fixed before adding this API.
Comment 14 Dominic Cooney 2011-05-19 17:56:52 PDT
Comment on attachment 94139 [details]
Patch: expanded the tests

View in context: https://bugs.webkit.org/attachment.cgi?id=94139&action=review

> Source/WebCore/page/Location.cpp:278
> +String Location::resolveURL(const String& urlString, ExceptionCode& ec) const

Where’s the SOP check for this?
Comment 15 Alexey Proskuryakov 2011-05-19 17:59:13 PDT
I disagree.

What the use case for having the API if we can't even say what results it's going to return? Clearly it's not needed for compatibility, and clearly it's not needed for implementation experience, since implementation is trivial. So why is it needed at all?

Hopefully there is some middle ground between "we have no idea" and "unify URL behavior across the web". If we can say something like "always use resolveURL() to get an URL that can be safely used with such and such existing API", and maintain that across releases, then I would agree that it's a good thing.

Some other practical use case would make me convinced, too.
Comment 16 Adam Barth 2011-05-19 18:14:43 PDT
> What the use case for having the API if we can't even say what results it's going to return?

The use case is to let web sites resolve relative URLs without having to create a goofy HTMLAnchorElement.  This feature is just missing from the web platform.

> Clearly it's not needed for compatibility, and clearly it's not needed for implementation experience, since implementation is trivial. So why is it needed at all?

Because it's useful to web site authors.

By your logic, we shouldn't implement any new web platform feature that relates to URLs because it's not needed for compatibility and URL processing isn't well-defined.
Comment 17 Adam Barth 2011-05-19 18:33:00 PDT
To flesh out a use case:

1) I want to obtain an OAuth token.
2) I want to have a callback URL relative to the current page, but I don't want to hard code the base URL of the page.

To generate the callback URL for OAuth, I use location.resolveURL("/oauthcallback.php"), which gives me the right URL to include in the initial message regardless of the current host.

To work around the lack of this API today, folks use regular expressions over location.href, which is pretty ridiculous and error prone.
Comment 18 Erik Arvidsson 2011-05-19 18:38:29 PDT
(In reply to comment #14)
> (From update of attachment 94139 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=94139&action=review
> 
> > Source/WebCore/page/Location.cpp:278
> > +String Location::resolveURL(const String& urlString, ExceptionCode& ec) const
> 
> Where’s the SOP check for this?

SOP? Same Origin Principle? This does not request anything so there is no need for any check like that.
Comment 19 Erik Arvidsson 2011-05-19 19:19:27 PDT
Created attachment 94160 [details]
Patch
Comment 20 WebKit Review Bot 2011-05-19 20:09:59 PDT
Comment on attachment 94160 [details]
Patch

Attachment 94160 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8722050
Comment 21 Adam Barth 2011-05-19 21:28:44 PDT
Comment on attachment 94160 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=94160&action=review

r- for compile break.  One comment below:

> Source/WebCore/platform/KURL.h:154
> +    String prettyURLWithTrailingSlash() const;

You probably need this in KURLGoogleWhateverThatForkedFileIsCalled.cpp as well.  Is prettyURL called elsewhere without adding a trailing slash?  prettyURLWithTrailingSlash isn't the greatest name because there isn't always a trailing slash, as in:

http://example.com/foo?bar

Also, does this code handle the following input correctly:

http://example.com?bar

I guess it depends on KURL canonicalizes things.  I don't think hasPath() will ever be false for GURL because GURL is more agressive about canonicalizing.  Another interesting test case is:

http:

but now we're getting into UA-specific URL behaviors.
Comment 22 WebKit Review Bot 2011-05-19 21:41:46 PDT
Comment on attachment 94160 [details]
Patch

Attachment 94160 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/8719260
Comment 23 Alexey Proskuryakov 2011-05-19 22:35:20 PDT
> To generate the callback URL for OAuth, I use location.resolveURL("/oauthcallback.php"), which gives me the right URL to include in the initial message regardless of the current host.

Thank you, that's a valid use case, and it is good to have it documented here. There are some requirements that follow from it.

1) It should be reasonably easy to pass the resolved URL to a server.
2) The server should get a URL it can make a request to.

The pretty URL mostly passes (1) by the virtue of being a String, and even a valid UTF-16 sequence (which is not trivial, as we decode an arbitrary sequence of percent escapes). However, the server needs an ASCII string to make an HTTP request, so either side would need to perform an encoding step to make ASCII.

(2) is clearly not fulfilled though. The browser's best idea of what such a URL looks like is not the pretty URL, but what it does when trying to request a resource.

From your use case, it appears that the definition of resolveURL(url) should be: the URL the browser would use to request a subresource (such as an image from an IMG element) with the argument as a URL (e.g. src attribute of the IMG element).

if we can all agree with this, then the implementation needs to be corrected.
Comment 24 WebKit Review Bot 2011-05-19 23:24:57 PDT
Comment on attachment 94160 [details]
Patch

Attachment 94160 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/8722085
Comment 25 Ojan Vafai 2011-05-20 10:28:47 PDT
The value returned to resolveURL should also match the one returned by location.href and location.pathname, no? It seems like Safari should not be returning the pretty URL for those (Firefox, Chrome and IE don't).
Comment 26 Ojan Vafai 2011-05-20 10:50:16 PDT
(In reply to comment #25)
> The value returned to resolveURL should also match the one returned by location.href and location.pathname, no? It seems like Safari should not be returning the pretty URL for those (Firefox, Chrome and IE don't).

Thinking about this more, I think it's fine for everyone to do the desired behavior (i.e. not the pretty URL). Making location.href and location.pathname can be done as a followup and this is an incremental step in that direction that does not have backward-compat concerns.
Comment 27 Alexey Proskuryakov 2011-05-20 10:58:50 PDT
> The value returned to resolveURL should also match the one returned by location.href and location.pathname, no?

I'm not sure how to interpret this requirement. Are you saying that location.resolveURL(location.pathname) should always be equal to location.href? If you think that it should, please tell us why.

Not having this equivalence does not affect the single known use case for resolveURL. I don't see any other case where it would matter.

> It seems like Safari should not be returning the pretty URL for those (Firefox, Chrome and IE don't).

It is very possible that location.href and pathname need to be fixed. The former is tracked in bug 30225. There is no bug for location.pathname, but there is one for HTMLAnchorElement.pathname (bug 22899).

Unless there is a strong reason why these should match (i.e. stronger than our engineering instincts), I suggest focusing on resolveURL alone in this particular bug.
Comment 28 Alexey Proskuryakov 2011-05-20 10:59:34 PDT
> Making location.href and location.pathname can be done as a followup and this is an incremental step in that direction that does not have backward-compat concerns.

Agreed.
Comment 29 Erik Arvidsson 2011-05-20 11:03:28 PDT
Created attachment 94241 [details]
Patch: Use KURL::string so that the return URL is not decoded
Comment 30 Darin Adler 2011-05-20 11:05:58 PDT
Comment on attachment 94241 [details]
Patch: Use KURL::string so that the return URL is not decoded

What about all other functions of prettyURL? Does it do nothing useful?
Comment 31 Adam Barth 2011-05-20 11:20:11 PDT
> What about all other functions of prettyURL? Does it do nothing useful?

I've filed https://bugs.webkit.org/show_bug.cgi?id=61201 to investigate this.
Comment 32 Erik Arvidsson 2011-05-20 11:23:02 PDT
(In reply to comment #30)
> (From update of attachment 94241 [details])
> What about all other functions of prettyURL? Does it do nothing useful?

I'm not fully sure what you are referring to here?

prettyURL is still used in places where the main use of the URL is to present it in a more readable form for humans.
Comment 33 Darin Adler 2011-05-20 11:23:45 PDT
(In reply to comment #32)
> I'm not fully sure what you are referring to here?
> 
> prettyURL is still used in places where the main use of the URL is to present it in a more readable form for humans.

Was there any benefit to using prettyURL here, or was it only causing problems and offering no benefits?
Comment 34 Alexey Proskuryakov 2011-05-20 11:38:07 PDT
Looking good to me.

I'd like to see a test that has non-ASCII characters in domain name, to verify that IDNA encoding is used as expected.

Perhaps it would be better to make this an http test, just to be closer to normal usage.

+document.querySelector('base').href = 'http://www.webkit.org/\u03B1/\u03B2/';

This is interesting! It would be good to test an escape sequence that cannot be decoded to a valid UTF-8 string here (like %d0%d0). Also, I suggest adding <meta charset="utf-8"> to the test to make it more predictable in browsers that have control over how URL paths are encoded.

Is resolveURL expected to preserve credentials from main document's URL (like http://user:pass@example.org)? I can't really tell from the use case, and I'm not sure what we do for images or for XMLHttpRequest. This is worth a regression test once we have the answer.
Comment 35 Erik Arvidsson 2011-05-20 15:18:07 PDT
I'm going on sabbatical for 3 months so I unassigned myself. If anyone wants to take over, feel free to do so.
Comment 36 Alexey Proskuryakov 2011-05-23 12:02:40 PDT
Comment on attachment 94241 [details]
Patch: Use KURL::string so that the return URL is not decoded

r- to get this out of review queue, since some more work is still needed.

> Is resolveURL expected to preserve credentials from main document's URL

My thinking is that probably not. We shouldn't disclose user credentials to a 3rd party server.
Comment 37 Erik Arvidsson 2011-08-16 17:15:54 PDT
I'm back and I still want to resolve this.

Adam, do you want me to take over bug 30225 as well?
Comment 38 Adam Barth 2011-08-16 17:18:11 PDT
> Adam, do you want me to take over bug 30225 as well?

If you'd be willing.  I've lost context on these issues.
Comment 39 Anne van Kesteren 2011-08-17 01:24:44 PDT
Should resolveURL not be removed now we get the global URL object?
Comment 40 Erik Arvidsson 2011-08-17 11:33:54 PDT
I agree that Location resolveURL is already better covered by the URL proposal.

Adam, what is the status of the URL spec?
Comment 41 Adam Barth 2011-08-17 11:40:13 PDT
> Adam, what is the status of the URL spec?

Draft is here:

https://docs.google.com/document/d/1r_VTFKApVOaNIkocrg0z-t7lZgzisTuGTXkdzAk4gLU/edit?hl=en_US

The W3C WebApps group wants to take up work on the spec.  I need to find some time to learn how to upload the document to their servers.