Bug 46982 - Setting href protocol attribute on malformed URL is incorrect when page is served over HTTP
Summary: Setting href protocol attribute on malformed URL is incorrect when page is se...
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 47348
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-01 07:04 PDT by Steve Block
Modified: 2022-08-05 09:15 PDT (History)
9 users (show)

See Also:


Attachments
Test case (5.22 KB, text/plain)
2010-10-01 07:08 PDT, Steve Block
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Block 2010-10-01 07:04:26 PDT
LayoutTest fast/dom/HTMLAnchorElement/set-href-attribute-protocol.html tests setting the protocol attribute of the href of an anchor element.

A particular part of this test is as follows ...

debug("Set protocol to http on malformed URL");
a.href = "foo:??bar";
a.protocol = "http";
shouldBe("a.href", "'http:??bar'");

I noticed that when the test is served over HTTP, rather than as a file:// URL  as is currently the case, this test fails as follows ...

Chromium (including ToT WebKit): FAIL - http://host/??bar
Safari (including ToT WebKit)  : FAIL - http://host/path??bar
Android (including ToT WebKit) : FAIL - http://host/path??bar

The test passes on recent versions of FF and IE. A test case, which duplicates the existing test as an HTTP test, is attached.

The spec - http://dev.w3.org/html5/spec/urls.html - suggests that the expected result in the test is correct irrespective of the scheme over which the page is served, so it looks to me like this should be fixed.
Comment 1 Steve Block 2010-10-01 07:08:59 PDT
Created attachment 69464 [details]
Test case
Comment 2 Yael 2010-10-01 17:29:17 PDT
I can make the test case even smaller to show the error:

var a = document.createElement('a');

a.href="http:??bar";
alert(a.href);



The reason for the failure is in KURL.cpp, line 401.
if (p[1] != '/' && equalIgnoringCase(base.protocol(), String(str, p - str)) && base.isHierarchical()) 

We compare the protocol of the base url to the protocol of the new utl, and if they match, and the base url is hierarchical, then we assume that either the new url is hierarchical or relative. I don't think that assumption is valid.
Comment 3 Steve Block 2010-10-04 05:28:10 PDT
Considering your simpler test case ...
  Served from http://host/path
  document.createElement('a').href = 'http:?query';

'http://host/path?query' is the correct result provided we're following the non-strict parsing rules defined at http://tools.ietf.org/html/rfc3986#section-5.2.2. I think that the logic you mention at line 401 of KURL.cpp is providing exactly this non-strict parsing - we ignore the scheme if it's the same as that of the base URI.

If we were to use the strict parsing rules, then the result should be 'http:?query'. This matches the case when the scheme doesn't match that of the base URI, as is the case when the layout tests are served from file://.


As for the test case I originally mentioned, this still looks like a bug to me, as the protocol is being set on the resolved URL, which should be 'foo:?query' ...
  Served from http://host/path
  a.href = "foo:?query";
  a.protocol = "http";
  shouldBe("a.href", "'http:?query'");

Chromium (including ToT WebKit): FAIL - http://host/?query
Safari (including ToT WebKit)  : FAIL - http://host/path?query
Android (including ToT WebKit) : FAIL - http://host/path?query

Note that the reverse case works as expected ...
  Served from http://host/path
  a.href = "http:?query";
  a.protocol = "foo";
  shouldBe("a.href", "'foo://host/path?query'");
Comment 4 Steve Block 2010-10-07 07:18:15 PDT
> As for the test case I originally mentioned, this still looks like a bug to me, as the protocol is being set on the
> resolved URL, which should be 'foo:?query' ...
>   Served from http://host/path
>   a.href = "foo:?query";
>   a.protocol = "http";
>   shouldBe("a.href", "'http:?query'");
> 
> Chromium (including ToT WebKit): FAIL - http://host/?query
> Safari (including ToT WebKit)  : FAIL - http://host/path?query
> Android (including ToT WebKit) : FAIL - http://host/path?query

Having looked into this more carefully, I'm starting to think the Safari/Android behaviour is correct. The spec says that href attribute is subject to changes in base URL and should be re-resolved when the base URL changes. So the href attribute should 'store' the unresolved URL, and it is resolved relative to the base URL only when the getter is called. So it makes sense that ..

a.href = "foo:?query";
a.protocol = "http";
shouldBe("a.href", "'http://host/path?query'");

... should be identical in behaviour to ...

a.href = "http:?query";
shouldBe("a.href", "'http://host/path?query'");

Can anybody confirm that this desired behaviour is correct? If so, then only Chromium is buggy.

In any case, I think I've found a related, more fundamental bug, which should probably be fixed first - Bug 47348
Comment 5 Adam Barth 2010-10-07 10:10:13 PDT
Sadly, this stuff is a mess.  I'm interested to hear what Brett thinks.
Comment 6 Brett Wilson (Google) 2010-10-07 23:10:10 PDT
To answer the question, if you have href="http://google.com" and you ask for the path component, should it be empty or should it be "/"? Another question would he "HTTP://google.com/" should the protocol be "http" or "HTTP". Currently Chromium will definitely give you a slash and "http" because it resolves the full URL all the time.

We have three different path canonicalization routines currently. One for hierarchical (http and the like), one for file URLs, and one for data/javascript URLs and the like.

Say the page says a.path = "/foo?bar#baz" I think we have to escape the ? and the # or else this won't just be setting the path. Yet if the algorithm is supposed to be "parse this URL and do a string substitution without canonicalization" then we would need yet a fourth mode where we just escape characters that change the parsing of the URL.

I think our behavior makes sense and I haven't heard of any real world problems with it. Changing this, even if it would make it more correct, would introduce a lot of complexity for almost no benefit.
Comment 7 Alexey Proskuryakov 2010-10-07 23:23:07 PDT
But isn't it even less desirable to change KURL to match?
Comment 8 Adam Barth 2010-10-07 23:24:37 PDT
(In reply to comment #7)
> But isn't it even less desirable to change KURL to match?

The most desirable thing, IMHO, is to unfork our URL code.  As long as we have two pieces of code doing the same job, we're going to have two different behaviors.
Comment 9 Brett Wilson (Google) 2010-10-07 23:35:40 PDT
I wasn't arguing that KURL should be changed in this respect.
Comment 10 Steve Block 2010-10-08 02:25:35 PDT
OK, so it sounds like this should be WONTFIX. I'd appreciate your feedback on Bug 47348 though, to see it that should be fixed.
Comment 11 Steve Block 2010-10-14 05:21:51 PDT
The thinking is that this isn't worth fixing due to the complexity it would introduce.

However, this assumption may change once Bug 47348 is fixed, as that may involve changing much of this code. Leaving this bug open until Bug 47348 is fixed.
Comment 12 Ahmad Saleem 2022-08-05 07:52:02 PDT
I took the test case from attached and changed it into JSFiddle:

Link - https://jsfiddle.net/9drs1ofw/show

Following are results across browsers:

*** Safari 15.6 on macOS 12.5 ***

FAIL a.href should be https://www.mydomain.com/path/. Was null://www.mydomain.com/path/.

^ Safari fail only in this.

*** Firefox Nightly 105 ***

FAIL a.href should be https://www.mydomain.com/path/. Was null://www.mydomain.com/path/.

*** Chrome Canary 106 ***

FAIL a.href should be https://www.mydomain.com/path/. Was null://www.mydomain.com/path/.

FAIL a.href should be http:??bar. Was http:/??bar.


_______

Do we need to fix the failing test case? Just wanted to share updated test results. Thanks!
Comment 13 Brent Fulgham 2022-08-05 09:15:31 PDT
It seems like the standard has moved in this space, and all browsers agree on our current behavior. Seems like WONTFIX to me.