RESOLVED WONTFIX 46982
Setting href protocol attribute on malformed URL is incorrect when page is served over HTTP
https://bugs.webkit.org/show_bug.cgi?id=46982
Summary Setting href protocol attribute on malformed URL is incorrect when page is se...
Steve Block
Reported 2010-10-01 07:04:26 PDT
LayoutTest fast/dom/HTMLAnchorElement/set-href-attribute-protocol.html tests setting the protocol attribute of the href of an anchor element. A particular part of this test is as follows ... debug("Set protocol to http on malformed URL"); a.href = "foo:??bar"; a.protocol = "http"; shouldBe("a.href", "'http:??bar'"); I noticed that when the test is served over HTTP, rather than as a file:// URL as is currently the case, this test fails as follows ... Chromium (including ToT WebKit): FAIL - http://host/??bar Safari (including ToT WebKit) : FAIL - http://host/path??bar Android (including ToT WebKit) : FAIL - http://host/path??bar The test passes on recent versions of FF and IE. A test case, which duplicates the existing test as an HTTP test, is attached. The spec - http://dev.w3.org/html5/spec/urls.html - suggests that the expected result in the test is correct irrespective of the scheme over which the page is served, so it looks to me like this should be fixed.
Attachments
Test case (5.22 KB, text/plain)
2010-10-01 07:08 PDT, Steve Block
no flags
Steve Block
Comment 1 2010-10-01 07:08:59 PDT
Created attachment 69464 [details] Test case
Yael
Comment 2 2010-10-01 17:29:17 PDT
I can make the test case even smaller to show the error: var a = document.createElement('a'); a.href="http:??bar"; alert(a.href); The reason for the failure is in KURL.cpp, line 401. if (p[1] != '/' && equalIgnoringCase(base.protocol(), String(str, p - str)) && base.isHierarchical()) We compare the protocol of the base url to the protocol of the new utl, and if they match, and the base url is hierarchical, then we assume that either the new url is hierarchical or relative. I don't think that assumption is valid.
Steve Block
Comment 3 2010-10-04 05:28:10 PDT
Considering your simpler test case ... Served from http://host/path document.createElement('a').href = 'http:?query'; 'http://host/path?query' is the correct result provided we're following the non-strict parsing rules defined at http://tools.ietf.org/html/rfc3986#section-5.2.2. I think that the logic you mention at line 401 of KURL.cpp is providing exactly this non-strict parsing - we ignore the scheme if it's the same as that of the base URI. If we were to use the strict parsing rules, then the result should be 'http:?query'. This matches the case when the scheme doesn't match that of the base URI, as is the case when the layout tests are served from file://. As for the test case I originally mentioned, this still looks like a bug to me, as the protocol is being set on the resolved URL, which should be 'foo:?query' ... Served from http://host/path a.href = "foo:?query"; a.protocol = "http"; shouldBe("a.href", "'http:?query'"); Chromium (including ToT WebKit): FAIL - http://host/?query Safari (including ToT WebKit) : FAIL - http://host/path?query Android (including ToT WebKit) : FAIL - http://host/path?query Note that the reverse case works as expected ... Served from http://host/path a.href = "http:?query"; a.protocol = "foo"; shouldBe("a.href", "'foo://host/path?query'");
Steve Block
Comment 4 2010-10-07 07:18:15 PDT
> As for the test case I originally mentioned, this still looks like a bug to me, as the protocol is being set on the > resolved URL, which should be 'foo:?query' ... > Served from http://host/path > a.href = "foo:?query"; > a.protocol = "http"; > shouldBe("a.href", "'http:?query'"); > > Chromium (including ToT WebKit): FAIL - http://host/?query > Safari (including ToT WebKit) : FAIL - http://host/path?query > Android (including ToT WebKit) : FAIL - http://host/path?query Having looked into this more carefully, I'm starting to think the Safari/Android behaviour is correct. The spec says that href attribute is subject to changes in base URL and should be re-resolved when the base URL changes. So the href attribute should 'store' the unresolved URL, and it is resolved relative to the base URL only when the getter is called. So it makes sense that .. a.href = "foo:?query"; a.protocol = "http"; shouldBe("a.href", "'http://host/path?query'"); ... should be identical in behaviour to ... a.href = "http:?query"; shouldBe("a.href", "'http://host/path?query'"); Can anybody confirm that this desired behaviour is correct? If so, then only Chromium is buggy. In any case, I think I've found a related, more fundamental bug, which should probably be fixed first - Bug 47348
Adam Barth
Comment 5 2010-10-07 10:10:13 PDT
Sadly, this stuff is a mess. I'm interested to hear what Brett thinks.
Brett Wilson (Google)
Comment 6 2010-10-07 23:10:10 PDT
To answer the question, if you have href="http://google.com" and you ask for the path component, should it be empty or should it be "/"? Another question would he "HTTP://google.com/" should the protocol be "http" or "HTTP". Currently Chromium will definitely give you a slash and "http" because it resolves the full URL all the time. We have three different path canonicalization routines currently. One for hierarchical (http and the like), one for file URLs, and one for data/javascript URLs and the like. Say the page says a.path = "/foo?bar#baz" I think we have to escape the ? and the # or else this won't just be setting the path. Yet if the algorithm is supposed to be "parse this URL and do a string substitution without canonicalization" then we would need yet a fourth mode where we just escape characters that change the parsing of the URL. I think our behavior makes sense and I haven't heard of any real world problems with it. Changing this, even if it would make it more correct, would introduce a lot of complexity for almost no benefit.
Alexey Proskuryakov
Comment 7 2010-10-07 23:23:07 PDT
But isn't it even less desirable to change KURL to match?
Adam Barth
Comment 8 2010-10-07 23:24:37 PDT
(In reply to comment #7) > But isn't it even less desirable to change KURL to match? The most desirable thing, IMHO, is to unfork our URL code. As long as we have two pieces of code doing the same job, we're going to have two different behaviors.
Brett Wilson (Google)
Comment 9 2010-10-07 23:35:40 PDT
I wasn't arguing that KURL should be changed in this respect.
Steve Block
Comment 10 2010-10-08 02:25:35 PDT
OK, so it sounds like this should be WONTFIX. I'd appreciate your feedback on Bug 47348 though, to see it that should be fixed.
Steve Block
Comment 11 2010-10-14 05:21:51 PDT
The thinking is that this isn't worth fixing due to the complexity it would introduce. However, this assumption may change once Bug 47348 is fixed, as that may involve changing much of this code. Leaving this bug open until Bug 47348 is fixed.
Ahmad Saleem
Comment 12 2022-08-05 07:52:02 PDT
I took the test case from attached and changed it into JSFiddle: Link - https://jsfiddle.net/9drs1ofw/show Following are results across browsers: *** Safari 15.6 on macOS 12.5 *** FAIL a.href should be https://www.mydomain.com/path/. Was null://www.mydomain.com/path/. ^ Safari fail only in this. *** Firefox Nightly 105 *** FAIL a.href should be https://www.mydomain.com/path/. Was null://www.mydomain.com/path/. *** Chrome Canary 106 *** FAIL a.href should be https://www.mydomain.com/path/. Was null://www.mydomain.com/path/. FAIL a.href should be http:??bar. Was http:/??bar. _______ Do we need to fix the failing test case? Just wanted to share updated test results. Thanks!
Brent Fulgham
Comment 13 2022-08-05 09:15:31 PDT
It seems like the standard has moved in this space, and all browsers agree on our current behavior. Seems like WONTFIX to me.
Note You need to log in before you can comment on or make changes to this bug.