Bug 31593 - HTMLAnchorElement is inconsistent with its internal handling of the value returned for the href attribute.
Summary: HTMLAnchorElement is inconsistent with its internal handling of the value ret...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Ben Murdoch
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-17 11:43 PST by Ben Murdoch
Modified: 2009-11-19 16:16 PST (History)
2 users (show)

See Also:


Attachments
Proposed patch (3.01 KB, patch)
2009-11-17 11:53 PST, Ben Murdoch
darin: review-
Details | Formatted Diff | Diff
Alternative Patch. (7.46 KB, patch)
2009-11-18 04:55 PST, Ben Murdoch
darin: review+
benm: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Murdoch 2009-11-17 11:43:12 PST
During KURL::init(), trailing and leading spaces are removed from the URL. I think this step should also remove leading and trailing tabs, carriage returns and newlines.

Path and layout test to follow.
Comment 1 Ben Murdoch 2009-11-17 11:53:37 PST
Created attachment 43373 [details]
Proposed patch

Proposed patch
Comment 2 Darin Adler 2009-11-17 11:57:56 PST
Comment on attachment 43373 [details]
Proposed patch

There is insufficient context for this change. Does this affect websites? If so, how? What is the motivation for this change? Website incompatibilities? Behavior differences from other browsers?

This is too little test coverage. The new test covers only the newline character.

"I think this should change" is interesting, but not sufficient motivation for the change on its own.
Comment 3 Ben Murdoch 2009-11-17 12:31:14 PST
Hi Darin, apologies for the vagueness.

(In reply to comment #2)
> (From update of attachment 43373 [details])
> There is insufficient context for this change. Does this affect websites? If
> so, how? What is the motivation for this change? Website incompatibilities?
> Behavior differences from other browsers?

I discovered this issue while working on an Android browser bug. As I understand, currently it seems that if there is a leading newline in the href attribute of an <a> tag, that newline is only removed later by the deprecatedParseUrl() function (for example in HitTestResult::absoluteLinkURL() when the hit detection code is run). We do additional hit detection on Android to what is in WebCore, so I thought it made most sense to update KURLs behavior rather than making a call to a deprecated function in the Android code. As there is code to trim whitespace in KURL already, I thought it was sensible to augment that. I added trimming of tabs and carriage returns as I would consider them "whitespace" characters, too.

> 
> This is too little test coverage. The new test covers only the newline
> character.

I will fix this up and send a new patch if you are happy with the general idea of this patch.

Thanks, Ben
Comment 4 Ben Murdoch 2009-11-17 12:35:24 PST
+darin 

Sorry, forgot to add you on CC so you would know I replied :)

Thanks, Ben
Comment 5 Darin Adler 2009-11-17 12:37:37 PST
(In reply to comment #3)
> I discovered this issue while working on an Android browser bug. As I
> understand, currently it seems that if there is a leading newline in the href
> attribute of an <a> tag, that newline is only removed later by the
> deprecatedParseUrl() function (for example in HitTestResult::absoluteLinkURL()
> when the hit detection code is run). We do additional hit detection on Android
> to what is in WebCore, so I thought it made most sense to update KURLs behavior
> rather than making a call to a deprecated function in the Android code. As
> there is code to trim whitespace in KURL already, I thought it was sensible to
> augment that. I added trimming of tabs and carriage returns as I would consider
> them "whitespace" characters, too.

That's not the right way to fix this bug. To fix that bug should consistently use deprecatedParseUrl when constructing URLs from attribute values until we find a way to eliminate that function entirely.

Changing KURL would be OK to do if there are other arguments for doing so, but is not a good way to address that bug, since it addresses some differences between the code paths, but leaves other things different.
Comment 6 Ben Murdoch 2009-11-17 13:05:27 PST
(In reply to comment #5)

> That's not the right way to fix this bug. To fix that bug should consistently
> use deprecatedParseUrl when constructing URLs from attribute values until we
> find a way to eliminate that function entirely.
> 
> Changing KURL would be OK to do if there are other arguments for doing so, but
> is not a good way to address that bug, since it addresses some differences
> between the code paths, but leaves other things different.

OK, fair enough. In that case, would a patch to HTMLAnchorElement::href() to make it use deprecatedParseURL be acceptable (it currently does not)?

Thanks, Ben
Comment 7 Darin Adler 2009-11-17 21:54:20 PST
(In reply to comment #6)
> In that case, would a patch to HTMLAnchorElement::href() to
> make it use deprecatedParseURL be acceptable (it currently does not)?

Yes.
Comment 8 Ben Murdoch 2009-11-18 04:55:10 PST
Created attachment 43427 [details]
Alternative Patch.

Alternative patch as discussed above.
Comment 9 Darin Adler 2009-11-18 08:27:05 PST
You should retitle the bug to describe the problem seen by websites rather than your original thought on what to change to fix it.
Comment 10 Darin Adler 2009-11-18 08:30:11 PST
Comment on attachment 43427 [details]
Alternative Patch.

> +        When retrieving the href attribute of an A tag, we should run the attribute value through deprecatedParseURL until a good replacement for that function is found.

That's not really a bug title. It would be better to describe the problem and the real world symptom in both the bug and the change log.

This patch still seems too detached from what effect it's having on websites. There's not even a mention of what other browsers do or what HTML5 prescribes.

The key here is to do the same work in the href function that is done in HTMLAnchorElement::defaultEventHandler and HTMLAnchorElement::parseMappedAttribute. And I think that's what the comment should indicate. Consistency is the issue.

I would have liked to see more test cases about characters that should *not* be stripped, such as non-breaking spaces, \v, and the like.

r=me as is, although I'm not super-happy with how this is described in the change log
Comment 11 Ben Murdoch 2009-11-18 09:19:38 PST
Thanks for the review Darin.

I have updated the bug title and will edit the Changelogs on landing to match.

The HTML5 spec mandates that the href attribute must be a valid Web Address, which refers to the HTML5 Web Address spec and that states that when parsing a web address w, "Strip leading and trailing space characters from w." There doesn't appear to be a strict definition of "space characters", but running the layout test in Chrome, FF and IE it seems that they strip spaces, newlines, CR and tabs (as in they pass the test) and so this patch brings WebKit in line with that.

Thanks, Ben
Comment 12 Ben Murdoch 2009-11-18 09:20:26 PST
Comment on attachment 43427 [details]
Alternative Patch.

cq- so I can change the changelogs prior to landing.
Comment 13 Darin Adler 2009-11-18 09:23:48 PST
An even better way to handle this is would probably be to use [ReflectURL] in the IDL file and remove the HTMLAnchorElement::href() function entirely.
Comment 14 Ben Murdoch 2009-11-18 09:27:34 PST
(In reply to comment #13)
> An even better way to handle this is would probably be to use [ReflectURL] in
> the IDL file and remove the HTMLAnchorElement::href() function entirely.

Interesting, I wasn't aware of that. I will hold off landing and try it.
Comment 15 Darin Adler 2009-11-18 09:31:09 PST
(In reply to comment #14)
> Interesting, I wasn't aware of that. I will hold off landing and try it.

If any code is calling HTMLAnchorElement::href() directly rather than through DOM bindings, then we will still need to keep the function around. Generally, though, we want to synthesize these simple getters rather than hand-coding them. The one for URLs does indeed call deprecatedParseURL.
Comment 16 Ben Murdoch 2009-11-18 09:41:40 PST
(In reply to comment #15)
> If any code is calling HTMLAnchorElement::href() directly rather than through
> DOM bindings, then we will still need to keep the function around. Generally,
> though, we want to synthesize these simple getters rather than hand-coding
> them. The one for URLs does indeed call deprecatedParseURL.

OK. The IDL does already specify the ReflectURL property and the function is used elsewhere in the codebase (including the getters for other anchor properties such as hostname, which is how the layout test exposes the bug), so I think I'll land the patch as is (but update the Changelogs with the new bug title).

Thanks, Ben
Comment 17 Ben Murdoch 2009-11-18 10:08:13 PST
Landed as r51118.
Comment 18 Ian 'Hixie' Hickson 2009-11-19 15:32:28 PST
The term "space characters" is defined in the HTML5 spec.
Comment 19 Darin Adler 2009-11-19 16:16:09 PST
The following comment probably belongs in some other bug that's on the subject of our numerous whitespace-related functions:

We may be able to take advantage of HTML5 terminology and use the terms space characters and White_Space characters. Unfortunately, we normally do not use underscores in any of our identifiers, and since "skip whitespace" in HTML5 means "skip space characters", using "WhiteSpace" to mean Unicode White_Space characters might not work well for us.

Sadly, the isASCIISpace function in ASCIICType.h differs from both "space characters" or "White_Space characters". It includes U+000B, and HTML5 space characters does not. It is the subset of Unicode White_Space that fits in ASCII, so cannot be used if non-ASCII characters might be involved. It was named an designed based on the "C" locale version of the <ctype.h> isspace function.

We may find that isASCIISpace is never useful.

I'd love to clean this all up and unify based on a smaller set of characters. And make sure we have test cases.