Bug 46800 - getElementsByTagName unable to find SVG camelCase elements imported into HTML
Summary: getElementsByTagName unable to find SVG camelCase elements imported into HTML
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Rob Buis
URL: http://a.deveria.com/tests/lg-qselect...
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-29 05:31 PDT by Alexis Deveria
Modified: 2012-11-27 11:19 PST (History)
16 users (show)

See Also:


Attachments
test case (523 bytes, text/html)
2010-09-29 07:33 PDT, Alexis Deveria
no flags Details
test case in XHTML (This test case passes) (913 bytes, application/xhtml+xml)
2011-09-28 02:34 PDT, Dirk Schulze
no flags Details
Patch (10.02 KB, patch)
2012-03-19 19:31 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (23.95 KB, patch)
2012-04-08 11:07 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
Patch (24.72 KB, patch)
2012-04-08 15:20 PDT, Rob Buis
zimmermann: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexis Deveria 2010-09-29 05:31:58 PDT
Neither document.getElementsByTagName('linearGradient') nor document.querySelectorAll('linearGradient') match any linearGradient elements, nor any other elements with camelCase tags. This works as expected in Gecko/Opera.
Comment 1 Alexis Deveria 2010-09-29 07:33:52 PDT
Created attachment 69194 [details]
test case

Alerts the amount of linearGradient elements found (should be 1)
Comment 2 Mike Bostock 2011-09-27 20:56:42 PDT
Any update on this? This is an issue with the D3.js visualization framework: you can create camel-cased elements, but it's impossible to select them again.

I assume this bug has something to do with the HTML elements being case-insensitive, while the SVG elements are case-sensitive. My guess is that the selector is internally downcased, but the SVG element names are not, making them impossible to select.

It'd be great if SVG were not case-sensitive when used inside an HTML document. I'm not sure if that's possible, but it would fix this issue and make SVG more accessible besides.
Comment 3 Dirk Schulze 2011-09-28 01:52:21 PDT
It doesn't even work on creating the linearGradient with the NS of SVG: 


var a = document.createElementNS('http://www.w3.org/2000/svg', 'linearGradient');
alert(a);  --> [object SVGLinearGradientElement]

document.getElementsByTagName('svg')[0].appendChild(a);
document.getElementsByTagName('linearGradient').length --> 0

I'll check the behavior in XHTML.

Btw. the SVG element was found.
Comment 4 Dirk Schulze 2011-09-28 02:32:43 PDT
The example works with a XHTML document. A problem with HTML5?
Comment 5 Dirk Schulze 2011-09-28 02:34:03 PDT
Created attachment 108989 [details]
test case in XHTML (This test case passes)

Same test case, but as XHTML document. I get the expected count of linearGradient elements of 1.
Comment 6 Dirk Schulze 2011-09-28 02:36:50 PDT
Comment on attachment 108989 [details]
test case in XHTML (This test case passes)

Doesn't work with MIME-type text/html, changing to application/xhtml+xml.
Comment 7 Dirk Schulze 2011-09-28 02:56:44 PDT
Looks like it can be reproduced with every SVG element that has a capital letter in it. I tested linearGradient, radialGradient, clipPath, altGlyph and animateColor. All SVG elements that do not have a capital letter in its tag name seem to work. Tested it with pattern, mask and rect without any problems.

A complete list of all SVG element tag names can be found in WebCore/svg/svgtags.in
Comment 8 Dirk Schulze 2011-09-28 02:58:36 PDT
Changing component to HTML DOM. Because the example works with XML DOM.
Comment 9 Dirk Schulze 2011-09-30 01:42:03 PDT
(In reply to comment #2)
> I assume this bug has something to do with the HTML elements being case-insensitive, while the SVG elements are case-sensitive. My guess is that the selector is internally downcased, but the SVG element names are not, making them impossible to select.
> 
> It'd be great if SVG were not case-sensitive when used inside an HTML document. I'm not sure if that's possible, but it would fix this issue and make SVG more accessible besides.

This assumption seems to be correct. I commented out this line:

http://trac.webkit.org/browser/trunk/Source/WebCore/dom/Node.cpp#L1671

And after that the first example works as well.
Comment 10 Anne van Kesteren 2011-09-30 05:57:32 PDT
You need to implement:

http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-document-getelementsbytagname

querySelector should already be case-insensitive.
Comment 11 Dirk Schulze 2011-09-30 12:53:55 PDT
(In reply to comment #10)
> You need to implement:
> 
> http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-document-getelementsbytagname
> 
> querySelector should already be case-insensitive.
"""
Element nodes in the HTML namespace whose local name is localName converted to ASCII lowercase.
Element nodes, not in the HTML namespace, whose local name is localName.
"""

This is exactly done in WebKit right now. The only problem: the current code assumes that every element in HTML documents belongs to HTML namespace.

SVG elements are not included in the HTML namespace. Therefor getElementsByTagName('lineargradient') shouldn't work in HTML as well.
Comment 12 Dirk Schulze 2011-10-05 03:19:16 PDT
(In reply to comment #10)
> You need to implement:
> 
> http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-document-getelementsbytagname
> 
> querySelector should already be case-insensitive.

I just checked Opera with HTML5 Parser. It has exactly the same problem. Maybe I misunderstand the spec. Would be great if you can explain it Anne. My understanding of:

"""
Element nodes in the HTML namespace whose local name is localName converted to ASCII lowercase.
Element nodes, not in the HTML namespace, whose local name is localName.
"""

is, that just for HTML elements (Element nodes in the HTML namespace) the localName should converted to lowercase.

It might make more sense if the localName gets converted to lowercase if we search in an HTML document, independent of the namespace of the element that we search for? Otherwise we would have to check if other namespace contain the element before converting to lowercase and searching in the HTML namespace?
Comment 13 Ian 'Hixie' Hickson 2011-10-05 10:51:02 PDT
Why would that make more sense? Even in an HTML document, the SVG elements still use camelcase.
Comment 14 Dirk Schulze 2011-10-05 10:59:24 PDT
(In reply to comment #13)
> Why would that make more sense? Even in an HTML document, the SVG elements still use camelcase.

Why would it make sense to accept 'localName' with big capitals for elements in HTML namespace than?  Why shouldn't engines be as strict for HTML elements as for elements in other namespaces than? I just think that it is confusing that you don't care about capitalization for HTML elements but must care about capitalization on non HTML elements, even if they are all used in the same HTML document.
Comment 15 Anne van Kesteren 2011-11-09 11:20:20 PST
We care about it for HTML for backwards compatibility. If this API was never added until now it would do case-sensitive matching for everything. Should we add a note to the specification to point out the HTML behavior is strictly for backwards compatibility?
Comment 16 Mike Bostock 2011-12-21 11:22:58 PST
My interpretation is that getElementsByTagName and querySelectorAll should be case-sensitive when matching SVG elements in HTML documents, but case-insensitive when matching HTML elements. The problem is that the case-sensitive check currently applies per-document rather per-element. In Node.cpp:

    if (document()->isHTMLDocument())
        name = localName.lower();

So, if you querySelectorAll("linearGradient"), that should match an HTML element with the name <LiNeaRGradIENt> despite the absurd casing, but only match SVG elements of the same exact case: <linearGradient>.
Comment 17 Mike Bostock 2012-03-10 20:30:51 PST
Any update on this? It continues to be rather unfortunate that there's no way to select camelCased elements within an HTML document; e.g., querySelectorAll("clipPath") returns no results.
Comment 18 Dirk Schulze 2012-03-14 18:18:44 PDT
Adding anttik and kling to the bug. The code seems to have changed, the problem stays the same.
Comment 19 Rob Buis 2012-03-19 19:31:55 PDT
Created attachment 132747 [details]
Patch
Comment 20 Boris Zbarsky 2012-03-19 19:37:49 PDT
Will that do the right thing in non-HTML documents that contain HTML-namespace elements?
Comment 21 Nikolas Zimmermann 2012-03-20 04:21:52 PDT
Comment on attachment 132747 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=132747&action=review

> Source/WebCore/ChangeLog:10
> +        Test: fast/dom/getelementsbyname-localName-matching.html

I think it would be cleanest, if you'd actually test with different document types: foo.xhtml, foo.htm, foo.html, foo.svg as well.
Did you compare your testcase against FF nightlies as well?

> Source/WebCore/dom/TagNodeList.h:49
> +        AtomicString m_loweredLocalName;

Is it worth to cache this? Can't we lower on demand?
Comment 22 Nikolas Zimmermann 2012-03-20 04:23:29 PDT
(In reply to comment #20)
> Will that do the right thing in non-HTML documents that contain HTML-namespace elements?
Should be tested, I agree.
Comment 23 Rob Buis 2012-03-20 04:38:29 PDT
Hi Niko, Boris,

Thanks for the valid remarks. Indeed this could be tested better and Boris usecase is one that will fail with my patch (I think it works with code in ToT). I have a good idea how to fix it tonight, until then feel free to r-.
Cheers,

Rob.

PS: encouraging that the bots remained green.
Comment 24 Boris Zbarsky 2012-03-20 08:35:22 PDT
One more thing: should a separate bug be spun off for querySelector?
Comment 25 Rob Buis 2012-04-08 11:07:56 PDT
Created attachment 136149 [details]
Patch
Comment 26 Rob Buis 2012-04-08 11:13:59 PDT
Hi Niko,

(In reply to comment #21)
> (From update of attachment 132747 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=132747&action=review
> 
> > Source/WebCore/ChangeLog:10
> > +        Test: fast/dom/getelementsbyname-localName-matching.html
> 
> I think it would be cleanest, if you'd actually test with different document types: foo.xhtml, foo.htm, foo.html, foo.svg as well.

I tried to do this with my latest patch.

> Did you compare your testcase against FF nightlies as well?

We match FF 11 on those new test cases.

> > Source/WebCore/dom/TagNodeList.h:49
> > +        AtomicString m_loweredLocalName;
> 
> Is it worth to cache this? Can't we lower on demand?

I don't see how unfortunately.
Cheers,

Rob.
Comment 27 Dirk Schulze 2012-04-08 13:35:20 PDT
Can you please be more explicit with the description on your change log? You change a lot more than you actually mention in the ChangeLog. Also, this just fixes getElementsByTagName, right? Can you either rename this bug and open a new bug for query selector, or open a sub bug for getElementsByTagName please?
Comment 28 Rob Buis 2012-04-08 14:38:28 PDT
Change bug title to reflect we deal with getElementsByTagName only here. Bug 83438 tracks the querySelector part.
Comment 29 Rob Buis 2012-04-08 15:20:56 PDT
Created attachment 136158 [details]
Patch
Comment 30 Rob Buis 2012-04-08 15:21:35 PDT
Hi Dirk,

(In reply to comment #27)
> Can you please be more explicit with the description on your change log? You change a lot more than you actually mention in the ChangeLog. Also, this just fixes getElementsByTagName, right? Can you either rename this bug and open a new bug for query selector, or open a sub bug for getElementsByTagName please?

Should all be done now :)
Cheers,

Rob.
Comment 31 Nikolas Zimmermann 2012-04-09 08:57:28 PDT
Comment on attachment 136158 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=136158&action=review

Patch looks great. I'm not touching r? yet, as I have a question:

> Source/WebCore/dom/TagNodeList.cpp:59
> +    : TagNodeList(rootNode, namespaceURI, localName)
> +    , m_loweredLocalName(localName.lower())

TagNodeList stores the localName, so why not use m_localName.lower() instead of storing m_loweredLocalName if you need it?

> Source/WebCore/dom/TagNodeList.cpp:67
> +        const AtomicString &localName_ = testNode->isHTMLElement() ? m_loweredLocalName : m_localName;

The & is misplaced, and the variable name looks weird :-)
const AtomicString& localName = testNode->isHTMLElement() ? m_localName.lower() : m_localName.

I think we shouldn't store m_loweredLocalName, unless this shows hot in profiles.
Comment 32 Rob Buis 2012-04-09 11:37:41 PDT
Hi Niko,

(In reply to comment #31)
> (From update of attachment 136158 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=136158&action=review
> 
> Patch looks great. I'm not touching r? yet, as I have a question:

Sure.

> > Source/WebCore/dom/TagNodeList.cpp:59
> > +    : TagNodeList(rootNode, namespaceURI, localName)
> > +    , m_loweredLocalName(localName.lower())
> 
> TagNodeList stores the localName, so why not use m_localName.lower() instead of storing m_loweredLocalName if you need it?

See below.

> > Source/WebCore/dom/TagNodeList.cpp:67
> > +        const AtomicString &localName_ = testNode->isHTMLElement() ? m_loweredLocalName : m_localName;
> 
> The & is misplaced, and the variable name looks weird :-)

Hmm, should the style checker have caught the &?
I agree that the variable name is not the best :)

> const AtomicString& localName = testNode->isHTMLElement() ? m_localName.lower() : m_localName.

I could add something like that before landing, I should not put up a new patch and get all the bots working again for a one line change I think.

> I think we shouldn't store m_loweredLocalName, unless this shows hot in profiles.

I do think it will be hot. getElementsByTagName will be used a lot in a html context, so it will lowere() for every node it traverses. I think that is why the original code does lower() in Node.cpp and not in the TagNodeList. Note that following the exact matching rules require some comparisons to the lowered one and some to the exact localName, depending on whether the Node is html or not. I considered doing case insensitive comparison instead of lowercase but it seems much slower to me.
Comment 33 Rob Buis 2012-04-09 12:30:26 PDT
> > I think we shouldn't store m_loweredLocalName, unless this shows hot in profiles.
> 
> I do think it will be hot. getElementsByTagName will be used a lot in a html context, so it will lowere() for every node it traverses. I think that is why the original code does lower() in Node.cpp and not in the TagNodeList. Note that following the exact matching rules require some comparisons to the lowered one and some to the exact localName, depending on whether the Node is html or not. I considered doing case insensitive comparison instead of lowercase but it seems much slower to me.

Sorry I meant it *will be* hot, i.e. requirement for a lowered localName will be high.
Comment 34 Nikolas Zimmermann 2012-04-10 00:48:15 PDT
Comment on attachment 136158 [details]
Patch

You're right, r=me. Please land with the variable name fixes and the & position fix.
Comment 35 Rob Buis 2012-04-10 09:02:59 PDT
Fix was landed in r113710.
Comment 36 Mike Bostock 2012-05-14 09:45:17 PDT
Thank you for fixing getElementsByTagName, but will someone please follow up with a fix to querySelector + querySelectorAll (i.e., Bug 83438)? This frequently trips up people trying to use linearGradient, textPath, foreignObject, clipPath, etc. with D3.js.
Comment 37 Mike Bostock 2012-11-27 11:19:51 PST
Apologies for the ping, but any chance someone could fix the querySelector part of this bug (Bug 83438)?