Bug 14955

Summary: Implement getElementsByClassName
Product: WebKit Reporter: David Smith <catfish.man>
Component: DOMAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Enhancement CC: annevk, ap, dev+webkit, gavin.sharp, sam
Priority: P2    
Version: 523.x (Safari 3)   
Hardware: Mac   
OS: OS X 10.4   
URL: https://bugzilla.mozilla.org/show_bug.cgi?id=357450
Attachments:
Description Flags
Preliminary patch to implement getElementsByClassName
none
Updated to not have nasty code duplication
none
Now with changelog and test
none
Without getElementsByClassNameNS, and with null/undefined tests
none
Adds form feed to the whitespace list, and copyright headers
none
Actually include the header
eric: review-
Updated to address review comments
none
More review fixage
eric: review-
updated patch
darin: review-
updated patch (now with more Vector) mitz: review+

Description David Smith 2007-08-13 00:29:25 PDT
HTML5 specifies a getElementsByClassName method on Document and Element. Firefox 3 supports this with a slightly modified API, which Hixie says he plans to update the HTML5 spec to match. Firefox has gotten large (77x over js implementations, 8x over XPath) speed improvements for this functionality from doing it natively.
Comment 1 David Smith 2007-08-13 00:43:16 PDT
Created attachment 15948 [details]
Preliminary patch to implement getElementsByClassName

This isn't done (no changelog, no testcases and very limited manual testing, one known wart), but it does work, and is compatible with the Firefox implementation as far as I know. Feedback is welcome, particularly on the part that handles parsing the class string. Darin suggested generating a dummy Element, setting its class, then getting its classList, but I wasn't able to get that working (and I'm not convinced that it's less nasty than the code duplication). The current method is copy-pasted from NamedMappedAttrMap::parseClassAttribute, which is definitely less than ideal.
Comment 2 David Smith 2007-08-15 20:06:18 PDT
Created attachment 15991 [details]
Updated to not have nasty code duplication

The updated version of this has moved parseClassAttribute into AtomicStringList (which is renamed to ClassNameList, since that's all it's used for), which allows getting rid of the duplicated code without weird hacks.

Still no changelog or tests, but the code should be ready for review.
Comment 3 Adam Roben (:aroben) 2007-08-15 20:11:25 PDT
Comment on attachment 15991 [details]
Updated to not have nasty code duplication

I'm not sure the renaming of AtomicStringList is desirable. Perhaps ClassNameList should be a subclass of AtomicStringList that just adds the parseClassAttribute method? I could be convinced otherwise, though.

It also seems like parseClassAttribute could be a static method that functions as a named constructor.
Comment 4 David Smith 2007-08-15 22:55:31 PDT
(In reply to comment #3)
> (From update of attachment 15991 [details] [edit])
> I'm not sure the renaming of AtomicStringList is desirable. Perhaps
> ClassNameList should be a subclass of AtomicStringList that just adds the
> parseClassAttribute method? I could be convinced otherwise, though.

AtomicStringList effectively *is* already ClassNameList; it's used nowhere in the code except to store lists of class names. Maciej has pointed out that it's not even a particularly good structure to use for that, but fixing it is beyond the scope of this patch.

> 
> It also seems like parseClassAttribute could be a static method that functions
> as a named constructor.
> 

Seems plausible. I'll try it out and see how it feels.
Comment 5 David Smith 2007-08-16 16:02:49 PDT
I did a bit of performance testing on this just for fun; Native vs. prototype.js's xpath version vs. prototype.js's DOM traversal version

native: 303ms
xpath: 5,706ms
js/dom: 26,660ms

:)

I also tried out aroben's second suggestion, but it seemed like a bit of a wash. Made the getElementsByClassName callsite a little nicer, and the NamedMappedAttrMap one a little less nice. What about making a regular constructor that takes a class attribute string and calls through to parseClassAttribute? Is that sort of minor convenience method generally considered worthwhile in webkit?
Comment 6 David Smith 2007-08-22 10:49:30 PDT
Created attachment 16071 [details]
Now with changelog and test
Comment 7 Sam Weinig 2007-08-24 11:34:48 PDT
Have you tested the behavior of passing js null and js undefined to getElementsByClassName and getElementsByClassNameNS?
Comment 8 David Smith 2007-09-01 16:57:57 PDT
Created attachment 16176 [details]
Without getElementsByClassNameNS, and with null/undefined tests

getElementsByClassNameNS doesn't make any sense, so I've removed it (as discussed on irc). This also tests null/undefined as suggested.
Comment 9 Maciej Stachowiak 2007-09-08 05:25:35 PDT
Some test cases for getElementsByClassName:

http://tc.labs.opera.com/apis/getElementsByClassName/

(The Opera alpha now supports it).
Comment 10 Maciej Stachowiak 2007-09-08 05:37:14 PDT
Anne van Kesteren points out that this implementation doesn't treat \f as whitespace so it would fail his tests. I didn't double-check that the spec requires this treatment.
Comment 11 Anne van Kesteren 2007-09-09 13:08:08 PDT
FWIW: the specification currently doesn't define the string argument. However, it seems likely that it resuses the definitions for getting tokens out of string that is used for class= currently which includes the aforementioned character.
Comment 12 David Smith 2007-09-09 13:32:51 PDT
(In reply to comment #11)
> FWIW: the specification currently doesn't define the string argument. However,
> it seems likely that it resuses the definitions for getting tokens out of
> string that is used for class= currently which includes the aforementioned
> character.
> 

Interesting. That would imply that WebKit is mis-handling class= as well, since I reused the whitespace logic from it for getElementsByClassName.
Comment 13 David Smith 2007-09-09 23:25:54 PDT
Created attachment 16239 [details]
Adds form feed to the whitespace list, and copyright headers

This just adds form feed to the whitespace list (for *all* class name parsing, please let me know if that should be split out into another patch. It's about 6 characters on a line that's already being modified, so I figured it would be fine here), and adds my copyright header to the files I made more than tiny changes on (thanks for pointing that out olliej!).
Comment 14 David Smith 2007-09-09 23:42:19 PDT
Created attachment 16240 [details]
Actually include the header

Somehow either I hadn't been svn adding ClassNameList.h this whole time, or (based on what filemerge says), svn-apply didn't pick up on the rename. This is the same as the previous one, except it should actually work :)
Comment 15 Eric Seidel (no email) 2007-10-01 12:17:23 PDT
Comment on attachment 16240 [details]
Actually include the header

Ok, a few comments:
+    for(const ClassNameList *t = this; t; t = t->next()) 
+    { 
+        if(t->string() == str) 
+            return true; 

Those don't agree with the style guidelines.  { should be on the same line as for, and if( should be if (
http://webkit.org/coding/coding-style.html

No need for this-> here:
+void ClassNameList::parseClassAttribute(const String& classStr, const bool inCompatMode)
+{
+    this->clear();

+    String classAttr = inCompatMode ? 
+        (classStr.impl()->isLower() ? classStr : String(classStr.impl()->lower())) :
+        classStr;
doesnt' need to call all the impl() stuff, just call .lower() on string.

To me "start" and "end" are more readable than "ePos" and "sPos", but that's not a huge issue.

Style guidelines again:
+ClassNodeList::ClassNodeList(PassRefPtr<Node> rootNode, const AtomicString& className)
+: NodeList(rootNode), m_namespaceURI(nullAtom)


Why?
+    m_classList = *(new ClassNameList());
That shouldn't be needed.

Style again:
+    const ClassNameList* t = static_cast<Element*>(testNode)->getClassList();
+    
+    for (const ClassNameList* c = &m_classList; c; c = c->next())
+    {
+        if(t->contains(c->string()))
+           continue;

also, "t" is not a very readable/useful variable name

This is also more easily written as:
+    for (const ClassNameList* c = &m_classList; c; c = c->next())
+    {
+        if(t->contains(c->string()))
+           continue;
+
+        return false;

if (!contains)
    return false;

IMO.

Shouldn't this be "isEmpty()" ?

+PassRefPtr<NodeList> Element::getElementsByClassName(const String& className)
+{
+    if (className.isNull())
+        return 0;


Otherwise looks OK.  I think the style stuff should be corrected and a new patch posted before landing though.

Thanks for the patch!
Comment 16 David Smith 2007-10-01 23:36:29 PDT
Created attachment 16496 [details]
Updated to address review comments

I ended up removing the empty args optimization entirely, since it was buggy, and probably not useful.
Comment 17 David Smith 2007-10-02 00:22:59 PDT
Created attachment 16497 [details]
More review fixage

Addressing stuff I missed before, and stuff mentioned on IRC.
Comment 18 Eric Seidel (no email) 2007-10-02 23:43:57 PDT
Comment on attachment 16497 [details]
More review fixage

I am surprised.  I had figured class names were always case insensitive.  It does not appear to be the case with your patch.  Does contains() need to lower() the incoming string when in compatMode?

This is not needed, please remove:
+    m_classList = ClassNameList();

Constructor initializers are generally one on each line, with the comma (or colon) leading the line (see the style guidelines). This allows easy use of #if ENABLE(FOO) around certain initializers as well as easy removal/addition with the smallest possible diff.

Maybe it's the need to return "false" when m_impl == 0, that has caused isLower() not to be implemented before.  One could argue that isLower should return true for "" or null.  Not sure it matters.  I'm OK having a  funny isLower as such.  It's also OK to call string->impl()->toLower() (and NOT wrap it in String(), the compiler will do that for you.)  I know I encouraged you to add isLower. Doesn't matter either way, your call.

Need to fix the one small style issue, the extra initialization and answer regarding case insensitive lookups.  I also think we need a compat-mode test and some case-sensitivity and white-space sensitivity tests before this can land.

Thanks again for taking the time to work on this.
Comment 19 Anne van Kesteren 2007-10-03 09:40:09 PDT
FYI: class names are always case-_sensitive_ as far as getElementsByClassName() is concerned. No need to introduce limited quirks mode behavior into a new API.
Comment 20 Maciej Stachowiak 2007-10-15 06:08:11 PDT
I'd love to see the last few issues fixed up to land this on trunk.
Comment 21 David Smith 2007-10-15 07:50:56 PDT
(In reply to comment #20)
> I'd love to see the last few issues fixed up to land this on trunk.
> 

I haven't forgotten, just haven't had time to work on it. I'll try to set aside some time this weekend to finish it up.

I do have one question regarding case-sensitivity though: now that code is shared between regular class parsing and getElementsByClassName parsing it becomes more difficult to only have a quirks mode for regular class parsing. Should I leave it as is, or add another argument to the function to let the caller control whether quirks mode is allowed (or a third option if I'm missing some better way of doing it)?
Comment 22 Alexey Proskuryakov 2007-11-17 23:37:58 PST
(In reply to comment #19)
> FYI: class names are always case-_sensitive_ as far as getElementsByClassName()
> is concerned. No need to introduce limited quirks mode behavior into a new API.

I'm not sure if I agree - a Web developer doesn't need to know whether an API is old or new - consistent behavior might be better, even if it means supporting old quirks.
Comment 23 Sam Weinig 2007-11-30 14:34:19 PST
This doesn't seem to work with a lot of Anne's tests.  It is failing on tests that set a class on the documentElement (<html>) and I think it is failing to find this element because we only search below the root node.  I am curious if the the element that getElementsByClassName is called on is supposed to include it self in the list, or only it's descendants.
Comment 24 Sam Weinig 2007-11-30 14:41:14 PST
Actually, the spec is pretty specific about this.  So it is a bug for implementation of document.getElementsByClassName to just call documentElement.getElementsByClassName because that won't include itself.
Comment 25 David Smith 2007-12-03 16:33:15 PST
(In reply to comment #22)
> (In reply to comment #19)
> > FYI: class names are always case-_sensitive_ as far as getElementsByClassName()
> > is concerned. No need to introduce limited quirks mode behavior into a new API.
> 
> I'm not sure if I agree - a Web developer doesn't need to know whether an API
> is old or new - consistent behavior might be better, even if it means
> supporting old quirks.
> 

I've spoken with Mozilla and Opera developers about this, and brought it up with Hixie and #whatwg; so far four people (counting the two of us) are in favor of maintaining the case insensitivity quirk, and everyone else doesn't care at all. I'll try to remember to send an email to the list about changing the spec to reflect this.
Comment 26 Sam Weinig 2007-12-12 17:02:24 PST
Created attachment 17870 [details]
updated patch

This updates the patch to allow document as the root node, cleaned up things a bit and adds Anne's test suite to the layouttests.  Only test 014.html as it originally was failed, but I have updated it to match our current quirksmode behavior and added a comment.  Think we should land this with the current functionality.  Objections.

I also kept David's name in the ChangeLog.
Comment 27 Darin Adler 2007-12-12 17:55:10 PST
Comment on attachment 17870 [details]
updated patch

There's an extra change log entry here, for bug 15313.

I really wish we didn't use a list for ClassNameList. A Vector<AtomicString>* would work better, I think. It's really bad that deleting a ClassNameList uses O(n) stack in the destructor.

Is it really good to indent ChildNodeList.h as part of this patch? There's also an unneeded forward declaration added to that file. Please land that separately if at all.

We should use foldCase() instead of lower() for contexts like this one.

The check of isLower() in parseClassAttribute is unnecessary. String::lower() should already do that.

 41     , m_namespaceURI(nullAtom)

Isn't nullAtom the default anyway?

Otherwise looks good to me. I'll say review- for now because I want you to consider the comments above.
Comment 28 David Smith 2007-12-12 19:37:10 PST
I have already filed http://bugs.webkit.org/show_bug.cgi?id=15760 on the O(n) stack issue (it's a crasher). I think that bug would be a better place to address changing from a linked list to a vector. The current implementation is as it is merely because it's a rename of AtomicStringList rather than a new class.
Comment 29 Sam Weinig 2007-12-13 14:22:22 PST
Created attachment 17882 [details]
updated patch (now with more Vector)
Comment 30 mitz 2007-12-13 22:00:55 PST
Comment on attachment 17882 [details]
updated patch (now with more Vector)

 122     bool isLower() const { return m_impl ? m_impl->isLower() : false; }

The null string is all-lowercase (and all-uppercase), so this should say true.
Comment 31 mitz 2007-12-14 11:01:26 PST
Comment on attachment 17882 [details]
updated patch (now with more Vector)

r=me
Comment 32 Sam Weinig 2007-12-14 13:57:19 PST
Fix landed in r28722.