Bug 14955 - Implement getElementsByClassName
: Implement getElementsByClassName
Status: RESOLVED FIXED
: WebKit
HTML DOM
: 523.x (Safari 3)
: Macintosh Mac OS X 10.4
: P2 Enhancement
Assigned To:
: https://bugzilla.mozilla.org/show_bug...
:
:
:
  Show dependency treegraph
 
Reported: 2007-08-13 00:29 PST by
Modified: 2007-12-14 13:57 PST (History)


Attachments
Preliminary patch to implement getElementsByClassName (7.95 KB, patch)
2007-08-13 00:43 PST, David Smith
no flags Review Patch | Details | Formatted Diff | Diff
Updated to not have nasty code duplication (29.00 KB, patch)
2007-08-15 20:06 PST, David Smith
no flags Review Patch | Details | Formatted Diff | Diff
Now with changelog and test (37.46 KB, patch)
2007-08-22 10:49 PST, David Smith
no flags Review Patch | Details | Formatted Diff | Diff
Without getElementsByClassNameNS, and with null/undefined tests (35.34 KB, patch)
2007-09-01 16:57 PST, David Smith
no flags Review Patch | Details | Formatted Diff | Diff
Adds form feed to the whitespace list, and copyright headers (32.85 KB, patch)
2007-09-09 23:25 PST, David Smith
no flags Review Patch | Details | Formatted Diff | Diff
Actually include the header (34.65 KB, patch)
2007-09-09 23:42 PST, David Smith
eric: review-
Review Patch | Details | Formatted Diff | Diff
Updated to address review comments (34.56 KB, patch)
2007-10-01 23:36 PST, David Smith
no flags Review Patch | Details | Formatted Diff | Diff
More review fixage (37.36 KB, patch)
2007-10-02 00:22 PST, David Smith
eric: review-
Review Patch | Details | Formatted Diff | Diff
updated patch (75.22 KB, patch)
2007-12-12 17:02 PST, Sam Weinig
darin: review-
Review Patch | Details | Formatted Diff | Diff
updated patch (now with more Vector) (71.00 KB, patch)
2007-12-13 14:22 PST, Sam Weinig
mitz: review+
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2007-08-13 00:29:25 PST
HTML5 specifies a getElementsByClassName method on Document and Element. Firefox 3 supports this with a slightly modified API, which Hixie says he plans to update the HTML5 spec to match. Firefox has gotten large (77x over js implementations, 8x over XPath) speed improvements for this functionality from doing it natively.
------- Comment #1 From 2007-08-13 00:43:16 PST -------
Created an attachment (id=15948) [details]
Preliminary patch to implement getElementsByClassName

This isn't done (no changelog, no testcases and very limited manual testing, one known wart), but it does work, and is compatible with the Firefox implementation as far as I know. Feedback is welcome, particularly on the part that handles parsing the class string. Darin suggested generating a dummy Element, setting its class, then getting its classList, but I wasn't able to get that working (and I'm not convinced that it's less nasty than the code duplication). The current method is copy-pasted from NamedMappedAttrMap::parseClassAttribute, which is definitely less than ideal.
------- Comment #2 From 2007-08-15 20:06:18 PST -------
Created an attachment (id=15991) [details]
Updated to not have nasty code duplication

The updated version of this has moved parseClassAttribute into AtomicStringList (which is renamed to ClassNameList, since that's all it's used for), which allows getting rid of the duplicated code without weird hacks.

Still no changelog or tests, but the code should be ready for review.
------- Comment #3 From 2007-08-15 20:11:25 PST -------
(From update of attachment 15991 [details])
I'm not sure the renaming of AtomicStringList is desirable. Perhaps ClassNameList should be a subclass of AtomicStringList that just adds the parseClassAttribute method? I could be convinced otherwise, though.

It also seems like parseClassAttribute could be a static method that functions as a named constructor.
------- Comment #4 From 2007-08-15 22:55:31 PST -------
(In reply to comment #3)
> (From update of attachment 15991 [details] [edit])
> I'm not sure the renaming of AtomicStringList is desirable. Perhaps
> ClassNameList should be a subclass of AtomicStringList that just adds the
> parseClassAttribute method? I could be convinced otherwise, though.

AtomicStringList effectively *is* already ClassNameList; it's used nowhere in the code except to store lists of class names. Maciej has pointed out that it's not even a particularly good structure to use for that, but fixing it is beyond the scope of this patch.

> 
> It also seems like parseClassAttribute could be a static method that functions
> as a named constructor.
> 

Seems plausible. I'll try it out and see how it feels.
------- Comment #5 From 2007-08-16 16:02:49 PST -------
I did a bit of performance testing on this just for fun; Native vs. prototype.js's xpath version vs. prototype.js's DOM traversal version

native: 303ms
xpath: 5,706ms
js/dom: 26,660ms

:)

I also tried out aroben's second suggestion, but it seemed like a bit of a wash. Made the getElementsByClassName callsite a little nicer, and the NamedMappedAttrMap one a little less nice. What about making a regular constructor that takes a class attribute string and calls through to parseClassAttribute? Is that sort of minor convenience method generally considered worthwhile in webkit?
------- Comment #6 From 2007-08-22 10:49:30 PST -------
Created an attachment (id=16071) [details]
Now with changelog and test
------- Comment #7 From 2007-08-24 11:34:48 PST -------
Have you tested the behavior of passing js null and js undefined to getElementsByClassName and getElementsByClassNameNS?
------- Comment #8 From 2007-09-01 16:57:57 PST -------
Created an attachment (id=16176) [details]
Without getElementsByClassNameNS, and with null/undefined tests

getElementsByClassNameNS doesn't make any sense, so I've removed it (as discussed on irc). This also tests null/undefined as suggested.
------- Comment #9 From 2007-09-08 05:25:35 PST -------
Some test cases for getElementsByClassName:

http://tc.labs.opera.com/apis/getElementsByClassName/

(The Opera alpha now supports it).
------- Comment #10 From 2007-09-08 05:37:14 PST -------
Anne van Kesteren points out that this implementation doesn't treat \f as whitespace so it would fail his tests. I didn't double-check that the spec requires this treatment.
------- Comment #11 From 2007-09-09 13:08:08 PST -------
FWIW: the specification currently doesn't define the string argument. However, it seems likely that it resuses the definitions for getting tokens out of string that is used for class= currently which includes the aforementioned character.
------- Comment #12 From 2007-09-09 13:32:51 PST -------
(In reply to comment #11)
> FWIW: the specification currently doesn't define the string argument. However,
> it seems likely that it resuses the definitions for getting tokens out of
> string that is used for class= currently which includes the aforementioned
> character.
> 

Interesting. That would imply that WebKit is mis-handling class= as well, since I reused the whitespace logic from it for getElementsByClassName.
------- Comment #13 From 2007-09-09 23:25:54 PST -------
Created an attachment (id=16239) [details]
Adds form feed to the whitespace list, and copyright headers

This just adds form feed to the whitespace list (for *all* class name parsing, please let me know if that should be split out into another patch. It's about 6 characters on a line that's already being modified, so I figured it would be fine here), and adds my copyright header to the files I made more than tiny changes on (thanks for pointing that out olliej!).
------- Comment #14 From 2007-09-09 23:42:19 PST -------
Created an attachment (id=16240) [details]
Actually include the header

Somehow either I hadn't been svn adding ClassNameList.h this whole time, or (based on what filemerge says), svn-apply didn't pick up on the rename. This is the same as the previous one, except it should actually work :)
------- Comment #15 From 2007-10-01 12:17:23 PST -------
(From update of attachment 16240 [details])
Ok, a few comments:
+    for(const ClassNameList *t = this; t; t = t->next()) 
+    { 
+        if(t->string() == str) 
+            return true; 

Those don't agree with the style guidelines.  { should be on the same line as for, and if( should be if (
http://webkit.org/coding/coding-style.html

No need for this-> here:
+void ClassNameList::parseClassAttribute(const String& classStr, const bool inCompatMode)
+{
+    this->clear();

+    String classAttr = inCompatMode ? 
+        (classStr.impl()->isLower() ? classStr : String(classStr.impl()->lower())) :
+        classStr;
doesnt' need to call all the impl() stuff, just call .lower() on string.

To me "start" and "end" are more readable than "ePos" and "sPos", but that's not a huge issue.

Style guidelines again:
+ClassNodeList::ClassNodeList(PassRefPtr<Node> rootNode, const AtomicString& className)
+: NodeList(rootNode), m_namespaceURI(nullAtom)


Why?
+    m_classList = *(new ClassNameList());
That shouldn't be needed.

Style again:
+    const ClassNameList* t = static_cast<Element*>(testNode)->getClassList();
+    
+    for (const ClassNameList* c = &m_classList; c; c = c->next())
+    {
+        if(t->contains(c->string()))
+           continue;

also, "t" is not a very readable/useful variable name

This is also more easily written as:
+    for (const ClassNameList* c = &m_classList; c; c = c->next())
+    {
+        if(t->contains(c->string()))
+           continue;
+
+        return false;

if (!contains)
    return false;

IMO.

Shouldn't this be "isEmpty()" ?

+PassRefPtr<NodeList> Element::getElementsByClassName(const String& className)
+{
+    if (className.isNull())
+        return 0;


Otherwise looks OK.  I think the style stuff should be corrected and a new patch posted before landing though.

Thanks for the patch!
------- Comment #16 From 2007-10-01 23:36:29 PST -------
Created an attachment (id=16496) [details]
Updated to address review comments

I ended up removing the empty args optimization entirely, since it was buggy, and probably not useful.
------- Comment #17 From 2007-10-02 00:22:59 PST -------
Created an attachment (id=16497) [details]
More review fixage

Addressing stuff I missed before, and stuff mentioned on IRC.
------- Comment #18 From 2007-10-02 23:43:57 PST -------
(From update of attachment 16497 [details])
I am surprised.  I had figured class names were always case insensitive.  It does not appear to be the case with your patch.  Does contains() need to lower() the incoming string when in compatMode?

This is not needed, please remove:
+    m_classList = ClassNameList();

Constructor initializers are generally one on each line, with the comma (or colon) leading the line (see the style guidelines). This allows easy use of #if ENABLE(FOO) around certain initializers as well as easy removal/addition with the smallest possible diff.

Maybe it's the need to return "false" when m_impl == 0, that has caused isLower() not to be implemented before.  One could argue that isLower should return true for "" or null.  Not sure it matters.  I'm OK having a  funny isLower as such.  It's also OK to call string->impl()->toLower() (and NOT wrap it in String(), the compiler will do that for you.)  I know I encouraged you to add isLower. Doesn't matter either way, your call.

Need to fix the one small style issue, the extra initialization and answer regarding case insensitive lookups.  I also think we need a compat-mode test and some case-sensitivity and white-space sensitivity tests before this can land.

Thanks again for taking the time to work on this.
------- Comment #19 From 2007-10-03 09:40:09 PST -------
FYI: class names are always case-_sensitive_ as far as getElementsByClassName() is concerned. No need to introduce limited quirks mode behavior into a new API.
------- Comment #20 From 2007-10-15 06:08:11 PST -------
I'd love to see the last few issues fixed up to land this on trunk.
------- Comment #21 From 2007-10-15 07:50:56 PST -------
(In reply to comment #20)
> I'd love to see the last few issues fixed up to land this on trunk.
> 

I haven't forgotten, just haven't had time to work on it. I'll try to set aside some time this weekend to finish it up.

I do have one question regarding case-sensitivity though: now that code is shared between regular class parsing and getElementsByClassName parsing it becomes more difficult to only have a quirks mode for regular class parsing. Should I leave it as is, or add another argument to the function to let the caller control whether quirks mode is allowed (or a third option if I'm missing some better way of doing it)?
------- Comment #22 From 2007-11-17 23:37:58 PST -------
(In reply to comment #19)
> FYI: class names are always case-_sensitive_ as far as getElementsByClassName()
> is concerned. No need to introduce limited quirks mode behavior into a new API.

I'm not sure if I agree - a Web developer doesn't need to know whether an API is old or new - consistent behavior might be better, even if it means supporting old quirks.
------- Comment #23 From 2007-11-30 14:34:19 PST -------
This doesn't seem to work with a lot of Anne's tests.  It is failing on tests that set a class on the documentElement (<html>) and I think it is failing to find this element because we only search below the root node.  I am curious if the the element that getElementsByClassName is called on is supposed to include it self in the list, or only it's descendants.
------- Comment #24 From 2007-11-30 14:41:14 PST -------
Actually, the spec is pretty specific about this.  So it is a bug for implementation of document.getElementsByClassName to just call documentElement.getElementsByClassName because that won't include itself.
------- Comment #25 From 2007-12-03 16:33:15 PST -------
(In reply to comment #22)
> (In reply to comment #19)
> > FYI: class names are always case-_sensitive_ as far as getElementsByClassName()
> > is concerned. No need to introduce limited quirks mode behavior into a new API.
> 
> I'm not sure if I agree - a Web developer doesn't need to know whether an API
> is old or new - consistent behavior might be better, even if it means
> supporting old quirks.
> 

I've spoken with Mozilla and Opera developers about this, and brought it up with Hixie and #whatwg; so far four people (counting the two of us) are in favor of maintaining the case insensitivity quirk, and everyone else doesn't care at all. I'll try to remember to send an email to the list about changing the spec to reflect this.
------- Comment #26 From 2007-12-12 17:02:24 PST -------
Created an attachment (id=17870) [details]
updated patch

This updates the patch to allow document as the root node, cleaned up things a bit and adds Anne's test suite to the layouttests.  Only test 014.html as it originally was failed, but I have updated it to match our current quirksmode behavior and added a comment.  Think we should land this with the current functionality.  Objections.

I also kept David's name in the ChangeLog.
------- Comment #27 From 2007-12-12 17:55:10 PST -------
(From update of attachment 17870 [details])
There's an extra change log entry here, for bug 15313.

I really wish we didn't use a list for ClassNameList. A Vector<AtomicString>* would work better, I think. It's really bad that deleting a ClassNameList uses O(n) stack in the destructor.

Is it really good to indent ChildNodeList.h as part of this patch? There's also an unneeded forward declaration added to that file. Please land that separately if at all.

We should use foldCase() instead of lower() for contexts like this one.

The check of isLower() in parseClassAttribute is unnecessary. String::lower() should already do that.

 41     , m_namespaceURI(nullAtom)

Isn't nullAtom the default anyway?

Otherwise looks good to me. I'll say review- for now because I want you to consider the comments above.
------- Comment #28 From 2007-12-12 19:37:10 PST -------
I have already filed http://bugs.webkit.org/show_bug.cgi?id=15760 on the O(n) stack issue (it's a crasher). I think that bug would be a better place to address changing from a linked list to a vector. The current implementation is as it is merely because it's a rename of AtomicStringList rather than a new class.
------- Comment #29 From 2007-12-13 14:22:22 PST -------
Created an attachment (id=17882) [details]
updated patch (now with more Vector)
------- Comment #30 From 2007-12-13 22:00:55 PST -------
(From update of attachment 17882 [details])
 122     bool isLower() const { return m_impl ? m_impl->isLower() : false; }

The null string is all-lowercase (and all-uppercase), so this should say true.
------- Comment #31 From 2007-12-14 11:01:26 PST -------
(From update of attachment 17882 [details])
r=me
------- Comment #32 From 2007-12-14 13:57:19 PST -------
Fix landed in r28722.