RESOLVED FIXED 14955
Implement getElementsByClassName
https://bugs.webkit.org/show_bug.cgi?id=14955
Summary Implement getElementsByClassName
David Smith
Reported 2007-08-13 00:29:25 PDT
HTML5 specifies a getElementsByClassName method on Document and Element. Firefox 3 supports this with a slightly modified API, which Hixie says he plans to update the HTML5 spec to match. Firefox has gotten large (77x over js implementations, 8x over XPath) speed improvements for this functionality from doing it natively.
Attachments
Preliminary patch to implement getElementsByClassName (7.95 KB, patch)
2007-08-13 00:43 PDT, David Smith
no flags
Updated to not have nasty code duplication (29.00 KB, patch)
2007-08-15 20:06 PDT, David Smith
no flags
Now with changelog and test (37.46 KB, patch)
2007-08-22 10:49 PDT, David Smith
no flags
Without getElementsByClassNameNS, and with null/undefined tests (35.34 KB, patch)
2007-09-01 16:57 PDT, David Smith
no flags
Adds form feed to the whitespace list, and copyright headers (32.85 KB, patch)
2007-09-09 23:25 PDT, David Smith
no flags
Actually include the header (34.65 KB, patch)
2007-09-09 23:42 PDT, David Smith
eric: review-
Updated to address review comments (34.56 KB, patch)
2007-10-01 23:36 PDT, David Smith
no flags
More review fixage (37.36 KB, patch)
2007-10-02 00:22 PDT, David Smith
eric: review-
updated patch (75.22 KB, patch)
2007-12-12 17:02 PST, Sam Weinig
darin: review-
updated patch (now with more Vector) (71.00 KB, patch)
2007-12-13 14:22 PST, Sam Weinig
mitz: review+
David Smith
Comment 1 2007-08-13 00:43:16 PDT
Created attachment 15948 [details] Preliminary patch to implement getElementsByClassName This isn't done (no changelog, no testcases and very limited manual testing, one known wart), but it does work, and is compatible with the Firefox implementation as far as I know. Feedback is welcome, particularly on the part that handles parsing the class string. Darin suggested generating a dummy Element, setting its class, then getting its classList, but I wasn't able to get that working (and I'm not convinced that it's less nasty than the code duplication). The current method is copy-pasted from NamedMappedAttrMap::parseClassAttribute, which is definitely less than ideal.
David Smith
Comment 2 2007-08-15 20:06:18 PDT
Created attachment 15991 [details] Updated to not have nasty code duplication The updated version of this has moved parseClassAttribute into AtomicStringList (which is renamed to ClassNameList, since that's all it's used for), which allows getting rid of the duplicated code without weird hacks. Still no changelog or tests, but the code should be ready for review.
Adam Roben (:aroben)
Comment 3 2007-08-15 20:11:25 PDT
Comment on attachment 15991 [details] Updated to not have nasty code duplication I'm not sure the renaming of AtomicStringList is desirable. Perhaps ClassNameList should be a subclass of AtomicStringList that just adds the parseClassAttribute method? I could be convinced otherwise, though. It also seems like parseClassAttribute could be a static method that functions as a named constructor.
David Smith
Comment 4 2007-08-15 22:55:31 PDT
(In reply to comment #3) > (From update of attachment 15991 [details] [edit]) > I'm not sure the renaming of AtomicStringList is desirable. Perhaps > ClassNameList should be a subclass of AtomicStringList that just adds the > parseClassAttribute method? I could be convinced otherwise, though. AtomicStringList effectively *is* already ClassNameList; it's used nowhere in the code except to store lists of class names. Maciej has pointed out that it's not even a particularly good structure to use for that, but fixing it is beyond the scope of this patch. > > It also seems like parseClassAttribute could be a static method that functions > as a named constructor. > Seems plausible. I'll try it out and see how it feels.
David Smith
Comment 5 2007-08-16 16:02:49 PDT
I did a bit of performance testing on this just for fun; Native vs. prototype.js's xpath version vs. prototype.js's DOM traversal version native: 303ms xpath: 5,706ms js/dom: 26,660ms :) I also tried out aroben's second suggestion, but it seemed like a bit of a wash. Made the getElementsByClassName callsite a little nicer, and the NamedMappedAttrMap one a little less nice. What about making a regular constructor that takes a class attribute string and calls through to parseClassAttribute? Is that sort of minor convenience method generally considered worthwhile in webkit?
David Smith
Comment 6 2007-08-22 10:49:30 PDT
Created attachment 16071 [details] Now with changelog and test
Sam Weinig
Comment 7 2007-08-24 11:34:48 PDT
Have you tested the behavior of passing js null and js undefined to getElementsByClassName and getElementsByClassNameNS?
David Smith
Comment 8 2007-09-01 16:57:57 PDT
Created attachment 16176 [details] Without getElementsByClassNameNS, and with null/undefined tests getElementsByClassNameNS doesn't make any sense, so I've removed it (as discussed on irc). This also tests null/undefined as suggested.
Maciej Stachowiak
Comment 9 2007-09-08 05:25:35 PDT
Some test cases for getElementsByClassName: http://tc.labs.opera.com/apis/getElementsByClassName/ (The Opera alpha now supports it).
Maciej Stachowiak
Comment 10 2007-09-08 05:37:14 PDT
Anne van Kesteren points out that this implementation doesn't treat \f as whitespace so it would fail his tests. I didn't double-check that the spec requires this treatment.
Anne van Kesteren
Comment 11 2007-09-09 13:08:08 PDT
FWIW: the specification currently doesn't define the string argument. However, it seems likely that it resuses the definitions for getting tokens out of string that is used for class= currently which includes the aforementioned character.
David Smith
Comment 12 2007-09-09 13:32:51 PDT
(In reply to comment #11) > FWIW: the specification currently doesn't define the string argument. However, > it seems likely that it resuses the definitions for getting tokens out of > string that is used for class= currently which includes the aforementioned > character. > Interesting. That would imply that WebKit is mis-handling class= as well, since I reused the whitespace logic from it for getElementsByClassName.
David Smith
Comment 13 2007-09-09 23:25:54 PDT
Created attachment 16239 [details] Adds form feed to the whitespace list, and copyright headers This just adds form feed to the whitespace list (for *all* class name parsing, please let me know if that should be split out into another patch. It's about 6 characters on a line that's already being modified, so I figured it would be fine here), and adds my copyright header to the files I made more than tiny changes on (thanks for pointing that out olliej!).
David Smith
Comment 14 2007-09-09 23:42:19 PDT
Created attachment 16240 [details] Actually include the header Somehow either I hadn't been svn adding ClassNameList.h this whole time, or (based on what filemerge says), svn-apply didn't pick up on the rename. This is the same as the previous one, except it should actually work :)
Eric Seidel (no email)
Comment 15 2007-10-01 12:17:23 PDT
Comment on attachment 16240 [details] Actually include the header Ok, a few comments: + for(const ClassNameList *t = this; t; t = t->next()) + { + if(t->string() == str) + return true; Those don't agree with the style guidelines. { should be on the same line as for, and if( should be if ( http://webkit.org/coding/coding-style.html No need for this-> here: +void ClassNameList::parseClassAttribute(const String& classStr, const bool inCompatMode) +{ + this->clear(); + String classAttr = inCompatMode ? + (classStr.impl()->isLower() ? classStr : String(classStr.impl()->lower())) : + classStr; doesnt' need to call all the impl() stuff, just call .lower() on string. To me "start" and "end" are more readable than "ePos" and "sPos", but that's not a huge issue. Style guidelines again: +ClassNodeList::ClassNodeList(PassRefPtr<Node> rootNode, const AtomicString& className) +: NodeList(rootNode), m_namespaceURI(nullAtom) Why? + m_classList = *(new ClassNameList()); That shouldn't be needed. Style again: + const ClassNameList* t = static_cast<Element*>(testNode)->getClassList(); + + for (const ClassNameList* c = &m_classList; c; c = c->next()) + { + if(t->contains(c->string())) + continue; also, "t" is not a very readable/useful variable name This is also more easily written as: + for (const ClassNameList* c = &m_classList; c; c = c->next()) + { + if(t->contains(c->string())) + continue; + + return false; if (!contains) return false; IMO. Shouldn't this be "isEmpty()" ? +PassRefPtr<NodeList> Element::getElementsByClassName(const String& className) +{ + if (className.isNull()) + return 0; Otherwise looks OK. I think the style stuff should be corrected and a new patch posted before landing though. Thanks for the patch!
David Smith
Comment 16 2007-10-01 23:36:29 PDT
Created attachment 16496 [details] Updated to address review comments I ended up removing the empty args optimization entirely, since it was buggy, and probably not useful.
David Smith
Comment 17 2007-10-02 00:22:59 PDT
Created attachment 16497 [details] More review fixage Addressing stuff I missed before, and stuff mentioned on IRC.
Eric Seidel (no email)
Comment 18 2007-10-02 23:43:57 PDT
Comment on attachment 16497 [details] More review fixage I am surprised. I had figured class names were always case insensitive. It does not appear to be the case with your patch. Does contains() need to lower() the incoming string when in compatMode? This is not needed, please remove: + m_classList = ClassNameList(); Constructor initializers are generally one on each line, with the comma (or colon) leading the line (see the style guidelines). This allows easy use of #if ENABLE(FOO) around certain initializers as well as easy removal/addition with the smallest possible diff. Maybe it's the need to return "false" when m_impl == 0, that has caused isLower() not to be implemented before. One could argue that isLower should return true for "" or null. Not sure it matters. I'm OK having a funny isLower as such. It's also OK to call string->impl()->toLower() (and NOT wrap it in String(), the compiler will do that for you.) I know I encouraged you to add isLower. Doesn't matter either way, your call. Need to fix the one small style issue, the extra initialization and answer regarding case insensitive lookups. I also think we need a compat-mode test and some case-sensitivity and white-space sensitivity tests before this can land. Thanks again for taking the time to work on this.
Anne van Kesteren
Comment 19 2007-10-03 09:40:09 PDT
FYI: class names are always case-_sensitive_ as far as getElementsByClassName() is concerned. No need to introduce limited quirks mode behavior into a new API.
Maciej Stachowiak
Comment 20 2007-10-15 06:08:11 PDT
I'd love to see the last few issues fixed up to land this on trunk.
David Smith
Comment 21 2007-10-15 07:50:56 PDT
(In reply to comment #20) > I'd love to see the last few issues fixed up to land this on trunk. > I haven't forgotten, just haven't had time to work on it. I'll try to set aside some time this weekend to finish it up. I do have one question regarding case-sensitivity though: now that code is shared between regular class parsing and getElementsByClassName parsing it becomes more difficult to only have a quirks mode for regular class parsing. Should I leave it as is, or add another argument to the function to let the caller control whether quirks mode is allowed (or a third option if I'm missing some better way of doing it)?
Alexey Proskuryakov
Comment 22 2007-11-17 23:37:58 PST
(In reply to comment #19) > FYI: class names are always case-_sensitive_ as far as getElementsByClassName() > is concerned. No need to introduce limited quirks mode behavior into a new API. I'm not sure if I agree - a Web developer doesn't need to know whether an API is old or new - consistent behavior might be better, even if it means supporting old quirks.
Sam Weinig
Comment 23 2007-11-30 14:34:19 PST
This doesn't seem to work with a lot of Anne's tests. It is failing on tests that set a class on the documentElement (<html>) and I think it is failing to find this element because we only search below the root node. I am curious if the the element that getElementsByClassName is called on is supposed to include it self in the list, or only it's descendants.
Sam Weinig
Comment 24 2007-11-30 14:41:14 PST
Actually, the spec is pretty specific about this. So it is a bug for implementation of document.getElementsByClassName to just call documentElement.getElementsByClassName because that won't include itself.
David Smith
Comment 25 2007-12-03 16:33:15 PST
(In reply to comment #22) > (In reply to comment #19) > > FYI: class names are always case-_sensitive_ as far as getElementsByClassName() > > is concerned. No need to introduce limited quirks mode behavior into a new API. > > I'm not sure if I agree - a Web developer doesn't need to know whether an API > is old or new - consistent behavior might be better, even if it means > supporting old quirks. > I've spoken with Mozilla and Opera developers about this, and brought it up with Hixie and #whatwg; so far four people (counting the two of us) are in favor of maintaining the case insensitivity quirk, and everyone else doesn't care at all. I'll try to remember to send an email to the list about changing the spec to reflect this.
Sam Weinig
Comment 26 2007-12-12 17:02:24 PST
Created attachment 17870 [details] updated patch This updates the patch to allow document as the root node, cleaned up things a bit and adds Anne's test suite to the layouttests. Only test 014.html as it originally was failed, but I have updated it to match our current quirksmode behavior and added a comment. Think we should land this with the current functionality. Objections. I also kept David's name in the ChangeLog.
Darin Adler
Comment 27 2007-12-12 17:55:10 PST
Comment on attachment 17870 [details] updated patch There's an extra change log entry here, for bug 15313. I really wish we didn't use a list for ClassNameList. A Vector<AtomicString>* would work better, I think. It's really bad that deleting a ClassNameList uses O(n) stack in the destructor. Is it really good to indent ChildNodeList.h as part of this patch? There's also an unneeded forward declaration added to that file. Please land that separately if at all. We should use foldCase() instead of lower() for contexts like this one. The check of isLower() in parseClassAttribute is unnecessary. String::lower() should already do that. 41 , m_namespaceURI(nullAtom) Isn't nullAtom the default anyway? Otherwise looks good to me. I'll say review- for now because I want you to consider the comments above.
David Smith
Comment 28 2007-12-12 19:37:10 PST
I have already filed http://bugs.webkit.org/show_bug.cgi?id=15760 on the O(n) stack issue (it's a crasher). I think that bug would be a better place to address changing from a linked list to a vector. The current implementation is as it is merely because it's a rename of AtomicStringList rather than a new class.
Sam Weinig
Comment 29 2007-12-13 14:22:22 PST
Created attachment 17882 [details] updated patch (now with more Vector)
mitz
Comment 30 2007-12-13 22:00:55 PST
Comment on attachment 17882 [details] updated patch (now with more Vector) 122 bool isLower() const { return m_impl ? m_impl->isLower() : false; } The null string is all-lowercase (and all-uppercase), so this should say true.
mitz
Comment 31 2007-12-14 11:01:26 PST
Comment on attachment 17882 [details] updated patch (now with more Vector) r=me
Sam Weinig
Comment 32 2007-12-14 13:57:19 PST
Fix landed in r28722.
Note You need to log in before you can comment on or make changes to this bug.