Bug 22699 - Enable NodeList caching for getElementsByTagName
Summary: Enable NodeList caching for getElementsByTagName
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: David Smith
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-05 16:21 PST by David Smith
Modified: 2009-01-02 16:26 PST (History)
2 users (show)

See Also:


Attachments
Initial patch (13.01 KB, patch)
2008-12-24 11:13 PST, David Smith
darin: review-
Details | Formatted Diff | Diff
Addresses most review comments (13.51 KB, patch)
2008-12-26 13:56 PST, David Smith
no flags Details | Formatted Diff | Diff
Sigh, forgot to save a file before diffing. (13.59 KB, patch)
2008-12-26 14:24 PST, David Smith
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Smith 2008-12-05 16:21:08 PST
Currently it's rather slow as a result of not having it.
Comment 1 Sam Weinig 2008-12-05 19:15:32 PST
To do this, we will probably need to create HashTraits for QualifiedName and since we aren't interested in hashing on the prefix, and adaptor for just hashing on the namespace and localname.
Comment 2 David Smith 2008-12-24 11:13:19 PST
Created attachment 26239 [details]
Initial patch

This appears to work well (passes tests, goes quickly). I'm hesitant about a lot of the hashing stuff though, as I haven't worked with it before.
Comment 3 Darin Adler 2008-12-25 10:34:50 PST
Comment on attachment 26239 [details]
Initial patch

> +        data->setNodeLists(std::auto_ptr<NodeListsNodeData>(new NodeListsNodeData));

In .cpp files we normally don't use the "std::" prefix. Instead we do "using namespace std" at the top of the file. In this file I think that's already done so you can just omit the "std::" here.

> +    return TagNodeList::create(this, namespaceURI.isEmpty() ? nullAtom : AtomicString(namespaceURI), name, result.first->second);

It would be good to change the namespaceURI argument to an AtomicString rather than converting it here. Unrelated to your patch though.

> -struct QNameHash {
> +struct QNameImplHash {

I don't think this name change is a good one. A single hash struct/class can support hashing multiple different types as long as there's no ambiguity, so this could hash both Impl and non-Impl if we later implemented hashing for non-Impl. I think adding the Impl to the name is not an improvement. I think that instead we should combine both the new QNameHash and the old QNameImplHash and put both in the header.

> +    // Golden ratio - arbitrary start value to avoid mapping all 0's to all 0's
> +    static const unsigned PHI = 0x9e3779b9U;

This should be a single constant shared by the various hash functions. I don't think it's good to have it defined inside the QualifiedName class. A step in the wrong direction. Lets find a place to put it so we can share it.

> +static inline unsigned hashComponents(const QualifiedNameComponents& buf)

This is now in a header, so it should not be marked "static", since that gives it internal linkage. Things in headers should almost never have internal linkage.

> +    typedef HashMap<QualifiedName::QualifiedName, DynamicNodeList::Caches*, QNameHash, HashTraits<WebCore::QualifiedName> > TagCacheMap;

How does this compile? Does QualifiedName::QualifiedName actually work?

No need for the WebCore:: in WebCore::QualifiedName.

Instead of explicitly specifying HashTraits<QualifiedName>, you should make that be the default. The way to do that is to specialize DefaultHash for the QualifiedName type. For examples, look at KURL.h, AtomicString.h, PlatformString.h, and StringImpl.h. Don't emulate IntSizeHash.h which puts the default in a separate header. It's important that the default be specialized in the same header that defines the type, even if the actual hash structure is defined in another header, because otherwise you could end up getting the wrong default depending on what headers you include.

There's enough that could be improved that I'm going to say review-, but this looks like great work that's on the right track.

I worry a little that this uses more memory to trade off better speed. Are there some speed tests that demonstrate the importance of this optimization on the real web?
Comment 4 David Smith 2008-12-25 13:19:42 PST
(In reply to comment #3)
> (From update of attachment 26239 [details] [review])
> > +        data->setNodeLists(std::auto_ptr<NodeListsNodeData>(new NodeListsNodeData));
> 
> In .cpp files we normally don't use the "std::" prefix. Instead we do "using
> namespace std" at the top of the file. In this file I think that's already done
> so you can just omit the "std::" here.

Will do.

> 
> > +    return TagNodeList::create(this, namespaceURI.isEmpty() ? nullAtom : AtomicString(namespaceURI), name, result.first->second);
> 
> It would be good to change the namespaceURI argument to an AtomicString rather
> than converting it here. Unrelated to your patch though.
> 
> > -struct QNameHash {
> > +struct QNameImplHash {
> 
> I don't think this name change is a good one. A single hash struct/class can
> support hashing multiple different types as long as there's no ambiguity, so
> this could hash both Impl and non-Impl if we later implemented hashing for
> non-Impl. I think adding the Impl to the name is not an improvement. I think
> that instead we should combine both the new QNameHash and the old QNameImplHash
> and put both in the header.

Makes sense.

> 
> > +    // Golden ratio - arbitrary start value to avoid mapping all 0's to all 0's
> > +    static const unsigned PHI = 0x9e3779b9U;
> 
> This should be a single constant shared by the various hash functions. I don't
> think it's good to have it defined inside the QualifiedName class. A step in
> the wrong direction. Lets find a place to put it so we can share it.
> 

I moved it there since it conflicted with a definition in another location. I should see if I can just remove it entirely and use the existing public location.

> > +static inline unsigned hashComponents(const QualifiedNameComponents& buf)
> 
> This is now in a header, so it should not be marked "static", since that gives
> it internal linkage. Things in headers should almost never have internal
> linkage.
>

Will do.
 
> > +    typedef HashMap<QualifiedName::QualifiedName, DynamicNodeList::Caches*, QNameHash, HashTraits<WebCore::QualifiedName> > TagCacheMap;
> 
> How does this compile? Does QualifiedName::QualifiedName actually work?
> 
> No need for the WebCore:: in WebCore::QualifiedName.
> 

Not sure what I was doing there, heh. It compiles, but that's definitely odd. Will fix.

> Instead of explicitly specifying HashTraits<QualifiedName>, you should make
> that be the default. The way to do that is to specialize DefaultHash for the
> QualifiedName type. For examples, look at KURL.h, AtomicString.h,
> PlatformString.h, and StringImpl.h. Don't emulate IntSizeHash.h which puts the
> default in a separate header. It's important that the default be specialized in
> the same header that defines the type, even if the actual hash structure is
> defined in another header, because otherwise you could end up getting the wrong
> default depending on what headers you include.

OK, I'll check out those headers and fix that.

> 
> There's enough that could be improved that I'm going to say review-, but this
> looks like great work that's on the right track.
> 
> I worry a little that this uses more memory to trade off better speed. Are
> there some speed tests that demonstrate the importance of this optimization on
> the real web?
> 

The testcase I've been using is an as-yet unreleased version of a popular benchmark, so I was hesitant to include a link in this bug. The speedup is quite dramatic (>50x in some cases), but since I haven't heard complaints about getElementsByTagName being slow in WebKit, perhaps it's a nonissue? I'll ping you on irc in the next day or two if you're around.
Comment 5 Darin Adler 2008-12-25 21:43:40 PST
(In reply to comment #4)
> The testcase I've been using is an as-yet unreleased version of a popular
> benchmark, so I was hesitant to include a link in this bug. The speedup is
> quite dramatic (>50x in some cases), but since I haven't heard complaints about
> getElementsByTagName being slow in WebKit, perhaps it's a nonissue? I'll ping
> you on irc in the next day or two if you're around.

I don't mind adding the caching -- we just have to keep in mind that there's a memory/speed tradeoff when making changes like this one. Seems fine to add it.
Comment 6 Darin Adler 2008-12-25 21:45:19 PST
(In reply to comment #4)
> > > -struct QNameHash {

Since this is going to be in a header and not just private to one file, I also suggest naming it QualifiedNameHash instead of QNameHash.
Comment 7 David Smith 2008-12-26 13:56:40 PST
Created attachment 26260 [details]
Addresses most review comments

I wasn't sure what to do with PHI unfortunately. There didn't seem to be a header that both made sense to put it in and was included by the right files, and it seems like overkill to make a new header for it.
Comment 8 David Smith 2008-12-26 14:24:00 PST
Created attachment 26261 [details]
Sigh, forgot to save a file before diffing.

Sorry about the bugspam.
Comment 9 Darin Adler 2009-01-02 10:21:58 PST
(In reply to comment #7)
> I wasn't sure what to do with PHI unfortunately. There didn't seem to be a
> header that both made sense to put it in and was included by the right files,
> and it seems like overkill to make a new header for it. 

I think that PHI should go in <wtf/HashFunctions.h>, with a suitable name other than PHI, perhaps stringHashingStartValue.
Comment 10 Darin Adler 2009-01-02 10:29:05 PST
Comment on attachment 26261 [details]
Sigh, forgot to save a file before diffing.

> +        * dom/QualifiedName.h: Move QNameHash to the header and make it work on QualifiedNames
> +        (WebCore::hashComponents):
> +        (WebCore::QualifiedNameHash::hash):
> +        (WebCore::QualifiedNameHash::equal):

If you're not going to comment on the individual functions in the .h file, please just remove the function names. Either way is OK with me. In cpp files it's probably less good to remove the function names, even if you aren't commenting on each one.

> +        (WTF::):

Please don't leave bogus lines like this one in the change log.

> +    pair<NodeListsNodeData::TagCacheMap::iterator, bool> result = data->nodeLists()->m_tagNodeListCaches.add(QualifiedName(nullAtom, name, namespaceURI), 0);
> +    if (result.second)
> +        result.first->second = new DynamicNodeList::Caches;
> +    
> +    return TagNodeList::create(this, namespaceURI.isEmpty() ? nullAtom : AtomicString(namespaceURI), name, result.first->second);

It's inefficient to convert namespaceURI to an AtomicString twice here. It should be made into an AtomicString only once and reused. The best way to do this is to change the argument into an AtomicString in the first place in Node.h. This should be done for isDefaultNamespace, lookupPrefix, lookupNamespacePrefix, and getElementsByTagNameNS, since all those functions just end up making an AtomicString a moment later. Maybe in a separate patch, though.

> +    static unsigned hash(const QualifiedName& name) {
> +        return hash(name.impl());
> +    }

Either the function definition should to be all on one line, or use our normal brace style (function start braces go on their own line).

> +    static unsigned hash(const QualifiedName::QualifiedNameImpl* name) {    
> +        QualifiedNameComponents c = { name->m_prefix.impl(), name->m_localName.impl(), name->m_namespace.impl() };
> +        return hashComponents(c);
> +    }

Same comment here.

> +    static bool equal(const QualifiedName& a, const QualifiedName& b) { return a == b; }
> +    
> +    static bool equal(const QualifiedName::QualifiedNameImpl* a, const QualifiedName::QualifiedNameImpl* b) { return a == b; }

I think these would read better if they didn't have a space between them.

r=me
Comment 11 David Smith 2009-01-02 16:26:45 PST
Landed after addressing the final review comments in r39563