Bug 13021

Summary: XPath can be very slow
Product: WebKit Reporter: Mark Rowe (bdash) <mrowe>
Component: XMLAssignee: Alexey Proskuryakov <ap>
Status: RESOLVED FIXED    
Severity: Major CC: ap
Priority: P2    
Version: 523.x (Safari 3)   
Hardware: Mac   
OS: OS X 10.4   
URL: http://ejohn.org/apps/classname/xpath.html?r0=1173429259
Attachments:
Description Flags
partial fix 1
none
partial fix 2
none
partial fix 2 v2
none
partial fix 2 v3
none
partial fix 3
darin: review+
partial fix 3 v2 darin: review+

Description Mark Rowe (bdash) 2007-03-09 00:53:08 PST
After reading <http://ejohn.org/blog/getelementsbyclassname-speed-comparison/> I grabbed a copy of the test pages to see how we compared to Firefox.  Our results on the XPath test (at http://ejohn.org/apps/classname/xpath.html?r0=1173429259) are horrible.  We frequently take 20+ seconds, while Firefox comes in at around 5 seconds.

The document uses prototype.js's getElementsByClassName implementation, which will use XPath if available.  It constructs a path of the form ".//*[contains(concat(' ', @class, ' '), ' " + className + " ')]", where className is the class being searched for.

A quick glance at the process with Shark suggests we're spending a lot of time dispatching DOM subtree modified events (XPathResult attaches a listener for these) from below Attribute::createAttrIfNeeded.
Comment 1 Alexey Proskuryakov 2007-03-09 12:03:55 PST
I believe the problem is that //* enumerates all element children multiple times, removing the duplicates later. I have noticed this by code inspection before, but didn't expect //* to be a common idiom.

I'm going to try some possible optimizations.
Comment 2 Mark Rowe (bdash) 2007-03-09 23:56:46 PST
Alexey, I'd encourage you to use Shark to see exactly where the time is being spent in this case.  As I mentioned I was seeing a large amount of time spent under Attribute::createAttrIfNeeded, specifically related to processing of DOM subtree modified events.  I'm not even sure why enumerating elements would lead to DOM subtrees being modified...
Comment 3 Alexey Proskuryakov 2007-03-10 03:33:02 PST
I cannot see how the time spent in Attribute::createAttrIfNeeded() is related to processing DOM events - but it's clearly another factor to the overall slowness, because it's needlessly called during XPath attribute node enumeration.
Comment 4 Mark Rowe (bdash) 2007-03-10 05:16:19 PST
Attribute::createAttrIfNeeded creates a DOM text node for the attribute value.  When it adds itself to the Attribute via appendChild this results in a subtree modification event being dispatched.  It seems a bit bizarre that the event is dispatched in this case because it seems that the DOM is not being modified as such, rather it is being lazily created.
Comment 5 Alexey Proskuryakov 2007-03-10 13:04:32 PST
Created attachment 13578 [details]
partial fix 1

I'm still not sure what the main culprit is (the DOM modification events do not seem to be a huge contributor, according to Shark), but this patch improves the performance a bit. The results are very inconsistent, though - sometimes I measure a 10% improvement, other times, a slowdown :-/
Comment 6 Darin Adler 2007-03-10 20:18:22 PST
Comment on attachment 13578 [details]
partial fix 1

r=me

One thing that's clearly a slow idiom is having functions with a return type that's a Vector. Vector is not good for these purposes because returning one involves allocating an entire new vector every time. It's better to pass a Vector by value. Or we can wrap the vector so it works better as a return value type. Or we can carefully code so we get the C++ return value optimization; but that might not help enough.
Comment 7 Alexey Proskuryakov 2007-03-11 00:40:31 PST
Comment on attachment 13578 [details]
partial fix 1

Committed revision 20102, clearing the review flag.

It's particularly inefficient that NodeVector is a vector of RefPtrs. It has to hold onto its nodes because Attr nodes usually don't have any other references. Of course, re-creating Attr nodes all the time doesn't help, too.
Comment 8 Alexey Proskuryakov 2007-03-11 03:14:51 PDT
See <https://bugzilla.mozilla.org/show_bug.cgi?id=208172> and <http://www.fiveanddime.net/firefox-1.5-source/mozilla/extensions/transformiix/docs/optimized-xpath.html.html> for ideas that were considered for Mozilla optimized XPath implementation.
Comment 9 Alexey Proskuryakov 2007-03-11 13:30:46 PDT
(In reply to comment #4)
> It seems a bit bizarre that the event is dispatched in this case because 
> it seems that the DOM is not being modified as such, rather it is being
> lazily created.

AFAICT, these mutation events don't even go anywhere - they appear to be directly dispatched to the attribute's text node, which cannot have any listeners attached (because it has been just created). I'm going to verify this theory, and possibly add an optimization.
Comment 10 Alexey Proskuryakov 2007-03-13 13:35:24 PDT
Created attachment 13617 [details]
partial fix 2

This brings us almost on par with Firefox (16ms vs. 17ms on my G4, down from 48ms). But it comes with a number of comments:

1) I do not like short-circuiting appendChild() like this, it would be much better to just make it fast. However, that's definitely not a stabilization period work (and even changes in Attr.cpp look questionable in this regard).

2) Merging nodesInAxis and nodeMatches saved 24ms, and short-circuiting appendChild saved another 7ms.

3) However, only making the second change (short-circuiting appendChild) actually made the performance worse (by 4ms)! I can only guess why that happened - and my guess is that XPath evaluation creates a lot of memory fragmentation, causing weird effects with fastMalloc.

4) Running the test for the second time gives a noticeably worse result (24ms). Again, I can only blame fastMalloc, but without much evidence.

5) There are still a lot of attack vectors for improving performance in this test. For example, I really want to make NodeVector a Vector<Node*> instead of Vector<RefPtr<Node> >. Also, "@class" really needs to use getAttributeNode() instead of enumerating (and instantiating in DOM) all attributes.
Comment 11 Alexey Proskuryakov 2007-03-14 13:13:59 PDT
(In reply to comment #10)
> This brings us almost on par with Firefox (16ms vs. 17ms on my G4, down from
> 48ms).

Comparing to Firefox 3 beta makes things less rosy (10ms there).
Comment 12 Alexey Proskuryakov 2007-03-14 23:37:17 PDT
Created attachment 13642 [details]
partial fix 2 v2

Backed out the changes to EvaluationContext - I forgot about custom namespace resolvers that can do anything when called.
Comment 13 Alexey Proskuryakov 2007-03-20 13:02:46 PDT
Created attachment 13724 [details]
partial fix 2 v3

Updated for current TOT.
Comment 14 Darin Adler 2007-03-24 20:18:45 PDT
Comment on attachment 13724 [details]
partial fix 2 v3

+        // This does everything appendChild() would do in this situation, but much more efficiently.

Misleading comment because this does what appendChild would do *if* m_ignoreChildrenChanged was set.

+                RefPtr<Node> n = attrs->item(i);
+                if (nodeMatches(n.get()))
+                    nodes.append(n);

Would be more efficient to append n.release().

+            bool nodeMatches(Node* node) const;

Don't need the name "node" here.

r=me
Comment 15 Alexey Proskuryakov 2007-03-25 01:57:00 PDT
Comment on attachment 13724 [details]
partial fix 2 v3

Committed revision 20487, clearing review flag from a landed patch.

(In reply to comment #14)

> Misleading comment because this does what appendChild would do *if*
> m_ignoreChildrenChanged was set.

  Updated the comment.

> Would be more efficient to append n.release().

  Fixed.

> Don't need the name "node" here.

  Fixed.
Comment 16 Alexey Proskuryakov 2007-03-25 08:41:49 PDT
Created attachment 13814 [details]
partial fix 3

This makes us faster than Firefox 2, but still slower than Firefox 3 beta (13.5 ms on my machine vs. 16 ms and 10 ms). Results on other benchmarks (such as <http://www.andrewdupont.net/test/xpath/>) are also decent now, though not always stellar.

I intend to close this bug after this lands, and to pause working on XPath optimization during the stabilization period.
Comment 17 Darin Adler 2007-03-27 22:02:59 PDT
Comment on attachment 13814 [details]
partial fix 3

+    Vector<UChar, 1024> result;
+    [...]
+    return String(result.data(), result.size());

I think it might be a slightly better idiom here is to use a plain old Vector<UChar>, then use String::adopt for the return value. Not sure. There will be slightly more allocation while accumulating the vector, but saves one allocation at the resturn site.

-    return nodes;
+    return v;

I don't understand that change; can't we still return nodes?

+            // Steals nodes from value.
+            Value(NodeSet& value) : m_type(NodeSetValue), m_data(new ValueData) { value.swap(m_data->m_nodeSet); }

I think maybe this is a little to dangerous to be a public constructor. Can you instead make it a private constructor and use a named construction function to expose this? Or maybe a clearer way would be to just construct an empty Value and then call a function named "adopt" that takes a NodeSet&. At the very least, I think this should be labeled "explicit".

Great changes, r=me
Comment 18 Alexey Proskuryakov 2007-03-29 22:22:07 PDT
(In reply to comment #17)
> I think it might be a slightly better idiom here is to use a plain old
> Vector<UChar>, then use String::adopt for the return value. Not sure.

  I've just tried using String::adopt, and got a huge drop in performance (~1ms).

> I don't understand that change; can't we still return nodes?

  I made this change to save on constructing a new Value, given that we already have a Value containing the node-set to return.

  However, measuring the performance now, I'm surprised to see that constructing a new Value seems to be slightly faster than copying - the function doesn't look like one where return value optimization could work in. Could be a random fluctuation, going to check the generated code.

> I think maybe this is a little to dangerous to be a public constructor. Can you
> instead make it a private constructor and use a named construction function to
> expose this? Or maybe a clearer way would be to just construct an empty Value
> and then call a function named "adopt" that takes a NodeSet&. At the very
> least, I think this should be labeled "explicit".

  Yes, I also dislike how it works now. Going to try safer approaches.
Comment 19 Alexey Proskuryakov 2007-03-30 12:56:39 PDT
Created attachment 13890 [details]
partial fix 3 v2

I have checked the generated code for "return v" vs. "return nodes", and the former was definitely better (it enabled return value optimization). So, looks like my measurement just showed a random fluctuation, since this code is not particularly hot.

Changed Value to make it nicer and safer to use.

These changes didn't affect performance, so thanks to the improvements from bug 13190 it's just under 12 ms on my machine now (vs. 16 ms in Firefox 2 and 10 ms in Firefox 3).
Comment 20 Darin Adler 2007-03-30 13:32:51 PDT
Comment on attachment 13890 [details]
partial fix 3 v2

r=me
Comment 21 Alexey Proskuryakov 2007-03-30 13:50:07 PDT
Committed revision 20620, closing.

Opening a new bug to track optimized XPath implementation.