Summary: | XPath can be very slow | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Mark Rowe (bdash) <mrowe> | ||||||||||||||
Component: | XML | Assignee: | Alexey Proskuryakov <ap> | ||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||
Severity: | Major | CC: | ap | ||||||||||||||
Priority: | P2 | ||||||||||||||||
Version: | 523.x (Safari 3) | ||||||||||||||||
Hardware: | Mac | ||||||||||||||||
OS: | OS X 10.4 | ||||||||||||||||
URL: | http://ejohn.org/apps/classname/xpath.html?r0=1173429259 | ||||||||||||||||
Attachments: |
|
Description
Mark Rowe (bdash)
2007-03-09 00:53:08 PST
I believe the problem is that //* enumerates all element children multiple times, removing the duplicates later. I have noticed this by code inspection before, but didn't expect //* to be a common idiom. I'm going to try some possible optimizations. Alexey, I'd encourage you to use Shark to see exactly where the time is being spent in this case. As I mentioned I was seeing a large amount of time spent under Attribute::createAttrIfNeeded, specifically related to processing of DOM subtree modified events. I'm not even sure why enumerating elements would lead to DOM subtrees being modified... I cannot see how the time spent in Attribute::createAttrIfNeeded() is related to processing DOM events - but it's clearly another factor to the overall slowness, because it's needlessly called during XPath attribute node enumeration. Attribute::createAttrIfNeeded creates a DOM text node for the attribute value. When it adds itself to the Attribute via appendChild this results in a subtree modification event being dispatched. It seems a bit bizarre that the event is dispatched in this case because it seems that the DOM is not being modified as such, rather it is being lazily created. Created attachment 13578 [details]
partial fix 1
I'm still not sure what the main culprit is (the DOM modification events do not seem to be a huge contributor, according to Shark), but this patch improves the performance a bit. The results are very inconsistent, though - sometimes I measure a 10% improvement, other times, a slowdown :-/
Comment on attachment 13578 [details]
partial fix 1
r=me
One thing that's clearly a slow idiom is having functions with a return type that's a Vector. Vector is not good for these purposes because returning one involves allocating an entire new vector every time. It's better to pass a Vector by value. Or we can wrap the vector so it works better as a return value type. Or we can carefully code so we get the C++ return value optimization; but that might not help enough.
Comment on attachment 13578 [details]
partial fix 1
Committed revision 20102, clearing the review flag.
It's particularly inefficient that NodeVector is a vector of RefPtrs. It has to hold onto its nodes because Attr nodes usually don't have any other references. Of course, re-creating Attr nodes all the time doesn't help, too.
See <https://bugzilla.mozilla.org/show_bug.cgi?id=208172> and <http://www.fiveanddime.net/firefox-1.5-source/mozilla/extensions/transformiix/docs/optimized-xpath.html.html> for ideas that were considered for Mozilla optimized XPath implementation. (In reply to comment #4) > It seems a bit bizarre that the event is dispatched in this case because > it seems that the DOM is not being modified as such, rather it is being > lazily created. AFAICT, these mutation events don't even go anywhere - they appear to be directly dispatched to the attribute's text node, which cannot have any listeners attached (because it has been just created). I'm going to verify this theory, and possibly add an optimization. Created attachment 13617 [details]
partial fix 2
This brings us almost on par with Firefox (16ms vs. 17ms on my G4, down from 48ms). But it comes with a number of comments:
1) I do not like short-circuiting appendChild() like this, it would be much better to just make it fast. However, that's definitely not a stabilization period work (and even changes in Attr.cpp look questionable in this regard).
2) Merging nodesInAxis and nodeMatches saved 24ms, and short-circuiting appendChild saved another 7ms.
3) However, only making the second change (short-circuiting appendChild) actually made the performance worse (by 4ms)! I can only guess why that happened - and my guess is that XPath evaluation creates a lot of memory fragmentation, causing weird effects with fastMalloc.
4) Running the test for the second time gives a noticeably worse result (24ms). Again, I can only blame fastMalloc, but without much evidence.
5) There are still a lot of attack vectors for improving performance in this test. For example, I really want to make NodeVector a Vector<Node*> instead of Vector<RefPtr<Node> >. Also, "@class" really needs to use getAttributeNode() instead of enumerating (and instantiating in DOM) all attributes.
(In reply to comment #10) > This brings us almost on par with Firefox (16ms vs. 17ms on my G4, down from > 48ms). Comparing to Firefox 3 beta makes things less rosy (10ms there). Created attachment 13642 [details]
partial fix 2 v2
Backed out the changes to EvaluationContext - I forgot about custom namespace resolvers that can do anything when called.
Created attachment 13724 [details]
partial fix 2 v3
Updated for current TOT.
Comment on attachment 13724 [details]
partial fix 2 v3
+ // This does everything appendChild() would do in this situation, but much more efficiently.
Misleading comment because this does what appendChild would do *if* m_ignoreChildrenChanged was set.
+ RefPtr<Node> n = attrs->item(i);
+ if (nodeMatches(n.get()))
+ nodes.append(n);
Would be more efficient to append n.release().
+ bool nodeMatches(Node* node) const;
Don't need the name "node" here.
r=me
Comment on attachment 13724 [details] partial fix 2 v3 Committed revision 20487, clearing review flag from a landed patch. (In reply to comment #14) > Misleading comment because this does what appendChild would do *if* > m_ignoreChildrenChanged was set. Updated the comment. > Would be more efficient to append n.release(). Fixed. > Don't need the name "node" here. Fixed. Created attachment 13814 [details] partial fix 3 This makes us faster than Firefox 2, but still slower than Firefox 3 beta (13.5 ms on my machine vs. 16 ms and 10 ms). Results on other benchmarks (such as <http://www.andrewdupont.net/test/xpath/>) are also decent now, though not always stellar. I intend to close this bug after this lands, and to pause working on XPath optimization during the stabilization period. Comment on attachment 13814 [details]
partial fix 3
+ Vector<UChar, 1024> result;
+ [...]
+ return String(result.data(), result.size());
I think it might be a slightly better idiom here is to use a plain old Vector<UChar>, then use String::adopt for the return value. Not sure. There will be slightly more allocation while accumulating the vector, but saves one allocation at the resturn site.
- return nodes;
+ return v;
I don't understand that change; can't we still return nodes?
+ // Steals nodes from value.
+ Value(NodeSet& value) : m_type(NodeSetValue), m_data(new ValueData) { value.swap(m_data->m_nodeSet); }
I think maybe this is a little to dangerous to be a public constructor. Can you instead make it a private constructor and use a named construction function to expose this? Or maybe a clearer way would be to just construct an empty Value and then call a function named "adopt" that takes a NodeSet&. At the very least, I think this should be labeled "explicit".
Great changes, r=me
(In reply to comment #17) > I think it might be a slightly better idiom here is to use a plain old > Vector<UChar>, then use String::adopt for the return value. Not sure. I've just tried using String::adopt, and got a huge drop in performance (~1ms). > I don't understand that change; can't we still return nodes? I made this change to save on constructing a new Value, given that we already have a Value containing the node-set to return. However, measuring the performance now, I'm surprised to see that constructing a new Value seems to be slightly faster than copying - the function doesn't look like one where return value optimization could work in. Could be a random fluctuation, going to check the generated code. > I think maybe this is a little to dangerous to be a public constructor. Can you > instead make it a private constructor and use a named construction function to > expose this? Or maybe a clearer way would be to just construct an empty Value > and then call a function named "adopt" that takes a NodeSet&. At the very > least, I think this should be labeled "explicit". Yes, I also dislike how it works now. Going to try safer approaches. Created attachment 13890 [details] partial fix 3 v2 I have checked the generated code for "return v" vs. "return nodes", and the former was definitely better (it enabled return value optimization). So, looks like my measurement just showed a random fluctuation, since this code is not particularly hot. Changed Value to make it nicer and safer to use. These changes didn't affect performance, so thanks to the improvements from bug 13190 it's just under 12 ms on my machine now (vs. 16 ms in Firefox 2 and 10 ms in Firefox 3). Comment on attachment 13890 [details]
partial fix 3 v2
r=me
Committed revision 20620, closing. Opening a new bug to track optimized XPath implementation. |