Bug 74665 - Poor XPath performance when evaluating an expression that returns a lot of nodes
Summary: Poor XPath performance when evaluating an expression that returns a lot of nodes
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2011-12-15 16:42 PST by Alexey Proskuryakov
Modified: 2011-12-16 11:01 PST (History)
0 users

See Also:


Attachments
proposed fix (4.41 KB, patch)
2011-12-15 16:49 PST, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2011-12-15 16:42:47 PST
Steps to reproduce: open <http://opensource.apple.com/source/SQLite/SQLite-74/derived_source/sqlite3.c>. After loading, Safari will freeze for a long time.

The reason is that Safari Reader evaluates an XPath expression, which returns about 180K nodes. Sorting such a node set takes a long time.

The sorting function is optimized for small node sets in large documents, and this is the opposite of it.

<rdar://problem/10517146>
Comment 1 Alexey Proskuryakov 2011-12-15 16:49:14 PST
Created attachment 119519 [details]
proposed fix
Comment 2 Darin Adler 2011-12-15 17:02:18 PST
Comment on attachment 119519 [details]
proposed fix

View in context: https://bugs.webkit.org/attachment.cgi?id=119519&action=review

> Source/WebCore/xml/XPathNodeSet.cpp:38
> +// When a node set is large, sorting it by traversing the whole document is better (we can
> +// assume that we aren't dealing with documents that we cannot even traverse in reasonable time).
> +const unsigned traversalSortCutoff = 10000;

Maybe comparing the node set to the total number of nodes in the document would be better. I guess we don’t really know what that is, so a hardcoded numbers seems fine.

> Source/WebCore/xml/XPathNodeSet.cpp:203
> +        if (node->isAttributeNode())
> +            containsAttributeNodes = true;

Since isAttributeNode is a virtual function we might want to do the much faster checks that have flags right in the Node first:

    if (!node->isElementNode() && !node->isTextNode() && node->isAttributeNode())

We might even consider putting that optimization into the isAttributeNode function itself.

> Source/WebCore/xml/XPathNodeSet.cpp:211
> +        if (nodes.contains(n))
> +            sortedNodes.append(n);

Would it be more or less efficient to remove each node from the set as it’s handled?

> Source/WebCore/xml/XPathNodeSet.cpp:216
> +        if (!containsAttributeNodes || !n->hasAttributes())
> +            continue;
> +
> +        NamedNodeMap* attributes = n->attributes();

It would be more efficient to write it like this:

    if (!containsAttributeNodes || !n->isElementNode())
        continue;

    NamedNodeMap* attributes = toElement(n)->attributes(true /* read-only */);
    if (!attributes)
        continue;

Avoids doing some work twice in the hasAttributes and attributes functions.

> Source/WebCore/xml/XPathNodeSet.cpp:226
> +    const_cast<Vector<RefPtr<Node> >& >(m_nodes).swap(sortedNodes);

No need for the space after the "&" character.

Is const_cast preferred over mutable?

I think a typedef for Vector<RefPtr<Node> > would help make this more readable.
Comment 3 Alexey Proskuryakov 2011-12-16 11:01:03 PST
Committed <http://trac.webkit.org/changeset/103082>.

> We might even consider putting that optimization into the isAttributeNode function itself.

Seems more appropriate to optimize isAttributeNode when needed. In this particular instance, this micro optimization feels unwarranted, as we do many more operations per node than this virtual call.

> Would it be more or less efficient to remove each node from the set as it’s handled?

My gut feeling is that it would be less efficient. But I didn't measure.

> It would be more efficient to write it like this:

Done.

> Is const_cast preferred over mutable?

I think so. The vector is not mutable, it's only sortable. We don't want to change it in other const methods.