Bug 26005 - Optimization for XPath //* does not preserve context size
: Optimization for XPath //* does not preserve context size
Status: RESOLVED FIXED
Product: WebKit
Classification: Unclassified
Component: XML
: 528+ (Nightly build)
: All All
: P2 Normal
Assigned To: Alexey Proskuryakov
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-25 00:33 PDT by Alexey Proskuryakov
Modified: 2009-05-28 09:26 PDT (History)
2 users (show)

See Also:


Attachments
test case (4.14 KB, text/html)
2009-05-25 00:37 PDT, Alexey Proskuryakov
no flags Details
test case showing before and after dom append (590 bytes, text/html)
2009-05-27 06:01 PDT, Phred
no flags Details
proposed fix (29.73 KB, patch)
2009-05-28 03:05 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2009-05-25 00:33:46 PDT
//* is an abbreviated form for /descendant-or-self::node()/child::*. When applying predicates to the last step, the context should contain all children elements of a given node, so e.g. //*[2] locates all second children elements in document.

We wrongfully optimize this to /descendant:node()/self::*, so the context for predicates always contains a single element.
Comment 1 Alexey Proskuryakov 2009-05-25 00:37:14 PDT
Created attachment 30645 [details]
test case
Comment 2 Phred 2009-05-27 06:01:25 PDT
Created attachment 30705 [details]
test case showing before and after dom append

As best I can discern the spec http://www.w3.org/TR/xpath#path-abbrev :

"//para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node"

There exist no parent document for the context node in the original URL, thus there should be no answer (null). Firefox 2 returned null and Opera currently returns null in this case. However once it is appended to the document, the answer refers to the root element as it should.

The bug here is that you are executing from the current document root regardless of whether the context node exist in the document or not.
Comment 3 Phred 2009-05-27 06:08:55 PDT
The previous comment was actually meant for Bug 25931.
Comment 4 Alexey Proskuryakov 2009-05-28 03:05:11 PDT
Created attachment 30733 [details]
proposed fix
Comment 5 Darin Adler 2009-05-28 09:04:54 PDT
Comment on attachment 30733 [details]
proposed fix

> + * Copyright (C) 2005 Frerich Raabe <raabe@kde.org>
> + * Copyright (C) 2006, 2009 Apple, Inc.

No comma in Apple Inc.

> +    switch (axis) {
> +        case Step::AttributeAxis:
> +            return Node::ATTRIBUTE_NODE;
> +        case Step::NamespaceAxis:
> +            return Node::XPATH_NAMESPACE_NODE;
> +        default:
> +            return Node::ELEMENT_NODE;
> +    }

I like to leave out the "default" case so that gcc can warn me about unhandled enum values. This is practical when the cases end with return statements, since the default return statement can go outside the switch, perhaps with an ASSERT_NOT_REACHED().

> +                // In XPath land, namespace nodes are not accessible on the attribute axis.
> +                if (node->namespaceURI() == "http://www.w3.org/2000/xmlns/")
> +                    return false;

Since namespaces are atomic strings, comparing to an AtomicString constant may be slightly more efficient, and it would also be nice to have that string in some common place.

r=me
Comment 6 Alexey Proskuryakov 2009-05-28 09:26:07 PDT
Committed <http://trac.webkit.org/changeset/44232>.

(In reply to comment #5)
> No comma in Apple Inc.

Fixed.

> I like to leave out the "default" case so that gcc can warn me about unhandled
> enum values. This is practical when the cases end with return statements, since
> the default return statement can go outside the switch, perhaps with an
> ASSERT_NOT_REACHED().

I think that in this case, leaving element axis as default is more natural.

> Since namespaces are atomic strings, comparing to an AtomicString constant may
> be slightly more efficient, and it would also be nice to have that string in
> some common place.

Agreed, will keep this in mind for future patches.