Bug 50366

Summary: XPath lexer misinterprets expression starting with "div".
Product: WebKit Reporter: Yonathan Randolph <yonathan>
Component: XMLAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, commit-queue, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
XPath parser patch.
none
Unit test for parsing div, mod, and *.
none
Updated patch with exhaustive test and slight bug fix.
none
Mollify check-webkit-style a little. none

Description Yonathan Randolph 2010-12-01 18:22:42 PST
The following expression fails because the first token is interpreted as binary operator "div", not the step "div".
document.evaluate('div', document.body, null, XPathResult.SINGLE_NODE_VALUE, null)
(Error: INVALID_EXPRESSION_ERR: DOM XPath Exception 51)

It's easy to work around the bug:
document.evaluate('./div', document.body, null, XPathResult.SINGLE_NODE_VALUE, null)
Comment 1 Yonathan Randolph 2010-12-01 18:30:13 PST
Created attachment 75344 [details]
XPath parser patch.

Note: if you're not seeing the error, try adding some whitespace to the XPath: "  div". It fails in Chrome at least.

Does anyone know what tests I should add for this? I didn't see unit tests in the last few changes to XPathParser.cpp.
Comment 2 Alexey Proskuryakov 2010-12-01 23:00:47 PST
The only text in the spec I can see that's related is this disambiguation rule in <http://www.w3.org/TR/xpath/#exprlex>:

"Otherwise, the token must not be recognized as a MultiplyOperator, an OperatorName, a NodeType, a FunctionName, or an AxisName."

I'm not sure how exactly to interpret this - there is no case that allows an OperatorName in disambiguation rules! The proposed patch agrees with common sense, but may be operating at a level that's different from what's implied by the spec.

> Does anyone know what tests I should add for this?

See existing tests in LayoutTests/fast/xpath, for example document-order.html or complex-id.html. The most recent changes to this file didn't affect observed behavior, so they didn't have regression tests.

Besides 'div', there should be a test for 'mod'.
Comment 3 Yonathan Randolph 2010-12-02 15:45:25 PST
Created attachment 75424 [details]
Unit test for parsing div, mod, and *.

(In reply to comment #2)
> The only text in the spec I can see that's related is this disambiguation rule in <http://www.w3.org/TR/xpath/#exprlex>:
> 
> "Otherwise, the token must not be recognized as a MultiplyOperator, an OperatorName, a NodeType, a FunctionName, or an AxisName."
> 
> I'm not sure how exactly to interpret this - there is no case that allows an OperatorName in disambiguation rules! The proposed patch agrees with common sense, but may be operating at a level that's different from what's implied by the spec.

No, I had to read it twice too. The first bullet point tells you the only cases an ambiguous binary operators can happen:

"If there is a preceding token and the preceding token is not one of @, ::, (, [, , or an Operator, then a * must be recognized as a MultiplyOperator and an NCName must be recognized as an OperatorName."

The bug was that the parser didn't properly check for "If there is a preceding token". It tested m_nextPos (which changes with whitespace) rather than the previous token value.

> 
> > Does anyone know what tests I should add for this?
> 
> See existing tests in LayoutTests/fast/xpath, for example document-order.html or complex-id.html. The most recent changes to this file didn't affect observed behavior, so they didn't have regression tests.
> 
> Besides 'div', there should be a test for 'mod'.

I've added a test.
Comment 4 Alexey Proskuryakov 2010-12-02 16:05:03 PST
Looks great. Could you please combine these into a single patch, and mark it for review (set review? flag when adding an attachment)?

+<p>Test that an NCName and * are interpreted as an operator when it is in
+binary operator context, and as a NameTest otherwise..

Nit: two periods at the end.

-/* Returns whether the last parsed token matches the [32] Operator rule
- * (check http://www.w3.org/TR/xpath#exprlex). Necessary to disambiguate
- * the tokens.
+/* Returns whether the current token can possibly be a binary operator, given
+ * the previous token. Necessary to disambiguate some of the operators
+ * (* (multiply), div, and, or, mod) in the [32] Operator rule
+ * (check http://www.w3.org/TR/xpath#exprlex).
  */

Nit: this is a pre-existing issue, but we don't use C comments in C++ code as a matter of coding style.

-    if (isOperatorContext()) {
+    if (isBinaryOperatorContext()) {
         if (name == "and") //### hash?
             return Token(AND);

I'm wondering what this ### comment is about.

+++ fast/xpath/ambiguous-operators.html	(revision 0)

I'd try to add as many tests as I can think of, even if redundant. Say, "div/div", "(12) div 4" or "div + div".
Comment 5 Yonathan Randolph 2010-12-02 20:59:53 PST
Created attachment 75461 [details]
Updated patch with exhaustive test and slight bug fix.

I've done what you've asked for.

> -    if (isOperatorContext()) {
> +    if (isBinaryOperatorContext()) {
>          if (name == "and") //### hash?
>              return Token(AND);
> 
> I'm wondering what this ### comment is about.

I don't understand it either.

> I'd try to add as many tests as I can think of, even if redundant. Say, "div/div", "(12) div 4" or "div + div".
I've begrudgingly made the test exhaustive (I think). Turns out this was actually a good idea, because I found a case I didn't notice earlier: node test immediately following comma.
Comment 6 WebKit Review Bot 2010-12-02 21:02:44 PST
Attachment 75461 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/fast/xpath/ambiguous-operators-expected.txt', u'LayoutTests/fast/xpath/ambiguous-operators.html', u'WebCore/xml/XPathParser.cpp', u'WebCore/xml/XPathParser.h']" exit_code: 1
WebCore/xml/XPathParser.cpp:134:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Yonathan Randolph 2010-12-02 21:25:32 PST
Created attachment 75463 [details]
Mollify check-webkit-style a little.

Also I forgot to include ChangeLogs to svn diff earlier.
Comment 8 Alexey Proskuryakov 2010-12-02 22:11:27 PST
Comment on attachment 75463 [details]
Mollify check-webkit-style a little.

r=me. Also marking approved for commit queue - it's been having some trouble lately, so let's see if it works.

> Turns out this was actually a good idea, because I found a case I didn't notice earlier: node test immediately following comma.

I'd glad to hear that!
Comment 9 Alexey Proskuryakov 2010-12-02 22:14:06 PST
Interestingly, Firefox 3.6 fails two of the subtests:

FAIL - - div: raised [Exception... "The expression cannot be converted to return the specified type." code: "52" nsresult: "0x805b0034 (NS_ERROR_DOM_TYPE_ERR)" location: "file:///Users/ap/Safari/OpenSource/LayoutTests/fast/xpath/xpath-test-pre.js Line: 28"]

FAIL - - *: raised [Exception... "The expression cannot be converted to return the specified type." code: "52" nsresult: "0x805b0034 (NS_ERROR_DOM_TYPE_ERR)" location: "file:///Users/ap/Safari/OpenSource/LayoutTests/fast/xpath/xpath-test-pre.js Line: 28"]
Comment 10 WebKit Commit Bot 2010-12-03 00:10:07 PST
The commit-queue encountered the following flaky tests while processing attachment 75463 [details]:

inspector/styles-disable-inherited.html

Please file bugs against the tests.  These tests were authored by pfeldman@chromium.org.  The commit-queue is continuing to process your patch.
Comment 11 WebKit Commit Bot 2010-12-03 00:12:26 PST
Comment on attachment 75463 [details]
Mollify check-webkit-style a little.

Clearing flags on attachment: 75463

Committed r73247: <http://trac.webkit.org/changeset/73247>
Comment 12 WebKit Commit Bot 2010-12-03 00:12:32 PST
All reviewed patches have been landed.  Closing bug.
Comment 13 Yonathan Randolph 2010-12-04 23:28:52 PST
(In reply to comment #9)
> Interestingly, Firefox 3.6 fails two of the subtests:

By the way, it seems that Gecko fails because it removes "redundant" double minus operators, and then the return value doesn't get coerced to number anymore. I filed the bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=616774