Bug 48101 - Yarr gives different results for /(?:a*?){2,}/
Summary: Yarr gives different results for /(?:a*?){2,}/
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Gavin Barraclough
URL: https://bugzilla.mozilla.org/show_bug...
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-21 16:43 PDT by Oliver Hunt
Modified: 2010-11-29 08:29 PST (History)
4 users (show)

See Also:


Attachments
The patch (49.18 KB, patch)
2010-11-27 13:45 PST, Gavin Barraclough
sam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Hunt 2010-10-21 16:43:01 PDT
The discussion in the referenced bug concludes that yarr behaviour is wrong in this case.
Comment 1 Gavin Barraclough 2010-11-27 13:45:31 PST
Created attachment 74965 [details]
The patch

No performance impact, will update linked mozilla bug with info this afternoon.
Comment 2 Sam Weinig 2010-11-28 12:46:22 PST
Comment on attachment 74965 [details]
The patch

View in context: https://bugs.webkit.org/attachment.cgi?id=74965&action=review

Any perf change?  r+.

> JavaScriptCore/yarr/RegexInterpreter.cpp:1552
> +        // then fix this up at the end! - simplyfying this should make it mch clearer.

Typo: mch -> much.  You may also want to file a bug on this and put the bug number in the comment.
Comment 3 Sam Weinig 2010-11-28 12:46:52 PST
(In reply to comment #2)
> (From update of attachment 74965 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=74965&action=review
> 
> Any perf change? 

Ignore that, I see your comment above.
Comment 4 Gavin Barraclough 2010-11-28 16:45:55 PST
Fixed in r72781.
Comment 5 Darin Adler 2010-11-29 08:29:06 PST
Comment on attachment 74965 [details]
The patch

View in context: https://bugs.webkit.org/attachment.cgi?id=74965&action=review

> JavaScriptCore/ChangeLog:7
> +        The test cases in the linked mozilla bug demostrate a couple of

Typo: demostrate

> JavaScriptCore/ChangeLog:14
> +        matches. In the case of parenthese-single matches (quantity count

Is “parenthese-single” the intended phrase or a typo?

> JavaScriptCore/ChangeLog:16
> +        of terminal subpattern matches we do currenty check, however there

Typo: currenty

> JavaScriptCore/ChangeLog:28
> +        Terminal sunpattern matching contains a second bug, too. The frame

Typo: sunpattern

> JavaScriptCore/yarr/RegexCompiler.cpp:739
> +    //   * a set of parens at the end of the regular expression (last term in any of the alternatives of the main body disjunction).
> +    //   * where the parens are non-capturing, and quantified unbounded greedy (*).
> +    //   * where the parens do not contain any capturing subpatterns.

Non-parallel grammar here.

> JavaScriptCore/yarr/RegexCompiler.cpp:748
> +        for (unsigned i =0; i < alternatives.size(); ++i) {

Missing space after the "=". Should also probably use size_t for the type here instead of unsigned.

> JavaScriptCore/yarr/RegexCompiler.cpp:754
> +                    && term.quantityCount == UINT_MAX

Better to use a named constant instead of UINT_MAX here. We’d normally use numeric_limits<unsigned>::max() instead of UINT_MAX, by the way, but neither seems right since the maximum value has a special meaning.

> JavaScriptCore/yarr/RegexInterpreter.cpp:722
> +                    // Technically this access to inputPosition should be accessing the begin term's
> +                    // inputPosition, but for repeats other than fixed these values should be
> +                    // the same anyway! (we don't pre-check for greedy or non-greedy matches.)

Should capitalize the word “we” here.

> JavaScriptCore/yarr/RegexInterpreter.cpp:780
> +        // should always be returned as a successful match - we should never becktrack to here.

Typo: becktrack

>> JavaScriptCore/yarr/RegexInterpreter.cpp:1552
>> +        // then fix this up at the end! - simplyfying this should make it mch clearer.
> 
> Typo: mch -> much.  You may also want to file a bug on this and put the bug number in the comment.

Typo: simplyfying

> JavaScriptCore/yarr/RegexJIT.cpp:957
> +            Jump success = (term.quantityType == QuantifierFixedCount) ?
> +                jump() :
> +                branch32(NotEqual, index, Address(stackPointerRegister, (parenthesesFrameLocation * sizeof(void*))));

We normally put the ? and the : at the starts of the lines, not the ends.
Comment 6 Darin Adler 2010-11-29 08:29:31 PST
Comment on attachment 74965 [details]
The patch

View in context: https://bugs.webkit.org/attachment.cgi?id=74965&action=review

> LayoutTests/fast/regex/script-tests/repeat-match-waldemar.js:38
>  Property changes on: LayoutTests/fast/regex/script-tests/repeat-match-waldemar.js
>  ___________________________________________________________________
>  Added: svn:executable
>    + *

This should not be checked in executable.