Bug 16321 - new RegExp("[\u0097]{4,6}", "gmy") crashes in DEBUG builds
Summary: new RegExp("[\u0097]{4,6}", "gmy") crashes in DEBUG builds
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.4
: P1 Normal
Assignee: Darin Adler
URL:
Keywords: HasReduction, InRadar, Regression
Depends on:
Blocks:
 
Reported: 2007-12-06 00:37 PST by Eric Seidel (no email)
Modified: 2007-12-06 12:26 PST (History)
1 user (show)

See Also:


Attachments
Test case (will ASSERT) (64 bytes, text/html)
2007-12-06 07:39 PST, David Kilzer (:ddkilzer)
no flags Details
more stressful test case (171 bytes, text/html)
2007-12-06 09:38 PST, Darin Adler
no flags Details
patch (4.53 KB, patch)
2007-12-06 09:48 PST, Darin Adler
eric: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2007-12-06 00:37:37 PST
new RegExp("[\u0097]{4,6}", "gmy") crashes in DEBUG builds

(probably release builds too)

ASSERTION FAILED: code - codestart <= length
(/Stuff/Projects/WebKit/JavaScriptCore/pcre/pcre_compile.cpp:2804 JSRegExp* jsRegExpCompile(const UChar*, int, JSRegExpIgnoreCaseOption, JSRegExpMultilineOption, unsigned int*, const char**))
Comment 1 Eric Seidel (no email) 2007-12-06 00:43:10 PST
This one too:
new RegExp("[\Ñ]{2,4}");
Comment 2 Eric Seidel (no email) 2007-12-06 00:48:26 PST
And this one: new RegExp("[\£]{3,4}", "gmy");
Comment 3 Eric Seidel (no email) 2007-12-06 00:50:58 PST
Reduction: new RegExp("[\u0097]{4,6}");
Comment 4 David Kilzer (:ddkilzer) 2007-12-06 07:39:16 PST
Created attachment 17748 [details]
Test case (will ASSERT)
Comment 5 David Kilzer (:ddkilzer) 2007-12-06 07:39:56 PST
Does not crash WebKit nightly r28482 (a Release build), so the ASSERT() condition may need to be changed?

Comment 6 Darin Adler 2007-12-06 08:09:01 PST
No, the assertion is definitely correct.
Comment 7 David Kilzer (:ddkilzer) 2007-12-06 08:25:01 PST
<rdar://problem/5632992>
Comment 8 Darin Adler 2007-12-06 08:40:42 PST
These test cases are extended character classes followed by repeat sequences. Should be easy to track down.
Comment 9 Darin Adler 2007-12-06 08:53:12 PST
The preflight is computing a length of 16, the compiled regular expression has a length of 17.
Comment 10 Darin Adler 2007-12-06 09:34:21 PST
If we want to see if this crashed in older builds, we should repeat the expression multiple times, and use a max number much higher than 6.
Comment 11 Darin Adler 2007-12-06 09:35:58 PST
Famous last words, but once again I don't see how this could be a regression. The logic has not changed since Leopard. Last time I was mistaken about this.
Comment 12 Darin Adler 2007-12-06 09:38:36 PST
Created attachment 17750 [details]
more stressful test case
Comment 13 Darin Adler 2007-12-06 09:48:24 PST
Created attachment 17752 [details]
patch
Comment 14 Eric Seidel (no email) 2007-12-06 10:30:58 PST
Comment on attachment 17752 [details]
patch

Looks good.  r=me
Comment 15 Eric Seidel (no email) 2007-12-06 10:32:59 PST
I should note, the ASSERT is actually not quite in the right place.  We ASSERT right after we've walked off the end of the buffer.  We could be more aggressive and ASSERT right before we add the final KET, so as to catch cases like this were we're off by one (we'd still be asserting/returning late for cases which were off by more than 1).  I actually changed my local tree to do that, and still hit some bad off by > 1 cases.  I'll file those additional cases once darin's patch lands.
Comment 16 Darin Adler 2007-12-06 10:36:30 PST
(In reply to comment #15)
> I should note, the ASSERT is actually not quite in the right place.  We ASSERT
> right after we've walked off the end of the buffer.  We could be more
> aggressive and ASSERT right before we add the final KET, so as to catch cases
> like this were we're off by one (we'd still be asserting/returning late for
> cases which were off by more than 1).

I don't understand what you're saying. The current assertion fires if and only if we're running off the end of the buffer. If the assert doesn't fire, then we're not running off the end of the buffer.

The final KET takes 3 bytes and the END opcode takes 1 byte -- I don't see how asserting before writing out those last 4 bytes would be more accurate.

Maybe I'll understand when I see your patch.
Comment 17 Eric Seidel (no email) 2007-12-06 10:45:32 PST
I mispoke.  OP_END instead of OP_KET.

I moved the error return before writing of OP_END and made it check to make sure there was enough space to write OP_END.  Left the ASSERT checking if we were already over the code buffer before writing the OP_END.  That allowed me to continue fuzz testing while ignoring the off-by-one case caused most often by this bug, and was not a permanent solution.  What I was proposing above was moving the return, but still ASSERTing if there was not enough space for the OP_END.

I agree, it's probably not worth the hassle to bother avoiding writing the OP_END.  it's a programmer error if we ever are writing the OP_END off the end of the buffer, no sense in doing contortions in the runtime to workaround that.
Comment 18 Darin Adler 2007-12-06 11:05:49 PST
(In reply to comment #17)
> I mispoke.  OP_END instead of OP_KET.
> 
> I moved the error return before writing of OP_END and made it check to make
> sure there was enough space to write OP_END.  Left the ASSERT checking if we
> were already over the code buffer before writing the OP_END.  That allowed me
> to continue fuzz testing while ignoring the off-by-one case caused most often
> by this bug, and was not a permanent solution.  What I was proposing above was
> moving the return, but still ASSERTing if there was not enough space for the
> OP_END.
> 
> I agree, it's probably not worth the hassle to bother avoiding writing the
> OP_END.  it's a programmer error if we ever are writing the OP_END off the end
> of the buffer, no sense in doing contortions in the runtime to workaround that.

We could allocate a bigger-than-needed buffer in debug versions. Then we could catch overruns and be able to continue testing, I suppose.
Comment 19 Darin Adler 2007-12-06 12:26:10 PST
Committed revision 28491.