Bug 10370

Summary: RegExp fails to match non-ASCII characters against [\S\s]
Product: WebKit Reporter: Doug Wright <apple>
Component: JavaScriptCoreAssignee: Alexey Proskuryakov <ap>
Status: RESOLVED FIXED    
Severity: Major CC: ap, ddkilzer, hartman.wiki, mrowe, nilcolor
Priority: P2 Keywords: HasReduction
Version: 420+   
Hardware: All   
OS: OS X 10.4   
URL: http://www.dougweb.org/bugzilla/safari/regexpbug/
Attachments:
Description Flags
Reduced test case
none
a more complete test case
none
a more complete test case
none
proposed fix darin: review+

Doug Wright
Reported 2006-08-12 09:51:19 PDT
See testcase. The 2nd alert() should display a few lines of text, but instead displays null because the regexp scanner has barfed upon encountering ’.
Attachments
Reduced test case (703 bytes, text/html)
2006-08-12 19:20 PDT, Mark Rowe (bdash)
no flags
a more complete test case (1.06 KB, text/html)
2007-09-20 05:53 PDT, Alexey Proskuryakov
no flags
a more complete test case (1.45 KB, text/html)
2007-09-22 13:07 PDT, Alexey Proskuryakov
no flags
proposed fix (68.01 KB, patch)
2007-09-29 02:41 PDT, Alexey Proskuryakov
darin: review+
Mark Rowe (bdash)
Comment 1 2006-08-12 19:19:19 PDT
Confirmed with WebKit ToT and 418.8. The character in question is Unicode "RIGHT SINGLE QUOTATION MARK". Reduction forthcoming.
Mark Rowe (bdash)
Comment 2 2006-08-12 19:20:11 PDT
Created attachment 10008 [details] Reduced test case
Alexey Proskuryakov
Comment 3 2007-09-19 03:53:07 PDT
*** Bug 15224 has been marked as a duplicate of this bug. ***
Alexey Proskuryakov
Comment 4 2007-09-19 03:56:13 PDT
As seen in bug 15224, this affects all non-ASCII characters, and causes problems in prototype.js. Looks like a very important bug to me.
Alexey Proskuryakov
Comment 5 2007-09-20 05:53:11 PDT
Created attachment 16333 [details] a more complete test case Tests other regex special characters, too. Passes in Firefox, and mostly passes in IE7, which apparently doesn't treat Unicode whitespace characters as such.
Alexey Proskuryakov
Comment 6 2007-09-22 02:17:25 PDT
This issue is also present in original PCRE 6.1 and 7.4. From comments in code, I'm not sure what the intended behavior for Perl is, but the the fact that \S and [\S] work differently surely looks like an bug.
Alexey Proskuryakov
Comment 7 2007-09-22 13:07:17 PDT
Created attachment 16349 [details] a more complete test case Added a test for a closely related issue from <http://bugs.exim.org/show_bug.cgi?id=580>. That bug was recently fixed, see svn diff -r218:219 svn://tahini.csx.cam.ac.uk/pcre I'm going to file the problem with [\S] to PCRE bugzilla soon.
Alexey Proskuryakov
Comment 8 2007-09-22 13:19:12 PDT
(In reply to comment #7) > svn diff -r218:219 svn://tahini.csx.cam.ac.uk/pcre I've just found that there's a ViewVC for PCRE: http://vcs.pcre.org/viewvc?view=rev&revision=219 > I'm going to file the problem with [\S] to PCRE bugzilla soon. http://bugs.exim.org/show_bug.cgi?id=603
Alexey Proskuryakov
Comment 9 2007-09-29 02:41:59 PDT
Created attachment 16446 [details] proposed fix This is based on an approach suggested by Philip Hazel, and on his fix for \S{2} vs. \S\S bug. I think this fix is important enough to go to trunk.
Darin Adler
Comment 10 2007-10-01 16:54:02 PDT
Comment on attachment 16446 [details] proposed fix r=me
Alexey Proskuryakov
Comment 11 2007-10-02 21:42:27 PDT
Committed revision 25958 (feature branch).
Alexey Blinov
Comment 12 2007-10-08 00:36:38 PDT
Hi. Feature branch - is it nightly build of WebKit (http://nightly.webkit.org/)? Or I have to compile it myself?
Mark Rowe (bdash)
Comment 13 2007-10-08 00:39:30 PDT
Nightly builds of the feature branch are available at http://nightly.webkit.org/builds/overview/feature-branch.
Alexey Proskuryakov
Comment 14 2007-11-04 03:20:57 PST
*** Bug 14877 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.