Bug 16207

Summary: JavaScript regular expressions should match UTF-16 code units rather than characters
Product: WebKit Reporter: Darin Adler <darin>
Component: JavaScriptCoreAssignee: Darin Adler <darin>
Status: RESOLVED FIXED    
Severity: Minor CC: eric
Priority: P3    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
patch, speeds up SunSpider aroben: review+

Darin Adler
Reported 2007-11-30 07:02:13 PST
Testing with other browsers indicates that the JavaScript regular expression code needs to treat surrogate pairs as two "characters" rather than a single character to match them. This is good news in a way, because it's an easy way to make the regular expression engine faster, by removing the UTF-16 smarts from most of the engine.
Attachments
patch, speeds up SunSpider (64.63 KB, patch)
2007-11-30 07:08 PST, Darin Adler
aroben: review+
Darin Adler
Comment 1 2007-11-30 07:08:54 PST
Created attachment 17606 [details] patch, speeds up SunSpider
Adam Roben (:aroben)
Comment 2 2007-11-30 10:08:37 PST
Comment on attachment 17606 [details] patch, speeds up SunSpider 2425 d = *++ptr; The precedence here seems correct, but potentially confusing. Maybe *(++ptr) would be better? 757 int c = *stack.currentFrame->args.subjectPtr++; Again, parentheses might make it clearer what precedence you're expecting here (and in the other instances of this expression). 1640 if (stack.currentFrame->args.subjectPtr >= md.end_subject || isNewline(*stack.currentFrame->args.subjectPtr)) Why did you leave the comparison with md.end_subject here but now elsewhere? r=me
Darin Adler
Comment 3 2007-11-30 10:55:00 PST
Committed revision 28243.
Note You need to log in before you can comment on or make changes to this bug.