RESOLVED FIXED Bug 16207
JavaScript regular expressions should match UTF-16 code units rather than characters
https://bugs.webkit.org/show_bug.cgi?id=16207
Summary JavaScript regular expressions should match UTF-16 code units rather than cha...
Darin Adler
Reported 2007-11-30 07:02:13 PST
Testing with other browsers indicates that the JavaScript regular expression code needs to treat surrogate pairs as two "characters" rather than a single character to match them. This is good news in a way, because it's an easy way to make the regular expression engine faster, by removing the UTF-16 smarts from most of the engine.
Attachments
patch, speeds up SunSpider (64.63 KB, patch)
2007-11-30 07:08 PST, Darin Adler
aroben: review+
Darin Adler
Comment 1 2007-11-30 07:08:54 PST
Created attachment 17606 [details] patch, speeds up SunSpider
Adam Roben (:aroben)
Comment 2 2007-11-30 10:08:37 PST
Comment on attachment 17606 [details] patch, speeds up SunSpider 2425 d = *++ptr; The precedence here seems correct, but potentially confusing. Maybe *(++ptr) would be better? 757 int c = *stack.currentFrame->args.subjectPtr++; Again, parentheses might make it clearer what precedence you're expecting here (and in the other instances of this expression). 1640 if (stack.currentFrame->args.subjectPtr >= md.end_subject || isNewline(*stack.currentFrame->args.subjectPtr)) Why did you leave the comparison with md.end_subject here but now elsewhere? r=me
Darin Adler
Comment 3 2007-11-30 10:55:00 PST
Committed revision 28243.
Note You need to log in before you can comment on or make changes to this bug.