WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
180537
YARR: Coalesce constructed character classes
https://bugs.webkit.org/show_bug.cgi?id=180537
Summary
YARR: Coalesce constructed character classes
Michael Saboff
Reported
2017-12-07 11:34:40 PST
Currently when we construct a character class like [abcde], we end up with a check for each character instead of characters in the range of a..e. It is also common for RegExp's to be written with something like [\s\S] when the programmer really wanted a . with the newly added 's', aka dotAll flag. In that case we perform lots of individual character and range checks. Instead we should coalesce characters and ranges when constructing a character class to reduce the resulting checks.
Attachments
Patch
(15.08 KB, patch)
2017-12-07 12:35 PST
,
Michael Saboff
jfbastien
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2017-12-07 11:35:39 PST
<
rdar://problem/35914557
>
Michael Saboff
Comment 2
2017-12-07 12:35:54 PST
Created
attachment 328716
[details]
Patch
EWS Watchlist
Comment 3
2017-12-07 12:38:23 PST
Attachment 328716
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:406: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
JF Bastien
Comment 4
2017-12-08 09:30:03 PST
Comment on
attachment 328716
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=328716&action=review
I'm not an expert in this code, but looks fine overall. Minor comments.
> Source/JavaScriptCore/yarr/YarrPattern.cpp:286 > + if (pos + index > 0 && matches[pos + index - 1] == ch - 1) {
pos and index are both unsigned, so this is just checking that it's non-zero? Or was the intent to capture wraparound as well?
> Source/JavaScriptCore/yarr/YarrPattern.cpp:358 > + // each iteration of the loop we will either remove something from the list, or break the loop.
Break the loop, or just break out of it?
> Source/JavaScriptCore/yarr/YarrPattern.cpp:407 > + && m_rangesUnicode[0].begin == 0x80 && m_rangesUnicode[0].end == 0x10ffff)
I don't get the Unicode range comparison. That's the general non-ASCII range, can the user specify invalid codepoint ranges? Or put another way, when it this range *not* the Unicode range?
Michael Saboff
Comment 5
2017-12-08 10:15:09 PST
Comment on
attachment 328716
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=328716&action=review
>> Source/JavaScriptCore/yarr/YarrPattern.cpp:286 >> + if (pos + index > 0 && matches[pos + index - 1] == ch - 1) { > > pos and index are both unsigned, so this is just checking that it's non-zero? Or was the intent to capture wraparound as well?
Just checking that it's non-zero. Due to the range of character values (0..0x10ffff), we can't get close to wrapping around even if there was one character per range.
>> Source/JavaScriptCore/yarr/YarrPattern.cpp:358 >> + // each iteration of the loop we will either remove something from the list, or break the loop. > > Break the loop, or just break out of it?
Break *out of* the loop.
>> Source/JavaScriptCore/yarr/YarrPattern.cpp:407 >> + && m_rangesUnicode[0].begin == 0x80 && m_rangesUnicode[0].end == 0x10ffff) > > I don't get the Unicode range comparison. That's the general non-ASCII range, can the user specify invalid codepoint ranges? > > Or put another way, when it this range *not* the Unicode range?
This checks that this character class matches every possible character.
Michael Saboff
Comment 6
2017-12-08 10:27:20 PST
Committed
r225683
: <
https://trac.webkit.org/changeset/225683
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug