Bug 196296 - [YARR] Precompute BMP / non-BMP status when constructing character classes
Summary: [YARR] Precompute BMP / non-BMP status when constructing character classes
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-03-27 08:15 PDT by Michael Saboff
Modified: 2019-03-28 23:05 PDT (History)
7 users (show)

See Also:


Attachments
Patch (35.49 KB, patch)
2019-03-27 09:02 PDT, Michael Saboff
keith_miller: review+
ews-watchlist: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews206 for win-future (12.88 MB, application/zip)
2019-03-27 11:09 PDT, EWS Watchlist
no flags Details
Patch fo landing (37.46 KB, patch)
2019-03-28 10:45 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated Patch for landing - fixed MIPS build (36.47 KB, patch)
2019-03-28 11:08 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2019-03-27 08:15:56 PDT
The range of Unicode code points extends to 0x10ffff.  This codespace is broken up into 17 "planes" of 64K code points.  The first plane, 0..0xffff, is called the Basic Multilingual Plane (BMP).  In JavaScript, we use surrogate pairs to represent non-BMP code points.  The RegExp processing differs slightly for BMP and non-BMP matching.  Currently, character classes have a single flag, m_hasNonBMPCharacters, to indicate if the class has any non-BMP characters or ranges.  By adding another flag that indicates if a character class has BMP code points, the processing of character classes in the regular expressions can be optimized when the character class has only characters from either the BMP or non-BMP codespace.
Comment 1 Radar WebKit Bug Importer 2019-03-27 08:16:28 PDT
<rdar://problem/49335655>
Comment 2 Michael Saboff 2019-03-27 09:02:08 PDT
Created attachment 366074 [details]
Patch
Comment 3 EWS Watchlist 2019-03-27 09:03:35 PDT
Attachment 366074 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrPattern.h:62:  Missing spaces around |  [whitespace/operators] [3]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Keith Miller 2019-03-27 10:28:10 PDT
Comment on attachment 366074 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366074&action=review

r=me.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1813
> -        add32(TrustedImm32(1), countRegister);
>  #ifdef JIT_UNICODE_EXPRESSIONS
>          if (m_decodeSurrogatePairs) {
> -            Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
> -            op.m_jumps.append(atEndOfInput());
> -            add32(TrustedImm32(1), countRegister);
> -            add32(TrustedImm32(1), index);
> -            isBMPChar.link(this);
> -        }
> +            if (term->characterClass->hasOneCharacterSize() && !term->invert())
> +                add32(TrustedImm32(term->characterClass->hasNonBMPCharacters() ? 2 : 1), countRegister);
> +            else {
> +                add32(TrustedImm32(1), countRegister);
> +                Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
> +                op.m_jumps.append(atEndOfInput());
> +                add32(TrustedImm32(1), countRegister);
> +                add32(TrustedImm32(1), index);
> +                isBMPChar.link(this);
> +            }
> +        } else
>  #endif
> +            add32(TrustedImm32(1), countRegister);

Seems like this could be a helper method.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1977
> +                add32(TrustedImm32(term->characterClass->hasNonBMPCharacters() ? 2 : 1), index);

Nit: it might be nice to have a helper method to get the width for one class like:

Optional<size_t> CharacterClass::knownCharacterClass()
{
    if (!hasOneCharacterSize())
        return nullopt;
   return hasNonBMPCharacters() ? 2 : 1;
}
Comment 5 Michael Saboff 2019-03-27 10:29:14 PDT
Comment on attachment 366074 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366074&action=review

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1900
> +            

I'll eliminate this extra whitespace.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1916
> +            

I'll eliminate this extra whitespace.
Comment 6 EWS Watchlist 2019-03-27 11:09:09 PDT
Comment on attachment 366074 [details]
Patch

Attachment 366074 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/11685313

New failing tests:
js/regexp-unicode.html
Comment 7 EWS Watchlist 2019-03-27 11:09:21 PDT
Created attachment 366085 [details]
Archive of layout-test-results from ews206 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews206  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 8 Michael Saboff 2019-03-27 13:50:36 PDT
Comment on attachment 366074 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366074&action=review

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:1813
>> +            add32(TrustedImm32(1), countRegister);
> 
> Seems like this could be a helper method.

Done.
Comment 9 Michael Saboff 2019-03-28 10:30:09 PDT
(In reply to Build Bot from comment #7)
> Created attachment 366085 [details]
> Archive of layout-test-results from ews206 for win-future
> 
> The attached test failures were seen while running run-webkit-tests on the
> win-ews.
> Bot: ews206  Port: win-future  Platform:
> CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit

Fixed this issue locally in the YarrInterpreter.
Comment 10 Michael Saboff 2019-03-28 10:32:24 PDT
(In reply to Michael Saboff from comment #9)
> (In reply to Build Bot from comment #7)
> > Created attachment 366085 [details]
> > Archive of layout-test-results from ews206 for win-future
> > 
> > The attached test failures were seen while running run-webkit-tests on the
> > win-ews.
> > Bot: ews206  Port: win-future  Platform:
> > CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
> 
> Fixed this issue locally in the YarrInterpreter.

The jsc-armv7 test failure is the same fixed issue as the win test failure.
Comment 11 Michael Saboff 2019-03-28 10:45:24 PDT
Created attachment 366183 [details]
Patch fo landing
Comment 12 EWS Watchlist 2019-03-28 10:48:26 PDT
Attachment 366183 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrPattern.h:62:  Missing spaces around |  [whitespace/operators] [3]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 Michael Saboff 2019-03-28 11:08:20 PDT
Created attachment 366185 [details]
Updated Patch for landing - fixed MIPS build
Comment 14 EWS Watchlist 2019-03-28 11:10:53 PDT
Attachment 366185 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrPattern.h:62:  Missing spaces around |  [whitespace/operators] [3]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 Michael Saboff 2019-03-28 23:05:58 PDT
Committed r243642: <https://trac.webkit.org/changeset/243642>