Bug 196296

Summary: [YARR] Precompute BMP / non-BMP status when constructing character classes
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: ews-watchlist, guijemont, guijemont+jsc-armv7-ews, keith_miller, mark.lam, saam, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
keith_miller: review+, ews-watchlist: commit-queue-
Archive of layout-test-results from ews206 for win-future
none
Patch fo landing
none
Updated Patch for landing - fixed MIPS build none

Description Michael Saboff 2019-03-27 08:15:56 PDT
The range of Unicode code points extends to 0x10ffff.  This codespace is broken up into 17 "planes" of 64K code points.  The first plane, 0..0xffff, is called the Basic Multilingual Plane (BMP).  In JavaScript, we use surrogate pairs to represent non-BMP code points.  The RegExp processing differs slightly for BMP and non-BMP matching.  Currently, character classes have a single flag, m_hasNonBMPCharacters, to indicate if the class has any non-BMP characters or ranges.  By adding another flag that indicates if a character class has BMP code points, the processing of character classes in the regular expressions can be optimized when the character class has only characters from either the BMP or non-BMP codespace.
Comment 1 Radar WebKit Bug Importer 2019-03-27 08:16:28 PDT
<rdar://problem/49335655>
Comment 2 Michael Saboff 2019-03-27 09:02:08 PDT
Created attachment 366074 [details]
Patch
Comment 3 EWS Watchlist 2019-03-27 09:03:35 PDT
Attachment 366074 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrPattern.h:62:  Missing spaces around |  [whitespace/operators] [3]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Keith Miller 2019-03-27 10:28:10 PDT
Comment on attachment 366074 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366074&action=review

r=me.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1813
> -        add32(TrustedImm32(1), countRegister);
>  #ifdef JIT_UNICODE_EXPRESSIONS
>          if (m_decodeSurrogatePairs) {
> -            Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
> -            op.m_jumps.append(atEndOfInput());
> -            add32(TrustedImm32(1), countRegister);
> -            add32(TrustedImm32(1), index);
> -            isBMPChar.link(this);
> -        }
> +            if (term->characterClass->hasOneCharacterSize() && !term->invert())
> +                add32(TrustedImm32(term->characterClass->hasNonBMPCharacters() ? 2 : 1), countRegister);
> +            else {
> +                add32(TrustedImm32(1), countRegister);
> +                Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
> +                op.m_jumps.append(atEndOfInput());
> +                add32(TrustedImm32(1), countRegister);
> +                add32(TrustedImm32(1), index);
> +                isBMPChar.link(this);
> +            }
> +        } else
>  #endif
> +            add32(TrustedImm32(1), countRegister);

Seems like this could be a helper method.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1977
> +                add32(TrustedImm32(term->characterClass->hasNonBMPCharacters() ? 2 : 1), index);

Nit: it might be nice to have a helper method to get the width for one class like:

Optional<size_t> CharacterClass::knownCharacterClass()
{
    if (!hasOneCharacterSize())
        return nullopt;
   return hasNonBMPCharacters() ? 2 : 1;
}
Comment 5 Michael Saboff 2019-03-27 10:29:14 PDT
Comment on attachment 366074 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366074&action=review

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1900
> +            

I'll eliminate this extra whitespace.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1916
> +            

I'll eliminate this extra whitespace.
Comment 6 EWS Watchlist 2019-03-27 11:09:09 PDT
Comment on attachment 366074 [details]
Patch

Attachment 366074 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/11685313

New failing tests:
js/regexp-unicode.html
Comment 7 EWS Watchlist 2019-03-27 11:09:21 PDT
Created attachment 366085 [details]
Archive of layout-test-results from ews206 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews206  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 8 Michael Saboff 2019-03-27 13:50:36 PDT
Comment on attachment 366074 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366074&action=review

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:1813
>> +            add32(TrustedImm32(1), countRegister);
> 
> Seems like this could be a helper method.

Done.
Comment 9 Michael Saboff 2019-03-28 10:30:09 PDT
(In reply to Build Bot from comment #7)
> Created attachment 366085 [details]
> Archive of layout-test-results from ews206 for win-future
> 
> The attached test failures were seen while running run-webkit-tests on the
> win-ews.
> Bot: ews206  Port: win-future  Platform:
> CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit

Fixed this issue locally in the YarrInterpreter.
Comment 10 Michael Saboff 2019-03-28 10:32:24 PDT
(In reply to Michael Saboff from comment #9)
> (In reply to Build Bot from comment #7)
> > Created attachment 366085 [details]
> > Archive of layout-test-results from ews206 for win-future
> > 
> > The attached test failures were seen while running run-webkit-tests on the
> > win-ews.
> > Bot: ews206  Port: win-future  Platform:
> > CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
> 
> Fixed this issue locally in the YarrInterpreter.

The jsc-armv7 test failure is the same fixed issue as the win test failure.
Comment 11 Michael Saboff 2019-03-28 10:45:24 PDT
Created attachment 366183 [details]
Patch fo landing
Comment 12 EWS Watchlist 2019-03-28 10:48:26 PDT
Attachment 366183 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrPattern.h:62:  Missing spaces around |  [whitespace/operators] [3]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 Michael Saboff 2019-03-28 11:08:20 PDT
Created attachment 366185 [details]
Updated Patch for landing - fixed MIPS build
Comment 14 EWS Watchlist 2019-03-28 11:10:53 PDT
Attachment 366185 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrPattern.h:62:  Missing spaces around |  [whitespace/operators] [3]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 Michael Saboff 2019-03-28 23:05:58 PDT
Committed r243642: <https://trac.webkit.org/changeset/243642>