Bug 174646 - Implement Unicode RegExp support in the YARR JIT
Summary: Implement Unicode RegExp support in the YARR JIT
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: Other
Hardware: Other Other
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-07-18 16:48 PDT by Michael Saboff
Modified: 2017-08-22 15:43 PDT (History)
7 users (show)

See Also:


Attachments
Work in Progress (44.61 KB, patch)
2017-07-18 18:03 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated Work in Progress with build fixes for 32 bit platforms (44.81 KB, patch)
2017-07-19 10:09 PDT, Michael Saboff
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews125 for ios-simulator-wk2 (19.79 MB, application/zip)
2017-07-20 19:45 PDT, Build Bot
no flags Details
Patch (60.78 KB, patch)
2017-08-22 11:09 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated Patch (62.65 KB, patch)
2017-08-22 13:36 PDT, Michael Saboff
fpizlo: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2017-07-18 16:48:16 PDT
Currently unicode regular expressions are handled in the Yarr interpreter.  We should add unicode support to the Yarr JIT.

<rdar://problem/33183180>
Comment 1 Michael Saboff 2017-07-18 18:03:39 PDT
Created attachment 315867 [details]
Work in Progress

This passes tests on both Mac and iOS 64 bit, but there is some more tuning that can be done.
Comment 2 Michael Saboff 2017-07-19 10:09:30 PDT
Created attachment 315930 [details]
Updated Work in Progress with build fixes for 32 bit platforms
Comment 3 Build Bot 2017-07-20 19:45:11 PDT
Comment on attachment 315930 [details]
Updated Work in Progress with build fixes for 32 bit platforms

Attachment 315930 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/4158569

New failing tests:
js/regexp-unicode.html
imported/w3c/IndexedDB-private-browsing/idbfactory_open.html
Comment 4 Build Bot 2017-07-20 19:45:13 PDT
Created attachment 316059 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.5
Comment 5 Michael Saboff 2017-08-22 11:09:25 PDT
Created attachment 318771 [details]
Patch
Comment 6 Michael Saboff 2017-08-22 13:36:24 PDT
Created attachment 318789 [details]
Updated Patch

Added check for quantifier overflow of character terms.
Comment 7 Filip Pizlo 2017-08-22 13:53:05 PDT
Comment on attachment 318789 [details]
Updated Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=318789&action=review

LGTM

> Source/JavaScriptCore/ChangeLog:38
> +        function, getEffectiveAddress64(), for ARM64.  It just calls x86Lea64() on X86-64.

Nice.
Comment 8 JF Bastien 2017-08-22 15:00:26 PDT
Comment on attachment 318789 [details]
Updated Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=318789&action=review

lgtm, not an expert in this though....

> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:2067
>  COMPILE_ASSERT(sizeof(Interpreter<UChar>::BackTrackInfoParentheses) == (YarrStackSpaceForBackTrackInfoParentheses * sizeof(uintptr_t)), CheckYarrStackSpaceForBackTrackInfoParentheses);

Move to static_assert?

> Source/JavaScriptCore/yarr/YarrJIT.cpp:2925
> +        , m_canonicalMode(m_pattern.unicode() ? CanonicalMode::Unicode : CanonicalMode::UCS2)

This is a bit weird because Unicode normalization has a concept of "canonical" which doesn't match this: http://unicode.org/faq/normalization.html
Comment 9 Michael Saboff 2017-08-22 15:42:29 PDT
(In reply to JF Bastien from comment #8)
> Comment on attachment 318789 [details]
> Updated Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=318789&action=review
> 
> lgtm, not an expert in this though....
> 
> > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:2067
> >  COMPILE_ASSERT(sizeof(Interpreter<UChar>::BackTrackInfoParentheses) == (YarrStackSpaceForBackTrackInfoParentheses * sizeof(uintptr_t)), CheckYarrStackSpaceForBackTrackInfoParentheses);
> 
> Move to static_assert?

COMPILE_ASSERT resolves to static_assert on appropriate platforms.

> > Source/JavaScriptCore/yarr/YarrJIT.cpp:2925
> > +        , m_canonicalMode(m_pattern.unicode() ? CanonicalMode::Unicode : CanonicalMode::UCS2)
> 
> This is a bit weird because Unicode normalization has a concept of
> "canonical" which doesn't match this:
> http://unicode.org/faq/normalization.html

This is part of our implementation Canonicalize() for case folding as specified in the standard at https://tc39.github.io/ecma262/#sec-runtime-semantics-canonicalize-ch.  We use CanonicalMode::Unicode to signify what that section has as "Unicode is true".  UCS2 is legacy canonicalization.
Comment 10 Michael Saboff 2017-08-22 15:43:11 PDT
Committed r221052: <http://trac.webkit.org/changeset/221052>