Bug 67337 - Update RegExp and related classes to use 8 bit strings when available
Summary: Update RegExp and related classes to use 8 bit strings when available
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on: 67339
Blocks: 66161
  Show dependency treegraph
 
Reported: 2011-08-31 16:40 PDT by Michael Saboff
Modified: 2012-01-04 16:35 PST (History)
4 users (show)

See Also:


Attachments
Proposed Patch (44.78 KB, patch)
2011-09-07 16:38 PDT, Michael Saboff
barraclough: review-
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Updated patch addressing the concerns raised with the prior patch. (49.55 KB, patch)
2011-09-08 18:43 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Updated patch to fix build failures (51.37 KB, patch)
2011-09-09 14:27 PDT, Michael Saboff
barraclough: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2011-08-31 16:40:43 PDT
As part of the changes for https://bugs.webkit.org/show_bug.cgi?id=66161 we need to change the JavaScriptCore regular expression code to handle 8 bit strings.  This includes the YARR parser, JIT and interpreter code.
Comment 1 Michael Saboff 2011-09-07 16:38:18 PDT
Created attachment 106671 [details]
Proposed Patch

This patch contains 8bit code paths for the YARR interpreter and JIT.  The implementations of the JIT have been coded for all current architectures, but I have only tested X86_64 and ARM v7.
Comment 2 WebKit Review Bot 2011-09-07 16:41:29 PDT
Attachment 106671 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/assembler/X86Assembler.h:845:  cmpw_ir is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
Source/JavaScriptCore/assembler/X86Assembler.h:1175:  movzbl_mr is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
Total errors found: 2 in 18 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Michael Saboff 2011-09-07 17:03:31 PDT
Forgot to mention performance results of this change.  Running command line benchmarks, SunSpider is 0.1% faster (noise level) and V8 is 0.2% slower.
Comment 4 WebKit Review Bot 2011-09-07 18:05:14 PDT
Comment on attachment 106671 [details]
Proposed Patch

Attachment 106671 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/9525004
Comment 5 Gavin Barraclough 2011-09-07 18:18:44 PDT
Comment on attachment 106671 [details]
Proposed Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=106671&action=review

Looks good. Few issues above, my only major concern is is branch16, but would be prudent to get some non-JIT numbers to prove the extra check in the interpreter isn't an overhead.

> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1074
> +    Jump branch16(RelationalCondition cond, RegisterID left, TrustedImm32 right)

This is actually doing a 32-bit by 16-bit compare.  We should either make this work like the other branch16 method (only consider the low 16 bits), or we should be ripping out branch16 and just using a branch32.  A method called branch16 that compares 32 bits isn't right.

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:884
> +            m_assembler.testl_rr(left, left);

I think this needs to be a testw?

> Source/JavaScriptCore/runtime/RegExp.cpp:308
> +{

It might help understanding to add the following documentary ASSERT:

// If the state is NotCompiled or ParseError, then there is no representation.
// If there is a representation, and the state must be either JITCode or ByteCode.
ASSERT(!!m_representation == (m_state == JITCode || m_state == ByteCode));

> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:170
>  

We should probably have some kind of comment on this class indicating that it is a placeholder for something better one we have 8-bit StringImpls.

> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:209
> +            if (m_charSize == Char8)

We should check the performance impact of this check.
Comment 6 Michael Saboff 2011-09-08 18:43:40 PDT
Created attachment 106826 [details]
Updated patch addressing the concerns raised with the prior patch.

I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test.  It is 3% slower.  As we change to use the replacement class we can check and tune at that time.

There still may be an export issue with windows that I will rely on the buildbot catching.
Comment 7 WebKit Review Bot 2011-09-08 18:45:39 PDT
Attachment 106826 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/assembler/X86Assembler.h:845:  cmpw_ir is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
Source/JavaScriptCore/assembler/X86Assembler.h:1175:  movzbl_mr is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
Total errors found: 2 in 20 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 WebKit Review Bot 2011-09-08 20:11:05 PDT
Comment on attachment 106826 [details]
Updated patch addressing the concerns raised with the prior patch.

Attachment 106826 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/9626306
Comment 9 WebKit Review Bot 2011-09-08 21:11:53 PDT
Comment on attachment 106826 [details]
Updated patch addressing the concerns raised with the prior patch.

Attachment 106826 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/9626325
Comment 10 WebKit Review Bot 2011-09-09 04:47:20 PDT
Comment on attachment 106826 [details]
Updated patch addressing the concerns raised with the prior patch.

Attachment 106826 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9625568
Comment 11 Geoffrey Garen 2011-09-09 10:48:20 PDT
> I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test.  It is 3% slower.  As we change to use the replacement class we can check and tune at that time.

I'm not convinced by this logic. Why should we expect things to get faster when we rename the abstract iterator class to StringConstCharacterIterator? I expect things not to get faster, since the source of the performance problem -- taking a branch on every character access -- will still be there.
Comment 12 Michael Saboff 2011-09-09 14:26:44 PDT
(In reply to comment #11)
> > I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test.  It is 3% slower.  As we change to use the replacement class we can check and tune at that time.
> 
> I'm not convinced by this logic. Why should we expect things to get faster when we rename the abstract iterator class to StringConstCharacterIterator? I expect things not to get faster, since the source of the performance problem -- taking a branch on every character access -- will still be there.

I expect that we'll do more than just change names.  One of the tuning possibilities includes moving to a template implementation of the interpreter that should restore all performance back, no per character branch.
Comment 13 Michael Saboff 2011-09-09 14:27:53 PDT
Created attachment 106923 [details]
Updated patch to fix build failures
Comment 14 WebKit Review Bot 2011-09-09 14:30:58 PDT
Attachment 106923 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/assembler/X86Assembler.h:845:  cmpw_ir is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
Source/JavaScriptCore/assembler/X86Assembler.h:1175:  movzbl_mr is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
Total errors found: 2 in 23 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 Gavin Barraclough 2011-09-12 11:13:41 PDT
Comment on attachment 106923 [details]
Updated patch to fix build failures

Looks great.
Comment 16 Michael Saboff 2011-09-12 15:18:36 PDT
Committed r94981: <http://trac.webkit.org/changeset/94981>
Comment 17 James Robinson 2011-09-12 18:40:11 PDT
This adds a build-time dependency on runtime/UString.cpp from yarr/YarrInterpreter.cpp.  Historically in the Chromium build we've avoided compiling any files in runtime/.  Is this an intentional change in the dependencies for yarr or unintentional?  Would you recommend that we start compiling in all of runtime/ even in non-JSC builds?
Comment 18 James Robinson 2011-09-12 18:56:25 PDT
Filed https://bugs.webkit.org/show_bug.cgi?id=67979 about this issue.
Comment 19 James Robinson 2011-09-12 19:20:21 PDT
WTF::String has a latin1() that looks identical to UString's.  Would it make sense for CharAccess be implemented in terms of WTF::String or WTF::StringImpl instead of JSC::Ustring?
Comment 20 James Robinson 2011-09-12 19:20:39 PDT
(In reply to comment #19)
> WTF::String has a latin1() that looks identical to UString's.  Would it make sense for CharAccess be implemented in terms of WTF::String or WTF::StringImpl instead of JSC::Ustring?

Ack sorry, I meant to post this comment on https://bugs.webkit.org/show_bug.cgi?id=67979
Comment 21 Alexey Proskuryakov 2012-01-04 16:35:05 PST
Comment on attachment 106923 [details]
Updated patch to fix build failures

View in context: https://bugs.webkit.org/attachment.cgi?id=106923&action=review

> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1479
> +    Interpreter(BytecodePattern* pattern, int* output, const UString input, unsigned start, unsigned length)

Shouldn't this be "const UString&"?