WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 67337
Update RegExp and related classes to use 8 bit strings when available
https://bugs.webkit.org/show_bug.cgi?id=67337
Summary
Update RegExp and related classes to use 8 bit strings when available
Michael Saboff
Reported
2011-08-31 16:40:43 PDT
As part of the changes for
https://bugs.webkit.org/show_bug.cgi?id=66161
we need to change the JavaScriptCore regular expression code to handle 8 bit strings. This includes the YARR parser, JIT and interpreter code.
Attachments
Proposed Patch
(44.78 KB, patch)
2011-09-07 16:38 PDT
,
Michael Saboff
barraclough
: review-
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Updated patch addressing the concerns raised with the prior patch.
(49.55 KB, patch)
2011-09-08 18:43 PDT
,
Michael Saboff
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Updated patch to fix build failures
(51.37 KB, patch)
2011-09-09 14:27 PDT
,
Michael Saboff
barraclough
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Michael Saboff
Comment 1
2011-09-07 16:38:18 PDT
Created
attachment 106671
[details]
Proposed Patch This patch contains 8bit code paths for the YARR interpreter and JIT. The implementations of the JIT have been coded for all current architectures, but I have only tested X86_64 and ARM v7.
WebKit Review Bot
Comment 2
2011-09-07 16:41:29 PDT
Attachment 106671
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/assembler/X86Assembler.h:845: cmpw_ir is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1175: movzbl_mr is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 2 in 18 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 3
2011-09-07 17:03:31 PDT
Forgot to mention performance results of this change. Running command line benchmarks, SunSpider is 0.1% faster (noise level) and V8 is 0.2% slower.
WebKit Review Bot
Comment 4
2011-09-07 18:05:14 PDT
Comment on
attachment 106671
[details]
Proposed Patch
Attachment 106671
[details]
did not pass cr-mac-ews (chromium): Output:
http://queues.webkit.org/results/9525004
Gavin Barraclough
Comment 5
2011-09-07 18:18:44 PDT
Comment on
attachment 106671
[details]
Proposed Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=106671&action=review
Looks good. Few issues above, my only major concern is is branch16, but would be prudent to get some non-JIT numbers to prove the extra check in the interpreter isn't an overhead.
> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1074 > + Jump branch16(RelationalCondition cond, RegisterID left, TrustedImm32 right)
This is actually doing a 32-bit by 16-bit compare. We should either make this work like the other branch16 method (only consider the low 16 bits), or we should be ripping out branch16 and just using a branch32. A method called branch16 that compares 32 bits isn't right.
> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:884 > + m_assembler.testl_rr(left, left);
I think this needs to be a testw?
> Source/JavaScriptCore/runtime/RegExp.cpp:308 > +{
It might help understanding to add the following documentary ASSERT: // If the state is NotCompiled or ParseError, then there is no representation. // If there is a representation, and the state must be either JITCode or ByteCode. ASSERT(!!m_representation == (m_state == JITCode || m_state == ByteCode));
> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:170 >
We should probably have some kind of comment on this class indicating that it is a placeholder for something better one we have 8-bit StringImpls.
> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:209 > + if (m_charSize == Char8)
We should check the performance impact of this check.
Michael Saboff
Comment 6
2011-09-08 18:43:40 PDT
Created
attachment 106826
[details]
Updated patch addressing the concerns raised with the prior patch. I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test. It is 3% slower. As we change to use the replacement class we can check and tune at that time. There still may be an export issue with windows that I will rely on the buildbot catching.
WebKit Review Bot
Comment 7
2011-09-08 18:45:39 PDT
Attachment 106826
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/assembler/X86Assembler.h:845: cmpw_ir is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1175: movzbl_mr is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 2 in 20 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 8
2011-09-08 20:11:05 PDT
Comment on
attachment 106826
[details]
Updated patch addressing the concerns raised with the prior patch.
Attachment 106826
[details]
did not pass cr-mac-ews (chromium): Output:
http://queues.webkit.org/results/9626306
WebKit Review Bot
Comment 9
2011-09-08 21:11:53 PDT
Comment on
attachment 106826
[details]
Updated patch addressing the concerns raised with the prior patch.
Attachment 106826
[details]
did not pass cr-mac-ews (chromium): Output:
http://queues.webkit.org/results/9626325
WebKit Review Bot
Comment 10
2011-09-09 04:47:20 PDT
Comment on
attachment 106826
[details]
Updated patch addressing the concerns raised with the prior patch.
Attachment 106826
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/9625568
Geoffrey Garen
Comment 11
2011-09-09 10:48:20 PDT
> I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test. It is 3% slower. As we change to use the replacement class we can check and tune at that time.
I'm not convinced by this logic. Why should we expect things to get faster when we rename the abstract iterator class to StringConstCharacterIterator? I expect things not to get faster, since the source of the performance problem -- taking a branch on every character access -- will still be there.
Michael Saboff
Comment 12
2011-09-09 14:26:44 PDT
(In reply to
comment #11
)
> > I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test. It is 3% slower. As we change to use the replacement class we can check and tune at that time. > > I'm not convinced by this logic. Why should we expect things to get faster when we rename the abstract iterator class to StringConstCharacterIterator? I expect things not to get faster, since the source of the performance problem -- taking a branch on every character access -- will still be there.
I expect that we'll do more than just change names. One of the tuning possibilities includes moving to a template implementation of the interpreter that should restore all performance back, no per character branch.
Michael Saboff
Comment 13
2011-09-09 14:27:53 PDT
Created
attachment 106923
[details]
Updated patch to fix build failures
WebKit Review Bot
Comment 14
2011-09-09 14:30:58 PDT
Attachment 106923
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/assembler/X86Assembler.h:845: cmpw_ir is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1175: movzbl_mr is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 2 in 23 files If any of these errors are false positives, please file a bug against check-webkit-style.
Gavin Barraclough
Comment 15
2011-09-12 11:13:41 PDT
Comment on
attachment 106923
[details]
Updated patch to fix build failures Looks great.
Michael Saboff
Comment 16
2011-09-12 15:18:36 PDT
Committed
r94981
: <
http://trac.webkit.org/changeset/94981
>
James Robinson
Comment 17
2011-09-12 18:40:11 PDT
This adds a build-time dependency on runtime/UString.cpp from yarr/YarrInterpreter.cpp. Historically in the Chromium build we've avoided compiling any files in runtime/. Is this an intentional change in the dependencies for yarr or unintentional? Would you recommend that we start compiling in all of runtime/ even in non-JSC builds?
James Robinson
Comment 18
2011-09-12 18:56:25 PDT
Filed
https://bugs.webkit.org/show_bug.cgi?id=67979
about this issue.
James Robinson
Comment 19
2011-09-12 19:20:21 PDT
WTF::String has a latin1() that looks identical to UString's. Would it make sense for CharAccess be implemented in terms of WTF::String or WTF::StringImpl instead of JSC::Ustring?
James Robinson
Comment 20
2011-09-12 19:20:39 PDT
(In reply to
comment #19
)
> WTF::String has a latin1() that looks identical to UString's. Would it make sense for CharAccess be implemented in terms of WTF::String or WTF::StringImpl instead of JSC::Ustring?
Ack sorry, I meant to post this comment on
https://bugs.webkit.org/show_bug.cgi?id=67979
Alexey Proskuryakov
Comment 21
2012-01-04 16:35:05 PST
Comment on
attachment 106923
[details]
Updated patch to fix build failures View in context:
https://bugs.webkit.org/attachment.cgi?id=106923&action=review
> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1479 > + Interpreter(BytecodePattern* pattern, int* output, const UString input, unsigned start, unsigned length)
Shouldn't this be "const UString&"?
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug