Bug 67337

Summary: Update RegExp and related classes to use 8 bit strings when available
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: dglazkov, ggaren, jamesr, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 67339    
Bug Blocks: 66161    
Attachments:
Description Flags
Proposed Patch
barraclough: review-, webkit.review.bot: commit-queue-
Updated patch addressing the concerns raised with the prior patch.
webkit.review.bot: commit-queue-
Updated patch to fix build failures barraclough: review+

Michael Saboff
Reported 2011-08-31 16:40:43 PDT
As part of the changes for https://bugs.webkit.org/show_bug.cgi?id=66161 we need to change the JavaScriptCore regular expression code to handle 8 bit strings. This includes the YARR parser, JIT and interpreter code.
Attachments
Proposed Patch (44.78 KB, patch)
2011-09-07 16:38 PDT, Michael Saboff
barraclough: review-
webkit.review.bot: commit-queue-
Updated patch addressing the concerns raised with the prior patch. (49.55 KB, patch)
2011-09-08 18:43 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Updated patch to fix build failures (51.37 KB, patch)
2011-09-09 14:27 PDT, Michael Saboff
barraclough: review+
Michael Saboff
Comment 1 2011-09-07 16:38:18 PDT
Created attachment 106671 [details] Proposed Patch This patch contains 8bit code paths for the YARR interpreter and JIT. The implementations of the JIT have been coded for all current architectures, but I have only tested X86_64 and ARM v7.
WebKit Review Bot
Comment 2 2011-09-07 16:41:29 PDT
Attachment 106671 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/assembler/X86Assembler.h:845: cmpw_ir is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1175: movzbl_mr is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 2 in 18 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 3 2011-09-07 17:03:31 PDT
Forgot to mention performance results of this change. Running command line benchmarks, SunSpider is 0.1% faster (noise level) and V8 is 0.2% slower.
WebKit Review Bot
Comment 4 2011-09-07 18:05:14 PDT
Comment on attachment 106671 [details] Proposed Patch Attachment 106671 [details] did not pass cr-mac-ews (chromium): Output: http://queues.webkit.org/results/9525004
Gavin Barraclough
Comment 5 2011-09-07 18:18:44 PDT
Comment on attachment 106671 [details] Proposed Patch View in context: https://bugs.webkit.org/attachment.cgi?id=106671&action=review Looks good. Few issues above, my only major concern is is branch16, but would be prudent to get some non-JIT numbers to prove the extra check in the interpreter isn't an overhead. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1074 > + Jump branch16(RelationalCondition cond, RegisterID left, TrustedImm32 right) This is actually doing a 32-bit by 16-bit compare. We should either make this work like the other branch16 method (only consider the low 16 bits), or we should be ripping out branch16 and just using a branch32. A method called branch16 that compares 32 bits isn't right. > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:884 > + m_assembler.testl_rr(left, left); I think this needs to be a testw? > Source/JavaScriptCore/runtime/RegExp.cpp:308 > +{ It might help understanding to add the following documentary ASSERT: // If the state is NotCompiled or ParseError, then there is no representation. // If there is a representation, and the state must be either JITCode or ByteCode. ASSERT(!!m_representation == (m_state == JITCode || m_state == ByteCode)); > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:170 > We should probably have some kind of comment on this class indicating that it is a placeholder for something better one we have 8-bit StringImpls. > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:209 > + if (m_charSize == Char8) We should check the performance impact of this check.
Michael Saboff
Comment 6 2011-09-08 18:43:40 PDT
Created attachment 106826 [details] Updated patch addressing the concerns raised with the prior patch. I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test. It is 3% slower. As we change to use the replacement class we can check and tune at that time. There still may be an export issue with windows that I will rely on the buildbot catching.
WebKit Review Bot
Comment 7 2011-09-08 18:45:39 PDT
Attachment 106826 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/assembler/X86Assembler.h:845: cmpw_ir is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1175: movzbl_mr is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 2 in 20 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 8 2011-09-08 20:11:05 PDT
Comment on attachment 106826 [details] Updated patch addressing the concerns raised with the prior patch. Attachment 106826 [details] did not pass cr-mac-ews (chromium): Output: http://queues.webkit.org/results/9626306
WebKit Review Bot
Comment 9 2011-09-08 21:11:53 PDT
Comment on attachment 106826 [details] Updated patch addressing the concerns raised with the prior patch. Attachment 106826 [details] did not pass cr-mac-ews (chromium): Output: http://queues.webkit.org/results/9626325
WebKit Review Bot
Comment 10 2011-09-09 04:47:20 PDT
Comment on attachment 106826 [details] Updated patch addressing the concerns raised with the prior patch. Attachment 106826 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9625568
Geoffrey Garen
Comment 11 2011-09-09 10:48:20 PDT
> I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test. It is 3% slower. As we change to use the replacement class we can check and tune at that time. I'm not convinced by this logic. Why should we expect things to get faster when we rename the abstract iterator class to StringConstCharacterIterator? I expect things not to get faster, since the source of the performance problem -- taking a branch on every character access -- will still be there.
Michael Saboff
Comment 12 2011-09-09 14:26:44 PDT
(In reply to comment #11) > > I checked the performance of the interpreter character size check with JIT turned off running V8 RegExp test. It is 3% slower. As we change to use the replacement class we can check and tune at that time. > > I'm not convinced by this logic. Why should we expect things to get faster when we rename the abstract iterator class to StringConstCharacterIterator? I expect things not to get faster, since the source of the performance problem -- taking a branch on every character access -- will still be there. I expect that we'll do more than just change names. One of the tuning possibilities includes moving to a template implementation of the interpreter that should restore all performance back, no per character branch.
Michael Saboff
Comment 13 2011-09-09 14:27:53 PDT
Created attachment 106923 [details] Updated patch to fix build failures
WebKit Review Bot
Comment 14 2011-09-09 14:30:58 PDT
Attachment 106923 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/assembler/X86Assembler.h:845: cmpw_ir is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1175: movzbl_mr is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 2 in 23 files If any of these errors are false positives, please file a bug against check-webkit-style.
Gavin Barraclough
Comment 15 2011-09-12 11:13:41 PDT
Comment on attachment 106923 [details] Updated patch to fix build failures Looks great.
Michael Saboff
Comment 16 2011-09-12 15:18:36 PDT
James Robinson
Comment 17 2011-09-12 18:40:11 PDT
This adds a build-time dependency on runtime/UString.cpp from yarr/YarrInterpreter.cpp. Historically in the Chromium build we've avoided compiling any files in runtime/. Is this an intentional change in the dependencies for yarr or unintentional? Would you recommend that we start compiling in all of runtime/ even in non-JSC builds?
James Robinson
Comment 18 2011-09-12 18:56:25 PDT
James Robinson
Comment 19 2011-09-12 19:20:21 PDT
WTF::String has a latin1() that looks identical to UString's. Would it make sense for CharAccess be implemented in terms of WTF::String or WTF::StringImpl instead of JSC::Ustring?
James Robinson
Comment 20 2011-09-12 19:20:39 PDT
(In reply to comment #19) > WTF::String has a latin1() that looks identical to UString's. Would it make sense for CharAccess be implemented in terms of WTF::String or WTF::StringImpl instead of JSC::Ustring? Ack sorry, I meant to post this comment on https://bugs.webkit.org/show_bug.cgi?id=67979
Alexey Proskuryakov
Comment 21 2012-01-04 16:35:05 PST
Comment on attachment 106923 [details] Updated patch to fix build failures View in context: https://bugs.webkit.org/attachment.cgi?id=106923&action=review > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1479 > + Interpreter(BytecodePattern* pattern, int* output, const UString input, unsigned start, unsigned length) Shouldn't this be "const UString&"?
Note You need to log in before you can comment on or make changes to this bug.