71331 – Towards 8 Bit Strings: Templatize JSC::Lexer class by character type

RESOLVED FIXED 71331

Towards 8 Bit Strings: Templatize JSC::Lexer class by character type

https://bugs.webkit.org/show_bug.cgi?id=71331

Summary Towards 8 Bit Strings: Templatize JSC::Lexer class by character type

Michael Saboff

Reported 2011-11-01 15:19:21 PDT

The JSC::Lexer class needs to handle source strings of both 8 and 16 bit strings. The current class processes the source using character pointers. Therefore it makes most sense to templatize the class to make it work with both 8 and 16 bit strings without impacting performance. <rdar://problem/10225301>

Attachments
Proposed Patch (63.59 KB, patch) 2011-11-01 17:27 PDT, Michael Saboff	ggaren: review- webkit-ews: commit-queue-	Details Formatted Diff Diff
Updated patch (64.47 KB, patch) 2011-11-02 17:44 PDT, Michael Saboff	webkit-ews: commit-queue-	Details Formatted Diff Diff
Patch with updates to fix build issues (65.56 KB, patch) 2011-11-03 11:01 PDT, Michael Saboff	webkit-ews: commit-queue-	Details Formatted Diff Diff
Updated patch with another build fix (67.51 KB, patch) 2011-11-03 11:50 PDT, Michael Saboff	webkit-ews: commit-queue-	Details Formatted Diff Diff
Patch updated for qt build failure (68.29 KB, patch) 2011-11-03 13:31 PDT, Michael Saboff	webkit-ews: commit-queue-	Details Formatted Diff Diff
Fixed qt build (again) (69.24 KB, patch) 2011-11-03 16:33 PDT, Michael Saboff	ggaren: review-	Details Formatted Diff Diff
Updated patch (75.46 KB, patch) 2011-11-04 16:47 PDT, Michael Saboff	darin: review+	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Michael Saboff

Comment 1 2011-11-01 17:27:59 PDT

Created attachment 113262 [details] Proposed Patch

WebKit Review Bot

Comment 2 2011-11-01 17:29:52 PDT

Attachment 113262 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/parser/Lexer.cpp:528: One space before end of line comments [whitespace/comments] [5] Total errors found: 1 in 23 files If any of these errors are false positives, please file a bug against check-webkit-style.

Early Warning System Bot

Comment 3 2011-11-01 18:13:02 PDT

Comment on attachment 113262 [details] Proposed Patch Attachment 113262 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/10190113

Collabora GTK+ EWS bot

Comment 4 2011-11-02 01:15:05 PDT

Comment on attachment 113262 [details] Proposed Patch Attachment 113262 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/10148173

Daniel Bates

Comment 5 2011-11-02 03:48:07 PDT

Comment on attachment 113262 [details] Proposed Patch Attachment 113262 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/10256218

Gyuyoung Kim

Comment 6 2011-11-02 04:59:38 PDT

Comment on attachment 113262 [details] Proposed Patch Attachment 113262 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/10259231

Geoffrey Garen

Comment 7 2011-11-02 08:36:49 PDT

A few compilation issues: ../../../Source/JavaScriptCore/parser/SyntaxChecker.h:118: error: declaration of 'const unsigned int JSC::SyntaxChecker::DontBuildKeywords' ../../../Source/JavaScriptCore/parser/Lexer.h:70: error: changes meaning of 'DontBuildKeywords' from 'JSC::LexType JSC::DontBuildKeywords' ../../../Source/JavaScriptCore/parser/SyntaxChecker.h:119: error: declaration of 'const unsigned int JSC::SyntaxChecker::DontBuildStrings' ../../../Source/JavaScriptCore/parser/Lexer.h:69: error: changes meaning of 'DontBuildStrings' from 'JSC::LexType JSC::DontBuildStrings' ../../../Source/JavaScriptCore/runtime/Identifier.cpp:227: error: comparison is always true due to limited range of data type distcc[10251] ERROR: compile ../../../Source/JavaScriptCore/runtime/Identifier.cpp on localhost failed 3>c:\cygwin\home\buildbot\webkit\source\javascriptcore\parser\Lexer.h(196) : warning C4661: 'void JSC::Lexer<T>::copyCodeWithoutBOMs(void)' : no suitable definition provided for explicit template instantiation request 3> with 3> [ 3> T=UChar 3> ] 3> c:\cygwin\home\buildbot\webkit\source\javascriptcore\parser\Lexer.h(141) : see declaration of 'JSC::Lexer<T>::copyCodeWithoutBOMs' 3> with 3> [ 3> T=UChar 3> ]

Geoffrey Garen

Comment 8 2011-11-02 09:28:43 PDT

Comment on attachment 113262 [details] Proposed Patch View in context: https://bugs.webkit.org/attachment.cgi?id=113262&action=review r- because it doesn't build. I think you can get a lot more speed and simplicity out of this patch. See below. > Source/JavaScriptCore/parser/Lexer.cpp:288 > + ASSERT(sizeof(T) == sizeof(char)); Should this be sizeof(LChar)? > Source/JavaScriptCore/parser/Lexer.cpp:289 > + data = reinterpret_cast<const T *>(source.provider()->data()->characters8()); No space before *, please. Shouldn't <T> always be LChar or UChar? If so, no need for cast here. > Source/JavaScriptCore/parser/Lexer.cpp:292 > + data = reinterpret_cast<const T *>(source.provider()->data()->characters16()); No space before *, please. Shouldn't <T> always be LChar or UChar? If so, no need for cast here. > Source/JavaScriptCore/parser/Lexer.cpp:477 > + m_buffer16.append(static_cast<UChar>(c)); No need for a cast between LChar and UChar. > Source/JavaScriptCore/parser/Lexer.cpp:485 > + m_buffer16.append(static_cast<UChar>(c)); No need for a cast between LChar and UChar. > Source/JavaScriptCore/parser/Lexer.cpp:492 > + // FIXME: Change record16 and m_buffer16 to record8 and m_buffer8 below when > + // 8 bit strings are turned on. Better to just not record at all in the fast case. See below. > Source/JavaScriptCore/parser/Lexer.cpp:493 > + int startingOffset = currentOffset(); startingOffset is redundant with identifierStart, and can be removed. > Source/JavaScriptCore/parser/Lexer.cpp:494 > + int startingLineNumber = lineNumber(); You can wait to record startingLineNumber until after the keyword case, since the keyword case does not advance the line number unless it succeeds. > Source/JavaScriptCore/parser/Lexer.cpp:504 > + bool bufferRequired = false; bufferRequired is always false, and can be removed. > Source/JavaScriptCore/parser/Lexer.cpp:508 > + while (true) { > + if (LIKELY(isIdentPart(m_current))) { > + shift(); I think this would read more naturally as: while (isIdentPart(m_current)) shift(); if (m_current == '\\') { ... // Reset everything and return parseIdentifierSlowCase. } > Source/JavaScriptCore/parser/Lexer.cpp:516 > + m_buffer16.resize(0); // FIXME: Change to m_buffer8 No need to resize to 0, since we haven't appended anything. > Source/JavaScriptCore/parser/Lexer.cpp:523 > + if (!bufferRequired) { bufferRequired is always false, so you can remove this case. > Source/JavaScriptCore/parser/Lexer.cpp:529 > + append16(identifierStart, currentCharacter() - identifierStart); // FIXME: Change to append8 > + ident = makeIdentifier(m_buffer16.data(), m_buffer16.size()); // FIXME: Change to m_buffer8 Why append to one buffer just to copy into another? I think you should just call makeIdentifier, passing identifierStart and currentCharacter() - identifierStart, with no intervening append16. Then you can also remove clearing m_buffer16 from the fast case. > Source/JavaScriptCore/parser/Lexer.cpp:540 > + // Keywords must not be recognized if there was an \uXXXX in the identifier. You've already ruled out a Unicode escape above (which is good, because people don't typically name their variables "MyV\u0061riable"). You can remove all of this code. > Source/JavaScriptCore/parser/Lexer.cpp:558 > +template <bool shouldCreateIdentifier> ALWAYS_INLINE JSTokenType Lexer<T>::parseIdentifierSlowCase(JSTokenData* tokenData, unsigned lexType, bool strictMode) Please don't mark the slow case inline. It seems like you can remove some code from the slow case, which already ran in the fast case. You can at least remove the fast check for a keyword at the head of the slow case. > Source/JavaScriptCore/parser/Lexer.cpp:646 > + if (UNLIKELY((m_current == '\\'))) { Since you've decided that this case is unlikely, can it move out into a helper function? That way, the codegen for the likely stuff will improve, and this code can be shared with the slow case. > Source/JavaScriptCore/parser/Lexer.cpp:690 > + // Fast check for characters that require special handling. > + // Catches -1, \n, \r, 0x2028, and 0x2029 as efficiently > + // as possible, and lets through all common ASCII characters. > + if (UNLIKELY(((static_cast<unsigned>(m_current) - 0xE) & 0x2000))) { You can remove this case entirely if you just change the test above from "m_current > 0xff" to "m_current > 0xff || m_current < 0xe". (Unless you think we need to handle the 'bell' character on the fast path ;) ). > Source/JavaScriptCore/parser/Lexer.h:66 > +// FIXME: LexType and ShiftType were moved outside of the Lexer definition due > +// to a clang compiler bug (<rdar://problem/10194295>) that doesn't properly > +// handle fully qualified enums defined within a template class. I think this issue is self-documenting: If anybody tries to move this enum inside the Lexer, it won't compile. > Source/JavaScriptCore/parser/ParserArena.h:69 > + ALWAYS_INLINE const Identifier& IdentifierArena::makeIdentifier(JSGlobalData* globalData, const LChar* characters, size_t length) Since this is just a copy of the existing makeIdentifier, I think it would be better to templatize the existing makeIdentifier, with T* as the type for "characters". > Source/JavaScriptCore/runtime/Identifier.cpp:130 > +struct IdentifierLCharBufferTranslator { Since these two structs are just a copy of UCharBuffer and IdentifierUCharBufferTranslator, I think you should just templatize those, with T* as the type for "s". > Source/JavaScriptCore/runtime/Identifier.cpp:223 > +PassRefPtr<StringImpl> Identifier::add(JSGlobalData* globalData, const LChar* s, int length) Since this is just a copy of Identifier::add for const UChar*, I think you should just templatize that function as well, with T* as the type for "s".

Michael Saboff

Comment 9 2011-11-02 17:44:56 PDT

Created attachment 113408 [details] Updated patch (In reply to comment #8) Most of the suggestions were taken. The ones not taken have comments below. Fixed two build issues. Still concerned that some compilers may not like comparison in template function between and unsigned char and 0xff. > (From update of attachment 113262 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=113262&action=review > > r- because it doesn't build. > > I think you can get a lot more speed and simplicity out of this patch. See below. > > > Source/JavaScriptCore/parser/Lexer.cpp:289 > > + data = reinterpret_cast<const T *>(source.provider()->data()->characters8()); > > Shouldn't <T> always be LChar or UChar? If so, no need for cast here. This is needed because this is a templates function that is compiled for both LChar and UChar types. One of these cast will be needed for each instantiation. > > Source/JavaScriptCore/parser/Lexer.cpp:292 > > + data = reinterpret_cast<const T *>(source.provider()->data()->characters16()); > > Shouldn't <T> always be LChar or UChar? If so, no need for cast here. See previous comment. > > Source/JavaScriptCore/parser/Lexer.cpp:485 > > + m_buffer16.append(static_cast<UChar>(c)); > > No need for a cast between LChar and UChar. This is needed since c is type "int". > > Source/JavaScriptCore/parser/Lexer.cpp:492 > > + // FIXME: Change record16 and m_buffer16 to record8 and m_buffer8 below when > > + // 8 bit strings are turned on. > > Better to just not record at all in the fast case. See below. We need to record in the case that the buffer type and template type (LChar and UChar) differ. The code has been updated accordingly. > > Source/JavaScriptCore/parser/Lexer.cpp:504 > > + bool bufferRequired = false; > > bufferRequired is always false, and can be removed. See previous comment. > > Source/JavaScriptCore/parser/Lexer.cpp:516 > > + m_buffer16.resize(0); // FIXME: Change to m_buffer8 > > No need to resize to 0, since we haven't appended anything. Added an if (bufferRequired) above this. > > Source/JavaScriptCore/parser/Lexer.cpp:529 > > + append16(identifierStart, currentCharacter() - identifierStart); // FIXME: Change to append8 > > + ident = makeIdentifier(m_buffer16.data(), m_buffer16.size()); // FIXME: Change to m_buffer8 > > Why append to one buffer just to copy into another? I think you should just call makeIdentifier, passing identifierStart and currentCharacter() - identifierStart, with no intervening append16. Then you can also remove clearing m_buffer16 from the fast case. Again, we want to buffer when the source string type is different from the type we want to create (eventually 8 bit). > > Source/JavaScriptCore/parser/Lexer.cpp:540 > > + // Keywords must not be recognized if there was an \uXXXX in the identifier. > > You've already ruled out a Unicode escape above (which is good, because people don't typically name their variables "MyV\u0061riable"). You can remove all of this code. The comment doesn't describe the check. The check is for an identifier that matches a keyword and is needed. > > Source/JavaScriptCore/parser/Lexer.cpp:558 > > +template <bool shouldCreateIdentifier> ALWAYS_INLINE JSTokenType Lexer<T>::parseIdentifierSlowCase(JSTokenData* tokenData, unsigned lexType, bool strictMode) > > Please don't mark the slow case inline. > > It seems like you can remove some code from the slow case, which already ran in the fast case. You can at least remove the fast check for a keyword at the head of the slow case. Removed ALWAYS_INLINE and cleaned up the code a little. > > Source/JavaScriptCore/parser/Lexer.cpp:646 > > + if (UNLIKELY((m_current == '\\'))) { > > Since you've decided that this case is unlikely, can it move out into a helper function? That way, the codegen for the likely stuff will improve, and this code can be shared with the slow case. Efectively the fast case makes (or will make) 8 bit strings while the slow case makes 16 bit strings. Therefore we need to handle the escapes besides \uxxxx. > > Source/JavaScriptCore/parser/Lexer.cpp:690 > > + // Fast check for characters that require special handling. > > + // Catches -1, \n, \r, 0x2028, and 0x2029 as efficiently > > + // as possible, and lets through all common ASCII characters. > > + if (UNLIKELY(((static_cast<unsigned>(m_current) - 0xE) & 0x2000))) { > > You can remove this case entirely if you just change the test above from "m_current > 0xff" to "m_current > 0xff || m_current < 0xe". (Unless you think we need to handle the 'bell' character on the fast path ;) ). Done.

Early Warning System Bot

Comment 10 2011-11-02 18:00:03 PDT

Comment on attachment 113408 [details] Updated patch Attachment 113408 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/10256475

Darin Adler

Comment 11 2011-11-03 10:13:49 PDT

Comment on attachment 113408 [details] Updated patch Windows build error: 3>DebuggerCallFrame.cpp 3>c:\cygwin\home\buildbot\webkit\source\javascriptcore\wtf\HashTable.h(368) : warning C4506: no definition for inline function 'JSC::Register JSC::Register::withCallee(JSC::JSObject *)' Qt build error: ../../../Source/JavaScriptCore/runtime/Identifier.h: In static member function 'static WTF::PassRefPtr<WTF::StringImpl> JSC::Identifier::add(JSC::JSGlobalData*, const T*, int) [with T = LChar]': ../../../Source/JavaScriptCore/runtime/Identifier.h:47: instantiated from here ../../../Source/JavaScriptCore/runtime/Identifier.h:160: error: comparison is always true due to limited range of data type A simple way to do that is to do the comparison with 0xFF can be done with an overloaded inline function.

Michael Saboff

Comment 12 2011-11-03 11:01:37 PDT

Created attachment 113525 [details] Patch with updates to fix build issues Added a template function Identifier::canUseSingleCharacterString() with explicit implementations for LChar and UChar to fix qt and win builds. The changes in comment #9 still apply.

Early Warning System Bot

Comment 13 2011-11-03 11:23:06 PDT

Comment on attachment 113525 [details] Patch with updates to fix build issues Attachment 113525 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/10249381

Michael Saboff

Comment 14 2011-11-03 11:50:17 PDT

Created attachment 113533 [details] Updated patch with another build fix Changes to Identifier.h to eliminate inclusion of CallFrame.h to fix builds.

Early Warning System Bot

Comment 15 2011-11-03 12:17:01 PDT

Comment on attachment 113533 [details] Updated patch with another build fix Attachment 113533 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/10285068

Michael Saboff

Comment 16 2011-11-03 13:31:32 PDT

Created attachment 113548 [details] Patch updated for qt build failure Change qt specific code to use JSGlobalData* version of Identifier() constructor instead of removed ExecState* version.

Early Warning System Bot

Comment 17 2011-11-03 13:53:11 PDT

Comment on attachment 113548 [details] Patch updated for qt build failure Attachment 113548 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/10249423

Darin Adler

Comment 18 2011-11-03 15:31:44 PDT

Comment on attachment 113548 [details] Patch updated for qt build failure Qt build still failing

Michael Saboff

Comment 19 2011-11-03 16:33:53 PDT

Created attachment 113574 [details] Fixed qt build (again) Did a find through the qt specific source looking for all Identifier constructors that need to be changed. Found only two, the one I previously changed and the one changed in this patch.

Geoffrey Garen

Comment 20 2011-11-03 17:29:18 PDT

+ while (isIdentPart(m_current)) + shift(); + + if (UNLIKELY(m_current == '\\')) { + setOffsetFromCharOffset(identifierStart); + if (bufferRequired) + m_buffer16.resize(0); // FIXME: Change to m_buffer8 + return parseIdentifierSlowCase<shouldCreateIdentifier>(tokenData, lexType, strictMode); + } I might be missing something here, but I still don't see any appends to m_buffer16 above -- shift() only moves the characters pointer forward, without appending to m_buffer16. So, I still don't think there's any need to resize to 0 here. + if (!bufferRequired) { + int identifierLength = currentCharacter() - identifierStart; + ident = makeIdentifier(identifierStart, identifierLength); Yeah, this is better. + } else { + if (identifierStart != currentCharacter()) + append16(identifierStart, currentCharacter() - identifierStart); // FIXME: Change to append8 + ident = makeIdentifier(m_buffer16.data(), m_buffer16.size()); // FIXME: Change to m_buffer8 + } I don't think you'll be able to unconditionally append to m_buffer8, since isIdentPart() is true even for some 16bit characters. What you really need, in the 16bit lexer case, is a flag telling you whether you saw any 16bit characters. If that flag is true, you can call 16bit makeIdentifier without any copying. If that flag is false, > > > Source/JavaScriptCore/parser/Lexer.cpp:540 > > > + // Keywords must not be recognized if there was an \uXXXX in the identifier. > > > > You've already ruled out a Unicode escape above (which is good, because people don't typically name their variables "MyV\u0061riable"). You can remove all of this code. > The comment doesn't describe the check. The check is for an identifier that matches a keyword and is needed. I should have been clearer here. The head of the function tests for unescaped keywords by calling parseKeyword<>(). The only case where parseKeyword<>() returns an identifier that matches a keyword -- as an identifier and not a keyword -- is the case of escaped keywords ("MyV\u0061riable"). But all escape sequences take the slow path of identifier parsing. Therefore, the fast path can assume (a) its identifiers have already been tested for unescaped keyword-ness and (b) its identifiers do not contain escape sequences. Therefore, it can assume (c) its identifiers do not match any keywords. That's why you can omit this code from the fast path. > > > Source/JavaScriptCore/parser/Lexer.cpp:646 > > > + if (UNLIKELY((m_current == '\\'))) { > > > > Since you've decided that this case is unlikely, can it move out into a helper function? That way, the codegen for the likely stuff will improve, and this code can be shared with the slow case. > Efectively the fast case makes (or will make) 8 bit strings while the slow case makes 16 bit strings. Therefore we need to handle the escapes besides \uxxxx. Yes, we need to handle them. But do we need to handle them inline, instead of in a helper function?

Geoffrey Garen

Comment 21 2011-11-03 17:31:19 PDT

> If that flag is false, If that flag is false, ... you should call a special version of makeIdentifier that takes a 16bit pointer and copies it into an 8bit StringImpl.

Geoffrey Garen

Comment 22 2011-11-04 12:09:07 PDT

Comment on attachment 113574 [details] Fixed qt build (again) Marking r- because I still think there are some improvements to be had here. (Looks like the build is working on all platforms, though. Yay!)

Michael Saboff

Comment 23 2011-11-04 13:58:18 PDT

(In reply to comment #20) > + while (isIdentPart(m_current)) > + shift(); > + > + if (UNLIKELY(m_current == '\\')) { > + setOffsetFromCharOffset(identifierStart); > + if (bufferRequired) > + m_buffer16.resize(0); // FIXME: Change to m_buffer8 > + return parseIdentifierSlowCase<shouldCreateIdentifier>(tokenData, lexType, strictMode); > + } > > I might be missing something here, but I still don't see any appends to m_buffer16 above -- shift() only moves the characters pointer forward, without appending to m_buffer16. So, I still don't think there's any need to resize to 0 here. I have refactored parseIdentifier() into LChar and UChar flavors that are very simple, no buffering. When they see a backslash they both fall out to the same slow case function. > > + if (!bufferRequired) { > + int identifierLength = currentCharacter() - identifierStart; > + ident = makeIdentifier(identifierStart, identifierLength); > > Yeah, this is better. > > + } else { > + if (identifierStart != currentCharacter()) > + append16(identifierStart, currentCharacter() - identifierStart); // FIXME: Change to append8 > + ident = makeIdentifier(m_buffer16.data(), m_buffer16.size()); // FIXME: Change to m_buffer8 > + } > > I don't think you'll be able to unconditionally append to m_buffer8, since isIdentPart() is true even for some 16bit characters. What you really need, in the 16bit lexer case, is a flag telling you whether you saw any 16bit characters. If that flag is true, you can call 16bit makeIdentifier without any copying. If that flag is false, As part of the parseIdentifier() refactoring, I added a new makeIdentifier flavor that will make 8 bit strings when passed a UChar* without any buffering. > > > > > Source/JavaScriptCore/parser/Lexer.cpp:540 > > > > + // Keywords must not be recognized if there was an \uXXXX in the identifier. > > > > > > You've already ruled out a Unicode escape above (which is good, because people don't typically name their variables "MyV\u0061riable"). You can remove all of this code. > > > The comment doesn't describe the check. The check is for an identifier that matches a keyword and is needed. > > I should have been clearer here. The head of the function tests for unescaped keywords by calling parseKeyword<>(). The only case where parseKeyword<>() returns an identifier that matches a keyword -- as an identifier and not a keyword -- is the case of escaped keywords ("MyV\u0061riable"). But all escape sequences take the slow path of identifier parsing. Therefore, the fast path can assume (a) its identifiers have already been tested for unescaped keyword-ness and (b) its identifiers do not contain escape sequences. Therefore, it can assume (c) its identifiers do not match any keywords. That's why you can omit this code from the fast path. I needed to be clearer here as well. This check is needed in the case where the parseKeyword<> check isn't run, that is when remaining is < maxTokenLength. This typically happens at the end of a function when we are reparsing. Consider "function f() { return true; }". Given that maxTokenLength is 11, we won't have 11 characters when parseIdentifier() is called pointing at "true; }", therefore the parseKeyword call at the top of the function is never made. Therefore I changed the test at the bottom to: if (UNLIKELY((remaining < maxTokenLength) && !(lexType & LexIgnoreReservedWords))) { ... > > > > > Source/JavaScriptCore/parser/Lexer.cpp:646 > > > > + if (UNLIKELY((m_current == '\\'))) { > > > > > > Since you've decided that this case is unlikely, can it move out into a helper function? That way, the codegen for the likely stuff will improve, and this code can be shared with the slow case. > > > Efectively the fast case makes (or will make) 8 bit strings while the slow case makes 16 bit strings. Therefore we need to handle the escapes besides \uxxxx. > > Yes, we need to handle them. But do we need to handle them inline, instead of in a helper function? I have a very lean parseString() that doesn't buffer (here is the LChar version): template <> template <bool shouldBuildStrings> ALWAYS_INLINE bool Lexer<LChar>::parseString(JSTokenData* tokenData, bool strictMode) { int stringQuoteCharacter = m_current; shift(); const LChar* stringStart = currentCharacter(); while (m_current != stringQuoteCharacter) { if (UNLIKELY((m_current == '\\') || (m_current < 0xe) || (m_current > 0xff))) return parseStringSlowCase<shouldBuildStrings>(tokenData, stringStart, strictMode); shift(); } if (shouldBuildStrings) tokenData->ident = makeIdentifier(stringStart, currentCharacter() - stringStart); else tokenData->ident = 0; return true; } Unfortunately, this is slower for tests like sunspider string-unpack-code which calls out to the slow case quite a bit (think of parsing a regexp pattern string of "\\b..". The version with inline escape processing is faster. I'm going to play with this for maybe an hour more to try to get the lean version's performance up. Otherwise I'll post a patch with the other changes but with the prior parseString() method.

Michael Saboff

Comment 24 2011-11-04 16:47:01 PDT

Created attachment 113734 [details] Updated patch Made the changes described in comment #23 with the exception of the suggested parseString() changes. I couldn't get the performance of anything close to the suggestion as fast as what was in the prior patch.

Darin Adler

Comment 25 2011-11-07 08:43:09 PST

Comment on attachment 113734 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=113734&action=review > Source/JavaScriptCore/parser/Lexer.cpp:290 > + if (sourceString->is8Bit()) { > + ASSERT(sizeof(T) == sizeof(LChar)); > + data = reinterpret_cast<const T*>(sourceString->characters8()); > + } else { > + ASSERT(sizeof(T) == sizeof(UChar)); > + data = reinterpret_cast<const T*>(sourceString->characters16()); > + } If would be better to have the if condition be based on the compile time condition, and the assertion be the runtime condition, rather than vice versa. > Source/JavaScriptCore/parser/Lexer.cpp:488 > +template <bool shouldCreateIdentifier> ALWAYS_INLINE JSTokenType Lexer<LChar>::parseIdentifier(JSTokenData* tokenData, unsigned lexType, bool strictMode) Despite the fact that there are some things the 8-bit and 16-bit version of this function need to do differently, there is still a lot of duplicate code. Is there a way to factor this so more code is shared? Maybe in the future. > Source/JavaScriptCore/parser/Lexer.h:68 > +enum LexType { > + LexIgnoreReservedWords = 1, > + LexDontBuildStrings = 2, > + LexDontBuildKeywords = 4 > +}; I don’t think abbreviating Lexer to Lex here is worthwhile. Also, the existing name for the enum “LexType”, is not good. It should be “LexerFlag” or something along those lines. > Source/JavaScriptCore/parser/Lexer.h:70 > +enum ShiftType { DoBoundsCheck, DoNotBoundsCheck }; The name here, not new, is also not good. This is BoundsCheckPolicy or something like that, not “shift type”. > Source/JavaScriptCore/parser/Lexer.h:135 > + void append16(const UChar* p, size_t length) { m_buffer16.append(p, length); } Better to use a word instead of “p”. I’d use the word characters. > Source/JavaScriptCore/parser/Lexer.h:156 > template <bool shouldBuildIdentifiers> ALWAYS_INLINE JSTokenType parseIdentifier(JSTokenData*, unsigned, bool strictMode); > + template <bool shouldBuildIdentifiers> NEVER_INLINE JSTokenType parseIdentifierSlowCase(JSTokenData*, unsigned, bool strictMode); Not clear what the “unsigned” is. It probably needs an argument name. > Source/JavaScriptCore/parser/SourceCode.h:78 > - SourceCode subExpression(int, int, int); > + SourceCode subExpression(unsigned, unsigned, int); This needs argument names. > Source/WebCore/bindings/js/StringSourceProvider.h:67 > } > + > + > Why whitespace added?

Michael Saboff

Comment 26 2011-11-07 09:54:28 PST

Committed r99436: <http://trac.webkit.org/changeset/99436>

Nikolas Zimmermann

Comment 27 2011-11-07 10:34:00 PST

This broke the Mac/SL build: /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp: In member function 'void JSC::Lexer<T>::append8(const T*, size_t) [with T = unsigned char]': /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:1529: instantiated from here /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:439: warning: comparison is always true due to limited range of data type /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:440: warning: comparison is always true due to limited range of data type /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp: In member function 'void JSC::Lexer<T>::append8(const T*, size_t) [with T = short unsigned int]': /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:1530: instantiated from here /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:439: warning: comparison is always true due to limited range of data type Can you fix it?

Michael Saboff

Comment 28 2011-11-07 12:51:24 PST

(In reply to comment #27) > This broke the Mac/SL build: > > /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp: In member function 'void JSC::Lexer<T>::append8(const T*, size_t) [with T = unsigned char]': > /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:1529: instantiated from here > /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:439: warning: comparison is always true due to limited range of data type > /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:440: warning: comparison is always true due to limited range of data type > /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp: In member function 'void JSC::Lexer<T>::append8(const T*, size_t) [with T = short unsigned int]': > /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:1530: instantiated from here > /Users/nikolaszimmermann/Coding/WebKit/Source/JavaScriptCore/parser/Lexer.cpp:439: warning: comparison is always true due to limited range of data type > > Can you fix it? I'll look into it in a couple of hours.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component JavaScriptCore

Assignee

Michael Saboff

Reported

2011-11-01 15:19 PDT

Modified

2011-11-07 12:51 PST History

CC List

6 users Show

URL

Keywords InRadar

Depends on

71712

Blocks

Dependencies

tree graph