Bug 156953 - [JSC] Optimize JSON.parse string fast path
Summary: [JSC] Optimize JSON.parse string fast path
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-23 07:13 PDT by Yusuke Suzuki
Modified: 2016-04-24 10:07 PDT (History)
8 users (show)

See Also:


Attachments
Patch (12.30 KB, patch)
2016-04-23 07:31 PDT, Yusuke Suzuki
mark.lam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2016-04-23 07:13:37 PDT
[JSC] Optimize JSON.parse string fast path
Comment 1 Yusuke Suzuki 2016-04-23 07:31:33 PDT
Created attachment 277149 [details]
Patch
Comment 2 Yusuke Suzuki 2016-04-23 07:34:37 PDT
Comment on attachment 277149 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=277149&action=review

Add comments.

> Source/JavaScriptCore/runtime/LiteralParser.h:74
> +

Convert LiteralParserToken to noncopyable class.

> Source/JavaScriptCore/runtime/LiteralParser.h:156
> +        StringBuilder m_builder;

Now, token's string's ownership is maintained by this builder.
Comment 3 Mark Lam 2016-04-23 09:29:04 PDT
Comment on attachment 277149 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=277149&action=review

r=me with suggestion.  Nice work on improving kraken.

> Source/JavaScriptCore/runtime/LiteralParser.cpp:717
> -                        LiteralParserToken<CharType> stringToken = m_lexer.currentToken();
> -                        m_lexer.next();
> +                        const LiteralParserToken<CharType>& stringToken = m_lexer.currentToken();
>                          if (stringToken.stringIs8Bit)
>                              lastValue = jsString(m_exec, makeIdentifier(stringToken.stringToken8, stringToken.stringLength).string());
>                          else
>                              lastValue = jsString(m_exec, makeIdentifier(stringToken.stringToken16, stringToken.stringLength).string());
> +                        m_lexer.next();

I understand that the reading of the token values needs be done before the call to m_lexer.next() because the lexer could alter the contents of the underlying StringBuilder that backs the token string in next().

My concern is: from reading this code (without digging deeper into the implementation of the lexer), it is not obvious that the validity of the current token is dependent on the internal state of the lexer.  It would be nice if there's some way to assert that the token string we're consuming is the one from the current token.  The only idea I can come up with for doing that assert is to assign some sort of a token ID to the token.  For example:

    template<CharType>
    class LiteralParserTokenPtr {
        LiteralParserTokenPtr(Lexer<CharType>& lexer)
            : m_lexer(lexer)
#if !ASSERTS_DISABLED
            , m_tokenID(lexer.currentTokenID)
#endif
        { }

        ALWAYS_INLINE const LiteralParserToken<CharType>* operator->() const
        {
            ASSERT(m_tokenID == m_lexer.currentTokenID);
            return &m_lexer.m_currentToken;
        }

    private:
        Lexer<CharType>& m_lexer;
#if !ASSERTS_DISABLED
        unsigned m_tokenID;
#endif
    }

    // In Lexer:
    friend class LiteralParserTokenPtr<CharType>;

    LiteralParserTokenPtr<CharType>& currentToken()
    {
        return LiteralParserTokenPtr<CharType>(*this);
    }
    ...
#if !ASSERTS_DISABLED
    unsigned m_currentTokenID { 0 };
#endif

    // In Lexer::next()
    if (!ASSERTS_DISABLED)
        m_currentTokenID++;
    ...

    // When parsing:
    LiteralParserTokenPtr<CharType> token = m_lexer.currentToken();
    lastValue = ... token->stringToken8 ...


This way, we will know if anyone ever tries to use the token after calling lexer.next().  What do you think?
Comment 4 Yusuke Suzuki 2016-04-24 09:48:05 PDT
Comment on attachment 277149 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=277149&action=review

>> Source/JavaScriptCore/runtime/LiteralParser.cpp:717
>> +                        m_lexer.next();
> 
> I understand that the reading of the token values needs be done before the call to m_lexer.next() because the lexer could alter the contents of the underlying StringBuilder that backs the token string in next().
> 
> My concern is: from reading this code (without digging deeper into the implementation of the lexer), it is not obvious that the validity of the current token is dependent on the internal state of the lexer.  It would be nice if there's some way to assert that the token string we're consuming is the one from the current token.  The only idea I can come up with for doing that assert is to assign some sort of a token ID to the token.  For example:
> 
>     template<CharType>
>     class LiteralParserTokenPtr {
>         LiteralParserTokenPtr(Lexer<CharType>& lexer)
>             : m_lexer(lexer)
> #if !ASSERTS_DISABLED
>             , m_tokenID(lexer.currentTokenID)
> #endif
>         { }
> 
>         ALWAYS_INLINE const LiteralParserToken<CharType>* operator->() const
>         {
>             ASSERT(m_tokenID == m_lexer.currentTokenID);
>             return &m_lexer.m_currentToken;
>         }
> 
>     private:
>         Lexer<CharType>& m_lexer;
> #if !ASSERTS_DISABLED
>         unsigned m_tokenID;
> #endif
>     }
> 
>     // In Lexer:
>     friend class LiteralParserTokenPtr<CharType>;
> 
>     LiteralParserTokenPtr<CharType>& currentToken()
>     {
>         return LiteralParserTokenPtr<CharType>(*this);
>     }
>     ...
> #if !ASSERTS_DISABLED
>     unsigned m_currentTokenID { 0 };
> #endif
> 
>     // In Lexer::next()
>     if (!ASSERTS_DISABLED)
>         m_currentTokenID++;
>     ...
> 
>     // When parsing:
>     LiteralParserTokenPtr<CharType> token = m_lexer.currentToken();
>     lastValue = ... token->stringToken8 ...
> 
> 
> This way, we will know if anyone ever tries to use the token after calling lexer.next().  What do you think?

Your suggestion looks nice.

I think switching this with the raw pointer is preferable.

Under debug mode, we use the above adapter. And under the release mode, we use the raw pointer (LiteralParserToken<CharType>*).
Comment 5 Yusuke Suzuki 2016-04-24 10:07:04 PDT
Committed r199968: <http://trac.webkit.org/changeset/199968>