176044 – [JSC] Use table based approach for JSON.stringify's Quote

RESOLVED FIXED 176044

[JSC] Use table based approach for JSON.stringify's Quote

https://bugs.webkit.org/show_bug.cgi?id=176044

Summary [JSC] Use table based approach for JSON.stringify's Quote

Yusuke Suzuki

Reported 2017-08-28 19:55:09 PDT

Now, V8 and SpiderMonkey both use table based approach. And according to arewefastyet data, it seems promising.

Attachments
Patch (14.85 KB, patch) 2017-08-28 21:27 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (14.91 KB, patch) 2017-08-28 21:29 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (14.89 KB, patch) 2017-08-28 21:33 PDT, Yusuke Suzuki	darin: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Yusuke Suzuki

Comment 1 2017-08-28 20:17:54 PDT

https://bugzilla.mozilla.org/show_bug.cgi?id=1343005

Yusuke Suzuki

Comment 2 2017-08-28 20:19:16 PDT

Note that JSON.stringify takes 3.8% of Ember.js Speedometer.

Yusuke Suzuki

Comment 3 2017-08-28 21:27:42 PDT

Created attachment 319240 [details] Patch

Yusuke Suzuki

Comment 4 2017-08-28 21:29:54 PDT

Created attachment 319241 [details] Patch

Yusuke Suzuki

Comment 5 2017-08-28 21:33:48 PDT

Created attachment 319242 [details] Patch

Darin Adler

Comment 6 2017-08-29 09:27:51 PDT

Comment on attachment 319242 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=319242&action=review > Source/WTF/wtf/text/StringBuilderJSON.cpp:2 > + * Copyright (C) 2010, 2013, 2016 Apple Inc. Missing “All rights reserved.” Not sure why it was omitted but someone should add it. > Source/WTF/wtf/text/StringBuilderJSON.cpp:20 > +static const constexpr LChar EscapeTable[256] = { The name of this doesn’t seem specific enough. Maybe escapedFormsForJSON? Does having this table contain 256 entries make the LChar version faster than it would be if it was 128? Because for UChar clearly there is no value to having the table larger than 128. The code below uses hex when referring to this table, masking and such, so I would write the size as 0x100, not 256 (or 0x80). > Source/WTF/wtf/text/StringBuilderJSON.cpp:55 > +template <typename OutputCharacterType, typename InputCharacterType> We should get consistent about whether to include a space before "<" in cases like this. O prefer not to. > Source/WTF/wtf/text/StringBuilderJSON.cpp:58 > + for (const InputCharacterType* end = input + length; input != end; ++input) { I think auto* reads way better in places like this. > Source/WTF/wtf/text/StringBuilderJSON.cpp:59 > + const InputCharacterType character = *input; And like this. > Source/WTF/wtf/text/StringBuilderJSON.cpp:60 > + const LChar escaped = EscapeTable[character & 0xFF]; The use of const here is OK but a bit peculiar. We have many local variables that are never changed and actually writing const seems like overkill. > Source/WTF/wtf/text/StringBuilderJSON.cpp:61 > + if (LIKELY(!escaped || character > 0xFF)) { Is this faster than having a separate UNLIKELY(character > 0xFF) check before looking up the character in the table. > Source/WTF/wtf/text/StringBuilderJSON.cpp:69 > + ASSERT(!(character & 0xFF00)); To me it seems overkill to assert this since it’s so firmly guaranteed by the way the code is designed. Also seems there are clearer ways to write this assertion. For me I would write one of these: ASSERT(character <= 0xFF); ASSERT(character == (character & 0xFF)); But really I would not write any assertion at all. > Source/WTF/wtf/text/StringBuilderJSON.cpp:83 > + // Make sure we have enough buffer space to append this string without having > + // to worry about reallocating in the middle. > + // The 2 is for the '"' quotes on each end. > + // The 6 is for characters that need to be \uNNNN encoded. This computation is a great way to figure out the maximum buffer size needed but is there some opportunity to optimize memory use better for the common case where we need much, much less space than this? To me it seems that for smaller strings we might want to use a stack buffer instead so we don’t resize the result to be unnecessarily large. I think this depends on how we are measuring the performance of this function (performance in the sense of both speed and memory use). > Source/WTF/wtf/text/StringBuilderJSON.cpp:89 > + // This max() is here to allow us to allocate sizes between the range [2^31, 2^32 - 2] because roundUpToPowerOfTwo(1<<31 + some int smaller than 1<<31) == 0. > + allocationSize = std::max(allocationSize, roundUpToPowerOfTwo(allocationSize)); Not new code, but why are we rounding up to a power of two? Seems like this std::max trick belongs inside a variant of the roundUpToPowerOfTwo function instead of here. I think it’s not good that roundUpToPowerOfTwo can turn a large number into zero and we should figure out how to fix that in the definition of that function rather than working around it here. Like define the function do nothing if it can’t round up? Or make a variant that does that? Maybe even consider making a variant that takes Checked<unsigned> too to help make clear the code is safe. But I guess that version would abort if we can’t round up and we just want to round up "only if we can". But wait, why round up at all?

Yusuke Suzuki

Comment 7 2017-08-29 18:17:47 PDT

Comment on attachment 319242 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=319242&action=review Thank you! >> Source/WTF/wtf/text/StringBuilderJSON.cpp:2 >> + * Copyright (C) 2010, 2013, 2016 Apple Inc. > > Missing “All rights reserved.” Not sure why it was omitted but someone should add it. Fixed. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:20 >> +static const constexpr LChar EscapeTable[256] = { > > The name of this doesn’t seem specific enough. Maybe escapedFormsForJSON? > > Does having this table contain 256 entries make the LChar version faster than it would be if it was 128? Because for UChar clearly there is no value to having the table larger than 128. > > The code below uses hex when referring to this table, masking and such, so I would write the size as 0x100, not 256 (or 0x80). Yes, why this has 256 entries is to eliminate 256 mask and > 0xff check in LChar cases in compiler. OK, I've changed this to 0x100. And rename it to escapedFormsForJSON. And it seems 0x80 is super slightly slower (~2.0%). The data is a bit fluctuated, but I think using this code for now is ok since it can remove the above check. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:55 >> +template <typename OutputCharacterType, typename InputCharacterType> > > We should get consistent about whether to include a space before "<" in cases like this. O prefer not to. It seems that a bit old code uses this form (`template<space><`). I've changed this to `template<`. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:58 >> + for (const InputCharacterType* end = input + length; input != end; ++input) { > > I think auto* reads way better in places like this. Changed. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:59 >> + const InputCharacterType character = *input; > > And like this. Changed. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:60 >> + const LChar escaped = EscapeTable[character & 0xFF]; > > The use of const here is OK but a bit peculiar. We have many local variables that are never changed and actually writing const seems like overkill. OK, changed this to auto. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:61 >> + if (LIKELY(!escaped || character > 0xFF)) { > > Is this faster than having a separate UNLIKELY(character > 0xFF) check before looking up the character in the table. It seems that UChar shows not so much difference (or super slightly slower). I think the current one is OK. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:69 >> + ASSERT(!(character & 0xFF00)); > > To me it seems overkill to assert this since it’s so firmly guaranteed by the way the code is designed. Also seems there are clearer ways to write this assertion. For me I would write one of these: > > ASSERT(character <= 0xFF); > ASSERT(character == (character & 0xFF)); > > But really I would not write any assertion at all. OK, I've dropped this assertion. Yeah, now the code clearly states this invariant is ensured. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:83 >> + // The 6 is for characters that need to be \uNNNN encoded. > > This computation is a great way to figure out the maximum buffer size needed but is there some opportunity to optimize memory use better for the common case where we need much, much less space than this? > > To me it seems that for smaller strings we might want to use a stack buffer instead so we don’t resize the result to be unnecessarily large. I think this depends on how we are measuring the performance of this function (performance in the sense of both speed and memory use). I think this is OK. This is StringBuilder's JSON quoting code. So we expect that this StringBuilder will accept more strings after this function. Even if we allocate a bit large space, this space will be filled after this function. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:89 >> + allocationSize = std::max(allocationSize, roundUpToPowerOfTwo(allocationSize)); > > Not new code, but why are we rounding up to a power of two? > > Seems like this std::max trick belongs inside a variant of the roundUpToPowerOfTwo function instead of here. I think it’s not good that roundUpToPowerOfTwo can turn a large number into zero and we should figure out how to fix that in the definition of that function rather than working around it here. Like define the function do nothing if it can’t round up? Or make a variant that does that? Maybe even consider making a variant that takes Checked<unsigned> too to help make clear the code is safe. But I guess that version would abort if we can’t round up and we just want to round up "only if we can". But wait, why round up at all? This is because this code is StringBuilder. What we are doing is we are reserving capacity for the StringBuilder's internal bufffer which capacity is >= the worst quoting result case. So, after this function, this StringBuilder will accept the following strings. So, we should take the policy resizing the buffer to a power of two. (As the same to Vector). In the case of roundUpToPowerOfTwo, I think this is OK for now since this behavior is actually tested in TestWTF (MathExtras.cpp). But the cleaner interface should be taking Checked<unsigned> and abort if rounding up fails. I've opened the bug for this. https://bugs.webkit.org/show_bug.cgi?id=176086

Yusuke Suzuki

Comment 8 2017-08-29 18:22:39 PDT

Committed r221330: <http://trac.webkit.org/changeset/221330>

Radar WebKit Bug Importer

Comment 9 2017-08-29 18:23:17 PDT

<rdar://problem/34149386>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Yusuke Suzuki

Reported

2017-08-28 19:55 PDT

Modified

2017-08-29 18:23 PDT History

CC List

4 users Show

URL

Keywords InRadar

Depends on

Blocks