RESOLVED FIXED 176044
[JSC] Use table based approach for JSON.stringify's Quote
https://bugs.webkit.org/show_bug.cgi?id=176044
Summary [JSC] Use table based approach for JSON.stringify's Quote
Yusuke Suzuki
Reported 2017-08-28 19:55:09 PDT
Now, V8 and SpiderMonkey both use table based approach. And according to arewefastyet data, it seems promising.
Attachments
Patch (14.85 KB, patch)
2017-08-28 21:27 PDT, Yusuke Suzuki
no flags
Patch (14.91 KB, patch)
2017-08-28 21:29 PDT, Yusuke Suzuki
no flags
Patch (14.89 KB, patch)
2017-08-28 21:33 PDT, Yusuke Suzuki
darin: review+
Yusuke Suzuki
Comment 1 2017-08-28 20:17:54 PDT
Yusuke Suzuki
Comment 2 2017-08-28 20:19:16 PDT
Note that JSON.stringify takes 3.8% of Ember.js Speedometer.
Yusuke Suzuki
Comment 3 2017-08-28 21:27:42 PDT
Yusuke Suzuki
Comment 4 2017-08-28 21:29:54 PDT
Yusuke Suzuki
Comment 5 2017-08-28 21:33:48 PDT
Darin Adler
Comment 6 2017-08-29 09:27:51 PDT
Comment on attachment 319242 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=319242&action=review > Source/WTF/wtf/text/StringBuilderJSON.cpp:2 > + * Copyright (C) 2010, 2013, 2016 Apple Inc. Missing “All rights reserved.” Not sure why it was omitted but someone should add it. > Source/WTF/wtf/text/StringBuilderJSON.cpp:20 > +static const constexpr LChar EscapeTable[256] = { The name of this doesn’t seem specific enough. Maybe escapedFormsForJSON? Does having this table contain 256 entries make the LChar version faster than it would be if it was 128? Because for UChar clearly there is no value to having the table larger than 128. The code below uses hex when referring to this table, masking and such, so I would write the size as 0x100, not 256 (or 0x80). > Source/WTF/wtf/text/StringBuilderJSON.cpp:55 > +template <typename OutputCharacterType, typename InputCharacterType> We should get consistent about whether to include a space before "<" in cases like this. O prefer not to. > Source/WTF/wtf/text/StringBuilderJSON.cpp:58 > + for (const InputCharacterType* end = input + length; input != end; ++input) { I think auto* reads way better in places like this. > Source/WTF/wtf/text/StringBuilderJSON.cpp:59 > + const InputCharacterType character = *input; And like this. > Source/WTF/wtf/text/StringBuilderJSON.cpp:60 > + const LChar escaped = EscapeTable[character & 0xFF]; The use of const here is OK but a bit peculiar. We have many local variables that are never changed and actually writing const seems like overkill. > Source/WTF/wtf/text/StringBuilderJSON.cpp:61 > + if (LIKELY(!escaped || character > 0xFF)) { Is this faster than having a separate UNLIKELY(character > 0xFF) check before looking up the character in the table. > Source/WTF/wtf/text/StringBuilderJSON.cpp:69 > + ASSERT(!(character & 0xFF00)); To me it seems overkill to assert this since it’s so firmly guaranteed by the way the code is designed. Also seems there are clearer ways to write this assertion. For me I would write one of these: ASSERT(character <= 0xFF); ASSERT(character == (character & 0xFF)); But really I would not write any assertion at all. > Source/WTF/wtf/text/StringBuilderJSON.cpp:83 > + // Make sure we have enough buffer space to append this string without having > + // to worry about reallocating in the middle. > + // The 2 is for the '"' quotes on each end. > + // The 6 is for characters that need to be \uNNNN encoded. This computation is a great way to figure out the maximum buffer size needed but is there some opportunity to optimize memory use better for the common case where we need much, much less space than this? To me it seems that for smaller strings we might want to use a stack buffer instead so we don’t resize the result to be unnecessarily large. I think this depends on how we are measuring the performance of this function (performance in the sense of both speed and memory use). > Source/WTF/wtf/text/StringBuilderJSON.cpp:89 > + // This max() is here to allow us to allocate sizes between the range [2^31, 2^32 - 2] because roundUpToPowerOfTwo(1<<31 + some int smaller than 1<<31) == 0. > + allocationSize = std::max(allocationSize, roundUpToPowerOfTwo(allocationSize)); Not new code, but why are we rounding up to a power of two? Seems like this std::max trick belongs inside a variant of the roundUpToPowerOfTwo function instead of here. I think it’s not good that roundUpToPowerOfTwo can turn a large number into zero and we should figure out how to fix that in the definition of that function rather than working around it here. Like define the function do nothing if it can’t round up? Or make a variant that does that? Maybe even consider making a variant that takes Checked<unsigned> too to help make clear the code is safe. But I guess that version would abort if we can’t round up and we just want to round up "only if we can". But wait, why round up at all?
Yusuke Suzuki
Comment 7 2017-08-29 18:17:47 PDT
Comment on attachment 319242 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=319242&action=review Thank you! >> Source/WTF/wtf/text/StringBuilderJSON.cpp:2 >> + * Copyright (C) 2010, 2013, 2016 Apple Inc. > > Missing “All rights reserved.” Not sure why it was omitted but someone should add it. Fixed. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:20 >> +static const constexpr LChar EscapeTable[256] = { > > The name of this doesn’t seem specific enough. Maybe escapedFormsForJSON? > > Does having this table contain 256 entries make the LChar version faster than it would be if it was 128? Because for UChar clearly there is no value to having the table larger than 128. > > The code below uses hex when referring to this table, masking and such, so I would write the size as 0x100, not 256 (or 0x80). Yes, why this has 256 entries is to eliminate 256 mask and > 0xff check in LChar cases in compiler. OK, I've changed this to 0x100. And rename it to escapedFormsForJSON. And it seems 0x80 is super slightly slower (~2.0%). The data is a bit fluctuated, but I think using this code for now is ok since it can remove the above check. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:55 >> +template <typename OutputCharacterType, typename InputCharacterType> > > We should get consistent about whether to include a space before "<" in cases like this. O prefer not to. It seems that a bit old code uses this form (`template<space><`). I've changed this to `template<`. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:58 >> + for (const InputCharacterType* end = input + length; input != end; ++input) { > > I think auto* reads way better in places like this. Changed. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:59 >> + const InputCharacterType character = *input; > > And like this. Changed. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:60 >> + const LChar escaped = EscapeTable[character & 0xFF]; > > The use of const here is OK but a bit peculiar. We have many local variables that are never changed and actually writing const seems like overkill. OK, changed this to auto. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:61 >> + if (LIKELY(!escaped || character > 0xFF)) { > > Is this faster than having a separate UNLIKELY(character > 0xFF) check before looking up the character in the table. It seems that UChar shows not so much difference (or super slightly slower). I think the current one is OK. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:69 >> + ASSERT(!(character & 0xFF00)); > > To me it seems overkill to assert this since it’s so firmly guaranteed by the way the code is designed. Also seems there are clearer ways to write this assertion. For me I would write one of these: > > ASSERT(character <= 0xFF); > ASSERT(character == (character & 0xFF)); > > But really I would not write any assertion at all. OK, I've dropped this assertion. Yeah, now the code clearly states this invariant is ensured. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:83 >> + // The 6 is for characters that need to be \uNNNN encoded. > > This computation is a great way to figure out the maximum buffer size needed but is there some opportunity to optimize memory use better for the common case where we need much, much less space than this? > > To me it seems that for smaller strings we might want to use a stack buffer instead so we don’t resize the result to be unnecessarily large. I think this depends on how we are measuring the performance of this function (performance in the sense of both speed and memory use). I think this is OK. This is StringBuilder's JSON quoting code. So we expect that this StringBuilder will accept more strings after this function. Even if we allocate a bit large space, this space will be filled after this function. >> Source/WTF/wtf/text/StringBuilderJSON.cpp:89 >> + allocationSize = std::max(allocationSize, roundUpToPowerOfTwo(allocationSize)); > > Not new code, but why are we rounding up to a power of two? > > Seems like this std::max trick belongs inside a variant of the roundUpToPowerOfTwo function instead of here. I think it’s not good that roundUpToPowerOfTwo can turn a large number into zero and we should figure out how to fix that in the definition of that function rather than working around it here. Like define the function do nothing if it can’t round up? Or make a variant that does that? Maybe even consider making a variant that takes Checked<unsigned> too to help make clear the code is safe. But I guess that version would abort if we can’t round up and we just want to round up "only if we can". But wait, why round up at all? This is because this code is StringBuilder. What we are doing is we are reserving capacity for the StringBuilder's internal bufffer which capacity is >= the worst quoting result case. So, after this function, this StringBuilder will accept the following strings. So, we should take the policy resizing the buffer to a power of two. (As the same to Vector). In the case of roundUpToPowerOfTwo, I think this is OK for now since this behavior is actually tested in TestWTF (MathExtras.cpp). But the cleaner interface should be taking Checked<unsigned> and abort if rounding up fails. I've opened the bug for this. https://bugs.webkit.org/show_bug.cgi?id=176086
Yusuke Suzuki
Comment 8 2017-08-29 18:22:39 PDT
Radar WebKit Bug Importer
Comment 9 2017-08-29 18:23:17 PDT
Note You need to log in before you can comment on or make changes to this bug.