Bug 144754

Summary: Optimize serialization of quoted JSON strings.
Product: WebKit Reporter: Andreas Kling <kling>
Component: JavaScriptCoreAssignee: Andreas Kling <kling>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cgarcia, cmarcelo, commit-queue, darin, ggaren, kling, simon.fraser
Priority: P2 Keywords: Performance
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 144784    
Bug Blocks:    
Attachments:
Description Flags
Patch idea
darin: review+
Patch for landing none

Description Andreas Kling 2015-05-07 13:18:46 PDT
JSON.stringify() can do better than calling StringBuilder::append() over and over.
Comment 1 Andreas Kling 2015-05-07 13:34:39 PDT
Created attachment 252615 [details]
Patch idea
Comment 2 Darin Adler 2015-05-07 15:32:12 PDT
Comment on attachment 252615 [details]
Patch idea

View in context: https://bugs.webkit.org/attachment.cgi?id=252615&action=review

I guess I like this. No big deal to build the JSON logic into StringBuilder. We could always refactor to deal with that later if we like.

> Source/WTF/wtf/text/StringBuilder.cpp:364
> +template <typename OutputCharType, typename InputCharType>

We should write out CharacterType and not use CharType.

> Source/WTF/wtf/text/StringBuilder.cpp:369
> +            *(output++) = input[i++];

I don’t get why this is a separate loop. Why can’t we just do one character at a time and use continue and avoid the extra break below?

    if (input[i] > 0x1F && input[i] != '"' && input[i] != '\\') {
        *output++ = input[i];
        continue;
    }

Is there something I’m missing that would make this super slow?

It’s also a little strange that we use indexing for input but pointer math for output.

> Source/WTF/wtf/text/StringBuilder.cpp:374
> +            *(output++) = '\\';

I think that idiomatically it’s always been *output++ for me, without parentheses.

> Source/WTF/wtf/text/StringBuilder.cpp:402
> +            static const char hexDigits[] = "0123456789abcdef";

It’s a little strange to use a different approach here than lowerNibbleToASCIIHexDigit does in ASCIICType.h. But this is lowercase hex and that is uppercase, and maybe this way of doing things is faster than what that does.

> Source/WTF/wtf/text/StringBuilder.cpp:407
> +            *(output++) = static_cast<LChar>(hexDigits[(ch >> 12) & 0xF]);
> +            *(output++) = static_cast<LChar>(hexDigits[(ch >> 8) & 0xF]);

I think these are always '0'; we can’t get here unless the character is <= 0x1F;

> Source/WTF/wtf/text/StringBuilder.cpp:417
> +    size_t minimumNewCapacity = length() + 2 + (string.length() * 6);

This could be a much lower value when string.is8Bit() is true. I think it’s worth optimizing that. I don’t think you need parentheses. I think you should explain in a comment both what the 2 and the 6 here mean.

I think the word “minimum” here seems a little strange. As far as I can tell, this is a maximum size, not a minimum. I guess minimum means “before rounding up”?
Comment 3 Andreas Kling 2015-05-07 17:19:09 PDT
Committed r183961: <http://trac.webkit.org/changeset/183961>
Comment 4 Simon Fraser (smfr) 2015-05-07 18:36:57 PDT
s/dom/JSON-stringify.html starting timing out after this:
https://build.webkit.org/results/Apple%20Mavericks%20Release%20WK1%20(Tests)/r183961%20(14539)/results.html
Comment 5 Andreas Kling 2015-05-07 20:10:21 PDT
(In reply to comment #4)
> s/dom/JSON-stringify.html starting timing out after this:
> https://build.webkit.org/results/Apple%20Mavericks%20Release%20WK1%20(Tests)/
> r183961%20(14539)/results.html

Investigating..
Comment 6 WebKit Commit Bot 2015-05-07 20:22:18 PDT
Re-opened since this is blocked by bug 144784
Comment 8 Andreas Kling 2015-05-08 00:30:50 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > s/dom/JSON-stringify.html starting timing out after this:
> > https://build.webkit.org/results/Apple%20Mavericks%20Release%20WK1%20(Tests)/
> > r183961%20(14539)/results.html
> 
> Investigating..

Hah, this was happening because I inadvertently made the \uNNNN output be uppercase hexadecimal, and this test was stringifying a JSON object containing all the unicode characters between \u0000 and \uFFFF.

The timeout happened because WebKit spent eons going back and forth with fontd to find a fallback font for every single character in the "Expected \uabcd but got \uABCD etc" message.
Comment 9 Andreas Kling 2015-05-08 00:34:43 PDT
(In reply to comment #7)
> Also this:
> https://build.webkit.org/results/Apple%20Mavericks%20Debug%20WK2%20(Tests)/
> r183976%20(11594)/storage/indexeddb/list-ordering-crash-log.txt

This was happening because I integrated the suggestion about a lower (than 6x) fudge factor for 8-bit strings. "\x00" is a valid 8-bit string, and encodes to "\u0000", so needs 6 characters, not 2. :/
Comment 10 Andreas Kling 2015-05-08 00:52:21 PDT
Created attachment 252697 [details]
Patch for landing
Comment 11 WebKit Commit Bot 2015-05-08 00:53:59 PDT
Attachment 252697 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/text/StringBuilder.cpp:402:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/WTF/wtf/text/StringBuilder.cpp:408:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 2 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Andreas Kling 2015-05-08 01:45:48 PDT
Committed r183988: <http://trac.webkit.org/changeset/183988>
Comment 13 Chris Dumez 2015-05-08 13:50:19 PDT
Potential 0.6% regression on JetStream according to the bots.
Comment 14 Darin Adler 2015-05-08 19:33:06 PDT
(In reply to comment #9)
> This was happening because I integrated the suggestion about a lower (than
> 6x) fudge factor for 8-bit strings. "\x00" is a valid 8-bit string, and
> encodes to "\u0000", so needs 6 characters, not 2. :/

Damn, I realized that was wrong, but only after the fact!
Comment 15 Darin Adler 2015-05-08 19:33:21 PDT
(In reply to comment #13)
> Potential 0.6% regression on JetStream according to the bots.

Uh oh!
Comment 16 Darin Adler 2015-05-09 18:24:29 PDT
Maybe it was my “continue” suggestion that caused the problem?
Comment 17 Andreas Kling 2015-05-09 18:55:03 PDT
(In reply to comment #16)
> Maybe it was my “continue” suggestion that caused the problem?

I don't think so. The loop was originally structured that way because it wanted to find the longest possible range to pass to StringBuilder::append(characters, length). The "continue" change was actually a speedup according to run-jsc-benchmarks :)

I can't reproduce any slowdown on any of the JetStream subtests, nor can I understand what would get slower from this change.

Given that the JetStream bot has quickly gone back up to the "pre-regression" level without reverting this change, I'm inclined to believe that what we saw was a bot hiccup.