Bug 29909

Summary: [V8] Chromium's implementation of ScriptString is awful for XHR's responseText
Product: WebKit Reporter: James Robinson <jamesr>
Component: New BugsAssignee: James Robinson <jamesr>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dglazkov, eric, fishd, iposva, jens, levin, mbelshe
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
One possible ScriptString implementation for v8
fishd: review-
Fixes style issues and makes the default c'tor for ScriptString do no allocations
none
Fixes some bugs
abarth: review-
Uses StringBuilder internally, adds ChangeLog entry
abarth: review-
Implementation using v8::String::Concat() for operator+=
abarth: review-, jamesr: commit-queue-
Uses OwnHandle<v8::String> instead of a v8::Persistent<v8::String> for management none

Description James Robinson 2009-09-29 18:35:25 PDT
ScriptString is implemented in bindings/v8 as a thin wrapper around WebCore::String.  This means every append operation calls into String::append() which allocates a new buffer large enough for the entire string and copies the data in.  For XMLHttpRequest's responseText object, this means every time new data is received from the network a new WebCore::String is allocated and populate with the text received thus far.

This results in memcpy()ing the same data around a lot and making a lot of allocations and deallocations, but it interacts particularly badly with V8 if the readystatechange to 3 handler happens to grab a reference to the responseText object because when this happens V8 grabs a reference to the WebCore::String backing responseText.  Some sites check the responseText.length parameter on readyState 3 which means they will create a handle from V8 on every intermediate WebCore string resulting in quadratic memory use (mitigated by a very small constant factor, but quadratic nonetheless).   For example, if a 1MB XHR is received in 32 chunks of size 32K there are a sequence of WebCore::String's created of sizes:

32k
64k
98k
128k
160k
...
992k
1024k

and a readystatechange event fired at each.  If the script grabs a handle at each readystatechange handler, up to around half a gig could be kept live until V8 GC's the handles holding the WebCore::String objects alive.  This is suboptimal, especially as I suspect it is rare that a script would want to look at the actual contents of responseText before the download completes.
Comment 1 James Robinson 2009-09-29 18:58:34 PDT
Created attachment 40340 [details]
One possible ScriptString implementation for v8

This patch is a possible replacement for ScriptString.  It handles appends by keeping a list of WebCore::String objects and only flattens to a contiguous buffer when the character data is requested.  This impl requires a custom V8ExternalStringResource implementation so that the .length property of the string can be queried without forcing the entire buffer to be flat.

An alternative would be to copy the string straight into V8 as it arrives and let it handle the cheap concatenation since it already has code to do this.  This would require copying the string back out into WebCore on demand as well as a change in the V8 interface to allow appending to a V8-controlled string.
Comment 2 Darin Fisher (:fishd, Google) 2009-09-30 11:47:21 PDT
Comment on attachment 40340 [details]
One possible ScriptString implementation for v8

> Index: WebCore/bindings/v8/ScriptString.h
...
> +// This class is not thread-safe.  To access the data from multiple threads
> +// the caller must serialize into a String object and explicitly copy() the
> +// data.
>  class ScriptString {
>  public:
> +    ScriptString() : m_impl(new ScriptStringImpl) {}

it seems like it would be nice if ScriptString's default constructor did no
allocations.  that's how String works, and it seems nice for cases where you
have a temporary ScriptString on the stack or a ScriptString as a member var.


> Index: WebCore/bindings/v8/ScriptStringImpl.h
...
> + * Copyright (C) 2008, 2009 Google Inc. All rights reserved.

2008?


> +class ScriptStringImpl : public RefCounted<ScriptStringImpl> {
...
> +private:
> +  // Forces all appended strings into a flat buffer (stored as m_buffer).
> +  void UpdateBuffer() const;

nit: indent by 4 spaces


> +  mutable Vector<String> m_appended_strings;

nit: m_appendedStrings per webkit style


> Index: WebCore/bindings/v8/ScriptStringImpl.cpp
...
> + * Copyright (C) 2008, 2009 Google Inc. All rights reserved.

2008?


> +#ifndef ScriptString_h
> +#define ScriptString_h
> +
> +#include "PlatformString.h"
> +#include "ScriptStringImpl.h"
> +
> +namespace WebCore {
> +
> +// This implementation is based on StringBuilder.h with some caching added.
> +
> +ScriptStringImpl::ScriptStringImpl()
> +    : m_length(0)
> +    , m_isNull(false) {
> +}
> +ScriptStringImpl::ScriptStringImpl(const String& s)
> +    : m_buffer(s)
> +    , m_length(s.length())
> +    , m_isNull(s.isNull()) {
> +}

nit: webkit style is to put the opening "{" on a new line


> +const UChar* ScriptStringImpl::data() const {
> +  UpdateBuffer();

nit: indent by 4 spaces


> +  if (!m_buffer.characters()) {
> +    // oh snap!
> +    return NULL;
> +  }
> +  return m_buffer.characters();

nit: the above null check and explicit null returns seems unnecessary.

maybe the data method should just be implemented inline as a call to toString().characters().


> +// This function should never change any observable state, hence the const-ness.
> +void ScriptStringImpl::UpdateBuffer() const {
> +    int count = m_appended_strings.size();
> +
> +    // No append() calls have been made since either construction or the last UpdateBuffer() call, so
> +    // m_buffer is already up to date.
> +    if (count == 0) {
> +      return;

nit: indent by 4 spaces
nit: do not use {}'s when the enclosed statement is one line


> Index: WebCore/bindings/v8/V8Binding.cpp

> +        // TODO(jamesr): get this accounting right

nit: use FIXME instead of TODO(jamesr) per webkit style


> +        // TODO(jamesr): get this accounting right

ditto


> +    //String webcoreString() { return String(m_scriptString); }

delete this commented out code?  it is good practice to avoid leaving
commented out code around unless it has a FIXME comment explaining
what needs to be done before it can be uncommented.


> Index: WebCore/bindings/v8/V8Binding.h

> +    // Convert a ScriptString to a V8 string.
> +    inline v8::Handle<v8::String> v8ScriptString(const ScriptString& string)
> +    {
> +      return v8ExternalScriptString(string);
> +    }

nit: 4 space indent


> +    inline v8::Handle<v8::Value> v8StringOrNull(const ScriptString& str) {
> +       return str.isNull() ? v8::Handle<v8::Value>(v8::Null()) : v8::Handle<v8::Value>(v8ScriptString(str));
> +    }

nit: fix indentation
Comment 3 David Levin 2009-09-30 12:55:33 PDT
Consider running "check-webkit-style" on your change as I think it will catch several issues that Darin pointed out (and perhaps one or two more).
Comment 4 James Robinson 2009-10-05 17:43:55 PDT
Created attachment 40673 [details]
Fixes style issues and makes the default c'tor for ScriptString do no allocations

Sorry about all the style issues, I will try to remember check-webkit-style in the future.  This patch passes check-webkit-style and makes the default c'tor for ScriptString perform no allocations.
Comment 5 James Robinson 2009-10-05 18:48:35 PDT
Created attachment 40676 [details]
Fixes some bugs

I found a few bugs in the implementation through unit testing, this addresses them.
Comment 6 Adam Barth 2009-10-05 20:43:42 PDT
Comment on attachment 40676 [details]
Fixes some bugs

I think you're re-inventing SegmentedString.  Have you investigated whether that's sufficient for your needs?

Also, we need a ChangeLog entry.  You can create one with ./WebKitTools/Scripts/prepare-ChangeLog
Comment 7 James Robinson 2009-10-07 17:10:32 PDT
(In reply to comment #6)
> (From update of attachment 40676 [details])
> I think you're re-inventing SegmentedString.  Have you investigated whether
> that's sufficient for your needs?

SegmentedString does not seem to support tracking the isNull attribute or caching the flattened buffer (multiple calls to toString() always build into a new buffer).  

StringBuilder is also close to my needs, I will try implementing ScriptStringImpl on top of that.  It will require some changes to the StringBuilder API (exposing length and isNull).

> 
> Also, we need a ChangeLog entry.  You can create one with
> ./WebKitTools/Scripts/prepare-ChangeLog

Whoops, my bad.  Will correct.
Comment 8 James Robinson 2009-10-07 17:51:44 PDT
Created attachment 40834 [details]
Uses StringBuilder internally, adds ChangeLog entry

This uses platform/text/StringBuilder to handle appends, which requires expanding the StringBuilder interface a little bit.  Using SegmentedString would also require some modifications (to handle isNull) and StringBuilder looked a lot simpler.
Comment 9 Adam Barth 2009-10-07 18:59:18 PDT
Comment on attachment 40834 [details]
Uses StringBuilder internally, adds ChangeLog entry

This looks great.  Two minor syntactic issues:

1) There are a handful of style problems:

+}
+ScriptStringImpl::ScriptStringImpl(const String& s)

Missing a space between these two linds.

+      return v8ExternalScriptString(string);

Need four space indent here.  (I think there was another indent problem too.)

2) ScriptStringImpl has nothing to do with v8 and should be in platform/text somewhere.  Maybe it should be called CachingStringBuilder ?
Comment 10 James Robinson 2009-10-14 17:04:44 PDT
Created attachment 41195 [details]
Implementation using v8::String::Concat() for operator+=

This is a different approach that uses a new V8 interface to build up the string.  It's significantly better than the the 'pure' C++ approach since V8 is smart enough to not flatten the string until it is truly necessary.  On a test page that downloads a 10MB data file via XHR and queries the responseText.length attribute on every readystatechange event, Chromium with this patch uses on the order of 10x less peak memory and builds up the responseText on the order of 20x times faster.  This is an extreme case but I think it's always a real improvement.

Since this patch depends on a V8 interface (v8::String::Concat) that has not yet migrated in Chromium's DEPS file, it will break the Chromium build if checked in right now so I've marked this patch cq-.  I would still appreciate feedback while waiting for the V8 DEPS roll and will update the patch once it is safe to land.  Thanks!
Comment 11 Adam Barth 2009-10-14 17:33:36 PDT
Comment on attachment 41195 [details]
Implementation using v8::String::Concat() for operator+=

That looks great!  One nit: please use OwnHandle to avoid manually allocating and deallocating the handle.

http://trac.webkit.org/browser/trunk/WebCore/bindings/v8/OwnHandle.h
Comment 12 James Robinson 2009-10-14 18:26:49 PDT
Created attachment 41196 [details]
Uses OwnHandle<v8::String> instead of a v8::Persistent<v8::String> for management

I'm going to double-check with V8 experts about the persistent handle use, but this version passes tests and loads up gmail so I believe it's correct.
Comment 13 Adam Barth 2009-10-14 18:31:41 PDT
Comment on attachment 41196 [details]
Uses OwnHandle<v8::String> instead of a v8::Persistent<v8::String> for management

ftw
Comment 14 Adam Barth 2009-10-19 21:31:56 PDT
Jame, this is set to commit-queue-.  Let us know when you'd like it landed.
Comment 15 James Robinson 2009-10-19 21:43:34 PDT
Comment on attachment 41196 [details]
Uses OwnHandle<v8::String> instead of a v8::Persistent<v8::String> for management

Thanks for the ping.  The V8 change is in and I ran this through Chromium's layout trybots, so this should be good to land now!
Comment 16 Adam Barth 2009-10-19 22:01:46 PDT
Comment on attachment 41196 [details]
Uses OwnHandle<v8::String> instead of a v8::Persistent<v8::String> for management

Clearing flags on attachment: 41196

Committed r49840: <http://trac.webkit.org/changeset/49840>
Comment 17 Adam Barth 2009-10-19 22:01:52 PDT
All reviewed patches have been landed.  Closing bug.