Bug 31415

Summary: [v8] do not copy data twice when converting short v8 string into WebCore::String
Product: WebKit Reporter: anton muhin <antonm>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, dimich, eric, fishd, jens, vitalyr
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
First take
none
Addressing Vitaly's comments
fishd: review-
Addressing Darin's concerns none

Description anton muhin 2009-11-12 09:47:29 PST
Using WebCore::String::String(const UChar*, int length) leads to another copy of data, so we'd better use String::createUninitialized directly
Comment 1 anton muhin 2009-11-12 09:50:50 PST
Created attachment 43070 [details]
First take
Comment 2 Vitaly Repeshko 2009-11-12 10:05:53 PST
(In reply to comment #1)
> Created an attachment (id=43070) [details]
> First take

LGTM.


> +    static String convertFromV8(v8::Handle<v8::String> v8String, int length) {

ASSERT v8String->Length() == length.

> +        UChar* buffer;
> +        String result = String::createUninitialized(length, buffer);
> +        v8String->Write(reinterpret_cast<uint16_t*>(buffer), 0, length);
> +        return result;
> +    }

Perhaps add a comment that using an inline buffer doesn't make sense here to prevent future attempts to "optimize" this.


> +    static AtomicString convertFromV8(v8::Handle<v8::String> v8String, int length) {

ASSERT v8String->Length() == length.

> +        static const int inlineBufferSize = 16;
> +        if (length <= inlineBufferSize) {
> +            UChar inlineBuffer[inlineBufferSize];
> +            v8String->Write(reinterpret_cast<uint16_t*>(inlineBuffer), 0, length);
> +            return AtomicString(inlineBuffer, length);
> +        } else {
> +            UChar* buffer;
> +            String tmp = String::createUninitialized(length, buffer);
> +            v8String->Write(reinterpret_cast<uint16_t*>(buffer), 0, length);
> +            return AtomicString(tmp);
> +        }
> +    }
Comment 3 anton muhin 2009-11-12 11:09:22 PST
Created attachment 43080 [details]
Addressing Vitaly's comments
Comment 4 anton muhin 2009-11-12 11:11:16 PST
(In reply to comment #2)
> (In reply to comment #1)
> > Created an attachment (id=43070) [details] [details]
> > First take
> 
> LGTM.
> 
> 
> > +    static String convertFromV8(v8::Handle<v8::String> v8String, int length) {
> 
> ASSERT v8String->Length() == length.
> 
> > +        UChar* buffer;
> > +        String result = String::createUninitialized(length, buffer);
> > +        v8String->Write(reinterpret_cast<uint16_t*>(buffer), 0, length);
> > +        return result;
> > +    }
> 
> Perhaps add a comment that using an inline buffer doesn't make sense here to
> prevent future attempts to "optimize" this.

Done.
 
> > +    static AtomicString convertFromV8(v8::Handle<v8::String> v8String, int length) {
> 
> ASSERT v8String->Length() == length.

Done.

Thanks a lot for comments, Vitaly.
Comment 5 Dmitry Titov 2009-11-12 14:48:40 PST
Comment on attachment 43080 [details]
Addressing Vitaly's comments

> Index: WebCore/ChangeLog

> +        Do not use WebCore::String::String(const UChar*, int length) to convert
> +        short v8 strings.
> +
> +        Plus added string traits.

It is good to add the reason for the change to the ChangeLog as well as what was done.
Ideally, perf optimizations would come with some measurement to show the actual win.

> +    static String convertFromV8(v8::Handle<v8::String> v8String, int length) {
> +        ASSERT(v8String->Length() == length);
> +        // NOTE: as of now, String(const UChar*, int) performs String::createUninitialized
> +        // anyway, so no need to optimize like we do for AtomicString below.
> +        UChar* buffer;
> +        String result = String::createUninitialized(length, buffer);
> +        v8String->Write(reinterpret_cast<uint16_t*>(buffer), 0, length);
> +        return result;

It is unfortunate that this introduces almost a duplicate of the code... Could you consider having the StringTraits to have some simple bool flag, like "useInlineBuffer" and have a single copy of the code, something like this:
if (StringTraits<StringType>::useInlineBuffer() && length <= inlineBufferSize)
... use inline buffer ...
} else {
... use createUninitialized ...
}
or some other way to keep the code in one place?
Comment 6 anton muhin 2009-11-13 05:08:50 PST
(In reply to comment #5)
> (From update of attachment 43080 [details])
> > Index: WebCore/ChangeLog
> 
> > +        Do not use WebCore::String::String(const UChar*, int length) to convert
> > +        short v8 strings.
> > +
> > +        Plus added string traits.
> 
> It is good to add the reason for the change to the ChangeLog as well as what
> was done.
> Ideally, perf optimizations would come with some measurement to show the actual
> win.

It has no performance impact or completely minimal.  The idea was not to speed up some things, but to make it clear that this optimization is actually wrong for this particular case.

> 
> > +    static String convertFromV8(v8::Handle<v8::String> v8String, int length) {
> > +        ASSERT(v8String->Length() == length);
> > +        // NOTE: as of now, String(const UChar*, int) performs String::createUninitialized
> > +        // anyway, so no need to optimize like we do for AtomicString below.
> > +        UChar* buffer;
> > +        String result = String::createUninitialized(length, buffer);
> > +        v8String->Write(reinterpret_cast<uint16_t*>(buffer), 0, length);
> > +        return result;
> 
> It is unfortunate that this introduces almost a duplicate of the code... Could
> you consider having the StringTraits to have some simple bool flag, like
> "useInlineBuffer" and have a single copy of the code, something like this:
> if (StringTraits<StringType>::useInlineBuffer() && length <= inlineBufferSize)
> ... use inline buffer ...
> } else {
> ... use createUninitialized ...
> }
> or some other way to keep the code in one place?

Sure we could.  The problem: there are only three lines of the code (including UChar* buffer declaration), so I am not sure that in this particular case it's worth the effort.  But if you still think it's a good idea, I'd definitely do that.

And thanks a lot for comments.
Comment 7 anton muhin 2009-11-17 06:00:12 PST
Comment on attachment 43080 [details]
Addressing Vitaly's comments

ping
Comment 8 Dmitry Titov 2009-11-17 13:52:53 PST
> It has no performance impact or completely minimal.  The idea was not to speed
> up some things, but to make it clear that this optimization is actually wrong
> for this particular case.

I'm not sure I understand the idea of the patch then. It replaces 15 lines of code with 43, to document that certain optimization does not always produce a perf win... Could a comment suffice? 

It seems the purpose of v8StringToWebCoreString templated function is to fold implementation of 2 other functions, v8StringToWebCoreString and v8StringToAtomicWebCoreString, into one, to avoid code duplication. It is strange to branch into 2 separate implementations again from v8StringToWebCoreString template. Perhaps 2 separate implementations from the start would be more readable?
Comment 9 anton muhin 2009-11-18 05:32:37 PST
(In reply to comment #8)
> > It has no performance impact or completely minimal.  The idea was not to speed
> > up some things, but to make it clear that this optimization is actually wrong
> > for this particular case.
> 
> I'm not sure I understand the idea of the patch then. It replaces 15 lines of
> code with 43, to document that certain optimization does not always produce a
> perf win... Could a comment suffice? 

It is an optimization (for small strings we do not copy them twice).  But perf effect as measured by current benchmarks is minimal if any.  However, I think it should be easy to cook up a benchmark that shows the benefit---many conversions of short strings.

I am not sure number of lines removed/added is a good estimator.  E.g. I removed two lines and added 10 to use a common idiom of traits when fetching webcore string from the resource (instead of overloading which is less intuitive imho).

> It seems the purpose of v8StringToWebCoreString templated function is to fold
> implementation of 2 other functions, v8StringToWebCoreString and
> v8StringToAtomicWebCoreString, into one, to avoid code duplication. It is
> strange to branch into 2 separate implementations again from
> v8StringToWebCoreString template. Perhaps 2 separate implementations from the
> start would be more readable?

The problem is there is no trivial prologue (checking if the string is already externalized) and epilogue (which externalizes the string if necessary) which I'd like not to duplicate.
Comment 10 Darin Fisher (:fishd, Google) 2009-11-24 14:55:20 PST
Comment on attachment 43080 [details]
Addressing Vitaly's comments

> Index: WebCore/bindings/v8/V8Binding.cpp

> +template <class S> struct StringTraits
> +{
> +    static S getFromResource(WebCoreStringResource* resource);
> +
> +    static S convertFromV8(v8::Handle<v8::String> v8String, int length);

how about naming these fromStringResource and fromV8String?


> +template<>
> +struct StringTraits<String>
> +{
> +    static String getFromResource(WebCoreStringResource* resource) {
> +        return resource->webcoreString();
> +    }

please place the opening bracket on a new line per webkit style.  same goes
for the other functions in this patch.


> +    static AtomicString convertFromV8(v8::Handle<v8::String> v8String, int length) {
...
> +            return AtomicString(inlineBuffer, length);
> +        } else {

nit: no need for else after return.


R- because of style issues.  otherwise, the substance looks good.
Comment 11 anton muhin 2009-11-25 15:19:08 PST
Created attachment 43872 [details]
Addressing Darin's concerns
Comment 12 anton muhin 2009-11-25 15:21:00 PST
(In reply to comment #10)
> (From update of attachment 43080 [details])
> > Index: WebCore/bindings/v8/V8Binding.cpp
> 
> > +template <class S> struct StringTraits
> > +{
> > +    static S getFromResource(WebCoreStringResource* resource);
> > +
> > +    static S convertFromV8(v8::Handle<v8::String> v8String, int length);
> 
> how about naming these fromStringResource and fromV8String?

Done.

> > +template<>
> > +struct StringTraits<String>
> > +{
> > +    static String getFromResource(WebCoreStringResource* resource) {
> > +        return resource->webcoreString();
> > +    }
> 
> please place the opening bracket on a new line per webkit style.  same goes
> for the other functions in this patch.

Hopefully done---I hope we are allowed to keep { on the line of if, at least that was the case before.
> 
> 
> > +    static AtomicString convertFromV8(v8::Handle<v8::String> v8String, int length) {
> ...
> > +            return AtomicString(inlineBuffer, length);
> > +        } else {
> 
> nit: no need for else after return.
> 
> 
> R- because of style issues.  otherwise, the substance looks good.

Many thanks for review.
Comment 13 Dmitry Titov 2009-11-25 18:05:12 PST
Comment on attachment 43872 [details]
Addressing Darin's concerns

Darin Fisher and I discussed it a bit. Although he's noted the style issues, he didn't mean to say the discussion we had before is concluded.

Usually, perf improvement should measurably improve perf, so if there is no test but you say it can easily be created then lets add a test, measure and know what we are gaining. Having a test also prevents possible regressions. We have a v8-specific set of tests in src/data/dom_perf - could you add a specific test here?

Again, if there is no measurable difference, I don't see why make the change. Agree there could be an extra copy of up to 16 bytes to a stack array. But if it's impossible to measure, why split the code at all?

r- because it needs a test.
Comment 14 Darin Fisher (:fishd, Google) 2009-11-26 00:06:57 PST
I agree with Dmitry.  My comment #10 was made without reviewing the commentary in this bug.  I had just jumped to the patch from the request queue :-(
Comment 15 anton muhin 2009-12-01 09:52:12 PST
(In reply to comment #13)
> (From update of attachment 43872 [details])
> Darin Fisher and I discussed it a bit. Although he's noted the style issues, he
> didn't mean to say the discussion we had before is concluded.
> 
> Usually, perf improvement should measurably improve perf, so if there is no
> test but you say it can easily be created then lets add a test, measure and
> know what we are gaining. Having a test also prevents possible regressions. We
> have a v8-specific set of tests in src/data/dom_perf - could you add a specific
> test here?
> 
> Again, if there is no measurable difference, I don't see why make the change.
> Agree there could be an extra copy of up to 16 bytes to a stack array. But if
> it's impossible to measure, why split the code at all?
> 
> r- because it needs a test.

For Dromaeo DOM core (which is one of our primary benchmarks now and the only benchmark we're lagging Safari for now), that gives ~2.5% speedup for overall score (4 runs, dropping the slowest) and, more importantly, ~10% for DOM query (which is, together with DOM Modification) is there we're behind Safari.
Comment 16 anton muhin 2009-12-03 11:12:10 PST
ping
Comment 17 Dmitry Titov 2009-12-04 10:51:10 PST
Comment on attachment 43872 [details]
Addressing Darin's concerns

Good improvement on Dromaeo! 
r=me
Comment 18 anton muhin 2009-12-04 10:55:52 PST
Thanks a lot for review, Dmitry and Darin,

Attempting to commit-queue+ as is, but if it got rotten would upload fresh patch (just rebaselined to the current WebKit).
Comment 19 WebKit Commit Bot 2009-12-04 12:16:13 PST
Comment on attachment 43872 [details]
Addressing Darin's concerns

Clearing flags on attachment: 43872

Committed r51707: <http://trac.webkit.org/changeset/51707>
Comment 20 WebKit Commit Bot 2009-12-04 12:16:21 PST
All reviewed patches have been landed.  Closing bug.