Bug 158452 - equal(StringView, StringView) for strings should have a fast path for pointer equality
Summary: equal(StringView, StringView) for strings should have a fast path for pointer...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Saam Barati
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-06 17:20 PDT by Saam Barati
Modified: 2016-06-07 14:12 PDT (History)
18 users (show)

See Also:


Attachments
patch (2.89 KB, patch)
2016-06-06 17:57 PDT, Saam Barati
kling: review+
Details | Formatted Diff | Diff
patch for landing (2.02 KB, patch)
2016-06-06 19:03 PDT, Saam Barati
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Saam Barati 2016-06-06 17:20:31 PDT
JSBench does a lot of StringView::operator== that would succeed much faster if just did pointer equality
on the string buffer.
Comment 1 Saam Barati 2016-06-06 17:57:48 PDT
Created attachment 280655 [details]
patch
Comment 2 WebKit Commit Bot 2016-06-06 17:58:51 PDT
Attachment 280655 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/text/StringView.h:138:  The parameter name "a" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/WTF/wtf/text/StringView.h:138:  The parameter name "b" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/WTF/wtf/text/StringImpl.h:142:  The parameter name "a" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/WTF/wtf/text/StringImpl.h:142:  The parameter name "b" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 4 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Andreas Kling 2016-06-06 17:59:10 PDT
Comment on attachment 280655 [details]
patch

r=me
Comment 4 Saam Barati 2016-06-06 18:04:23 PDT
(In reply to comment #2)
> Attachment 280655 [details] did not pass style-queue:
> 
> 
> ERROR: Source/WTF/wtf/text/StringView.h:138:  The parameter name "a" adds no
> information, so it should be removed.  [readability/parameter_name] [5]
> ERROR: Source/WTF/wtf/text/StringView.h:138:  The parameter name "b" adds no
> information, so it should be removed.  [readability/parameter_name] [5]
> ERROR: Source/WTF/wtf/text/StringImpl.h:142:  The parameter name "a" adds no
> information, so it should be removed.  [readability/parameter_name] [5]
> ERROR: Source/WTF/wtf/text/StringImpl.h:142:  The parameter name "b" adds no
> information, so it should be removed.  [readability/parameter_name] [5]
> Total errors found: 4 in 4 files
> 
> 
> If any of these errors are false positives, please file a bug against
> check-webkit-style.

I'll fix before landing
Comment 5 Darin Adler 2016-06-06 18:11:57 PDT
Comment on attachment 280655 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review

> Source/WTF/ChangeLog:11
> +        JSBench does a lot of StringView::operator== on StringViews that have
> +        the same underlying data pointer. We can make these types of use cases
> +        much faster by doing an early return in equalCommon when the underlying
> +        StringType's have equal data pointers.

Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much.

For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data.

So I would probably put the optimization in StringView, not in equalCommon.

> Source/WTF/wtf/text/StringImpl.h:339
> +    const void* data() const { return m_data8; }

If we keep this, this function needs a way longer name. It’s super unsafe to use this for anything except equalCommon. Maybe dataFor8BitEquality?
Comment 6 Andreas Kling 2016-06-06 18:22:45 PDT
(In reply to comment #5)
> Comment on attachment 280655 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=280655&action=review
> 
> > Source/WTF/ChangeLog:11
> > +        JSBench does a lot of StringView::operator== on StringViews that have
> > +        the same underlying data pointer. We can make these types of use cases
> > +        much faster by doing an early return in equalCommon when the underlying
> > +        StringType's have equal data pointers.
> 
> Can you help me understand more how this == on two StringViews with the same
> data pointer happens? I didn’t realize we were even using StringView all
> that much.

Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source.

This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad.

Oh, and fair point about giving data() a much scarier name.
Comment 7 Saam Barati 2016-06-06 18:27:21 PDT
(In reply to comment #5)
> Comment on attachment 280655 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=280655&action=review
> 
> > Source/WTF/ChangeLog:11
> > +        JSBench does a lot of StringView::operator== on StringViews that have
> > +        the same underlying data pointer. We can make these types of use cases
> > +        much faster by doing an early return in equalCommon when the underlying
> > +        StringType's have equal data pointers.
> 
> Can you help me understand more how this == on two StringViews with the same
> data pointer happens? I didn’t realize we were even using StringView all
> that much.
We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode
to check for text equality amongst JS programs (and eval programs, etc). We will use this
in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code
(i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object).
JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons
on StringViews in which this optimization is profitable. Specifically, it will save us
from doing an expensive string compare on long strings that are equal.

> For StringImpl, this is likely not a good optimization, because it’s highly
> likely that the StringImpl* itself is == the other, and much less likely
> that we have two distinct ones with the same length pointing at the same
> data.
> 
> So I would probably put the optimization in StringView, not in equalCommon.
> 
Right. I was debating whether to have this one level up or inside equalCommon.
Maybe it's worth taking some data on how much this optimization will help us with StringImpl?
Or should I just skip that and put the inside StringView?

> > Source/WTF/wtf/text/StringImpl.h:339
> > +    const void* data() const { return m_data8; }
> 
> If we keep this, this function needs a way longer name. It’s super unsafe to
> use this for anything except equalCommon. Maybe dataFor8BitEquality?
Sure. If we keep it, I think we should just call it dataForBufferEquality.
The buffer doesn't need to be 8 bit.
Comment 8 Saam Barati 2016-06-06 18:29:29 PDT
Comment on attachment 280655 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review

>>>> Source/WTF/ChangeLog:11
>>>> +        StringType's have equal data pointers.
>>> 
>>> Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much.
>>> 
>>> For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data.
>>> 
>>> So I would probably put the optimization in StringView, not in equalCommon.
>> 
>> Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source.
>> 
>> This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad.
>> 
>> Oh, and fair point about giving data() a much scarier name.
> 
> We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode
> to check for text equality amongst JS programs (and eval programs, etc). We will use this
> in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code
> (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object).
> JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons
> on StringViews in which this optimization is profitable. Specifically, it will save us
> from doing an expensive string compare on long strings that are equal.

I'll add some more clarification in the ChangeLog about where this is used.
Comment 9 Darin Adler 2016-06-06 18:47:23 PDT
Comment on attachment 280655 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review

>>>>> Source/WTF/ChangeLog:11
>>>>> +        StringType's have equal data pointers.
>>>> 
>>>> Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much.
>>>> 
>>>> For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data.
>>>> 
>>>> So I would probably put the optimization in StringView, not in equalCommon.
>>> 
>>> Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source.
>>> 
>>> This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad.
>>> 
>>> Oh, and fair point about giving data() a much scarier name.
>> 
>> We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode
>> to check for text equality amongst JS programs (and eval programs, etc). We will use this
>> in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code
>> (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object).
>> JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons
>> on StringViews in which this optimization is profitable. Specifically, it will save us
>> from doing an expensive string compare on long strings that are equal.
> 
> I'll add some more clarification in the ChangeLog about where this is used.

I think you should do it only for StringView, and then you could do it for StringImpl if you get data that it is helpful there (seems highly unlikely unless the StringImpl functions are already missing a fast path for when the implementation pointers are equal). I can see the == function for StringView checking the pointer equality first before checking the length. Could still use equalCommon.
Comment 10 Saam Barati 2016-06-06 19:03:47 PDT
Created attachment 280660 [details]
patch for landing
Comment 11 WebKit Commit Bot 2016-06-06 20:23:28 PDT
Comment on attachment 280660 [details]
patch for landing

Clearing flags on attachment: 280660

Committed r201738: <http://trac.webkit.org/changeset/201738>
Comment 12 WebKit Commit Bot 2016-06-06 20:23:36 PDT
All reviewed patches have been landed.  Closing bug.
Comment 13 Ryosuke Niwa 2016-06-07 01:53:48 PDT
This was 2-5% progression on JSBench on iOS.
Comment 14 Geoffrey Garen 2016-06-07 09:53:15 PDT
In the case of this speedup, our StringViews point to StringImpls that point to the same data but are not themselves pointer equal.

The way this happens is that two documents create equivalent substrings by parsing the same cached resource.

I believe this condition -- a substring StringImpl that is not pointer equal to another StringImpl but does point to the same data -- will happen any time we parse a resource from the cache (JS, HTML, or CSS), since we do not go to the trouble of uniqueing substrings. 

I'm not sure how commonly two such substrings will compare against each other, since you need some boundary-crossing interface in order for them to meet up. The boundary-crossing interface in this example is the JS source cache.
Comment 15 Darin Adler 2016-06-07 13:59:57 PDT
(In reply to comment #14)
> In the case of this speedup, our StringViews point to StringImpls that point
> to the same data but are not themselves pointer equal.

Are you sure?

> The way this happens is that two documents create equivalent substrings by
> parsing the same cached resource.

Using which StringImpl constructor?
Comment 16 Geoffrey Garen 2016-06-07 14:12:43 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > In the case of this speedup, our StringViews point to StringImpls that point
> > to the same data but are not themselves pointer equal.
> 
> Are you sure?
> 
> > The way this happens is that two documents create equivalent substrings by
> > parsing the same cached resource.
> 
> Using which StringImpl constructor?

Hmmm... Actually, in the case of JSBench, it looks like the strings point to the same characters because they originate as Identifiers in the JavaScript parser.

When navigating webpages, a <script> inside an HTML document would call SourceProvider::getRange:

        virtual StringView source() const = 0;
        StringView getRange(int start, int end) const
        {
            return source().substring(start, end - start);
        }