WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 158452
equal(StringView, StringView) for strings should have a fast path for pointer equality
https://bugs.webkit.org/show_bug.cgi?id=158452
Summary
equal(StringView, StringView) for strings should have a fast path for pointer...
Saam Barati
Reported
2016-06-06 17:20:31 PDT
JSBench does a lot of StringView::operator== that would succeed much faster if just did pointer equality on the string buffer.
Attachments
patch
(2.89 KB, patch)
2016-06-06 17:57 PDT
,
Saam Barati
kling
: review+
Details
Formatted Diff
Diff
patch for landing
(2.02 KB, patch)
2016-06-06 19:03 PDT
,
Saam Barati
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Saam Barati
Comment 1
2016-06-06 17:57:48 PDT
Created
attachment 280655
[details]
patch
WebKit Commit Bot
Comment 2
2016-06-06 17:58:51 PDT
Attachment 280655
[details]
did not pass style-queue: ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "a" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "b" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "a" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "b" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 4 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Andreas Kling
Comment 3
2016-06-06 17:59:10 PDT
Comment on
attachment 280655
[details]
patch r=me
Saam Barati
Comment 4
2016-06-06 18:04:23 PDT
(In reply to
comment #2
)
>
Attachment 280655
[details]
did not pass style-queue: > > > ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "a" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "b" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "a" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "b" adds no > information, so it should be removed. [readability/parameter_name] [5] > Total errors found: 4 in 4 files > > > If any of these errors are false positives, please file a bug against > check-webkit-style.
I'll fix before landing
Darin Adler
Comment 5
2016-06-06 18:11:57 PDT
Comment on
attachment 280655
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=280655&action=review
> Source/WTF/ChangeLog:11 > + JSBench does a lot of StringView::operator== on StringViews that have > + the same underlying data pointer. We can make these types of use cases > + much faster by doing an early return in equalCommon when the underlying > + StringType's have equal data pointers.
Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. So I would probably put the optimization in StringView, not in equalCommon.
> Source/WTF/wtf/text/StringImpl.h:339 > + const void* data() const { return m_data8; }
If we keep this, this function needs a way longer name. It’s super unsafe to use this for anything except equalCommon. Maybe dataFor8BitEquality?
Andreas Kling
Comment 6
2016-06-06 18:22:45 PDT
(In reply to
comment #5
)
> Comment on
attachment 280655
[details]
> patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=280655&action=review
> > > Source/WTF/ChangeLog:11 > > + JSBench does a lot of StringView::operator== on StringViews that have > > + the same underlying data pointer. We can make these types of use cases > > + much faster by doing an early return in equalCommon when the underlying > > + StringType's have equal data pointers. > > Can you help me understand more how this == on two StringViews with the same > data pointer happens? I didn’t realize we were even using StringView all > that much.
Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. Oh, and fair point about giving data() a much scarier name.
Saam Barati
Comment 7
2016-06-06 18:27:21 PDT
(In reply to
comment #5
)
> Comment on
attachment 280655
[details]
> patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=280655&action=review
> > > Source/WTF/ChangeLog:11 > > + JSBench does a lot of StringView::operator== on StringViews that have > > + the same underlying data pointer. We can make these types of use cases > > + much faster by doing an early return in equalCommon when the underlying > > + StringType's have equal data pointers. > > Can you help me understand more how this == on two StringViews with the same > data pointer happens? I didn’t realize we were even using StringView all > that much.
We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode to check for text equality amongst JS programs (and eval programs, etc). We will use this in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons on StringViews in which this optimization is profitable. Specifically, it will save us from doing an expensive string compare on long strings that are equal.
> For StringImpl, this is likely not a good optimization, because it’s highly > likely that the StringImpl* itself is == the other, and much less likely > that we have two distinct ones with the same length pointing at the same > data. > > So I would probably put the optimization in StringView, not in equalCommon. >
Right. I was debating whether to have this one level up or inside equalCommon. Maybe it's worth taking some data on how much this optimization will help us with StringImpl? Or should I just skip that and put the inside StringView?
> > Source/WTF/wtf/text/StringImpl.h:339 > > + const void* data() const { return m_data8; } > > If we keep this, this function needs a way longer name. It’s super unsafe to > use this for anything except equalCommon. Maybe dataFor8BitEquality?
Sure. If we keep it, I think we should just call it dataForBufferEquality. The buffer doesn't need to be 8 bit.
Saam Barati
Comment 8
2016-06-06 18:29:29 PDT
Comment on
attachment 280655
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=280655&action=review
>>>> Source/WTF/ChangeLog:11 >>>> + StringType's have equal data pointers. >>> >>> Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. >>> >>> For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. >>> >>> So I would probably put the optimization in StringView, not in equalCommon. >> >> Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. >> >> This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. >> >> Oh, and fair point about giving data() a much scarier name. > > We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode > to check for text equality amongst JS programs (and eval programs, etc). We will use this > in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code > (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). > JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons > on StringViews in which this optimization is profitable. Specifically, it will save us > from doing an expensive string compare on long strings that are equal.
I'll add some more clarification in the ChangeLog about where this is used.
Darin Adler
Comment 9
2016-06-06 18:47:23 PDT
Comment on
attachment 280655
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=280655&action=review
>>>>> Source/WTF/ChangeLog:11 >>>>> + StringType's have equal data pointers. >>>> >>>> Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. >>>> >>>> For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. >>>> >>>> So I would probably put the optimization in StringView, not in equalCommon. >>> >>> Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. >>> >>> This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. >>> >>> Oh, and fair point about giving data() a much scarier name. >> >> We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode >> to check for text equality amongst JS programs (and eval programs, etc). We will use this >> in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code >> (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). >> JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons >> on StringViews in which this optimization is profitable. Specifically, it will save us >> from doing an expensive string compare on long strings that are equal. > > I'll add some more clarification in the ChangeLog about where this is used.
I think you should do it only for StringView, and then you could do it for StringImpl if you get data that it is helpful there (seems highly unlikely unless the StringImpl functions are already missing a fast path for when the implementation pointers are equal). I can see the == function for StringView checking the pointer equality first before checking the length. Could still use equalCommon.
Saam Barati
Comment 10
2016-06-06 19:03:47 PDT
Created
attachment 280660
[details]
patch for landing
WebKit Commit Bot
Comment 11
2016-06-06 20:23:28 PDT
Comment on
attachment 280660
[details]
patch for landing Clearing flags on attachment: 280660 Committed
r201738
: <
http://trac.webkit.org/changeset/201738
>
WebKit Commit Bot
Comment 12
2016-06-06 20:23:36 PDT
All reviewed patches have been landed. Closing bug.
Ryosuke Niwa
Comment 13
2016-06-07 01:53:48 PDT
This was 2-5% progression on JSBench on iOS.
Geoffrey Garen
Comment 14
2016-06-07 09:53:15 PDT
In the case of this speedup, our StringViews point to StringImpls that point to the same data but are not themselves pointer equal. The way this happens is that two documents create equivalent substrings by parsing the same cached resource. I believe this condition -- a substring StringImpl that is not pointer equal to another StringImpl but does point to the same data -- will happen any time we parse a resource from the cache (JS, HTML, or CSS), since we do not go to the trouble of uniqueing substrings. I'm not sure how commonly two such substrings will compare against each other, since you need some boundary-crossing interface in order for them to meet up. The boundary-crossing interface in this example is the JS source cache.
Darin Adler
Comment 15
2016-06-07 13:59:57 PDT
(In reply to
comment #14
)
> In the case of this speedup, our StringViews point to StringImpls that point > to the same data but are not themselves pointer equal.
Are you sure?
> The way this happens is that two documents create equivalent substrings by > parsing the same cached resource.
Using which StringImpl constructor?
Geoffrey Garen
Comment 16
2016-06-07 14:12:43 PDT
(In reply to
comment #15
)
> (In reply to
comment #14
) > > In the case of this speedup, our StringViews point to StringImpls that point > > to the same data but are not themselves pointer equal. > > Are you sure? > > > The way this happens is that two documents create equivalent substrings by > > parsing the same cached resource. > > Using which StringImpl constructor?
Hmmm... Actually, in the case of JSBench, it looks like the strings point to the same characters because they originate as Identifiers in the JavaScript parser. When navigating webpages, a <script> inside an HTML document would call SourceProvider::getRange: virtual StringView source() const = 0; StringView getRange(int start, int end) const { return source().substring(start, end - start); }
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug