RESOLVED FIXED Bug 158452
equal(StringView, StringView) for strings should have a fast path for pointer equality
https://bugs.webkit.org/show_bug.cgi?id=158452
Summary equal(StringView, StringView) for strings should have a fast path for pointer...
Saam Barati
Reported 2016-06-06 17:20:31 PDT
JSBench does a lot of StringView::operator== that would succeed much faster if just did pointer equality on the string buffer.
Attachments
patch (2.89 KB, patch)
2016-06-06 17:57 PDT, Saam Barati
kling: review+
patch for landing (2.02 KB, patch)
2016-06-06 19:03 PDT, Saam Barati
no flags
Saam Barati
Comment 1 2016-06-06 17:57:48 PDT
WebKit Commit Bot
Comment 2 2016-06-06 17:58:51 PDT
Attachment 280655 [details] did not pass style-queue: ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "a" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "b" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "a" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "b" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 4 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Andreas Kling
Comment 3 2016-06-06 17:59:10 PDT
Comment on attachment 280655 [details] patch r=me
Saam Barati
Comment 4 2016-06-06 18:04:23 PDT
(In reply to comment #2) > Attachment 280655 [details] did not pass style-queue: > > > ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "a" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "b" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "a" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "b" adds no > information, so it should be removed. [readability/parameter_name] [5] > Total errors found: 4 in 4 files > > > If any of these errors are false positives, please file a bug against > check-webkit-style. I'll fix before landing
Darin Adler
Comment 5 2016-06-06 18:11:57 PDT
Comment on attachment 280655 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review > Source/WTF/ChangeLog:11 > + JSBench does a lot of StringView::operator== on StringViews that have > + the same underlying data pointer. We can make these types of use cases > + much faster by doing an early return in equalCommon when the underlying > + StringType's have equal data pointers. Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. So I would probably put the optimization in StringView, not in equalCommon. > Source/WTF/wtf/text/StringImpl.h:339 > + const void* data() const { return m_data8; } If we keep this, this function needs a way longer name. It’s super unsafe to use this for anything except equalCommon. Maybe dataFor8BitEquality?
Andreas Kling
Comment 6 2016-06-06 18:22:45 PDT
(In reply to comment #5) > Comment on attachment 280655 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=280655&action=review > > > Source/WTF/ChangeLog:11 > > + JSBench does a lot of StringView::operator== on StringViews that have > > + the same underlying data pointer. We can make these types of use cases > > + much faster by doing an early return in equalCommon when the underlying > > + StringType's have equal data pointers. > > Can you help me understand more how this == on two StringViews with the same > data pointer happens? I didn’t realize we were even using StringView all > that much. Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. Oh, and fair point about giving data() a much scarier name.
Saam Barati
Comment 7 2016-06-06 18:27:21 PDT
(In reply to comment #5) > Comment on attachment 280655 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=280655&action=review > > > Source/WTF/ChangeLog:11 > > + JSBench does a lot of StringView::operator== on StringViews that have > > + the same underlying data pointer. We can make these types of use cases > > + much faster by doing an early return in equalCommon when the underlying > > + StringType's have equal data pointers. > > Can you help me understand more how this == on two StringViews with the same > data pointer happens? I didn’t realize we were even using StringView all > that much. We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode to check for text equality amongst JS programs (and eval programs, etc). We will use this in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons on StringViews in which this optimization is profitable. Specifically, it will save us from doing an expensive string compare on long strings that are equal. > For StringImpl, this is likely not a good optimization, because it’s highly > likely that the StringImpl* itself is == the other, and much less likely > that we have two distinct ones with the same length pointing at the same > data. > > So I would probably put the optimization in StringView, not in equalCommon. > Right. I was debating whether to have this one level up or inside equalCommon. Maybe it's worth taking some data on how much this optimization will help us with StringImpl? Or should I just skip that and put the inside StringView? > > Source/WTF/wtf/text/StringImpl.h:339 > > + const void* data() const { return m_data8; } > > If we keep this, this function needs a way longer name. It’s super unsafe to > use this for anything except equalCommon. Maybe dataFor8BitEquality? Sure. If we keep it, I think we should just call it dataForBufferEquality. The buffer doesn't need to be 8 bit.
Saam Barati
Comment 8 2016-06-06 18:29:29 PDT
Comment on attachment 280655 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review >>>> Source/WTF/ChangeLog:11 >>>> + StringType's have equal data pointers. >>> >>> Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. >>> >>> For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. >>> >>> So I would probably put the optimization in StringView, not in equalCommon. >> >> Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. >> >> This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. >> >> Oh, and fair point about giving data() a much scarier name. > > We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode > to check for text equality amongst JS programs (and eval programs, etc). We will use this > in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code > (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). > JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons > on StringViews in which this optimization is profitable. Specifically, it will save us > from doing an expensive string compare on long strings that are equal. I'll add some more clarification in the ChangeLog about where this is used.
Darin Adler
Comment 9 2016-06-06 18:47:23 PDT
Comment on attachment 280655 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review >>>>> Source/WTF/ChangeLog:11 >>>>> + StringType's have equal data pointers. >>>> >>>> Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. >>>> >>>> For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. >>>> >>>> So I would probably put the optimization in StringView, not in equalCommon. >>> >>> Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. >>> >>> This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. >>> >>> Oh, and fair point about giving data() a much scarier name. >> >> We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode >> to check for text equality amongst JS programs (and eval programs, etc). We will use this >> in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code >> (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). >> JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons >> on StringViews in which this optimization is profitable. Specifically, it will save us >> from doing an expensive string compare on long strings that are equal. > > I'll add some more clarification in the ChangeLog about where this is used. I think you should do it only for StringView, and then you could do it for StringImpl if you get data that it is helpful there (seems highly unlikely unless the StringImpl functions are already missing a fast path for when the implementation pointers are equal). I can see the == function for StringView checking the pointer equality first before checking the length. Could still use equalCommon.
Saam Barati
Comment 10 2016-06-06 19:03:47 PDT
Created attachment 280660 [details] patch for landing
WebKit Commit Bot
Comment 11 2016-06-06 20:23:28 PDT
Comment on attachment 280660 [details] patch for landing Clearing flags on attachment: 280660 Committed r201738: <http://trac.webkit.org/changeset/201738>
WebKit Commit Bot
Comment 12 2016-06-06 20:23:36 PDT
All reviewed patches have been landed. Closing bug.
Ryosuke Niwa
Comment 13 2016-06-07 01:53:48 PDT
This was 2-5% progression on JSBench on iOS.
Geoffrey Garen
Comment 14 2016-06-07 09:53:15 PDT
In the case of this speedup, our StringViews point to StringImpls that point to the same data but are not themselves pointer equal. The way this happens is that two documents create equivalent substrings by parsing the same cached resource. I believe this condition -- a substring StringImpl that is not pointer equal to another StringImpl but does point to the same data -- will happen any time we parse a resource from the cache (JS, HTML, or CSS), since we do not go to the trouble of uniqueing substrings. I'm not sure how commonly two such substrings will compare against each other, since you need some boundary-crossing interface in order for them to meet up. The boundary-crossing interface in this example is the JS source cache.
Darin Adler
Comment 15 2016-06-07 13:59:57 PDT
(In reply to comment #14) > In the case of this speedup, our StringViews point to StringImpls that point > to the same data but are not themselves pointer equal. Are you sure? > The way this happens is that two documents create equivalent substrings by > parsing the same cached resource. Using which StringImpl constructor?
Geoffrey Garen
Comment 16 2016-06-07 14:12:43 PDT
(In reply to comment #15) > (In reply to comment #14) > > In the case of this speedup, our StringViews point to StringImpls that point > > to the same data but are not themselves pointer equal. > > Are you sure? > > > The way this happens is that two documents create equivalent substrings by > > parsing the same cached resource. > > Using which StringImpl constructor? Hmmm... Actually, in the case of JSBench, it looks like the strings point to the same characters because they originate as Identifiers in the JavaScript parser. When navigating webpages, a <script> inside an HTML document would call SourceProvider::getRange: virtual StringView source() const = 0; StringView getRange(int start, int end) const { return source().substring(start, end - start); }
Note You need to log in before you can comment on or make changes to this bug.