JSBench does a lot of StringView::operator== that would succeed much faster if just did pointer equality on the string buffer.
Created attachment 280655 [details] patch
Attachment 280655 [details] did not pass style-queue: ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "a" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "b" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "a" adds no information, so it should be removed. [readability/parameter_name] [5] ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "b" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 4 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 280655 [details] patch r=me
(In reply to comment #2) > Attachment 280655 [details] did not pass style-queue: > > > ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "a" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringView.h:138: The parameter name "b" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "a" adds no > information, so it should be removed. [readability/parameter_name] [5] > ERROR: Source/WTF/wtf/text/StringImpl.h:142: The parameter name "b" adds no > information, so it should be removed. [readability/parameter_name] [5] > Total errors found: 4 in 4 files > > > If any of these errors are false positives, please file a bug against > check-webkit-style. I'll fix before landing
Comment on attachment 280655 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review > Source/WTF/ChangeLog:11 > + JSBench does a lot of StringView::operator== on StringViews that have > + the same underlying data pointer. We can make these types of use cases > + much faster by doing an early return in equalCommon when the underlying > + StringType's have equal data pointers. Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. So I would probably put the optimization in StringView, not in equalCommon. > Source/WTF/wtf/text/StringImpl.h:339 > + const void* data() const { return m_data8; } If we keep this, this function needs a way longer name. It’s super unsafe to use this for anything except equalCommon. Maybe dataFor8BitEquality?
(In reply to comment #5) > Comment on attachment 280655 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=280655&action=review > > > Source/WTF/ChangeLog:11 > > + JSBench does a lot of StringView::operator== on StringViews that have > > + the same underlying data pointer. We can make these types of use cases > > + much faster by doing an early return in equalCommon when the underlying > > + StringType's have equal data pointers. > > Can you help me understand more how this == on two StringViews with the same > data pointer happens? I didn’t realize we were even using StringView all > that much. Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. Oh, and fair point about giving data() a much scarier name.
(In reply to comment #5) > Comment on attachment 280655 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=280655&action=review > > > Source/WTF/ChangeLog:11 > > + JSBench does a lot of StringView::operator== on StringViews that have > > + the same underlying data pointer. We can make these types of use cases > > + much faster by doing an early return in equalCommon when the underlying > > + StringType's have equal data pointers. > > Can you help me understand more how this == on two StringViews with the same > data pointer happens? I didn’t realize we were even using StringView all > that much. We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode to check for text equality amongst JS programs (and eval programs, etc). We will use this in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons on StringViews in which this optimization is profitable. Specifically, it will save us from doing an expensive string compare on long strings that are equal. > For StringImpl, this is likely not a good optimization, because it’s highly > likely that the StringImpl* itself is == the other, and much less likely > that we have two distinct ones with the same length pointing at the same > data. > > So I would probably put the optimization in StringView, not in equalCommon. > Right. I was debating whether to have this one level up or inside equalCommon. Maybe it's worth taking some data on how much this optimization will help us with StringImpl? Or should I just skip that and put the inside StringView? > > Source/WTF/wtf/text/StringImpl.h:339 > > + const void* data() const { return m_data8; } > > If we keep this, this function needs a way longer name. It’s super unsafe to > use this for anything except equalCommon. Maybe dataFor8BitEquality? Sure. If we keep it, I think we should just call it dataForBufferEquality. The buffer doesn't need to be 8 bit.
Comment on attachment 280655 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review >>>> Source/WTF/ChangeLog:11 >>>> + StringType's have equal data pointers. >>> >>> Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. >>> >>> For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. >>> >>> So I would probably put the optimization in StringView, not in equalCommon. >> >> Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. >> >> This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. >> >> Oh, and fair point about giving data() a much scarier name. > > We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode > to check for text equality amongst JS programs (and eval programs, etc). We will use this > in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code > (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). > JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons > on StringViews in which this optimization is profitable. Specifically, it will save us > from doing an expensive string compare on long strings that are equal. I'll add some more clarification in the ChangeLog about where this is used.
Comment on attachment 280655 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=280655&action=review >>>>> Source/WTF/ChangeLog:11 >>>>> + StringType's have equal data pointers. >>>> >>>> Can you help me understand more how this == on two StringViews with the same data pointer happens? I didn’t realize we were even using StringView all that much. >>>> >>>> For StringImpl, this is likely not a good optimization, because it’s highly likely that the StringImpl* itself is == the other, and much less likely that we have two distinct ones with the same length pointing at the same data. >>>> >>>> So I would probably put the optimization in StringView, not in equalCommon. >>> >>> Some of the JS source providers use a WTF::String internally and then just create a StringView wrapper when asked for their source. >>> >>> This change was really targeting the SourceCodeKey hash equality function. I was following the discussion about this on IRC and didn't realize it wasn't even mentioned here. My bad. >>> >>> Oh, and fair point about giving data() a much scarier name. >> >> We use it for SourceCodeKey inside JSC, which will use a StringView from SourceCode >> to check for text equality amongst JS programs (and eval programs, etc). We will use this >> in our unlinked code cache. Unlinked code cache can be used to cache unlinked byte code >> (i.e, bytecode that isn't tied to a scope, and in the case of a program, a global object). >> JSBench will stress this unlinked code cache heavily. In JSBench, we will often do comparisons >> on StringViews in which this optimization is profitable. Specifically, it will save us >> from doing an expensive string compare on long strings that are equal. > > I'll add some more clarification in the ChangeLog about where this is used. I think you should do it only for StringView, and then you could do it for StringImpl if you get data that it is helpful there (seems highly unlikely unless the StringImpl functions are already missing a fast path for when the implementation pointers are equal). I can see the == function for StringView checking the pointer equality first before checking the length. Could still use equalCommon.
Created attachment 280660 [details] patch for landing
Comment on attachment 280660 [details] patch for landing Clearing flags on attachment: 280660 Committed r201738: <http://trac.webkit.org/changeset/201738>
All reviewed patches have been landed. Closing bug.
This was 2-5% progression on JSBench on iOS.
In the case of this speedup, our StringViews point to StringImpls that point to the same data but are not themselves pointer equal. The way this happens is that two documents create equivalent substrings by parsing the same cached resource. I believe this condition -- a substring StringImpl that is not pointer equal to another StringImpl but does point to the same data -- will happen any time we parse a resource from the cache (JS, HTML, or CSS), since we do not go to the trouble of uniqueing substrings. I'm not sure how commonly two such substrings will compare against each other, since you need some boundary-crossing interface in order for them to meet up. The boundary-crossing interface in this example is the JS source cache.
(In reply to comment #14) > In the case of this speedup, our StringViews point to StringImpls that point > to the same data but are not themselves pointer equal. Are you sure? > The way this happens is that two documents create equivalent substrings by > parsing the same cached resource. Using which StringImpl constructor?
(In reply to comment #15) > (In reply to comment #14) > > In the case of this speedup, our StringViews point to StringImpls that point > > to the same data but are not themselves pointer equal. > > Are you sure? > > > The way this happens is that two documents create equivalent substrings by > > parsing the same cached resource. > > Using which StringImpl constructor? Hmmm... Actually, in the case of JSBench, it looks like the strings point to the same characters because they originate as Identifiers in the JavaScript parser. When navigating webpages, a <script> inside an HTML document would call SourceProvider::getRange: virtual StringView source() const = 0; StringView getRange(int start, int end) const { return source().substring(start, end - start); }