Bug 226803

Summary: Avoid some calls to StringView::toString() / StringView::toStringWithoutCopying()
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: WebCore Misc.Assignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: aboxhall, achristensen, apinheiro, benjamin, cfleizach, changseok, cmarcelo, darin, dino, dmazzoni, esprehn+autocc, ews-watchlist, fmalita, glenn, gyuyoung.kim, jcraig, jdiggs, kangil.han, kondapallykalyan, macpherson, menard, mifenton, pdr, sabouhallawa, samuel_white, sam, schenney, sergio, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch none

Description Chris Dumez 2021-06-08 20:00:03 PDT
Avoid some calls to StringView::toString() / StringView::toStringWithoutCopying().
Comment 1 Chris Dumez 2021-06-08 20:02:48 PDT
Created attachment 430936 [details]
Patch
Comment 2 Darin Adler 2021-06-09 08:45:15 PDT
Comment on attachment 430936 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=430936&action=review

> Source/WTF/wtf/URL.cpp:559
> +    if constexpr (std::is_same_v<StringType, StringView>)
> +        return input.toString();
> +    else
> +        return input;

Another way to write this is:

    return makeString(input);

We would have to make sure it’s optimized well enough for the single argument StringView and String cases, but I think it’s better than writing the if statement.

> Source/WTF/wtf/text/TextStream.h:71
>      WTF_EXPORT_PRIVATE TextStream& operator<<(const char*);

If we remove the const char* and String overloads rather than adding an AtomString one, will the StringView overload just be used instead? If so, would be nice to remove them.

Also would be good at some point to find a way to support everything that StringConcatenate does without having to write separate functions for each.

> Source/WebCore/css/parser/CSSPropertyParser.cpp:539
> +    auto stringView = range.consumeIncludingWhitespace().value();

Do we really need to change the variable name?

> Source/WebCore/css/parser/CSSPropertyParser.cpp:3337
> +    for (unsigned i = 0; i < gridRowNames.length(); ++i) {

This should use a loop more like this:

    for (auto character : gridRowNames.codeUnits()) {

> Source/WebCore/css/parser/CSSPropertyParserHelpers.cpp:3566
> +    return isPredefinedCounterStyle(nameToken.id()) ? AtomString { name.convertToASCIILowercase() } : name.toAtomString();

If this is hot enough, should consider a StringView::convertToASCIILowercaseAtom that optimizes the case for short strings at least, and goes directly to an AtomString rather than allocating and destroying a StringImpl if it happens to match an existing AtomString.
Comment 3 Chris Dumez 2021-06-09 08:47:47 PDT
(In reply to Darin Adler from comment #2)
> Comment on attachment 430936 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=430936&action=review
> 
> > Source/WTF/wtf/URL.cpp:559
> > +    if constexpr (std::is_same_v<StringType, StringView>)
> > +        return input.toString();
> > +    else
> > +        return input;
> 
> Another way to write this is:
> 
>     return makeString(input);
> 
> We would have to make sure it’s optimized well enough for the single
> argument StringView and String cases, but I think it’s better than writing
> the if statement.

Will check.

> 
> > Source/WTF/wtf/text/TextStream.h:71
> >      WTF_EXPORT_PRIVATE TextStream& operator<<(const char*);
> 
> If we remove the const char* and String overloads rather than adding an
> AtomString one, will the StringView overload just be used instead? If so,
> would be nice to remove them.

I think I tried that. If I remember correctly, the compiler complained that it was ambiguous for String because it could use the overload taking a StringView or the one taking a bool :/

> Also would be good at some point to find a way to support everything that
> StringConcatenate does without having to write separate functions for each.
> 
> > Source/WebCore/css/parser/CSSPropertyParser.cpp:539
> > +    auto stringView = range.consumeIncludingWhitespace().value();
> 
> Do we really need to change the variable name?

Probably not.

> 
> > Source/WebCore/css/parser/CSSPropertyParser.cpp:3337
> > +    for (unsigned i = 0; i < gridRowNames.length(); ++i) {
> 
> This should use a loop more like this:
> 
>     for (auto character : gridRowNames.codeUnits()) {

OK.
> 
> > Source/WebCore/css/parser/CSSPropertyParserHelpers.cpp:3566
> > +    return isPredefinedCounterStyle(nameToken.id()) ? AtomString { name.convertToASCIILowercase() } : name.toAtomString();
> 
> If this is hot enough, should consider a
> StringView::convertToASCIILowercaseAtom that optimizes the case for short
> strings at least, and goes directly to an AtomString rather than allocating
> and destroying a StringImpl if it happens to match an existing AtomString.

I'll check.
Comment 4 Chris Dumez 2021-06-09 08:57:27 PDT
(In reply to Darin Adler from comment #2)
> Comment on attachment 430936 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=430936&action=review
> 
> > Source/WTF/wtf/URL.cpp:559
> > +    if constexpr (std::is_same_v<StringType, StringView>)
> > +        return input.toString();
> > +    else
> > +        return input;
> 
> Another way to write this is:
> 
>     return makeString(input);
> 
> We would have to make sure it’s optimized well enough for the single
> argument StringView and String cases, but I think it’s better than writing
> the if statement.

It doesn't seem like makeString() would currently work with a StringView parameter as I don't see a StringTypeAdapter for StringView. For String, the code is a little hard to follow but it really doesn't look like it is as efficient as ref'ing the StringImpl. It looks to me that the implementation creates a new StringImpl and then asks the StringTypeAdapter to copy the characters into it.

> 
> > Source/WTF/wtf/text/TextStream.h:71
> >      WTF_EXPORT_PRIVATE TextStream& operator<<(const char*);
> 
> If we remove the const char* and String overloads rather than adding an
> AtomString one, will the StringView overload just be used instead? If so,
> would be nice to remove them.
> 
> Also would be good at some point to find a way to support everything that
> StringConcatenate does without having to write separate functions for each.
> 
> > Source/WebCore/css/parser/CSSPropertyParser.cpp:539
> > +    auto stringView = range.consumeIncludingWhitespace().value();
> 
> Do we really need to change the variable name?
> 
> > Source/WebCore/css/parser/CSSPropertyParser.cpp:3337
> > +    for (unsigned i = 0; i < gridRowNames.length(); ++i) {
> 
> This should use a loop more like this:
> 
>     for (auto character : gridRowNames.codeUnits()) {
> 
> > Source/WebCore/css/parser/CSSPropertyParserHelpers.cpp:3566
> > +    return isPredefinedCounterStyle(nameToken.id()) ? AtomString { name.convertToASCIILowercase() } : name.toAtomString();
> 
> If this is hot enough, should consider a
> StringView::convertToASCIILowercaseAtom that optimizes the case for short
> strings at least, and goes directly to an AtomString rather than allocating
> and destroying a StringImpl if it happens to match an existing AtomString.

(In reply to Chris Dumez from comment #3)
> (In reply to Darin Adler from comment #2)
> > Comment on attachment 430936 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=430936&action=review
> > 
> > > Source/WTF/wtf/URL.cpp:559
> > > +    if constexpr (std::is_same_v<StringType, StringView>)
> > > +        return input.toString();
> > > +    else
> > > +        return input;
> > 
> > Another way to write this is:
> > 
> >     return makeString(input);
> > 
> > We would have to make sure it’s optimized well enough for the single
> > argument StringView and String cases, but I think it’s better than writing
> > the if statement.
> 
> Will check.
> 
> > 
> > > Source/WTF/wtf/text/TextStream.h:71
> > >      WTF_EXPORT_PRIVATE TextStream& operator<<(const char*);
> > 
> > If we remove the const char* and String overloads rather than adding an
> > AtomString one, will the StringView overload just be used instead? If so,
> > would be nice to remove them.
> 
> I think I tried that. If I remember correctly, the compiler complained that
> it was ambiguous for String because it could use the overload taking a
> StringView or the one taking a bool :/
> 
> > Also would be good at some point to find a way to support everything that
> > StringConcatenate does without having to write separate functions for each.
> > 
> > > Source/WebCore/css/parser/CSSPropertyParser.cpp:539
> > > +    auto stringView = range.consumeIncludingWhitespace().value();
> > 
> > Do we really need to change the variable name?
> 
> Probably not.
> 
> > 
> > > Source/WebCore/css/parser/CSSPropertyParser.cpp:3337
> > > +    for (unsigned i = 0; i < gridRowNames.length(); ++i) {
> > 
> > This should use a loop more like this:
> > 
> >     for (auto character : gridRowNames.codeUnits()) {
> 
> OK.
> > 
> > > Source/WebCore/css/parser/CSSPropertyParserHelpers.cpp:3566
> > > +    return isPredefinedCounterStyle(nameToken.id()) ? AtomString { name.convertToASCIILowercase() } : name.toAtomString();
> > 
> > If this is hot enough, should consider a
> > StringView::convertToASCIILowercaseAtom that optimizes the case for short
> > strings at least, and goes directly to an AtomString rather than allocating
> > and destroying a StringImpl if it happens to match an existing AtomString.
> 
> I'll check.
Comment 5 Chris Dumez 2021-06-09 09:00:59 PDT
(In reply to Darin Adler from comment #2)
> Comment on attachment 430936 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=430936&action=review
> 
> > Source/WTF/wtf/URL.cpp:559
> > +    if constexpr (std::is_same_v<StringType, StringView>)
> > +        return input.toString();
> > +    else
> > +        return input;
> 
> Another way to write this is:
> 
>     return makeString(input);
> 
> We would have to make sure it’s optimized well enough for the single
> argument StringView and String cases, but I think it’s better than writing
> the if statement.
> 
> > Source/WTF/wtf/text/TextStream.h:71
> >      WTF_EXPORT_PRIVATE TextStream& operator<<(const char*);
> 
> If we remove the const char* and String overloads rather than adding an
> AtomString one, will the StringView overload just be used instead? If so,
> would be nice to remove them.
> 
> Also would be good at some point to find a way to support everything that
> StringConcatenate does without having to write separate functions for each.
> 
> > Source/WebCore/css/parser/CSSPropertyParser.cpp:539
> > +    auto stringView = range.consumeIncludingWhitespace().value();
> 
> Do we really need to change the variable name?
> 
> > Source/WebCore/css/parser/CSSPropertyParser.cpp:3337
> > +    for (unsigned i = 0; i < gridRowNames.length(); ++i) {
> 
> This should use a loop more like this:
> 
>     for (auto character : gridRowNames.codeUnits()) {
> 
> > Source/WebCore/css/parser/CSSPropertyParserHelpers.cpp:3566
> > +    return isPredefinedCounterStyle(nameToken.id()) ? AtomString { name.convertToASCIILowercase() } : name.toAtomString();
> 
> If this is hot enough, should consider a
> StringView::convertToASCIILowercaseAtom that optimizes the case for short
> strings at least, and goes directly to an AtomString rather than allocating
> and destroying a StringImpl if it happens to match an existing AtomString.

Do you mean that for strings that are short enough, we'd iterate through it and optimize the case where the input StringView is already all lowercase?
Comment 6 Chris Dumez 2021-06-09 09:04:17 PDT
(In reply to Chris Dumez from comment #5)
> (In reply to Darin Adler from comment #2)
> > Comment on attachment 430936 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=430936&action=review
> > 
> > > Source/WTF/wtf/URL.cpp:559
> > > +    if constexpr (std::is_same_v<StringType, StringView>)
> > > +        return input.toString();
> > > +    else
> > > +        return input;
> > 
> > Another way to write this is:
> > 
> >     return makeString(input);
> > 
> > We would have to make sure it’s optimized well enough for the single
> > argument StringView and String cases, but I think it’s better than writing
> > the if statement.
> > 
> > > Source/WTF/wtf/text/TextStream.h:71
> > >      WTF_EXPORT_PRIVATE TextStream& operator<<(const char*);
> > 
> > If we remove the const char* and String overloads rather than adding an
> > AtomString one, will the StringView overload just be used instead? If so,
> > would be nice to remove them.
> > 
> > Also would be good at some point to find a way to support everything that
> > StringConcatenate does without having to write separate functions for each.
> > 
> > > Source/WebCore/css/parser/CSSPropertyParser.cpp:539
> > > +    auto stringView = range.consumeIncludingWhitespace().value();
> > 
> > Do we really need to change the variable name?
> > 
> > > Source/WebCore/css/parser/CSSPropertyParser.cpp:3337
> > > +    for (unsigned i = 0; i < gridRowNames.length(); ++i) {
> > 
> > This should use a loop more like this:
> > 
> >     for (auto character : gridRowNames.codeUnits()) {
> > 
> > > Source/WebCore/css/parser/CSSPropertyParserHelpers.cpp:3566
> > > +    return isPredefinedCounterStyle(nameToken.id()) ? AtomString { name.convertToASCIILowercase() } : name.toAtomString();
> > 
> > If this is hot enough, should consider a
> > StringView::convertToASCIILowercaseAtom that optimizes the case for short
> > strings at least, and goes directly to an AtomString rather than allocating
> > and destroying a StringImpl if it happens to match an existing AtomString.
> 
> Do you mean that for strings that are short enough, we'd iterate through it
> and optimize the case where the input StringView is already all lowercase?

Looks like the version in StringImpl has a similar optimization (without length restriction):
```
template<StringImpl::CaseConvertType type, typename CharacterType>
ALWAYS_INLINE Ref<StringImpl> StringImpl::convertASCIICase(StringImpl& impl, const CharacterType* data, unsigned length)
{
    unsigned failingIndex;
    for (unsigned i = 0; i < length; ++i) {
        CharacterType character = data[i];
        if (type == CaseConvertType::Lower ? UNLIKELY(isASCIIUpper(character)) : LIKELY(isASCIILower(character))) {
            failingIndex = i;
            goto SlowPath;
        }
    }
    return impl;

SlowPath:
    CharacterType* newData;
    auto newImpl = createUninitializedInternalNonEmpty(length, newData);
    copyCharacters(newData, data, failingIndex);
    for (unsigned i = failingIndex; i < length; ++i)
        newData[i] = type == CaseConvertType::Lower ? toASCIILower(data[i]) : toASCIIUpper(data[i]);
    return newImpl;
}
```
Comment 7 Chris Dumez 2021-06-09 09:37:08 PDT
Created attachment 430971 [details]
Patch
Comment 8 Darin Adler 2021-06-09 11:12:10 PDT
(In reply to Chris Dumez from comment #4)
> It doesn't seem like makeString() would currently work with a StringView
> parameter as I don't see a StringTypeAdapter for StringView.

It definitely works. StringTypeAdapter<StringView> is in StringView.h, not StringConcatenate.h.

> For String, the
> code is a little hard to follow but it really doesn't look like it is as
> efficient as ref'ing the StringImpl. It looks to me that the implementation
> creates a new StringImpl and then asks the StringTypeAdapter to copy the
> characters into it.

Yes, I was talking about optimizing that case, not expecting that it was already optimized. We could and should optimize makeString when called with a single String or single StringView, to make it more useful for generic programming.
Comment 9 Darin Adler 2021-06-09 11:15:14 PDT
(In reply to Chris Dumez from comment #5)
> Do you mean that for strings that are short enough, we'd iterate through it
> and optimize the case where the input StringView is already all lowercase?

The point here is to avoid allocating a new StringImpl when there is already an AtomStringImpl with the correct characters. Presumably the reason we are using AtomString is that such duplication happens and we want to save memory when it does, but it’s also nice to save the extra work from allocating/deallocating in that case.

Yes, when the characters already don’t contain any uppercase ASCII, we can just look up the AtomString without even copying characters. That’s what you are talking about above.

But there is another case we could optimize: when the string is short enough, we could copy the lowercased characters into a fixed size buffer without allocating anything on the heap, and then look up the AtomString without allocating/destroying memory on the heap if the string happens to already exist.
Comment 10 EWS 2021-06-09 13:46:31 PDT
Committed r278669 (238651@main): <https://commits.webkit.org/238651@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 430971 [details].
Comment 11 Radar WebKit Bug Importer 2021-06-09 13:47:20 PDT
<rdar://problem/79096267>