Bug 175757 - StringView could use a function to strip leading/trailing characters without allocation
Summary: StringView could use a function to strip leading/trailing characters without ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Sam Weinig
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-08-20 16:43 PDT by Sam Weinig
Modified: 2017-08-24 16:37 PDT (History)
8 users (show)

See Also:


Attachments
Patch (6.22 KB, patch)
2017-08-20 17:02 PDT, Sam Weinig
no flags Details | Formatted Diff | Diff
Patch (6.63 KB, patch)
2017-08-20 19:07 PDT, Sam Weinig
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Weinig 2017-08-20 16:43:15 PDT
StringView could use a function to strip leading/trailing characters without allocation
Comment 1 Sam Weinig 2017-08-20 17:02:09 PDT
Created attachment 318607 [details]
Patch
Comment 2 Chris Dumez 2017-08-20 17:30:04 PDT
I think Daniel already has an open bug about this and I vaguely remember Darin not liking the idea. CCing then.
Comment 3 Chris Dumez 2017-08-20 17:32:17 PDT
(In reply to Chris Dumez from comment #2)
> I think Daniel already has an open bug about this and I vaguely remember
> Darin not liking the idea. CCing then.

Maybe https://bugs.webkit.org/show_bug.cgi?id=171731
Comment 4 Sam Weinig 2017-08-20 17:57:36 PDT
I intentionally did not make this a stripWhiteSpace() function as the semantics of that are usually suspect. Rather, this is intended as a building block for the HTML parser idioms.
Comment 5 youenn fablet 2017-08-20 18:31:35 PDT
Comment on attachment 318607 [details]
Patch

I like this.

View in context: https://bugs.webkit.org/attachment.cgi?id=318607&action=review

> Source/WTF/wtf/text/StringView.h:123
> +    StringView stripLeadingAndTrailingMatchedCharacters(MachedCharacterPredicate&&);

s/Mached/Matched/

> Source/WTF/wtf/text/StringView.h:163
> +    template<typename CharacterType, class MachedCharacterPredicate>

Why using both typename and class?

> Tools/TestWebKitAPI/Tests/WTF/StringView.cpp:938
> +    EXPECT_TRUE(stringViewFromLiteral("AAAAAAAAA").stripLeadingAndTrailingMatchedCharacters(isA) == StringView::empty());

For completeness, you could add AAAAAAAAB, BAAAAAAAA and maybe AAABABAAA.
Comment 6 Sam Weinig 2017-08-20 19:07:11 PDT
Created attachment 318609 [details]
Patch
Comment 7 Sam Weinig 2017-08-20 20:16:09 PDT
Thanks for the review! Since Darin had concerns about a related change, I'll wait for him to chime in before landing.
Comment 8 Darin Adler 2017-08-21 08:49:14 PDT
Comment on attachment 318609 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=318609&action=review

I have no objection to adding this.

My concern is about stripping one WTF::String and storing the result in another WTF::String or WTF::AtomicString. When there is nothing to strip, it’s important that we not convert to a StringView and then call toString since it will always copy the string, or call toAtomicString which will have to look up the table in the atomic string table even though it hasn’t been changed.

When stripping a WTF::String or WTF::StringView and then processing the result, it’s great to have a function like this one!

> Source/WTF/ChangeLog:21
> +        For instance, the check (from ScriptElement.cpp:287):
> +
> +        if (!stripLeadingAndTrailingHTMLSpaces(sourceURL).isEmpty()) {
> +            ...
> +        }
> +
> +        currently allocates a string just to make this check. With a new 
> +        stripLeadingAndTrailingHTMLSpaces such as:

I think it would be nice to have the above call site use a “contains” function rather than a “strip” function. I don’t find “strip.isEmpty” to be a clear idiom.

> Source/WTF/wtf/text/StringView.h:123
> +    template<typename MatchedCharacterPredicate>
> +    StringView stripLeadingAndTrailingMatchedCharacters(MatchedCharacterPredicate&&);

I don’t understand why this takes an rvalue reference to the predicate. It seems the function will call the predicate multiple times. That seems like "const&", not "&&", which we normally would use when a function will take ownership of an object or might take ownership of an object.

> Source/WTF/wtf/text/StringView.h:967
> +    if (!start && end == m_length - 1)
> +        return *this;
> +
> +    StringView result(characters + start, end + 1 - start);
> +    result.setUnderlyingString(*this);
> +    return result;

Why not call StringView::substring and keep this function simpler and shorter?
Comment 9 Darin Adler 2017-08-21 08:57:49 PDT
Comment on attachment 318609 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=318609&action=review

>> Source/WTF/ChangeLog:21
>> +        stripLeadingAndTrailingHTMLSpaces such as:
> 
> I think it would be nice to have the above call site use a “contains” function rather than a “strip” function. I don’t find “strip.isEmpty” to be a clear idiom.

I take that back. On reflection, strip.isEmpty is what the call site actually means!
Comment 10 Sam Weinig 2017-08-21 11:49:36 PDT
(In reply to Darin Adler from comment #8)
> Comment on attachment 318609 [details]
> Patch
> 
> > Source/WTF/wtf/text/StringView.h:123
> > +    template<typename MatchedCharacterPredicate>
> > +    StringView stripLeadingAndTrailingMatchedCharacters(MatchedCharacterPredicate&&);
> 
> I don’t understand why this takes an rvalue reference to the predicate. It
> seems the function will call the predicate multiple times. That seems like
> "const&", not "&&", which we normally would use when a function will take
> ownership of an object or might take ownership of an object.

In this case, it's actually not an r-value reference, but rather a universal reference. But, that said, since I never want to take ownership, that is overkill, will switch to const&.

> 
> > Source/WTF/wtf/text/StringView.h:967
> > +    if (!start && end == m_length - 1)
> > +        return *this;
> > +
> > +    StringView result(characters + start, end + 1 - start);
> > +    result.setUnderlyingString(*this);
> > +    return result;
> 
> Why not call StringView::substring and keep this function simpler and
> shorter?

It ends up doing a few unnecessary checks. I have a small refactor in mind to avoid those and simplify some other things.

Thanks for the review.
Comment 11 Sam Weinig 2017-08-21 14:10:26 PDT
Committed r220982: <http://trac.webkit.org/changeset/220982>
Comment 12 Radar WebKit Bug Importer 2017-08-21 14:11:27 PDT
<rdar://problem/33998992>
Comment 13 Darin Adler 2017-08-24 16:37:26 PDT
Comment on attachment 318609 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=318609&action=review

> Source/WTF/wtf/text/StringView.h:975
> +    if (is8Bit())
> +        return stripLeadingAndTrailingMatchedCharacters<LChar>(characters8(), WTFMove(predicate));
> +    return stripLeadingAndTrailingMatchedCharacters<UChar>(characters16(), WTFMove(predicate));

Just occurred to me: Surprised that the <LChar> and <UChar> are needed. I think they are not.