Bug 175757

Summary: StringView could use a function to strip leading/trailing characters without allocation
Product: WebKit Reporter: Sam Weinig <sam>
Component: New BugsAssignee: Sam Weinig <sam>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, buildbot, cdumez, cmarcelo, darin, dbates, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=171731
Attachments:
Description Flags
Patch
none
Patch darin: review+

Sam Weinig
Reported 2017-08-20 16:43:15 PDT
StringView could use a function to strip leading/trailing characters without allocation
Attachments
Patch (6.22 KB, patch)
2017-08-20 17:02 PDT, Sam Weinig
no flags
Patch (6.63 KB, patch)
2017-08-20 19:07 PDT, Sam Weinig
darin: review+
Sam Weinig
Comment 1 2017-08-20 17:02:09 PDT
Chris Dumez
Comment 2 2017-08-20 17:30:04 PDT
I think Daniel already has an open bug about this and I vaguely remember Darin not liking the idea. CCing then.
Chris Dumez
Comment 3 2017-08-20 17:32:17 PDT
(In reply to Chris Dumez from comment #2) > I think Daniel already has an open bug about this and I vaguely remember > Darin not liking the idea. CCing then. Maybe https://bugs.webkit.org/show_bug.cgi?id=171731
Sam Weinig
Comment 4 2017-08-20 17:57:36 PDT
I intentionally did not make this a stripWhiteSpace() function as the semantics of that are usually suspect. Rather, this is intended as a building block for the HTML parser idioms.
youenn fablet
Comment 5 2017-08-20 18:31:35 PDT
Comment on attachment 318607 [details] Patch I like this. View in context: https://bugs.webkit.org/attachment.cgi?id=318607&action=review > Source/WTF/wtf/text/StringView.h:123 > + StringView stripLeadingAndTrailingMatchedCharacters(MachedCharacterPredicate&&); s/Mached/Matched/ > Source/WTF/wtf/text/StringView.h:163 > + template<typename CharacterType, class MachedCharacterPredicate> Why using both typename and class? > Tools/TestWebKitAPI/Tests/WTF/StringView.cpp:938 > + EXPECT_TRUE(stringViewFromLiteral("AAAAAAAAA").stripLeadingAndTrailingMatchedCharacters(isA) == StringView::empty()); For completeness, you could add AAAAAAAAB, BAAAAAAAA and maybe AAABABAAA.
Sam Weinig
Comment 6 2017-08-20 19:07:11 PDT
Sam Weinig
Comment 7 2017-08-20 20:16:09 PDT
Thanks for the review! Since Darin had concerns about a related change, I'll wait for him to chime in before landing.
Darin Adler
Comment 8 2017-08-21 08:49:14 PDT
Comment on attachment 318609 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=318609&action=review I have no objection to adding this. My concern is about stripping one WTF::String and storing the result in another WTF::String or WTF::AtomicString. When there is nothing to strip, it’s important that we not convert to a StringView and then call toString since it will always copy the string, or call toAtomicString which will have to look up the table in the atomic string table even though it hasn’t been changed. When stripping a WTF::String or WTF::StringView and then processing the result, it’s great to have a function like this one! > Source/WTF/ChangeLog:21 > + For instance, the check (from ScriptElement.cpp:287): > + > + if (!stripLeadingAndTrailingHTMLSpaces(sourceURL).isEmpty()) { > + ... > + } > + > + currently allocates a string just to make this check. With a new > + stripLeadingAndTrailingHTMLSpaces such as: I think it would be nice to have the above call site use a “contains” function rather than a “strip” function. I don’t find “strip.isEmpty” to be a clear idiom. > Source/WTF/wtf/text/StringView.h:123 > + template<typename MatchedCharacterPredicate> > + StringView stripLeadingAndTrailingMatchedCharacters(MatchedCharacterPredicate&&); I don’t understand why this takes an rvalue reference to the predicate. It seems the function will call the predicate multiple times. That seems like "const&", not "&&", which we normally would use when a function will take ownership of an object or might take ownership of an object. > Source/WTF/wtf/text/StringView.h:967 > + if (!start && end == m_length - 1) > + return *this; > + > + StringView result(characters + start, end + 1 - start); > + result.setUnderlyingString(*this); > + return result; Why not call StringView::substring and keep this function simpler and shorter?
Darin Adler
Comment 9 2017-08-21 08:57:49 PDT
Comment on attachment 318609 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=318609&action=review >> Source/WTF/ChangeLog:21 >> + stripLeadingAndTrailingHTMLSpaces such as: > > I think it would be nice to have the above call site use a “contains” function rather than a “strip” function. I don’t find “strip.isEmpty” to be a clear idiom. I take that back. On reflection, strip.isEmpty is what the call site actually means!
Sam Weinig
Comment 10 2017-08-21 11:49:36 PDT
(In reply to Darin Adler from comment #8) > Comment on attachment 318609 [details] > Patch > > > Source/WTF/wtf/text/StringView.h:123 > > + template<typename MatchedCharacterPredicate> > > + StringView stripLeadingAndTrailingMatchedCharacters(MatchedCharacterPredicate&&); > > I don’t understand why this takes an rvalue reference to the predicate. It > seems the function will call the predicate multiple times. That seems like > "const&", not "&&", which we normally would use when a function will take > ownership of an object or might take ownership of an object. In this case, it's actually not an r-value reference, but rather a universal reference. But, that said, since I never want to take ownership, that is overkill, will switch to const&. > > > Source/WTF/wtf/text/StringView.h:967 > > + if (!start && end == m_length - 1) > > + return *this; > > + > > + StringView result(characters + start, end + 1 - start); > > + result.setUnderlyingString(*this); > > + return result; > > Why not call StringView::substring and keep this function simpler and > shorter? It ends up doing a few unnecessary checks. I have a small refactor in mind to avoid those and simplify some other things. Thanks for the review.
Sam Weinig
Comment 11 2017-08-21 14:10:26 PDT
Radar WebKit Bug Importer
Comment 12 2017-08-21 14:11:27 PDT
Darin Adler
Comment 13 2017-08-24 16:37:26 PDT
Comment on attachment 318609 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=318609&action=review > Source/WTF/wtf/text/StringView.h:975 > + if (is8Bit()) > + return stripLeadingAndTrailingMatchedCharacters<LChar>(characters8(), WTFMove(predicate)); > + return stripLeadingAndTrailingMatchedCharacters<UChar>(characters16(), WTFMove(predicate)); Just occurred to me: Surprised that the <LChar> and <UChar> are needed. I think they are not.
Note You need to log in before you can comment on or make changes to this bug.