WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
239356
Leverage StringView in more places to avoid some String allocations
https://bugs.webkit.org/show_bug.cgi?id=239356
Summary
Leverage StringView in more places to avoid some String allocations
Chris Dumez
Reported
2022-04-14 15:41:18 PDT
Leverage StringView in more places to avoid some String allocations.
Attachments
Patch
(70.49 KB, patch)
2022-04-14 15:48 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(71.43 KB, patch)
2022-04-15 16:14 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Chris Dumez
Comment 1
2022-04-14 15:48:18 PDT
Created
attachment 457653
[details]
Patch
Darin Adler
Comment 2
2022-04-15 15:35:52 PDT
Comment on
attachment 457653
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457653&action=review
> Source/JavaScriptCore/inspector/ContentSearchUtilities.cpp:94 > + result.append(std::pair { lineNumber, line.toString() });
I guess we do need the explicit std::pair; I would omit it if we could.
> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 > + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString();
Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name.
> Source/WebCore/Modules/mediastream/libwebrtc/LibWebRTCRtpTransceiverBackend.cpp:94 > + rtcCodec.name = StringView(codec.mimeType).substring(6).utf8().data();
Since we are setting a std::string here it would be better if CString knew how to convert to std::string without writing .data(), which makes the code look unnecessarily dangerous, although it’s actually correct and fine. Kind of annoying that we need both CString and std::string in WebKit. Longer term, I wonder how many unique great things CString has; maybe we can drop it and switch to std::string. It uses our faster malloc, but otherwise seems to have no significantly different features from std::string.
> Source/WebCore/html/HTMLMapElement.cpp:103 > m_name = mapName;
WTFMove(mapName)
> Source/WebCore/page/Location.cpp:202 > + if (!hash.isEmpty() && hash[0] == '#')
We have startsWith(UChar) function for cases like this.
> Source/WebCore/page/UserContentURLPattern.h:37 > + UserContentURLPattern(StringView pattern)
Should we add explicit?
> Source/WebCore/page/UserContentURLPattern.h:38 > : m_matchSubdomains(false)
Normally we indent.
> Source/WebCore/platform/network/HTTPParsers.cpp:681 > // FIXME: The rest of this should use StringView.
Does this comment still make sense?
Chris Dumez
Comment 3
2022-04-15 15:48:56 PDT
Comment on
attachment 457653
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457653&action=review
>> Source/JavaScriptCore/inspector/ContentSearchUtilities.cpp:94 >> + result.append(std::pair { lineNumber, line.toString() }); > > I guess we do need the explicit std::pair; I would omit it if we could.
You're probably right that we can. I'll try.
>> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 >> + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); > > Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. > > I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name.
I was careful about that and I don't think that's true? Previous code had a |header| String, then was allocating a new temporary String (let's call it |tmp|), then passing |tmp| to stripLeadingAndTrailingHTTPSpaces(), which in the common case would return |tmp| (because this case is optimized). However, there was still one String allocation for |tmp|, even in this optimized "common" case. With my code, we convert |header| to a StringView, do all operations on the StringView and then construct a String at the end. Still only one String allocation, and one String allocation is all cases, not just the "common" case. The one case where the previous code could be more efficient is if both the subString() AND the stripLeadingAndTrailingHTTPSpaces() are NO-OPs. But given the logic in this function, I don't think it is common for the substring() to return the original String as-is. Did I miss something?
>> Source/WebCore/Modules/mediastream/libwebrtc/LibWebRTCRtpTransceiverBackend.cpp:94 >> + rtcCodec.name = StringView(codec.mimeType).substring(6).utf8().data(); > > Since we are setting a std::string here it would be better if CString knew how to convert to std::string without writing .data(), which makes the code look unnecessarily dangerous, although it’s actually correct and fine. Kind of annoying that we need both CString and std::string in WebKit. Longer term, I wonder how many unique great things CString has; maybe we can drop it and switch to std::string. It uses our faster malloc, but otherwise seems to have no significantly different features from std::string.
The reason std::string is used here is because of libwebrtc here I believe. I agree it'd be nice if CString would know how to convert by itself though. Will look into that.
>> Source/WebCore/html/HTMLMapElement.cpp:103 >> m_name = mapName; > > WTFMove(mapName)
Indeed.
Darin Adler
Comment 4
2022-04-15 15:58:21 PDT
Comment on
attachment 457653
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457653&action=review
>>> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 >>> + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); >> >> Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. >> >> I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name. > > I was careful about that and I don't think that's true? > > Previous code had a |header| String, then was allocating a new temporary String (let's call it |tmp|), then passing |tmp| to stripLeadingAndTrailingHTTPSpaces(), which in the common case would return |tmp| (because this case is optimized). However, there was still one String allocation for |tmp|, even in this optimized "common" case. > > With my code, we convert |header| to a StringView, do all operations on the StringView and then construct a String at the end. Still only one String allocation, and one String allocation is all cases, not just the "common" case. > > The one case where the previous code could be more efficient is if both the subString() AND the stripLeadingAndTrailingHTTPSpaces() are NO-OPs. But given the logic in this function, I don't think it is common for the substring() to return the original String as-is. > > Did I miss something?
I didn’t spot why it’s not common for substring to return the original header as-is; I had guessed there might be a common degenerate case. But I trust you on that.
Chris Dumez
Comment 5
2022-04-15 16:06:21 PDT
Comment on
attachment 457653
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457653&action=review
>>>> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 >>>> + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); >>> >>> Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. >>> >>> I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name. >> >> I was careful about that and I don't think that's true? >> >> Previous code had a |header| String, then was allocating a new temporary String (let's call it |tmp|), then passing |tmp| to stripLeadingAndTrailingHTTPSpaces(), which in the common case would return |tmp| (because this case is optimized). However, there was still one String allocation for |tmp|, even in this optimized "common" case. >> >> With my code, we convert |header| to a StringView, do all operations on the StringView and then construct a String at the end. Still only one String allocation, and one String allocation is all cases, not just the "common" case. >> >> The one case where the previous code could be more efficient is if both the subString() AND the stripLeadingAndTrailingHTTPSpaces() are NO-OPs. But given the logic in this function, I don't think it is common for the substring() to return the original String as-is. >> >> Did I miss something? > > I didn’t spot why it’s not common for substring to return the original header as-is; I had guessed there might be a common degenerate case. But I trust you on that.
The header always starts with: const char* contentTypeCharacters = "Content-Type:"; The substring() is meant to get the part of the header line after the "Content-Type:", thus the contentTypePrefixLength in the substring call. Only way substring() would return the same string is if `contentTypeBegin + contentTypePrefixLength` was 0 and `contentTypeEnd - contentTypeBegin - contentTypePrefixLength` was the length of the whole string (or greater). contentTypeBegin + contentTypePrefixLength cannot be 0 because contentTypePrefixLength is a constant greater than 0 (13).
Darin Adler
Comment 6
2022-04-15 16:13:50 PDT
Comment on
attachment 457653
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457653&action=review
>>>>> Source/WebCore/Modules/fetch/FetchBodyConsumer.cpp:167 >>>>> + contentType = StringView(header).substring(contentTypeBegin + contentTypePrefixLength, contentTypeEnd - contentTypeBegin - contentTypePrefixLength).stripLeadingAndTrailingMatchedCharacters(isHTTPSpace).toString(); >>>> >>>> Is there a very-common case where the header string happens to have nothing that need to be trimmed, no "\r\n", no spaces? I ask because that is the case will become *less* efficient, now copying the String. >>>> >>>> I wish switching to StringView didn’t make the stripLeadingAndTrailingHTTPSpaces() function call *so* much longer stripLeadingAndTrailingMatchedCharacters(isHTTPSpace). On reflection, seems unnecessarily to have "matched characters" in that function name. >>> >>> I was careful about that and I don't think that's true? >>> >>> Previous code had a |header| String, then was allocating a new temporary String (let's call it |tmp|), then passing |tmp| to stripLeadingAndTrailingHTTPSpaces(), which in the common case would return |tmp| (because this case is optimized). However, there was still one String allocation for |tmp|, even in this optimized "common" case. >>> >>> With my code, we convert |header| to a StringView, do all operations on the StringView and then construct a String at the end. Still only one String allocation, and one String allocation is all cases, not just the "common" case. >>> >>> The one case where the previous code could be more efficient is if both the subString() AND the stripLeadingAndTrailingHTTPSpaces() are NO-OPs. But given the logic in this function, I don't think it is common for the substring() to return the original String as-is. >>> >>> Did I miss something? >> >> I didn’t spot why it’s not common for substring to return the original header as-is; I had guessed there might be a common degenerate case. But I trust you on that. > > The header always starts with: > const char* contentTypeCharacters = "Content-Type:"; > > The substring() is meant to get the part of the header line after the "Content-Type:", thus the contentTypePrefixLength in the substring call. Only way substring() would return the same string is if `contentTypeBegin + contentTypePrefixLength` was 0 and `contentTypeEnd - contentTypeBegin - contentTypePrefixLength` was the length of the whole string (or greater). > contentTypeBegin + contentTypePrefixLength cannot be 0 because contentTypePrefixLength is a constant greater than 0 (13).
Got it.
Chris Dumez
Comment 7
2022-04-15 16:14:23 PDT
Created
attachment 457731
[details]
Patch
Darin Adler
Comment 8
2022-04-15 16:54:53 PDT
Comment on
attachment 457731
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457731&action=review
> Source/WTF/wtf/text/CString.h:75 > + std::string toStdString() const { return m_buffer ? std::string(m_buffer->data()) : std::string(); }
Why isn’t the function body just: { return data(); }
Chris Dumez
Comment 9
2022-04-15 16:57:57 PDT
(In reply to Darin Adler from
comment #8
)
> Comment on
attachment 457731
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=457731&action=review
> > > Source/WTF/wtf/text/CString.h:75 > > + std::string toStdString() const { return m_buffer ? std::string(m_buffer->data()) : std::string(); } > > Why isn’t the function body just: > > { return data(); }
I wasn't sure it was OK to pass null to std::string(const char*).
Chris Dumez
Comment 10
2022-04-15 16:58:59 PDT
(In reply to Chris Dumez from
comment #9
)
> (In reply to Darin Adler from
comment #8
) > > Comment on
attachment 457731
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=457731&action=review
> > > > > Source/WTF/wtf/text/CString.h:75 > > > + std::string toStdString() const { return m_buffer ? std::string(m_buffer->data()) : std::string(); } > > > > Why isn’t the function body just: > > > > { return data(); } > > I wasn't sure it was OK to pass null to std::string(const char*).
"If s is a null pointer, if n == npos, or if the range specified by [first,last) is not valid, it causes undefined behavior." from
https://www.cplusplus.com/reference/string/string/string/
Darin Adler
Comment 11
2022-04-15 17:01:22 PDT
Comment on
attachment 457731
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=457731&action=review
>>>> Source/WTF/wtf/text/CString.h:75 >>>> + std::string toStdString() const { return m_buffer ? std::string(m_buffer->data()) : std::string(); } >>> >>> Why isn’t the function body just: >>> >>> { return data(); } >> >> I wasn't sure it was OK to pass null to std::string(const char*). > > "If s is a null pointer, if n == npos, or if the range specified by [first,last) is not valid, it causes undefined behavior." > from
https://www.cplusplus.com/reference/string/string/string/
Oh, the old code had that problem, then! It was just passing .data() without checking for null. But maybe that one particular call site knew it could never be null.
EWS
Comment 12
2022-04-15 21:45:50 PDT
Committed
r292939
(
249704@main
): <
https://commits.webkit.org/249704@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 457731
[details]
.
Radar WebKit Bug Importer
Comment 13
2022-04-15 21:46:15 PDT
<
rdar://problem/91842417
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug