Bug 221766

Summary: Reduce string copies when converting from NSString/CFStringRef to WTF::String
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: New BugsAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, darin, ews-watchlist, ggaren, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch none

Description Alex Christensen 2021-02-11 11:46:29 PST
Reduce string copies when converting from NSString/CFStringRef to WTF::String
Comment 1 Alex Christensen 2021-02-11 11:52:58 PST
Created attachment 420020 [details]
Patch
Comment 2 Geoffrey Garen 2021-02-11 12:12:40 PST
Comment on attachment 420020 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420020&action=review

r=me -- but please revert the buffer casting behavior before landing (or post a new patch)

> Source/WTF/wtf/text/cf/StringCF.cpp:52
> +    buffer.resize(size * sizeof(UChar) / sizeof(LChar));
> +
> +    StringBuffer<UChar>& ucharBuffer = *reinterpret_cast<StringBuffer<UChar>*>(&buffer);

I don't think we should do this part. It's not type-safe or strict-aliasing-safe. Also, in order for the resize() to be an optimization, malloc would need to over-allocate the original buffer by 2X. In bmalloc, that will never happen.

So, I recommend just using a new StringBuffer. Or, if I've missed something and there's a reason to believe that resize() will be faster, then I recommend adding a releaseRaw() that gives you the malloc buffer, along with an adopting constructor. That's type-safe and strict-aliasing-safe.

(Side note: I think the argument to resize() should have been just "size".)
Comment 3 Alex Christensen 2021-02-11 12:17:58 PST
Created attachment 420025 [details]
Patch
Comment 4 Geoffrey Garen 2021-02-11 12:19:41 PST
> Also, in order for the resize() to be an optimization,
> malloc would need to over-allocate the original buffer by 2X. In bmalloc,
> that will never happen.

Well, I guess it could be an optimization if the allocation were large and the adjacent memory were free? Anyway, in that case, you'd only save the function call to malloc(); the data copy would still have to restart from scratch. So, I'm not sure it would be a win.
Comment 5 Alex Christensen 2021-02-11 12:21:26 PST
If resize doesn't find adjacent memory, wouldn't it need to copy the bytes that we are going to throw away?  I think two malloc calls is actually most efficient.
I added a scope to free the first before allocating the second
Comment 6 Geoffrey Garen 2021-02-11 12:23:03 PST
Comment on attachment 420025 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420025&action=review

> Source/WTF/wtf/text/cf/StringCF.cpp:47
> +        StringBuffer<LChar> buffer(size);
>          CFIndex usedBufLen;
> -        CFIndex convertedsize = CFStringGetBytes(str, CFRangeMake(0, size), kCFStringEncodingISOLatin1, 0, false, lcharBuffer.data(), size, &usedBufLen);
> -        if ((convertedsize == size) && (usedBufLen == size)) {
> -            m_impl = StringImpl::create(lcharBuffer.data(), size);
> +        CFIndex convertedSize = CFStringGetBytes(str, CFRangeMake(0, size), kCFStringEncodingISOLatin1, 0, false, buffer.characters(), size, &usedBufLen);
> +        if (convertedSize == size && usedBufLen == size) {

If we ever regret this behavior, I suppose we can preflight it with a CFStringGetFastestEncoding() check?
Comment 7 Geoffrey Garen 2021-02-11 12:23:47 PST
(In reply to Alex Christensen from comment #5)
> If resize doesn't find adjacent memory, wouldn't it need to copy the bytes
> that we are going to throw away?

Yes, good point.
Comment 8 EWS 2021-02-11 13:30:31 PST
commit-queue failed to commit attachment 420025 [details] to WebKit repository. To retry, please set cq+ flag again.
Comment 9 EWS 2021-02-11 14:24:05 PST
Committed r272752: <https://commits.webkit.org/r272752>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 420025 [details].
Comment 10 Radar WebKit Bug Importer 2021-02-11 14:25:27 PST
<rdar://problem/74250930>
Comment 11 Darin Adler 2021-02-12 13:30:24 PST
Comment on attachment 420025 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420025&action=review

> Source/WTF/wtf/text/cf/StringCF.cpp:48
> +            m_impl = StringImpl::adopt(WTFMove(buffer));

This makes a 2-block StringImpl (BufferOwned), but the old code made a 1-block StringImpl (BufferInternal). Doesn’t that mean we are using more memory now?

> Source/WTF/wtf/text/cf/StringCF.cpp:55
> +    m_impl = StringImpl::adopt(WTFMove(ucharBuffer));

Ditto.
Comment 12 Geoffrey Garen 2021-02-12 14:09:23 PST
> This makes a 2-block StringImpl (BufferOwned), but the old code made a
> 1-block StringImpl (BufferInternal). Doesn’t that mean we are using more
> memory now?

bmalloc small allocations are precise, so I don't think BufferOwned intrinsically uses more memory. But maybe having two objects instead of one increases risk of fragmentation.