Reduce string copies when converting from NSString/CFStringRef to WTF::String
Created attachment 420020 [details] Patch
Comment on attachment 420020 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=420020&action=review r=me -- but please revert the buffer casting behavior before landing (or post a new patch) > Source/WTF/wtf/text/cf/StringCF.cpp:52 > + buffer.resize(size * sizeof(UChar) / sizeof(LChar)); > + > + StringBuffer<UChar>& ucharBuffer = *reinterpret_cast<StringBuffer<UChar>*>(&buffer); I don't think we should do this part. It's not type-safe or strict-aliasing-safe. Also, in order for the resize() to be an optimization, malloc would need to over-allocate the original buffer by 2X. In bmalloc, that will never happen. So, I recommend just using a new StringBuffer. Or, if I've missed something and there's a reason to believe that resize() will be faster, then I recommend adding a releaseRaw() that gives you the malloc buffer, along with an adopting constructor. That's type-safe and strict-aliasing-safe. (Side note: I think the argument to resize() should have been just "size".)
Created attachment 420025 [details] Patch
> Also, in order for the resize() to be an optimization, > malloc would need to over-allocate the original buffer by 2X. In bmalloc, > that will never happen. Well, I guess it could be an optimization if the allocation were large and the adjacent memory were free? Anyway, in that case, you'd only save the function call to malloc(); the data copy would still have to restart from scratch. So, I'm not sure it would be a win.
If resize doesn't find adjacent memory, wouldn't it need to copy the bytes that we are going to throw away? I think two malloc calls is actually most efficient. I added a scope to free the first before allocating the second
Comment on attachment 420025 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=420025&action=review > Source/WTF/wtf/text/cf/StringCF.cpp:47 > + StringBuffer<LChar> buffer(size); > CFIndex usedBufLen; > - CFIndex convertedsize = CFStringGetBytes(str, CFRangeMake(0, size), kCFStringEncodingISOLatin1, 0, false, lcharBuffer.data(), size, &usedBufLen); > - if ((convertedsize == size) && (usedBufLen == size)) { > - m_impl = StringImpl::create(lcharBuffer.data(), size); > + CFIndex convertedSize = CFStringGetBytes(str, CFRangeMake(0, size), kCFStringEncodingISOLatin1, 0, false, buffer.characters(), size, &usedBufLen); > + if (convertedSize == size && usedBufLen == size) { If we ever regret this behavior, I suppose we can preflight it with a CFStringGetFastestEncoding() check?
(In reply to Alex Christensen from comment #5) > If resize doesn't find adjacent memory, wouldn't it need to copy the bytes > that we are going to throw away? Yes, good point.
commit-queue failed to commit attachment 420025 [details] to WebKit repository. To retry, please set cq+ flag again.
Committed r272752: <https://commits.webkit.org/r272752> All reviewed patches have been landed. Closing bug and clearing flags on attachment 420025 [details].
<rdar://problem/74250930>
Comment on attachment 420025 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=420025&action=review > Source/WTF/wtf/text/cf/StringCF.cpp:48 > + m_impl = StringImpl::adopt(WTFMove(buffer)); This makes a 2-block StringImpl (BufferOwned), but the old code made a 1-block StringImpl (BufferInternal). Doesn’t that mean we are using more memory now? > Source/WTF/wtf/text/cf/StringCF.cpp:55 > + m_impl = StringImpl::adopt(WTFMove(ucharBuffer)); Ditto.
> This makes a 2-block StringImpl (BufferOwned), but the old code made a > 1-block StringImpl (BufferInternal). Doesn’t that mean we are using more > memory now? bmalloc small allocations are precise, so I don't think BufferOwned intrinsically uses more memory. But maybe having two objects instead of one increases risk of fragmentation.