Bug 228161

Summary: SharedBuffer::takeData() is a bit dangerous
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: WebCore Misc.Assignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, benjamin, changseok, cmarcelo, darin, eric.carlson, esprehn+autocc, ews-watchlist, glenn, gyuyoung.kim, jer.noble, mifenton, philipj, sergio, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=228096
https://bugs.webkit.org/show_bug.cgi?id=234724
Attachments:
Description Flags
Patch
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch none

Description Chris Dumez 2021-07-21 12:46:28 PDT
SharedBuffer::takeData() is a bit dangerous since SharedBuffer is RefCounted and several object may be sharing ownership of the buffer. Having one owner call takeData() in case ownership is shared leads to bugs such as Bug 228096.
Comment 1 Chris Dumez 2021-07-21 12:51:36 PDT
Created attachment 433948 [details]
Patch
Comment 2 Alex Christensen 2021-07-21 12:57:06 PDT
Comment on attachment 433948 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433948&action=review

> Source/WebCore/platform/SharedBuffer.cpp:127
> +    Vector<uint8_t> data;

This should use reserveInitialCapacity and uncheckedAppend.
Comment 3 Darin Adler 2021-07-21 12:58:26 PDT
Comment on attachment 433948 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433948&action=review

> Source/WebCore/platform/SharedBuffer.cpp:127
> +    Vector<uint8_t> data;

To make sure we don’t over allocate vector capacity, we could add this:

    data.reserveInitialCapacity(size());

It means we’d iterate the segments twice, but I think that’s better.
Comment 4 Darin Adler 2021-07-21 12:58:41 PDT
Ah, same comment as Alex!
Comment 5 Chris Dumez 2021-07-21 12:59:49 PDT
Created attachment 433950 [details]
Patch
Comment 6 Darin Adler 2021-07-21 13:01:16 PDT
Might still want to offer a takeData with a RELEASE_ASSERT(buffer.hasOneRef()), because silently copying isn’t so great.
Comment 7 Chris Dumez 2021-07-21 13:03:00 PDT
(In reply to Darin Adler from comment #6)
> Might still want to offer a takeData with a
> RELEASE_ASSERT(buffer.hasOneRef()), because silently copying isn’t so great.

Seems risky to me. Also note that there was no takeData() until I introduced it super recently as an optimization. Getting the optimization sometimes is still better than crashing.
Comment 8 Chris Dumez 2021-07-21 13:24:18 PDT
Created attachment 433951 [details]
Patch
Comment 9 Alex Christensen 2021-07-21 13:49:52 PDT
Comment on attachment 433951 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433951&action=review

> Source/WTF/wtf/Vector.h:787
> +    template<typename U> ALWAYS_INLINE void uncheckedAppend(const U* u, size_t size) { uncheckedAppend<FailureAction::Crash>(u, size); }

Do we need this one?  Removing it would encourage Span adoption.

> Source/WebCore/platform/SharedBuffer.h:107
> +    Vector<uint8_t> extractData();

Would it be possible to annotate this with && after () so we can only call it on r-values?  As it stands now, I think it's still risky because it will clear the buffer sometimes and sometimes it won't.  If we make it clear that this can only be called on r-values, we indicate in the code that this SharedBuffer reference shouldn't be used after this call.
Comment 10 Chris Dumez 2021-07-21 13:53:55 PDT
(In reply to Alex Christensen from comment #9)
> Comment on attachment 433951 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=433951&action=review
> 
> > Source/WTF/wtf/Vector.h:787
> > +    template<typename U> ALWAYS_INLINE void uncheckedAppend(const U* u, size_t size) { uncheckedAppend<FailureAction::Crash>(u, size); }
> 
> Do we need this one?  Removing it would encourage Span adoption.
> 
> > Source/WebCore/platform/SharedBuffer.h:107
> > +    Vector<uint8_t> extractData();
> 
> Would it be possible to annotate this with && after () so we can only call
> it on r-values?  As it stands now, I think it's still risky because it will
> clear the buffer sometimes and sometimes it won't.  If we make it clear that
> this can only be called on r-values, we indicate in the code that this
> SharedBuffer reference shouldn't be used after this call.

If we do use &&, then maybe it can simply be an optimized overload of copyData()?

Vector<uint8_t> copyData() const &;
Vector<uint8_t> copyData() &&; // Moves the data out instead of copying when hasOneRef
Comment 11 Alex Christensen 2021-07-21 14:24:54 PDT
I like it.
Comment 12 Chris Dumez 2021-07-21 14:41:42 PDT
Created attachment 433958 [details]
Patch
Comment 13 Chris Dumez 2021-07-21 14:44:10 PDT
(In reply to Chris Dumez from comment #10)
> (In reply to Alex Christensen from comment #9)
> > Comment on attachment 433951 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=433951&action=review
> > 
> > > Source/WTF/wtf/Vector.h:787
> > > +    template<typename U> ALWAYS_INLINE void uncheckedAppend(const U* u, size_t size) { uncheckedAppend<FailureAction::Crash>(u, size); }
> > 
> > Do we need this one?  Removing it would encourage Span adoption.
> > 
> > > Source/WebCore/platform/SharedBuffer.h:107
> > > +    Vector<uint8_t> extractData();
> > 
> > Would it be possible to annotate this with && after () so we can only call
> > it on r-values?  As it stands now, I think it's still risky because it will
> > clear the buffer sometimes and sometimes it won't.  If we make it clear that
> > this can only be called on r-values, we indicate in the code that this
> > SharedBuffer reference shouldn't be used after this call.
> 
> If we do use &&, then maybe it can simply be an optimized overload of
> copyData()?
> 
> Vector<uint8_t> copyData() const &;
> Vector<uint8_t> copyData() &&; // Moves the data out instead of copying when
> hasOneRef

I looked in to that but I don't like that it means I have to keep calling WTFMove() in order the get the optimization.
```
void foo(Ref<SharedBuffer>&& buffer)
{
    auto data = buffer.copyData(); // Doesn't get the optimization.
    // I have to do this, which is ugly and people are unlikely to do it in practice:
    auto data = WTFMove(buffer).copyData(); 
}
```
Comment 14 Darin Adler 2021-07-21 14:45:19 PDT
Please still mark copyData() const.
Comment 15 Chris Dumez 2021-07-21 14:46:17 PDT
Created attachment 433959 [details]
Patch
Comment 16 Alex Christensen 2021-07-21 15:03:05 PDT
You could make a static function that does it:
auto vector = SharedBuffer::copyData(WTFMove(buffer));

I think it's beneficial to have WTFMove because it indicates that buffer shouldn't be used after this call or it will get undefined behavior.  There are even tools that check for it.  There aren't tools that check for extractData().  I even think WTFMove(buffer).copyData() is good C++.
Comment 17 Chris Dumez 2021-07-21 15:06:06 PDT
(In reply to Alex Christensen from comment #16)
> You could make a static function that does it:
> auto vector = SharedBuffer::copyData(WTFMove(buffer));
> 
> I think it's beneficial to have WTFMove because it indicates that buffer
> shouldn't be used after this call or it will get undefined behavior.  There
> are even tools that check for it.  There aren't tools that check for
> extractData().  I even think WTFMove(buffer).copyData() is good C++.

I feel the name "extraData" conveys you shouldn't be using the buffer after calling it. I don't want to over-engineer this thing.

WTFMove(buffer).copyData() may be good C++ but it is not common place in WebKit. We made such optimization for String::isolatedCopy() and look how much the optimization is used in WebKit (even though it could be, in many places).
Comment 18 Darin Adler 2021-07-23 12:32:13 PDT
Comment on attachment 433959 [details]
Patch

Should we do both? Still have the takeData overload for && but have extractData as syntactic sugar so it’s easier to call correctly. Seems like a way to "dip our toes in the water" of the && functions and maybe try to make move(x).take the future idiom.
Comment 19 Chris Dumez 2021-07-23 12:35:46 PDT
(In reply to Darin Adler from comment #18)
> Comment on attachment 433959 [details]
> Patch
> 
> Should we do both? Still have the takeData overload for && but have
> extractData as syntactic sugar so it’s easier to call correctly. Seems like
> a way to "dip our toes in the water" of the && functions and maybe try to
> make move(x).take the future idiom.

In this proposal, would `takeData() &&` do the exact same thing as extractData()?
- If so, why it is useful?
- If not, I assume because it omits the hasOneRef() check and never copies, then I don't think this is ever safe. Just because something is an r-value reference doesn't mean we're the only ones holding a ref to it. Therefore, we'd be re-introduce the class of bugs I am trying to get rid of with this patch.
Comment 20 Chris Dumez 2021-07-23 14:31:52 PDT
Comment on attachment 433959 [details]
Patch

Landing as is for now. Will follow-up if needed based on discussion.
Comment 21 Darin Adler 2021-07-23 14:45:33 PDT
(In reply to Chris Dumez from comment #19)
> In this proposal, would `takeData() &&` do the exact same thing as
> extractData()?

Yes. They are synonyms with different syntax.

> - If so, why it is useful?

We can use it to experiment with the different syntax and to let the static analysis tools know what is going on so they can give errors for any use after moving out before overwriting. Some day, after experimenting with it, we may decide that we prefer it, and we could perhaps remove extractData.
Comment 22 EWS 2021-07-23 15:05:12 PDT
Committed r280260 (239926@main): <https://commits.webkit.org/239926@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 433959 [details].
Comment 23 Radar WebKit Bug Importer 2021-07-23 15:06:15 PDT
<rdar://problem/81038029>