Summary: | Distinguish contiguous SharedBuffer from non-contiguous one and guarantee immutability | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jean-Yves Avenard [:jya] <jean-yves.avenard> |
Component: | WebCore Misc. | Assignee: | Jean-Yves Avenard [:jya] <jean-yves.avenard> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | achristensen, aestes, alecflett, beidson, benjamin, berto, calvaris, cdumez, cgarcia, changseok, darin, eric.carlson, esprehn+autocc, ews-watchlist, galpeter, ggaren, glenn, gustavo, gyuyoung.kim, hi, Hironori.Fujii, hta, japhet, jer.noble, joepeck, jsbell, kangil.han, macpherson, menard, mifenton, mmaxfield, pangle, philipj, pnormand, sam, sergio, tommyw, vjaquez, webkit-bug-importer, youennf |
Priority: | P2 | Keywords: | InRadar |
Version: | Other | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
See Also: |
https://bugs.webkit.org/show_bug.cgi?id=170956 https://bugs.webkit.org/show_bug.cgi?id=233363 https://bugs.webkit.org/show_bug.cgi?id=233511 |
||
Bug Depends on: | 233401, 233411, 233481 | ||
Bug Blocks: | 232424, 233441, 233442, 233677 | ||
Attachments: |
Description
Jean-Yves Avenard [:jya]
2021-11-12 00:26:52 PST
I like that idea. Analogous with StringBuilder and String. (In reply to Darin Adler from comment #2) > I like that idea. > > Analogous with StringBuilder and String. I hesitated with going with a SharedBufferWriter ; but the changes required are more invasive. Turned out, that 90% of the SharedBuffer use in webkit are actually only ever made of a single DataSegment. The remaining are all related to networking and receiving the data in chunk. Such SharedBuffer are ultimately always flatten into a contiguous one at the end of its lifecycle, either because they are transferred over IPC, or because the underneath code (such as images) needs a plain buffer. Could revisit the approach you mention. On second thought, yeah, I see how it can be done easily on top of my current changes. In case this is helpful, some more thoughts: I’m not insisting that you stick so close to the StringBuilder concept, but I do think it’s a really similar relationship. We made String immutable (I did this a long time ago, originally it was not!). It’s possible that we will find that SharedBuffer "builders" could be longer lived than the StringBuilder ever is, though. On the other hand, I don’t see how you "share" something when it’s being mutated! Created attachment 444667 [details]
WIP
Created attachment 444794 [details]
WIP
Created attachment 444924 [details]
WIP
this patch contains the lot: split ContiguousSharedBuffer from SharedBuffer where data() and the various methods requiring a flat/contiguous SharedBuffer are now only available on ContiguousSharedBuffer Introduce SharedBufferBuilder to create a SharedBuilder the latter now being fully immutable with the exception of takeData which checks if there's only a single ref. though this will likely go away, as the extractData operation isn't atomic and as much not thread-safe. I made most used of SharedBuffer that were only ever used with a single DataSegment a ContiguousSharedBuffer. SharedBuffer that always ended being flatten are now made contiguous when no more buffers would be added to a SharedBuffer. Places that only needed a single conversion to a contiguous buffer are now created at the point of use using makeContiguous I will be splitting this patch in two: one with contiguous buffer only the other with the SharedBufferBuilder this ended up being a much bigger effort than expected. Created attachment 444925 [details]
Patch for EWS
Created attachment 444935 [details]
WIP Combined Patch for EWS
Fix gtk/windows compilation, handle case where creation of SharedBuffer with a CFData* results in an empty buffer
Created attachment 444936 [details]
WIP Combined Patch for EWS
gstreamer fixes, more gtk/windows fixes
Created attachment 444951 [details]
WIP Combined Patch for EWS
more gstreamer and gtk/windows fixes, fix RangeResponseGenerator, fix ScriptBufferSourceProvider : need further thought to StringView not owning its content and to keep alive the flatten SharedBuffer without temporarily doubling memory usage
Created attachment 444997 [details]
WIP ContiguousSharedBuffer for EWS
More Windows and gstreamer compilation fixes. Fix API test now that call to data() doesn't modify the SharedBuffer
Created attachment 444998 [details]
WIP ContiguousSharedBuffer for EWS
rebase
Created attachment 444999 [details]
WIP ContiguousSharedBuffer for EWS
rebase2
Created attachment 445000 [details]
WIP ContiguousSharedBuffer for EWS
Created attachment 445002 [details]
WIP ContiguousSharedBuffer for EWS
Created attachment 445003 [details]
WIP ContiguousSharedBuffer for EWS
yet more gstreamer compilation fix, need a build machine really
Created attachment 445005 [details]
WIP ContiguousSharedBuffer for EWS
iOS sim, SharedBufferWin compilation fixes
Created attachment 445008 [details]
WIP ContiguousSharedBuffer for EWS
Windows compilation fixes
Created attachment 445011 [details]
WIP ContiguousSharedBuffer for EWS
Created attachment 445018 [details]
WIP ContiguousSharedBuffer for EWS
Windows JPEG decoder compilation fix
Created attachment 445022 [details]
WIP ContiguousSharedBuffer for EWS
Fix windows wpe crash, hopefully make gtk finally compile
I'm sorry I didn't have time to check this, I can check the gtk/wpe build and attach additional fixes here... GTK/WPE fixes http://sprunge.us/vmcu7f (In reply to Philippe Normand from comment #26) > GTK/WPE fixes http://sprunge.us/vmcu7f Thanks ! was taking me forever with only one error reported at a time :) I believe some makeContiguous() are unnecessary as the original buffer already is. (In reply to Philippe Normand from comment #26) > GTK/WPE fixes http://sprunge.us/vmcu7f When doing things like: > // FIXME: ideally we would encode the content as a stream without having to fetch it all. > - auto* data = resource.data->data(); > + auto* data = resource.data->makeContiguous()->data(); data would be a dangling pointer once the expression is evaluated. I wonder if such an incorrect use should warrant data() to make data() && = delete It's going to be quite painful for some use, but that would be an easy mistake or have data return an object that is refcounted itself. Created attachment 445087 [details]
WIP ContiguousSharedBuffer for EWS
wpe and gtk fixes
Created attachment 445091 [details]
WIP ContiguousSharedBuffer for EWS
last gtk fixes
Created attachment 445095 [details]
WIP ContiguousSharedBuffer for EWS
uploaded wrong stack of patches
Thanks for the patch. If this patch contains new public API please make sure it follows the guidelines for new WebKit2 GTK+ API. See https://trac.webkit.org/wiki/WebKitGTK/AddingNewWebKit2API Created attachment 445179 [details]
WIP ContiguousSharedBuffer for EWS
final preliminary version
Created attachment 445180 [details]
ContiguousSharedBuffer for EWS
final preliminary version
Created attachment 445181 [details]
ContiguousSharedBuffer for review
This is a great idea. One thing I don’t like about this change is that the simpler case, the contiguous buffer, has the longer name. I would call the contiguous immutable one by the shorter name and rename the more complex mutable one that supports segments. (In reply to Darin Adler from comment #37) > This is a great idea. > > One thing I don’t like about this change is that the simpler case, the > contiguous buffer, has the longer name. I would call the contiguous > immutable one by the shorter name and rename the more complex mutable one > that supports segments. After bug 233442 (introducing SharedBufferBuilder), neither will be mutable. What about SharedBuffer which by default is a contiguous one and is the base class. And CompositeSharedBuffer ? there will be a need to reshuffle the logic a bit as there's a lot of place in the code that are expecting a SharedBuffer as base class. Would make the change even more massive. Probably would be simpler still to have CompositeSharedBuffer be the base class and SharedBuffer inherit from it. I guess we can do the renames as mechanical steps later. I’d like to make sure the most-commonly-used class has the shortest name. Created attachment 445502 [details] Patch rebase following bug 233471 Created attachment 445504 [details]
Patch for EWS
(In reply to Darin Adler from comment #40) > I guess we can do the renames as mechanical steps later. I’d like to make > sure the most-commonly-used class has the shortest name. I will be taking this in a follow-up patch. That will make review more readable. Will land all three patches at once. So I believe the patch as-is, is ready for review. (In reply to Darin Adler from comment #40) > I guess we can do the renames as mechanical steps later. I’d like to make > sure the most-commonly-used class has the shortest name. FWIW, there are 440 use of SharedBuffer left vs 422 of ContiguousSharedBuffer :) Created attachment 445551 [details]
Patch for EWS
Fix gtk build
Created attachment 445553 [details]
Patch
Created attachment 445555 [details]
Patch
Created attachment 445704 [details]
Patch
rebase
(In reply to Jean-Yves Avenard [:jya] from comment #44) > (In reply to Darin Adler from comment #40) > > I guess we can do the renames as mechanical steps later. I’d like to make > > sure the most-commonly-used class has the shortest name. > > FWIW, there are 440 use of SharedBuffer left vs 422 of > ContiguousSharedBuffer :) Miscalculated: there are 608 ContiguousSharedBuffer in use following this patch, and 1077 SharedBuffer. Not doubt there's much more SharedBuffer that could be made into contiguous ones permanently. But I'm not sure the contiguous SharedBuffer will be the most commonly used. Created attachment 445918 [details]
Patch
Simplify ContiguousSharedBuffer::createCFData() on non-Fundation
Created attachment 445948 [details]
Patch
rebase
Created attachment 445960 [details]
Patch
fix win crash
Created attachment 445963 [details]
Patch
Created attachment 445964 [details]
Patch
Created attachment 445993 [details]
Patch
Fix SharedBufferPOSIX.cpp, it's not compiled by any of our bots
Comment on attachment 445993 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=445993&action=review > Source/WebCore/ChangeLog:32 > + There's a potential increase of temporary memory usage with the > + ScriptBufferSourceProvider class; bug 233511 is tracking it. Presumably this is a small enough amount of additional memory that it’s OK and we don’t have "unshippable" WebKit until it’s done. > Source/WebCore/platform/SharedBuffer.cpp:102 > + if (m_segments.size() == 1) > + return ContiguousSharedBuffer::create(m_segments[0].segment.copyRef()); Isn’t this less efficient than just returning a pointer to the first segment in this case? How common is this likely to be? There’s a new failure mode in the new design where the same shared buffer repeatedly is converted into a ContiguousSharedBuffer. What guarantees that won’t happen. > Source/WebCore/platform/SharedBuffer.cpp:172 > +size_t SharedBuffer::size() const > +{ > + return m_size; > +} > + > +bool SharedBuffer::isEmpty() const > +{ > + return !size(); > +} > + > +bool SharedBuffer::isContiguous() const > +{ > + return m_contiguous; > +} Should any of these go in the header instead as inlines? > Source/WebCore/platform/SharedBuffer.cpp:226 > +void SharedBuffer::append(const char* data, size_t length) > +{ > + append(reinterpret_cast<const uint8_t*>(data), length); > +} Should this go in the header instead as an inline? > Source/WebCore/platform/SharedBuffer.cpp:349 > bool SharedBuffer::hasOneSegment() const > { > - auto it = begin(); > - return it != end() && ++it == end(); > + return m_segments.size() == 1; > } Should this go in the header instead as an inline? > Source/WebCore/platform/SharedBuffer.cpp:412 > +const char* ContiguousSharedBuffer::dataAsCharPtr() const > +{ > + return reinterpret_cast<const char*>(data()); > +} Should this go in the header instead as an inline? > Source/WebCore/platform/SharedBuffer.cpp:544 > +const char* SharedBufferDataView::dataAsCharPtr() const > +{ > + return reinterpret_cast<const char*>(data()); > +} Should this go in the header instead as an inline? > Source/WebCore/platform/SharedBuffer.h:79 > + static Ref<DataSegment> create(Vector<uint8_t>&& data) > + { > + data.shrinkToFit(); > + return adoptRef(*new DataSegment(WTFMove(data))); > + } For future reference/refactoring: Functions like these probably should not be inlined in the header. It makes the code bigger at each call site if there is more than one, and the constructors themselves can often get inlined in these create functions if they are moved out of the header file, even if the constructors aren’t explicitly marked inline. Same for the other create functions. > Source/WebCore/platform/SharedBuffer.h:140 > +#if !USE(FOUNDATION) > + friend class ContiguousSharedBuffer; // For createCFData > +#endif Do we really need the #if here? > Source/WebCore/platform/SharedBuffer.h:281 > + // Ensure that you can't call append on a ContiguousSharedBuffer directly. > + // When called from the base class, it will assert. > + template <typename... Args> > + void append(Args&&...) = delete; Is there an alternate version of this design that doesn’t have the Liskov Substitutability Principle problem? > Source/WebCore/platform/SharedBuffer.h:283 > + WEBCORE_EXPORT const uint8_t *data() const; Misplaced "*" here. > Source/WebKit/WebProcess/InjectedBundle/InjectedBundlePageLoaderClient.cpp:77 > + data = API::Data::createWithoutCopying((const unsigned char*)contiguousBuffer->data(), contiguousBuffer->size(), releaseSharedBuffer, contiguousBuffer.ptr()); This typecast is not needed, that’s already the correct type. > Source/WebKit/WebProcess/Network/NetworkProcessConnection.cpp:284 > + RefPtr<ContiguousSharedBuffer> buffer = handle.tryWrapInSharedBuffer(); I suggest auto here. > Source/WebKit/WebProcess/WebPage/mac/WebPageMac.mm:569 > + RefPtr<ContiguousSharedBuffer> buffer = frame.editor().dataSelectionForPasteboard(pasteboardType); I suggest auto here. > Source/WebKit/WebProcess/cocoa/VideoFullscreenManager.mm:-235 > -#if HAVE(UIKIT_WEBKIT_INTERNALS) > - return mode == HTMLMediaElementEnums::VideoFullscreenModeStandard; > -#else > return true; > -#endif This seems like an unrelated change that was included in this patch by accident? > Source/WebKit/WebProcess/cocoa/VideoFullscreenManager.mm:242 > -#if PLATFORM(IOS_FAMILY) && !HAVE(UIKIT_WEBKIT_INTERNALS) > +#if PLATFORM(IOS_FAMILY) Ditto. > Source/WebKitLegacy/mac/WebView/WebHTMLRepresentation.mm:233 > + auto *parsedArchiveData = [_private->dataSource _documentLoader]->parsedArchiveData(); This should be just auto or auto*, not "auto *". > Source/WebKitLegacy/mac/WebView/WebResource.mm:167 > - NSData *data = nil; > + > + RetainPtr<NSData> data; Good fix! I was prepping for createNSData and spotted this error, and was about to ask you to fix it, but looks like you spotted it too. > Tools/TestWebKitAPI/Tests/WebCore/SharedBuffer.cpp:46 > + RefPtr<SharedBuffer> buffer = ContiguousSharedBuffer::createWithContentsOfFile(String("not_existing_file")); Would more auto make these better? Comment on attachment 445993 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=445993&action=review >> Source/WebCore/ChangeLog:32 >> + ScriptBufferSourceProvider class; bug 233511 is tracking it. > > Presumably this is a small enough amount of additional memory that it’s OK and we don’t have "unshippable" WebKit until it’s done. Here we have a trade-off between speed and memory efficiency. I would need to discuss with others on what we want to achieve here. We could always flatten the buffer before storing it into a ScriptBufferSourceProvider >> Source/WebCore/platform/SharedBuffer.cpp:102 >> + return ContiguousSharedBuffer::create(m_segments[0].segment.copyRef()); > > Isn’t this less efficient than just returning a pointer to the first segment in this case? How common is this likely to be? > > There’s a new failure mode in the new design where the same shared buffer repeatedly is converted into a ContiguousSharedBuffer. What guarantees that won’t happen. > Isn’t this less efficient than just returning a pointer to the first segment in this case? How common is this likely to be? I made the constructor private to keep with the existing style, forcing to go through the create method. I don't see how this would be less efficient, we just explicitly create the object rather than the let the compiler decides what to do. > There’s a new failure mode in the new design where the same shared buffer repeatedly is converted into a ContiguousSharedBuffer In practice it shouldn't happen. A buffer is either flattened when we read its content like when dealing with external framework that requires a pointer and this is done once. Where multiple conversions could occur on the same object, this is where I tracked where the SharedBuffer was first created and made it a ContiguousSharedBuffer instead. There's no guarantee in the future that it won't happen though. However, after bug 233442 and 233677 , I believe the naming gives more visibility of what is happening and how to do the "right thing" will be more obvious to folks. >> Source/WebCore/platform/SharedBuffer.cpp:172 >> +} > > Should any of these go in the header instead as inlines? it's a bit confusing on which is to be made inline vs which one we don't want to. The issue however is with WEBCORE_EXPORT which can't be used on the class if there are inlines. >> Source/WebCore/platform/SharedBuffer.h:79 >> + } > > For future reference/refactoring: > > Functions like these probably should not be inlined in the header. It makes the code bigger at each call site if there is more than one, and the constructors themselves can often get inlined in these create functions if they are moved out of the header file, even if the constructors aren’t explicitly marked inline. > > Same for the other create functions. I'll move those to the code then. >> Source/WebCore/platform/SharedBuffer.h:140 >> +#endif > > Do we really need the #if here? no. but it's the only place it's needed. I'll remove it. >> Source/WebCore/platform/SharedBuffer.h:281 >> + void append(Args&&...) = delete; > > Is there an alternate version of this design that doesn’t have the Liskov Substitutability Principle problem? This pattern disappear following bug 233442, append no longer exists. This was primarily used as a WIP where most SharedBuffer instances where converted to ContiguousSharedBuffer and could tell at compile time where it was misused. Easier to spot than waiting on an assertion during a test. >> Source/WebKit/WebProcess/cocoa/VideoFullscreenManager.mm:-235 >> -#endif > > This seems like an unrelated change that was included in this patch by accident? oops. Comment on attachment 445993 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=445993&action=review >> Tools/TestWebKitAPI/Tests/WebCore/SharedBuffer.cpp:46 >> + RefPtr<SharedBuffer> buffer = ContiguousSharedBuffer::createWithContentsOfFile(String("not_existing_file")); > > Would more auto make these better? probably, I mostly kept the existing code as-is. Created attachment 446118 [details]
Patch
apply comments
Created attachment 446125 [details]
Patch
apply comments
Created attachment 446127 [details] Patch pick some fly-by fixes from bug 233442 requires for this patch to not cause regressions Created attachment 446525 [details]
Patch
Rebase
Created attachment 446529 [details]
Patch
Fix compilation error following rebase
Created attachment 446812 [details]
Patch
rebase
Created attachment 446870 [details]
Patch
Rebase, convert new code to ContiguousSharedBuffer
Created attachment 446874 [details]
Patch
Fix compilation on iOS, GTK and Windows
Created attachment 446877 [details]
Patch
Created attachment 446929 [details]
Patch
Yet another rebase
Created attachment 446966 [details]
Patch
Fix intermittent crash
Committed r286937 (245163@main): <https://commits.webkit.org/245163@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 446966 [details]. Filed: Bug 241110 – REGRESSION(245163@main) Can't load a very large image as an image document |