Bug 215232

Summary: REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback
Product: WebKit Reporter: Andy Estes <aestes>
Component: WebKit2Assignee: Andy Estes <aestes>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, darin, eric.carlson, ews-watchlist, ggaren, glenn, jer.noble, philipj, sergio, webkit-bug-importer
Priority: P1 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 208090    
Attachments:
Description Flags
Patch for EWS
none
Patch for EWS
none
Patch for EWS
none
Patch
none
Patch for landing
none
Patch for EWS
none
Patch for landing none

Andy Estes
Reported 2020-08-06 11:49:34 PDT
After r257667, traces show 1.9x more CPU time is spent decoding IPC::SharedBufferDataReference objects during video playback on Netflix.
Attachments
Patch for EWS (6.39 KB, patch)
2020-08-18 11:40 PDT, Andy Estes
no flags
Patch for EWS (44.71 KB, patch)
2020-08-18 21:54 PDT, Andy Estes
no flags
Patch for EWS (50.32 KB, patch)
2020-08-19 03:04 PDT, Andy Estes
no flags
Patch (51.24 KB, patch)
2020-08-19 14:26 PDT, Andy Estes
no flags
Patch for landing (92.81 KB, patch)
2020-08-20 17:19 PDT, Andy Estes
no flags
Patch for EWS (91.81 KB, patch)
2020-08-20 17:52 PDT, Andy Estes
no flags
Patch for landing (91.56 KB, patch)
2020-08-20 18:09 PDT, Andy Estes
no flags
Radar WebKit Bug Importer
Comment 1 2020-08-06 11:51:06 PDT
Andy Estes
Comment 2 2020-08-18 11:40:54 PDT Comment hidden (obsolete)
Andy Estes
Comment 3 2020-08-18 21:54:27 PDT Comment hidden (obsolete)
Andy Estes
Comment 4 2020-08-19 03:04:45 PDT Comment hidden (obsolete)
Geoffrey Garen
Comment 5 2020-08-19 12:15:48 PDT
Comment on attachment 406832 [details] Patch for EWS View in context: https://bugs.webkit.org/attachment.cgi?id=406832&action=review > Source/WebKit/ChangeLog:24 > + 1. While both techniques copy the SharedBuffer only once during encoding, the new technique > + combines the buffer's data segments (by calling SharedBuffer::data()) while the old > + technique avoids this. To fix a regression, a partial revert makes great sense. That said, should we also fix the encoder for SharedBuffer, so that it stops doing this inefficient thing? > Source/WebKit/ChangeLog:30 > + 2. Once a SharedMemory handle is decoded and mapped, the new technique copies its memory > + into a SharedBuffer for use by the message receiver. Unlike the old technique that > + decodes as a pointer into the existing Decoder buffer, the new technique must decode a > + handle, map a SharedMemory allocation, copy it into a new SharedBuffer on the heap, and > + then destroy the handle and SharedMemory. Same question here. > Source/WebKit/Platform/IPC/LegacySharedBufferDataReference.h:36 > +class LegacySharedBufferDataReference { Would be good to add a comment explaining why this should not be used, and what should be used instead.
Andy Estes
Comment 6 2020-08-19 14:26:41 PDT
Andy Estes
Comment 7 2020-08-19 14:37:46 PDT
(In reply to Geoffrey Garen from comment #5) > Comment on attachment 406832 [details] > Patch for EWS > > View in context: > https://bugs.webkit.org/attachment.cgi?id=406832&action=review > > > Source/WebKit/ChangeLog:24 > > + 1. While both techniques copy the SharedBuffer only once during encoding, the new technique > > + combines the buffer's data segments (by calling SharedBuffer::data()) while the old > > + technique avoids this. > > To fix a regression, a partial revert makes great sense. That said, should > we also fix the encoder for SharedBuffer, so that it stops doing this > inefficient thing? Yes, I think we should fix the segment combining issue in a different bug. > > > Source/WebKit/ChangeLog:30 > > + 2. Once a SharedMemory handle is decoded and mapped, the new technique copies its memory > > + into a SharedBuffer for use by the message receiver. Unlike the old technique that > > + decodes as a pointer into the existing Decoder buffer, the new technique must decode a > > + handle, map a SharedMemory allocation, copy it into a new SharedBuffer on the heap, and > > + then destroy the handle and SharedMemory. > > Same question here. Yes, also in a different bug. I'll file these and add FIXMEs.
Andy Estes
Comment 8 2020-08-19 14:40:36 PDT
(Note that https://bugs.webkit.org/attachment.cgi?id=406872 has a much differently worded ChangeLog)
Geoffrey Garen
Comment 9 2020-08-19 15:30:25 PDT
Comment on attachment 406872 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406872&action=review r=me > Source/WebKit/ChangeLog:43 > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an behavior > Source/WebKit/ChangeLog:44 > + Resolve this slowdown by reverting the argument encoding changes made to > + IPC::SharedBufferDataReference in r257667. For compatibility with code added after r257667, > + IPC::SharedBufferDataReference retains its new behavior of copying into a SharedBuffer when > + decoded, but this type is now only used in cases where the SharedBuffer can be ref'd instead > + of copied. Pre-r257667 instances of SharedBufferDataReference were changed to a new > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an > + IPC::DataReference when decoded, avoiding many copies. So LegacySharedBufferDataReference is the fast IPC data type and SharedBufferDataReference is the slow IPC data type? That is... surprising. Are we in danger of new adoptions of SharedBufferDataReference causing perf regressions?
Darin Adler
Comment 10 2020-08-19 15:30:55 PDT
Comment on attachment 406872 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406872&action=review Great work! So exciting to see this diagnosed. > Source/WebKit/ChangeLog:3 > + REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback What’s our strategy to protect ourselves from making this kind of mistake again? Maybe we can find a way to count how often we copy data buffers during regression tests? There can’t be too many different places that do that "copy a lot of bytes" operation. > Source/WebKit/ChangeLog:21 > + handle. It makes only one copy like the old encoder did, but pays the added cost of > + combining SharedBuffer data segments prior to copying by needlessly calling > + SharedBuffer::data() (this should be fixed independently). Seems like we have a real opportunity there; doesn’t seem that hard to code! > Source/WebKit/ChangeLog:43 > + LegacySharedBufferDataReference type, which reverts to the old behaior of producing an I’m not sure that "Legacy" accurately captures the difference between these two. Normally we name something Legacy when the dependency on the old behavior is an unwanted remnant of the past, to eventually be eliminated. Here it seems from your description there are reasons to prefer the two different behaviors for different cases of sending data across the process boundary; ideally those would be reflected in the names. Typo: "behaior" > Tools/DumpRenderTree/DumpRenderTree.xcodeproj/project.pbxproj:414 > + BCB281EE0CFA713D007E533E /* Base.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = Base.xcconfig; path = mac/Configurations/Base.xcconfig; sourceTree = "<group>"; }; > + BCB281F00CFA713D007E533E /* DumpRenderTree.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = DumpRenderTree.xcconfig; path = mac/Configurations/DumpRenderTree.xcconfig; sourceTree = "<group>"; }; > + BCB282F40CFA7450007E533E /* DebugRelease.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = DebugRelease.xcconfig; path = mac/Configurations/DebugRelease.xcconfig; sourceTree = "<group>"; }; > + BCB283DE0CFA7C20007E533E /* TestNetscapePlugIn.xcconfig */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.xcconfig; name = TestNetscapePlugIn.xcconfig; path = mac/Configurations/TestNetscapePlugIn.xcconfig; sourceTree = "<group>"; }; Are these changes improvements? Is "4" a better file encoding than "30"? Or is this just a script doing some reverting?
Andy Estes
Comment 11 2020-08-19 17:06:33 PDT
(In reply to Geoffrey Garen from comment #9) > So LegacySharedBufferDataReference is the fast IPC data type and > SharedBufferDataReference is the slow IPC data type? That is... surprising. > > Are we in danger of new adoptions of SharedBufferDataReference causing perf > regressions? (In reply to Darin Adler from comment #10) > I’m not sure that "Legacy" accurately captures the difference between these > two. Normally we name something Legacy when the dependency on the old > behavior is an unwanted remnant of the past, to eventually be eliminated. > Here it seems from your description there are reasons to prefer the two > different behaviors for different cases of sending data across the process > boundary; ideally those would be reflected in the names. How about I rename IPC::LegacySharedBufferDataReference to IPC::SharedBufferDataReference and call the current IPC::SharedBufferDataReference something like IPC::SharedBufferCopy? The thinking that led to "legacy" was that while IPC::DataReference is currently the faster way to decode, IPC::SharedBufferDataReference provides the safer interface for message receivers with SharedBuffer instead of a raw pointer to an unretained buffer. So that LegacySharedBufferDataReference is decoded as a IPC::DataReference felt like a legacy behavior, but really it's a faster-or-safer tradeoff I want to convey. I think a modern design would be to phase out DataReference (and LegacySharedBufferDataReference) and make SharedBuffer faster to use for decoding. One thing I think would help here is to add a type of SharedBuffer data segment that just refs a SharedMemory or IPC::Decoder, but that's out-of-scope for this change.
Darin Adler
Comment 12 2020-08-19 17:21:47 PDT
All of that sounds smart.
Andy Estes
Comment 13 2020-08-20 17:19:37 PDT Comment hidden (obsolete)
Andy Estes
Comment 14 2020-08-20 17:52:39 PDT
Created attachment 406989 [details] Patch for EWS
Andy Estes
Comment 15 2020-08-20 18:09:26 PDT Comment hidden (obsolete)
Andy Estes
Comment 16 2020-08-20 18:17:21 PDT
(In reply to Darin Adler from comment #10) > Comment on attachment 406872 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=406872&action=review > > Great work! So exciting to see this diagnosed. > > > Source/WebKit/ChangeLog:3 > > + REGRESSION (r257667): 1.9x more CPU time in IPC::SharedBufferDataReference decoding during Netflix playback > > What’s our strategy to protect ourselves from making this kind of mistake > again? > > Maybe we can find a way to count how often we copy data buffers during > regression tests? There can’t be too many different places that do that > "copy a lot of bytes" operation. One thing we could do is assert in IPC::SharedBufferCopy's destructor that m_buffer is either null or has more than one ref. That would detect cases where we created a SharedBuffer that the message receiver never ref'd or moved.
EWS
Comment 17 2020-08-21 04:29:24 PDT
Committed r266003: <https://trac.webkit.org/changeset/266003> All reviewed patches have been landed. Closing bug and clearing flags on attachment 406992 [details].
Note You need to log in before you can comment on or make changes to this bug.