132229 – Coalesce responses on network process side

Antti Koivisto

Reported 2014-04-27 02:03:33 PDT

To reduce IPC we should coalesce responses in network process and send them over with single IPC call.

Antti Koivisto

Comment 1 2014-04-27 02:13:12 PDT

Created attachment 230258 [details] patch

Andreas Kling

Comment 2 2014-04-27 02:20:49 PDT

Comment on attachment 230258 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=230258&action=review r=me, this looks really sweet. > Source/WebKit2/ChangeLog:30 > + Add a new message type that covers didReceiveResonse, didReceiveBuffer and didFinishLoading in a single message. Typo, didReceiveResponse. > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:31 > +#include <Webcore/ResourceResponse.h> Webcore :| > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:58 > + NetworkResourceLoader* m_coalesingLoader; Typo, m_coalescingLoader.

Antti Koivisto

Comment 3 2014-04-27 02:36:10 PDT

Created attachment 230259 [details] updated

Antti Koivisto

Comment 4 2014-04-27 04:54:59 PDT

https://trac.webkit.org/r167853

Antti Koivisto

Comment 5 2014-04-27 07:15:51 PDT

<rdar://problem/16737186>

Tim Horton

Comment 6 2014-04-28 10:27:18 PDT

This regressed an XHR test: http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=http%2Ftests%2Fxmlhttprequest%2Finteractive-state.html

Tim Horton

Comment 7 2014-04-28 10:43:22 PDT

And a multipart test: http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2Fmultipart%2Finvalid-image-data.html

Tim Horton

Comment 8 2014-04-28 10:45:01 PDT

And also http/tests/multipart/multipart-html.php. The bots didn't pick up the crash logs, but they look like this: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x00000000000008c8 0 com.apple.WebKit2 0x0000000116d5ae2c WebKit::WebDocumentLoader::navigationID() const + 12 (WebDocumentLoader.h:40) 1 com.apple.WebKit2 0x0000000116d559cb WebKit::WebFrameLoaderClient::dispatchDidFailProvisionalLoad(WebCore::ResourceError const&) + 203 (WebFrameLoaderClient.cpp:470) 2 com.apple.WebCore 0x000000011a99cbe8 WebCore::FrameLoader::checkLoadCompleteForThisFrame() + 584 (FrameLoader.cpp:2210) 3 com.apple.WebCore 0x000000011a995974 WebCore::FrameLoader::checkLoadComplete() + 324 (FrameLoader.cpp:2442) 4 com.apple.WebCore 0x000000011a99ec88 WebCore::FrameLoader::receivedMainResourceError(WebCore::ResourceError const&) + 408 (FrameLoader.cpp:2731) 5 com.apple.WebCore 0x000000011a6a4ba4 WebCore::DocumentLoader::mainReceivedError(WebCore::ResourceError const&) + 324 (DocumentLoader.cpp:267) 6 com.apple.WebCore 0x000000011a6a55fe WebCore::DocumentLoader::notifyFinished(WebCore::CachedResource*) + 398 (DocumentLoader.cpp:384) 7 com.apple.WebCore 0x000000011a2c6ccd WebCore::CachedResource::checkNotify() + 109 (CachedResource.cpp:332) 8 com.apple.WebCore 0x000000011a2c6ea1 WebCore::CachedResource::error(WebCore::CachedResource::Status) + 145 (CachedResource.cpp:359) 9 com.apple.WebCore 0x000000011bde8ac9 WebCore::SubresourceLoader::didFail(WebCore::ResourceError const&) + 377 (SubresourceLoader.cpp:339) 10 com.apple.WebKit2 0x0000000116fd6fc5 WebKit::WebResourceLoadScheduler::internallyFailedLoadTimerFired() + 213 (WebResourceLoadScheduler.cpp:166)

WebKit Commit Bot

Comment 9 2014-04-28 10:47:15 PDT

Re-opened since this is blocked by bug 132288

Antti Koivisto

Comment 10 2014-04-29 07:44:14 PDT

Created attachment 230377 [details] disable coalescing for xhr and multipart responses

Alexey Proskuryakov

Comment 11 2014-04-29 09:59:53 PDT

Comment on attachment 230377 [details] disable coalescing for xhr and multipart responses View in context: https://bugs.webkit.org/attachment.cgi?id=230377&action=review > Source/WebKit2/ChangeLog:8 > + To reduce IPC we should coalesce responses in network process and send them over with single IPC call. I don't object to this change, however I also don't understand why "reducing IPC" improves performance. I understand that we see a lot of IPC on profiles, but combining responses into one doesn't get rid of any serialization that we perform - we still need to send requests and responses as many times as before, don't we? A better understanding of where the benefit comes from may help come up with a safer fix that won't be a long-term liability due to added complexity. > Source/WebKit2/ChangeLog:39 > + Add allowResponseCoalescing flag. Main resource and XHR is not allowed to coalesce as it may alter > + observable semantics. One place where coalescing would be a substantial win is actually most XHR use cases. We don't need the partial data for Blob responses, we don't need it for XML responses, and so on. Yet we still need progress notifications for these. For blobs especially, what we do today is buffer the data on WebProcess side, and then send it back to NetworkProcess to register as a blob - crazy! > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:71 > + static const double responseCoalescingTime = 0.5; How did you come up with this number? This seems to mean that slow loading pages will now be committed half a second later than before, which I expect to be observable. > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:88 > + loader->send(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo(m_coalescingResponse, CertificateInfo(m_coalescingResponse), false)); This looks like the best place to send a coalesced response - we already have all the data. What is the reason to not use this technique here? > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:61 > + long long m_coalescingResponseEncodedDataLength; Is a signed value appropriate here? > Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:137 > + Ref<WebResourceLoader> protect(*this); I'm somewhat worried that protecting only the loader is insufficient. I don't know to to tell what would be sufficient. Maybe run API tests with GuardMalloc? But the coverage is very sketchy. Maybe it's OK because the loader protects both Frame and DocumentLoader. But it doesn't protect the Document, which can be closed and replaced. > Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:143 > + didReceiveData(data, encodedDataLength); Seems like we shouldn't do this if client cancels the load while handling didReceiveResponse. Or does m_coreLoader become null in this case?

Antti Koivisto

Comment 12 2014-04-29 12:04:02 PDT

(In reply to comment #11) > > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:71 > > + static const double responseCoalescingTime = 0.5; > > How did you come up with this number? It seemed reasonable. Only thing that it really should affect are progressively rendered images (the first display may be delayed up to 0.5s but will be more complete). Scripts and stylesheets are not processed until they finished in any case. > This seems to mean that slow loading pages will now be committed half a second later than before, which I expect to be observable. Coalescing is not enabled for the main document (or XHR). > > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:88 > > + loader->send(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo(m_coalescingResponse, CertificateInfo(m_coalescingResponse), false)); > > This looks like the best place to send a coalesced response - we already have all the data. What is the reason to not use this technique here? I guess I don't understand what you mean here. > > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:61 > > + long long m_coalescingResponseEncodedDataLength; > > Is a signed value appropriate here? It is the same type as used elsewhere for this data. > > > Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:137 > > + Ref<WebResourceLoader> protect(*this); > > I'm somewhat worried that protecting only the loader is insufficient. I don't know to to tell what would be sufficient. Maybe run API tests with GuardMalloc? But the coverage is very sketchy. > > Maybe it's OK because the loader protects both Frame and DocumentLoader. But it doesn't protect the Document, which can be closed and replaced. This is the same protection other functions here have. > > Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:143 > > + didReceiveData(data, encodedDataLength); > > Seems like we shouldn't do this if client cancels the load while handling didReceiveResponse. Or does m_coreLoader become null in this case? This case shouldn't be any different from the separate-messages case. You may have both didReceiveResponse and didReceiveData messages in-flight at the same time.

Alexey Proskuryakov

Comment 13 2014-04-29 12:30:38 PDT

> > > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:88 > > > + loader->send(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo(m_coalescingResponse, CertificateInfo(m_coalescingResponse), false)); > > > > This looks like the best place to send a coalesced response - we already have all the data. What is the reason to not use this technique here? > > I guess I don't understand what you mean here. This is a missed optimization - a place where we could send a coalesced response, but don't. > > > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:61 > > > + long long m_coalescingResponseEncodedDataLength; > > > > Is a signed value appropriate here? > > It is the same type as used elsewhere for this data. This is not sufficient. You are adding new code, and we don't have to copy old mistakes unless there is a reason to. Using signed values for sizes causes all sorts of bugs. > > Maybe it's OK because the loader protects both Frame and DocumentLoader. But it doesn't protect the Document, which can be closed and replaced. > > This is the same protection other functions here have. Do other functions make multiple calls to client? This new function is very different from others, as it's multi-step. > > > Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:143 > > > + didReceiveData(data, encodedDataLength); > > > > Seems like we shouldn't do this if client cancels the load while handling didReceiveResponse. Or does m_coreLoader become null in this case? > > This case shouldn't be any different from the separate-messages case. You may have both didReceiveResponse and didReceiveData messages in-flight at the same time. I don't understand this answer. There can be multiple messages, but each gets a lot of additional checks before it reaches WebResourceLoader, they don't share the same stack when executed.

Antti Koivisto

Comment 14 2014-04-29 12:49:39 PDT

> This is a missed optimization - a place where we could send a coalesced response, but don't. Right, this path is an existing optimization that coalesces didReceiveData and didFinishLoading for disk cache hit (along with passing an mmapped buffer). It could be improved further but that is bit of a separate topic. > This is not sufficient. You are adding new code, and we don't have to copy old mistakes unless there is a reason to. > > Using signed values for sizes causes all sorts of bugs. I disagree. Consistency is more important. We should fix this everywhere as a separate change. Conversions cause bugs. > Do other functions make multiple calls to client? This new function is very different from others, as it's multi-step. Yes, didReceiveResource does (which is a similar existing optimization, see above). > I don't understand this answer. There can be multiple messages, but each gets a lot of additional checks before it reaches WebResourceLoader, they don't share the same stack when executed. Maybe you could suggest concrete improvements to the patch as I'm not sure what you are after here.

Alexey Proskuryakov

Comment 15 2014-04-29 13:18:37 PDT

The biggest concrete improvement would be one I requested in the first comment - it would be helpful to understand why this improves our metrics. Without this understanding, and given the quality of implementation concerns that I listed, I'm not willing to give an r+ - the risk of regressions and the cost in complexity are relatively high.

Alexey Proskuryakov

Comment 16 2014-04-29 13:19:14 PDT

Comment on attachment 230377 [details] disable coalescing for xhr and multipart responses View in context: https://bugs.webkit.org/attachment.cgi?id=230377&action=review > Source/WebKit2/ChangeLog:14 > + Disable coalesing for multipart/x-response-mixed as they may generate multiple responses. I think that the code is correct, but the comment is not - multipart/x-response-mixed is not the only kind of multipart response that we support.

Antti Koivisto

Comment 17 2014-04-29 13:35:18 PDT

(In reply to comment #15) > The biggest concrete improvement would be one I requested in the first comment - it would be helpful to understand why this improves our metrics. Doing less work, in this case constructing and sending IPC messages, generally improves performance. I'm not sure what further understanding is needed. > Without this understanding, and given the quality of implementation concerns that I listed, I'm not willing to give an r+ - the risk of regressions and the cost in complexity are relatively high. I haven't seen any actionable concerns. Since you "don't object to this change" I suppose I will seek the review elsewhere.

Alexey Proskuryakov

Comment 18 2014-04-29 14:20:12 PDT

> Doing less work, in this case constructing and sending IPC messages, generally improves performance. I'm not sure what further understanding is needed. I find this explanation implausible. There must be something else at play.

Antti Koivisto

Comment 19 2014-04-29 15:25:26 PDT

> I find this explanation implausible. There must be something else at play. For example the network process side profiles for sending messages (over PLT run) look like this: Running Time Self Symbol Name 501.0ms 6.0% 1,0 bool IPC::Connection::send<Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo>(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo&&, unsigned long long, unsigned int) 425.0ms 5.1% 1,0 IPC::ArgumentCoder<WebCore::ResourceResponse>::encode(IPC::ArgumentEncoder&, WebCore::ResourceResponse const&) 66.0ms 0.7% 1,0 IPC::Connection::sendMessage(std::__1::unique_ptr<IPC::MessageEncoder, std::__1::default_delete<IPC::MessageEncoder> >, unsigned int) Running Time Self Symbol Name 306.0ms 3.6% 0,0 bool IPC::Connection::send<Messages::WebResourceLoader::DidReceiveData>(Messages::WebResourceLoader::DidReceiveData&&, unsigned long long, unsigned int) 229.0ms 2.7% 0,0 IPC::ArgumentEncoder::encodeVariableLengthByteArray(IPC::DataReference const&) 67.0ms 0.8% 3,0 IPC::Connection::sendMessage(std::__1::unique_ptr<IPC::MessageEncoder, std::__1::default_delete<IPC::MessageEncoder> >, unsigned int) Running Time Self Symbol Name 63.0ms 0.7% 1,0 bool IPC::MessageSender::send<Messages::WebResourceLoader::DidFinishResourceLoad>(Messages::WebResourceLoader::DidFinishResourceLoad const&, unsigned long long, unsigned int) 58.0ms 0.7% 1,0 IPC::MessageSender::sendMessage(std::__1::unique_ptr<IPC::MessageEncoder, std::__1::default_delete<IPC::MessageEncoder> >, unsigned int) 54.0ms 0.6% 1,0 IPC::Connection::sendMessage(std::__1::unique_ptr<IPC::MessageEncoder, std::__1::default_delete<IPC::MessageEncoder> >, unsigned int) 2.0ms 0.0% 2,0 non-virtual thunk to WebKit::NetworkResourceLoader::messageSenderConnection() 1.0ms 0.0% 1,0 DYLD-STUB$$operator new(unsigned long) Simply combining sendMessage calls (practically all combine) wins 1.5% of main thread time. A portion of encodeVariableLengthByteArray overhead (2.7%) will go away as well as we now do do a single encode per resource instead of multiple ones. There are comparable wins in receiving web process side. Does this make it sound more plausible? These things are not free.

Alexey Proskuryakov

Comment 20 2014-04-29 16:15:29 PDT

Don't these include data serialization, which will still happen with this patch?

Antti Koivisto

Comment 21 2014-04-29 22:51:54 PDT

(In reply to comment #20) > Don't these include data serialization, which will still happen with this patch? No, sendMessage functions are pure low level/kernel overhead. The IPC::ArgumentCoder<>::encode functions do the serialization. ResourceResponse serialization is indeed very slow and should be optimised too.

Antti Koivisto

Comment 22 2014-04-29 23:36:37 PDT

On web process side we spend 1.1% of total CPU time simply receiving IPC messages (this includes UI process messages too) and forwarding them to client. This does not include any type-specific decoding. Running Time Self Symbol Name 771.0ms 1.2% 7,0 _dispatch_source_invoke$VARIANT$mp 760.0ms 1.1% 2,0 _dispatch_source_latch_and_call 719.0ms 1.1% 10,0 IPC::Connection::receiveSourceEventHandler() 275.0ms 0.4% 4,0 IPC::MessageDecoder::MessageDecoder(IPC::DataReference const&, WTF::Vector<IPC::Attachment, 0ul, WTF::CrashOnOverflow>) 215.0ms 0.3% 11,0 IPC::Connection::processIncomingMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >) 158.0ms 0.2% 1,0 mach_msg 43.0ms 0.0% 1,0 vm_deallocate 13.0ms 0.0% 12,0 WTF::fastMalloc(unsigned long) 3.0ms 0.0% 3,0 WTF::Vector<char, 4160ul, WTF::CrashOnOverflow>::resize(unsigned long)

Alexey Proskuryakov

Comment 23 2014-04-29 23:56:07 PDT

This is certainly interesting - there aren't a lot of messages sent, after all. Recently I debugged a problem by printing out all messages between all WebKit processes while loading a complex web page, and that was still very manageable. I wonder how many of these messages are didReceiveData ones - perhaps coalescing only these would be sufficient. It has certainly come up before that parsing data in smaller chunks (due to CFNetworks callbacks never being blocked) might cause trouble, although I don't think that's we've ever seen a confirmation. This makes me think about something that this patch appears to regress. How do we handle Content-Disposition: attachment for subresources? A client needs to make a decision in didReceiveResponse before the delegate call even returns. In fact, the API contract is for the client to get callbacks right away, although I'm not sure if that's readily observable for anything but downloads.

Antti Koivisto

Comment 24 2014-04-30 01:21:13 PDT

(In reply to comment #23) > This is certainly interesting - there aren't a lot of messages sent, after all. Recently I debugged a problem by printing out all messages between all WebKit processes while loading a complex web page, and that was still very manageable. It is just that the "IPC is free" theory is wrong. Note that there is also context switch overhead that doesn't show up in profiles. > I wonder how many of these messages are didReceiveData ones - perhaps coalescing only these would be sufficient. It has certainly come up before that parsing data in smaller chunks (due to CFNetworks callbacks never being blocked) might cause trouble, although I don't think that's we've ever seen a confirmation. This approach is strictly better since it coalesces more messages, at least 3->1 in all non-disk-cache cases. Generally there is no reason to keep our internal IPC looking like the external webkit API. > This makes me think about something that this patch appears to regress. How do we handle Content-Disposition: attachment for subresources? A client needs to make a decision in didReceiveResponse before the delegate call even returns. In fact, the API contract is for the client to get callbacks right away, although I'm not sure if that's readily observable for anything but downloads. I don't understand the mechanism this patch would affect "Content-Disposition: attachment" handling. The client is still free to make decisions in didReceiveResponse. But we can certainly disable coalescing in that case if it is a concern.

Alexey Proskuryakov

Comment 25 2014-04-30 09:36:47 PDT

> I don't understand the mechanism this patch would affect "Content-Disposition: attachment" handling. The client is still free to make decisions in didReceiveResponse. But we can certainly disable coalescing in that case if it is a concern. The decision to convert to download must be made during didReceiveResponse handling, not after the delegate returns control to CFNetwork.

Antti Koivisto

Comment 26 2014-04-30 11:03:59 PDT

> The decision to convert to download must be made during didReceiveResponse handling, not after the delegate returns control to CFNetwork. I don't see how that works with existing code that uses asynchronous didReceiveResponse messages.

Alexey Proskuryakov

Comment 27 2014-04-30 11:15:10 PDT

What counts is that delegate calls are synchronous, see -connection:didReceiveResponse: in WebCoreResourceHandleAsOperationQueueDelegate.mm. We do convert that into a async message + async completion message for IPC. Once a completion message is received, -continueDidReceiveResponse is called, which signals a semaphore, and unblocks the delegate. I'm not exactly sure where the code that converts an NSURLConnection to download is, but it has to be there somewhere, running before the delegate returns.

Antti Koivisto

Comment 28 2014-04-30 11:28:02 PDT

(In reply to comment #27) > What counts is that delegate calls are synchronous, see -connection:didReceiveResponse: in WebCoreResourceHandleAsOperationQueueDelegate.mm. > > We do convert that into a async message + async completion message for IPC. Once a completion message is received, -continueDidReceiveResponse is called, which signals a semaphore, and unblocks the delegate. I'm not exactly sure where the code that converts an NSURLConnection to download is, but it has to be there somewhere, running before the delegate returns. The continueDidReceiveResponse callback is only requested for the main resource (the third parameter here): loader->sendAbortingOnFailure(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo(response, CertificateInfo(response), loader->isLoadingMainResource())); This patch does not affect main resource load.

Alexey Proskuryakov

Comment 29 2014-04-30 11:41:29 PDT

Is it correct behavior that we ignore Content-Disposition for non-main resources? Do other browsers do the same? If it's just a bug, making our performance depend on it would be not great. I'm not sure why we need to go back and forth on this. This is essentially the same question I had in comment 23, and it needs a complete answer.

Csaba Osztrogonác

Comment 30 2014-06-04 03:32:12 PDT

Comment on attachment 230258 [details] patch Cleared Andreas Kling's review+ from obsolete attachment 230258 [details] so that this bug does not appear in http://webkit.org/pending-commit.

Darin Adler

Comment 31 2014-07-12 16:46:48 PDT

Comment on attachment 230377 [details] disable coalescing for xhr and multipart responses Antti, what’s the status on this 3 month old unreviewed patch?

Antti Koivisto

Comment 32 2014-07-13 07:27:49 PDT

Pratik landed similar change in bug 134560

Antti Koivisto

Comment 33 2014-07-13 07:28:30 PDT

*** This bug has been marked as a duplicate of bug 134560 ***

Attachments
patch (13.96 KB, patch) 2014-04-27 02:13 PDT, Antti Koivisto	no flags	Details Formatted Diff Diff
updated (13.69 KB, patch) 2014-04-27 02:36 PDT, Antti Koivisto	no flags	Details Formatted Diff Diff
disable coalescing for xhr and multipart responses (18.50 KB, patch) 2014-04-29 07:44 PDT, Antti Koivisto	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.