Bug 132229 - Coalesce responses on network process side
Summary: Coalesce responses on network process side
Status: RESOLVED DUPLICATE of bug 134560
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on: 132288
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-27 02:03 PDT by Antti Koivisto
Modified: 2014-07-13 07:28 PDT (History)
6 users (show)

See Also:


Attachments
patch (13.96 KB, patch)
2014-04-27 02:13 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff
updated (13.69 KB, patch)
2014-04-27 02:36 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff
disable coalescing for xhr and multipart responses (18.50 KB, patch)
2014-04-29 07:44 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Antti Koivisto 2014-04-27 02:03:33 PDT
To reduce IPC we should coalesce responses in network process and send them over with single IPC call.
Comment 1 Antti Koivisto 2014-04-27 02:13:12 PDT
Created attachment 230258 [details]
patch
Comment 2 Andreas Kling 2014-04-27 02:20:49 PDT
Comment on attachment 230258 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=230258&action=review

r=me, this looks really sweet.

> Source/WebKit2/ChangeLog:30
> +            Add a new message type that covers didReceiveResonse, didReceiveBuffer and didFinishLoading in a single message.

Typo, didReceiveResponse.

> Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:31
> +#include <Webcore/ResourceResponse.h>

Webcore :|

> Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:58
> +    NetworkResourceLoader* m_coalesingLoader;

Typo, m_coalescingLoader.
Comment 3 Antti Koivisto 2014-04-27 02:36:10 PDT
Created attachment 230259 [details]
updated
Comment 4 Antti Koivisto 2014-04-27 04:54:59 PDT
https://trac.webkit.org/r167853
Comment 5 Antti Koivisto 2014-04-27 07:15:51 PDT
<rdar://problem/16737186>
Comment 8 Tim Horton 2014-04-28 10:45:01 PDT
And also http/tests/multipart/multipart-html.php. The bots didn't pick up the crash logs, but they look like this:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000000000008c8
0   com.apple.WebKit2               0x0000000116d5ae2c WebKit::WebDocumentLoader::navigationID() const + 12 (WebDocumentLoader.h:40)
1   com.apple.WebKit2               0x0000000116d559cb WebKit::WebFrameLoaderClient::dispatchDidFailProvisionalLoad(WebCore::ResourceError const&) + 203 (WebFrameLoaderClient.cpp:470)
2   com.apple.WebCore               0x000000011a99cbe8 WebCore::FrameLoader::checkLoadCompleteForThisFrame() + 584 (FrameLoader.cpp:2210)
3   com.apple.WebCore               0x000000011a995974 WebCore::FrameLoader::checkLoadComplete() + 324 (FrameLoader.cpp:2442)
4   com.apple.WebCore               0x000000011a99ec88 WebCore::FrameLoader::receivedMainResourceError(WebCore::ResourceError const&) + 408 (FrameLoader.cpp:2731)
5   com.apple.WebCore               0x000000011a6a4ba4 WebCore::DocumentLoader::mainReceivedError(WebCore::ResourceError const&) + 324 (DocumentLoader.cpp:267)
6   com.apple.WebCore               0x000000011a6a55fe WebCore::DocumentLoader::notifyFinished(WebCore::CachedResource*) + 398 (DocumentLoader.cpp:384)
7   com.apple.WebCore               0x000000011a2c6ccd WebCore::CachedResource::checkNotify() + 109 (CachedResource.cpp:332)
8   com.apple.WebCore               0x000000011a2c6ea1 WebCore::CachedResource::error(WebCore::CachedResource::Status) + 145 (CachedResource.cpp:359)
9   com.apple.WebCore               0x000000011bde8ac9 WebCore::SubresourceLoader::didFail(WebCore::ResourceError const&) + 377 (SubresourceLoader.cpp:339)
10  com.apple.WebKit2               0x0000000116fd6fc5 WebKit::WebResourceLoadScheduler::internallyFailedLoadTimerFired() + 213 (WebResourceLoadScheduler.cpp:166)
Comment 9 WebKit Commit Bot 2014-04-28 10:47:15 PDT
Re-opened since this is blocked by bug 132288
Comment 10 Antti Koivisto 2014-04-29 07:44:14 PDT
Created attachment 230377 [details]
disable coalescing for xhr and multipart responses
Comment 11 Alexey Proskuryakov 2014-04-29 09:59:53 PDT
Comment on attachment 230377 [details]
disable coalescing for xhr and multipart responses

View in context: https://bugs.webkit.org/attachment.cgi?id=230377&action=review

> Source/WebKit2/ChangeLog:8
> +        To reduce IPC we should coalesce responses in network process and send them over with single IPC call.

I don't object to this change, however I also don't understand why "reducing IPC" improves performance.

I understand that we see a lot of IPC on profiles, but combining responses into one doesn't get rid of any serialization that we perform - we still need to send requests and responses as many times as before, don't we?

A better understanding of where the benefit comes from may help come up with a safer fix that won't be a long-term liability due to added complexity.

> Source/WebKit2/ChangeLog:39
> +            Add allowResponseCoalescing flag. Main resource and XHR is not allowed to coalesce as it may alter
> +            observable semantics.

One place where coalescing would be a substantial win is actually most XHR use cases. We don't need the partial data for Blob responses, we don't need it for XML responses, and so on. Yet we still need progress notifications for these.

For blobs especially, what we do today is buffer the data on WebProcess side, and then send it back to NetworkProcess to register as a blob - crazy!

> Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:71
> +        static const double responseCoalescingTime = 0.5;

How did you come up with this number?

This seems to mean that slow loading pages will now be committed half a second later than before, which I expect to be observable.

> Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:88
> +            loader->send(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo(m_coalescingResponse, CertificateInfo(m_coalescingResponse), false));

This looks like the best place to send a coalesced response - we already have all the data. What is the reason to not use this technique here?

> Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:61
> +    long long m_coalescingResponseEncodedDataLength;

Is a signed value appropriate here?

> Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:137
> +    Ref<WebResourceLoader> protect(*this);

I'm somewhat worried that protecting only the loader is insufficient. I don't know to to tell what would be sufficient. Maybe run API tests with GuardMalloc? But the coverage is very sketchy.

Maybe it's OK because the loader protects both Frame and DocumentLoader. But it doesn't protect the Document, which can be closed and replaced.

> Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:143
> +        didReceiveData(data, encodedDataLength);

Seems like we shouldn't do this if client cancels the load while handling didReceiveResponse. Or does m_coreLoader become null in this case?
Comment 12 Antti Koivisto 2014-04-29 12:04:02 PDT
(In reply to comment #11)
> > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:71
> > +        static const double responseCoalescingTime = 0.5;
> 
> How did you come up with this number?

It seemed reasonable. Only thing that it really should affect are progressively rendered images (the first display may be delayed up to 0.5s but will be more complete). Scripts and stylesheets are not processed until they finished in any case.

> This seems to mean that slow loading pages will now be committed half a second later than before, which I expect to be observable.

Coalescing is not enabled for the main document (or XHR).

> > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:88
> > +            loader->send(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo(m_coalescingResponse, CertificateInfo(m_coalescingResponse), false));
> 
> This looks like the best place to send a coalesced response - we already have all the data. What is the reason to not use this technique here?

I guess I don't understand what you mean here.

> > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:61
> > +    long long m_coalescingResponseEncodedDataLength;
> 
> Is a signed value appropriate here?

It is the same type as used elsewhere for this data.

> 
> > Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:137
> > +    Ref<WebResourceLoader> protect(*this);
> 
> I'm somewhat worried that protecting only the loader is insufficient. I don't know to to tell what would be sufficient. Maybe run API tests with GuardMalloc? But the coverage is very sketchy.
> 
> Maybe it's OK because the loader protects both Frame and DocumentLoader. But it doesn't protect the Document, which can be closed and replaced.

This is the same protection other functions here have.

> > Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:143
> > +        didReceiveData(data, encodedDataLength);
> 
> Seems like we shouldn't do this if client cancels the load while handling didReceiveResponse. Or does m_coreLoader become null in this case?

This case shouldn't be any different from the separate-messages case. You may have both didReceiveResponse and didReceiveData messages in-flight at the same time.
Comment 13 Alexey Proskuryakov 2014-04-29 12:30:38 PDT
> > > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.cpp:88
> > > +            loader->send(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo(m_coalescingResponse, CertificateInfo(m_coalescingResponse), false));
> > 
> > This looks like the best place to send a coalesced response - we already have all the data. What is the reason to not use this technique here?
> 
> I guess I don't understand what you mean here.

This is a missed optimization - a place where we could send a coalesced response, but don't.

> > > Source/WebKit2/NetworkProcess/AsynchronousNetworkLoaderClient.h:61
> > > +    long long m_coalescingResponseEncodedDataLength;
> > 
> > Is a signed value appropriate here?
> 
> It is the same type as used elsewhere for this data.

This is not sufficient. You are adding new code, and we don't have to copy old mistakes unless there is a reason to.

Using signed values for sizes causes all sorts of bugs.

> > Maybe it's OK because the loader protects both Frame and DocumentLoader. But it doesn't protect the Document, which can be closed and replaced.
> 
> This is the same protection other functions here have.

Do other functions make multiple calls to client? This new function is very different from others, as it's multi-step.

> > > Source/WebKit2/WebProcess/Network/WebResourceLoader.cpp:143
> > > +        didReceiveData(data, encodedDataLength);
> > 
> > Seems like we shouldn't do this if client cancels the load while handling didReceiveResponse. Or does m_coreLoader become null in this case?
> 
> This case shouldn't be any different from the separate-messages case. You may have both didReceiveResponse and didReceiveData messages in-flight at the same time.

I don't understand this answer. There can be multiple messages, but each gets a lot of additional checks before it reaches WebResourceLoader, they don't share the same stack when executed.
Comment 14 Antti Koivisto 2014-04-29 12:49:39 PDT
> This is a missed optimization - a place where we could send a coalesced response, but don't.

Right, this path is an existing optimization that coalesces didReceiveData and didFinishLoading for disk cache hit (along with passing an mmapped buffer). It could be improved further but that is bit of a separate topic.

> This is not sufficient. You are adding new code, and we don't have to copy old mistakes unless there is a reason to.
> 
> Using signed values for sizes causes all sorts of bugs.

I disagree. Consistency is more important. We should fix this everywhere as a separate change. Conversions cause bugs.

> Do other functions make multiple calls to client? This new function is very different from others, as it's multi-step.

Yes, didReceiveResource does (which is a similar existing optimization, see above).

> I don't understand this answer. There can be multiple messages, but each gets a lot of additional checks before it reaches WebResourceLoader, they don't share the same stack when executed.

Maybe you could suggest concrete improvements to the patch as I'm not sure what you are after here.
Comment 15 Alexey Proskuryakov 2014-04-29 13:18:37 PDT
The biggest concrete improvement would be one I requested in the first comment - it would be helpful to understand why this improves our metrics.

Without this understanding, and given the quality of implementation concerns that I listed, I'm not willing to give an r+ - the risk of regressions and the cost in complexity are relatively high.
Comment 16 Alexey Proskuryakov 2014-04-29 13:19:14 PDT
Comment on attachment 230377 [details]
disable coalescing for xhr and multipart responses

View in context: https://bugs.webkit.org/attachment.cgi?id=230377&action=review

> Source/WebKit2/ChangeLog:14
> +            Disable coalesing for multipart/x-response-mixed as they may generate multiple responses.

I think that the code is correct, but the comment is not - multipart/x-response-mixed is not the only kind of multipart response that we support.
Comment 17 Antti Koivisto 2014-04-29 13:35:18 PDT
(In reply to comment #15)
> The biggest concrete improvement would be one I requested in the first comment - it would be helpful to understand why this improves our metrics.

Doing less work, in this case constructing and sending IPC messages, generally improves performance. I'm not sure what further understanding is needed.
 
> Without this understanding, and given the quality of implementation concerns that I listed, I'm not willing to give an r+ - the risk of regressions and the cost in complexity are relatively high.

I haven't seen any actionable concerns.

Since you "don't object to this change" I suppose I will seek the review elsewhere.
Comment 18 Alexey Proskuryakov 2014-04-29 14:20:12 PDT
> Doing less work, in this case constructing and sending IPC messages, generally improves performance. I'm not sure what further understanding is needed.

I find this explanation implausible. There must be something else at play.
Comment 19 Antti Koivisto 2014-04-29 15:25:26 PDT
> I find this explanation implausible. There must be something else at play.

For example the network process side profiles for sending messages (over PLT run) look like this:

Running Time	Self		Symbol Name
501.0ms    6.0%	1,0	 	                 bool IPC::Connection::send<Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo>(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo&&, unsigned long long, unsigned int)
425.0ms    5.1%	1,0	 	                  IPC::ArgumentCoder<WebCore::ResourceResponse>::encode(IPC::ArgumentEncoder&, WebCore::ResourceResponse const&)
66.0ms    0.7%	1,0	 	                  IPC::Connection::sendMessage(std::__1::unique_ptr<IPC::MessageEncoder, std::__1::default_delete<IPC::MessageEncoder> >, unsigned int)

Running Time	Self		Symbol Name
306.0ms    3.6%	0,0	 	                 bool IPC::Connection::send<Messages::WebResourceLoader::DidReceiveData>(Messages::WebResourceLoader::DidReceiveData&&, unsigned long long, unsigned int)
229.0ms    2.7%	0,0	 	                  IPC::ArgumentEncoder::encodeVariableLengthByteArray(IPC::DataReference const&)
67.0ms    0.8%	3,0	 	                  IPC::Connection::sendMessage(std::__1::unique_ptr<IPC::MessageEncoder, std::__1::default_delete<IPC::MessageEncoder> >, unsigned int)

Running Time	Self		Symbol Name
63.0ms    0.7%	1,0	 	               bool IPC::MessageSender::send<Messages::WebResourceLoader::DidFinishResourceLoad>(Messages::WebResourceLoader::DidFinishResourceLoad const&, unsigned long long, unsigned int)
58.0ms    0.7%	1,0	 	                IPC::MessageSender::sendMessage(std::__1::unique_ptr<IPC::MessageEncoder, std::__1::default_delete<IPC::MessageEncoder> >, unsigned int)
54.0ms    0.6%	1,0	 	                 IPC::Connection::sendMessage(std::__1::unique_ptr<IPC::MessageEncoder, std::__1::default_delete<IPC::MessageEncoder> >, unsigned int)
2.0ms    0.0%	2,0	 	                 non-virtual thunk to WebKit::NetworkResourceLoader::messageSenderConnection()
1.0ms    0.0%	1,0	 	                 DYLD-STUB$$operator new(unsigned long)

Simply combining sendMessage calls (practically all combine) wins 1.5% of main thread time. A portion of encodeVariableLengthByteArray overhead (2.7%) will go away as well as we now do do a single encode per resource instead of multiple ones.

There are comparable wins in receiving web process side. Does this make it sound more plausible? These things are not free.
Comment 20 Alexey Proskuryakov 2014-04-29 16:15:29 PDT
Don't these include data serialization, which will still happen with this patch?
Comment 21 Antti Koivisto 2014-04-29 22:51:54 PDT
(In reply to comment #20)
> Don't these include data serialization, which will still happen with this patch?

No, sendMessage functions are pure low level/kernel overhead. The IPC::ArgumentCoder<>::encode functions do the serialization.

ResourceResponse serialization is indeed very slow and should be optimised too.
Comment 22 Antti Koivisto 2014-04-29 23:36:37 PDT
On web process side we spend 1.1% of total CPU time simply receiving IPC messages (this includes UI process messages too) and forwarding them to client. This does not include any type-specific decoding.

Running Time	Self		Symbol Name
771.0ms    1.2%	7,0	 	     _dispatch_source_invoke$VARIANT$mp
760.0ms    1.1%	2,0	 	      _dispatch_source_latch_and_call
719.0ms    1.1%	10,0	 	       IPC::Connection::receiveSourceEventHandler()
275.0ms    0.4%	4,0	 	        IPC::MessageDecoder::MessageDecoder(IPC::DataReference const&, WTF::Vector<IPC::Attachment, 0ul, WTF::CrashOnOverflow>)
215.0ms    0.3%	11,0	 	        IPC::Connection::processIncomingMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >)
158.0ms    0.2%	1,0	 	        mach_msg
43.0ms    0.0%	1,0	 	        vm_deallocate
13.0ms    0.0%	12,0	 	        WTF::fastMalloc(unsigned long)
3.0ms    0.0%	3,0	 	        WTF::Vector<char, 4160ul, WTF::CrashOnOverflow>::resize(unsigned long)
Comment 23 Alexey Proskuryakov 2014-04-29 23:56:07 PDT
This is certainly interesting - there aren't a lot of messages sent, after all. Recently I debugged a problem by printing out all messages between all WebKit processes while loading a complex web page, and that was still very manageable.

I wonder how many of these messages are didReceiveData ones - perhaps coalescing only these would be sufficient. It has certainly come up before that parsing data in smaller chunks (due to CFNetworks callbacks never being blocked) might cause trouble, although I don't think that's we've ever seen a confirmation.

This makes me think about something that this patch appears to regress. How do we handle Content-Disposition: attachment for subresources? A client needs to make a decision in didReceiveResponse before the delegate call even returns. In fact, the API contract is for the client to get callbacks right away, although I'm not sure if that's readily observable for anything but downloads.
Comment 24 Antti Koivisto 2014-04-30 01:21:13 PDT
(In reply to comment #23)
> This is certainly interesting - there aren't a lot of messages sent, after all. Recently I debugged a problem by printing out all messages between all WebKit processes while loading a complex web page, and that was still very manageable.

It is just that the "IPC is free" theory is wrong. Note that there is also context switch overhead that doesn't show up in profiles.

> I wonder how many of these messages are didReceiveData ones - perhaps coalescing only these would be sufficient. It has certainly come up before that parsing data in smaller chunks (due to CFNetworks callbacks never being blocked) might cause trouble, although I don't think that's we've ever seen a confirmation.

This approach is strictly better since it coalesces more messages, at least 3->1 in all non-disk-cache cases.

Generally there is no reason to keep our internal IPC looking like the external webkit API.

> This makes me think about something that this patch appears to regress. How do we handle Content-Disposition: attachment for subresources? A client needs to make a decision in didReceiveResponse before the delegate call even returns. In fact, the API contract is for the client to get callbacks right away, although I'm not sure if that's readily observable for anything but downloads.

I don't understand the mechanism this patch would affect "Content-Disposition: attachment" handling. The client is still free to make decisions in didReceiveResponse. But we can certainly disable coalescing in that case if it is a concern.
Comment 25 Alexey Proskuryakov 2014-04-30 09:36:47 PDT
> I don't understand the mechanism this patch would affect "Content-Disposition: attachment" handling. The client is still free to make decisions in didReceiveResponse. But we can certainly disable coalescing in that case if it is a concern.

The decision to convert to download must be made during didReceiveResponse handling, not after the delegate returns control to CFNetwork.
Comment 26 Antti Koivisto 2014-04-30 11:03:59 PDT
> The decision to convert to download must be made during didReceiveResponse handling, not after the delegate returns control to CFNetwork.

I don't see how that works with existing code that uses asynchronous didReceiveResponse messages.
Comment 27 Alexey Proskuryakov 2014-04-30 11:15:10 PDT
What counts is that delegate calls are synchronous, see -connection:didReceiveResponse: in WebCoreResourceHandleAsOperationQueueDelegate.mm.

We do convert that into a async message + async completion message for IPC. Once a completion message is received, -continueDidReceiveResponse is called, which signals a semaphore, and unblocks the delegate. I'm not exactly sure where the code that converts an NSURLConnection to download is, but it has to be there somewhere, running before the delegate returns.
Comment 28 Antti Koivisto 2014-04-30 11:28:02 PDT
(In reply to comment #27)
> What counts is that delegate calls are synchronous, see -connection:didReceiveResponse: in WebCoreResourceHandleAsOperationQueueDelegate.mm.
> 
> We do convert that into a async message + async completion message for IPC. Once a completion message is received, -continueDidReceiveResponse is called, which signals a semaphore, and unblocks the delegate. I'm not exactly sure where the code that converts an NSURLConnection to download is, but it has to be there somewhere, running before the delegate returns.

The continueDidReceiveResponse callback is only requested for the main resource (the third parameter here):

    loader->sendAbortingOnFailure(Messages::WebResourceLoader::DidReceiveResponseWithCertificateInfo(response, CertificateInfo(response), loader->isLoadingMainResource()));

This patch does not affect main resource load.
Comment 29 Alexey Proskuryakov 2014-04-30 11:41:29 PDT
Is it correct behavior that we ignore Content-Disposition for non-main resources? Do other browsers do the same?

If it's just a bug, making our performance depend on it would be not great.

I'm not sure why we need to go back and forth on this. This is essentially the same question I had in comment 23, and it needs a complete answer.
Comment 30 Csaba Osztrogonác 2014-06-04 03:32:12 PDT
Comment on attachment 230258 [details]
patch

Cleared Andreas Kling's review+ from obsolete attachment 230258 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Comment 31 Darin Adler 2014-07-12 16:46:48 PDT
Comment on attachment 230377 [details]
disable coalescing for xhr and multipart responses

Antti, what’s the status on this 3 month old unreviewed patch?
Comment 32 Antti Koivisto 2014-07-13 07:27:49 PDT
Pratik landed similar change in bug 134560
Comment 33 Antti Koivisto 2014-07-13 07:28:30 PDT

*** This bug has been marked as a duplicate of bug 134560 ***