Bug 118448

Summary: [WK2][Soup] Implement NetworkResourceLoader::didReceiveData
Product: WebKit Reporter: Kwang Yul Seo <skyul>
Component: PlatformAssignee: Kwang Yul Seo <skyul>
Status: RESOLVED WONTFIX    
Severity: Normal CC: ap, beidson, buildbot, cdumez, gns, kbalazs, laszlo.gombos, ossy, rniwa, sam, svillar
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 108832    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from webkit-ews-15 for mac-mountainlion-wk2
none
Patch
none
Patch beidson: review-

Description Kwang Yul Seo 2013-07-06 22:40:32 PDT
Soup won't get much benefit from using ResourceHandleClient::didReceiveBuffer introduced in r145820 (Bug 112310) because it returns raw pointers and lengths.
Comment 1 Kwang Yul Seo 2013-07-06 22:55:42 PDT
Created attachment 206201 [details]
Patch
Comment 2 Build Bot 2013-07-07 04:44:33 PDT
Comment on attachment 206201 [details]
Patch

Attachment 206201 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/946680

New failing tests:
fullscreen/full-screen-iframe-with-max-width-height.html
Comment 3 Build Bot 2013-07-07 04:44:34 PDT
Created attachment 206207 [details]
Archive of layout-test-results from webkit-ews-15 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-15  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.3
Comment 4 Sam Weinig 2013-07-07 08:02:03 PDT
Since this is touching shared code, this needs to be evaluated by and owner.  Brady/Alexey, can one of you review this?
Comment 5 Brady Eidson 2013-07-07 10:00:11 PDT
Comment on attachment 206201 [details]
Patch

We want to make sure that by default, network process users don't use didReceiveData.  We don't want future code changes to creep in that accidentally add didReceiveData to a platform that shouldn't be using it.

Please keep the cross-platform didReceiveData as-is.

Please add a NetworkResourceLoaderSoup.cpp and implement its own didReceiveData there.
Comment 6 Sam Weinig 2013-07-07 12:28:22 PDT
(In reply to comment #5)
> (From update of attachment 206201 [details])
> We want to make sure that by default, network process users don't use didReceiveData.  We don't want future code changes to creep in that accidentally add didReceiveData to a platform that shouldn't be using it.
> 
> Please keep the cross-platform didReceiveData as-is.
> 
> Please add a NetworkResourceLoaderSoup.cpp and implement its own didReceiveData there.

What would a Soup specific implementation do?  If we don't want to use didReceiveData why should Soup?
Comment 7 Brady Eidson 2013-07-07 16:11:23 PDT
(In reply to comment #6)
> (In reply to comment #5)
> > (From update of attachment 206201 [details] [details])
> > We want to make sure that by default, network process users don't use didReceiveData.  We don't want future code changes to creep in that accidentally add didReceiveData to a platform that shouldn't be using it.
> > 
> > Please keep the cross-platform didReceiveData as-is.
> > 
> > Please add a NetworkResourceLoaderSoup.cpp and implement its own didReceiveData there.
> 
> What would a Soup specific implementation do?  If we don't want to use didReceiveData why should Soup?

The Soup-specific implementation would do what they're trying to do in this cross-platform implementation.

An alternative approach, I guess, would be don't let anybody ever use didReceiveData and force Soup to abstract it away to didReceiveBuffer in their ResourceHandleSoup.
Comment 8 Kwang Yul Seo 2013-07-07 17:35:21 PDT
(In reply to comment #7)
> The Soup-specific implementation would do what they're trying to do in this cross-platform implementation.
> 
> An alternative approach, I guess, would be don't let anybody ever use didReceiveData and force Soup to abstract it away to didReceiveBuffer in their ResourceHandleSoup.

For now, I prefer the former approach because the current soup implementation of ResourceHandle::didReceiveData gets raw pointers and lengths.

Of course, if there is a way to get some sort of buffer object in soup, that's the direction we should do in the next step.
Comment 9 Kwang Yul Seo 2013-07-07 17:56:55 PDT
Created attachment 206212 [details]
Patch
Comment 10 Sergio Villar Senin 2013-07-08 01:17:09 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > The Soup-specific implementation would do what they're trying to do in this cross-platform implementation.
> > 
> > An alternative approach, I guess, would be don't let anybody ever use didReceiveData and force Soup to abstract it away to didReceiveBuffer in their ResourceHandleSoup.
> 
> For now, I prefer the former approach because the current soup implementation of ResourceHandle::didReceiveData gets raw pointers and lengths.
> 
> Of course, if there is a way to get some sort of buffer object in soup, that's the direction we should do in the next step.

Of course there is, actually we're using that approach before adding the HTTP cache to libsoup. The "solution" would be to use the "got-chunk" signal which returns a SoupBuffer object which could be then encapsulated as a SharedBuffer. The problem with that approach is that we wouldn't have HTTP cache.
Comment 11 Kwang Yul Seo 2013-07-08 01:30:17 PDT
(In reply to comment #10)
> Of course there is, actually we're using that approach before adding the HTTP cache to libsoup. The "solution" would be to use the "got-chunk" signal which returns a SoupBuffer object which could be then encapsulated as a SharedBuffer. The problem with that approach is that we wouldn't have HTTP cache.

Oh, thanks for the information. BTW, why wouldn't we have HTTP cache? Is there any problem with "got-chunk" signal?
Comment 12 Sergio Villar Senin 2013-07-08 01:33:08 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > Of course there is, actually we're using that approach before adding the HTTP cache to libsoup. The "solution" would be to use the "got-chunk" signal which returns a SoupBuffer object which could be then encapsulated as a SharedBuffer. The problem with that approach is that we wouldn't have HTTP cache.
> 
> Oh, thanks for the information. BTW, why wouldn't we have HTTP cache? Is there any problem with "got-chunk" signal?

Not really, that's because the cache implementation is tightly tied to the use of the new streams API in libsoup, so if you don't use the streams API to read the contents you won't have the cache. Implementing the cache for the "old" signals like "got-chunk" would have mean creating a lot of fake signals to get it right.
Comment 13 Sergio Villar Senin 2013-07-08 01:38:16 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > Of course there is, actually we're using that approach before adding the HTTP cache to libsoup. The "solution" would be to use the "got-chunk" signal which returns a SoupBuffer object which could be then encapsulated as a SharedBuffer. The problem with that approach is that we wouldn't have HTTP cache.
> > 
> > Oh, thanks for the information. BTW, why wouldn't we have HTTP cache? Is there any problem with "got-chunk" signal?
> 
> Not really, that's because the cache implementation is tightly tied to the use of the new streams API in libsoup, so if you don't use the streams API to read the contents you won't have the cache. Implementing the cache for the "old" signals like "got-chunk" would have mean creating a lot of fake signals to get it right.

BTW I haven't taken a deep look at the code yet, so I don't have a clear idea why using SharedBuffer is better but we can create SoupBuffer objects from the data we get from the libsoup streams with zero cost because we can tell the buffer not to copy the memory but use an already existing chunk of it.
Comment 14 Kwang Yul Seo 2013-07-08 01:51:13 PDT
(In reply to comment #13)
> BTW I haven't taken a deep look at the code yet, so I don't have a clear idea why using SharedBuffer is better but we can create SoupBuffer objects from the data we get from the libsoup streams with zero cost because we can tell the buffer not to copy the memory but use an already existing chunk of it.

didReceiveBuffer is introduced to reduce a useless copy when delivering data buffers to ResourceHandle for platforms with objects that encapsulates a data buffer such as NSData (Mac), CFDataRef (CF), or QByteArray (qt).

The current implementation of ResourceHandleSoup delivers raw pointers and length. So simply wrapping raw pointers and lengths with SoupBuffer objects is not a performance win even though we could avoid copying the memory.

But if it is possible to use didReceiveBuffer without performance penalty, I think it is better to switch to didReceiveBuffer as early as possible because we want to deprecate the char* + length version of didReceiveData in the near future.

See Bug 112310 for more details.
Comment 15 Balazs Kelemen 2013-07-08 02:23:22 PDT
> > Oh, thanks for the information. BTW, why wouldn't we have HTTP cache? Is there any problem with "got-chunk" signal?
> 
> Not really, that's because the cache implementation is tightly tied to the use of the new streams API in libsoup, so if you don't use the streams API to read the contents you won't have the cache. Implementing the cache for the "old" signals like "got-chunk" would have mean creating a lot of fake signals to get it right.

Are you sure it is actually useful to have caching at the soup level? I think relying entirely on WebKit's own cache can be better.
Comment 16 Kwang Yul Seo 2013-07-08 02:29:39 PDT
(In reply to comment #15)
> Are you sure it is actually useful to have caching at the soup level? I think relying entirely on WebKit's own cache can be better.

WebKit provides only the memory cache while libsoup supports disk cache.
Comment 17 Sergio Villar Senin 2013-07-08 04:08:58 PDT
(In reply to comment #15)
> > > Oh, thanks for the information. BTW, why wouldn't we have HTTP cache? Is there any problem with "got-chunk" signal?
> > 
> > Not really, that's because the cache implementation is tightly tied to the use of the new streams API in libsoup, so if you don't use the streams API to read the contents you won't have the cache. Implementing the cache for the "old" signals like "got-chunk" would have mean creating a lot of fake signals to get it right.
> 
> Are you sure it is actually useful to have caching at the soup level? I think relying entirely on WebKit's own cache can be better.

Absolutely sure :), libsoup provides persistent on-disk cache while WebKit's is a memory cache as you know.
Comment 18 Balazs Kelemen 2013-07-08 06:57:36 PDT
(In reply to comment #17)
> (In reply to comment #15)
> > > > Oh, thanks for the information. BTW, why wouldn't we have HTTP cache? Is there any problem with "got-chunk" signal?
> > > 
> > > Not really, that's because the cache implementation is tightly tied to the use of the new streams API in libsoup, so if you don't use the streams API to read the contents you won't have the cache. Implementing the cache for the "old" signals like "got-chunk" would have mean creating a lot of fake signals to get it right.
> > 
> > Are you sure it is actually useful to have caching at the soup level? I think relying entirely on WebKit's own cache can be better.
> 
> Absolutely sure :), libsoup provides persistent on-disk cache while WebKit's is a memory cache as you know.

(In reply to comment #17)
> (In reply to comment #15)
> > > > Oh, thanks for the information. BTW, why wouldn't we have HTTP cache? Is there any problem with "got-chunk" signal?
> > > 
> > > Not really, that's because the cache implementation is tightly tied to the use of the new streams API in libsoup, so if you don't use the streams API to read the contents you won't have the cache. Implementing the cache for the "old" signals like "got-chunk" would have mean creating a lot of fake signals to get it right.
> > 
> > Are you sure it is actually useful to have caching at the soup level? I think relying entirely on WebKit's own cache can be better.
> 
> Absolutely sure :), libsoup provides persistent on-disk cache while WebKit's is a memory cache as you know.

Ok, but let me complaining further. :)
Is libsoup also use a memory cache as well? If yes (which seems unfortunate, but maybe not unbearable), are we sure that we do not duplicate resources in WebKit's cache?
Comment 19 Gustavo Noronha (kov) 2013-07-08 07:32:47 PDT
(In reply to comment #18)
> Is libsoup also use a memory cache as well? If yes (which seems unfortunate, but maybe not unbearable), are we sure that we do not duplicate resources in WebKit's cache?

Nope, just disk cache.
Comment 20 Kwang Yul Seo 2013-07-08 18:07:42 PDT
(In reply to comment #19)
> Nope, just disk cache.

BTW, does libsoup also provide memory cache if we want? In NetworkProcess, we need to use both memory cache and disk cache because NetworkProcess works at ResourceLoader layer which does not have the memory cache.

Please check out Bug 118343. For now, I just enabled disk cache on NetworkProcess, but we'd like to set memory cache too as the Mac port does.
Comment 21 Brady Eidson 2013-07-08 23:33:20 PDT
(In reply to comment #20)
> (In reply to comment #19)
> > Nope, just disk cache.
> 
> BTW, does libsoup also provide memory cache if we want? In NetworkProcess, we need to use both memory cache and disk cache because NetworkProcess works at ResourceLoader layer which does not have the memory cache.
> 
> Please check out Bug 118343. For now, I just enabled disk cache on NetworkProcess, but we'd like to set memory cache too as the Mac port does.

We actually don't rely on CFNetwork's memory caching, which is the analog of what libsoup's memory caching would be.

The WebCore memory-cache is still per-WebProcess, and with the mmap sharing we do with the disk cache we've seen little regression.

Moving the WebCore object cache into a shared location is still in the future, but isn't critical to switch over to NetworkProcess in practice.
Comment 22 Kwang Yul Seo 2013-07-08 23:48:23 PDT
(In reply to comment #21)
> We actually don't rely on CFNetwork's memory caching, which is the analog of what libsoup's memory caching would be.

Really? I thought that the Mac port uses CFNetwork's memory caching because the following code block seems to set the memory cache size for the shared NSURLCache.

void NetworkProcess::platformSetCacheModel(CacheModel cacheModel)
{
...
    NSURLCache *nsurlCache = [NSURLCache sharedURLCache];
    [nsurlCache setMemoryCapacity:urlCacheMemoryCapacity];
...
}

> The WebCore memory-cache is still per-WebProcess, and with the mmap sharing we do with the disk cache we've seen little regression.

That's good. Currently I am looking for a way to implement this in libsoup.

> Moving the WebCore object cache into a shared location is still in the future, but isn't critical to switch over to NetworkProcess in practice.

So what's our plan on shared memory cache? Do we want to move the layer above CachedResourceRequest?
Comment 23 Sergio Villar Senin 2013-07-09 02:28:47 PDT
(In reply to comment #20)
> (In reply to comment #19)
> > Nope, just disk cache.
> 
> BTW, does libsoup also provide memory cache if we want? In NetworkProcess, we need to use both memory cache and disk cache because NetworkProcess works at ResourceLoader layer which does not have the memory cache.
> 
> Please check out Bug 118343. For now, I just enabled disk cache on NetworkProcess, but we'd like to set memory cache too as the Mac port does.

libsoup does not provide any kind of memory cache. My understanding is that this is not the way to go but of course it should be doable. I understood that the idea in WK2 was to move as much as loading code as possible to the network process, contrary to the chromium networking thread which operates at resourcehandle level.

I am not sure about all the implications but one of the advantages of having the memory cache in the networking process is that you don't need to do complex cache synchronizations (apart from the fact that you wouldn't have duplicated resources in the caches). I guess the obvious drawback is that you'd need IPC even for cached resources which would be accessed blazingly fast otherwise from WebProcess located caches.

In any case I remember reading a Google report that showed that the memory wasted by having different memory caches was not that significant as most of them were very small resources (like the I like it fbook icon). So I guess that having multiple memory caches is not a big issue for the moment.
Comment 24 Kwang Yul Seo 2013-07-09 02:55:24 PDT
(In reply to comment #23)
> libsoup does not provide any kind of memory cache. My understanding is that this is not the way to go but of course it should be doable. I understood that the idea in WK2 was to move as much as loading code as possible to the network process, contrary to the chromium networking thread which operates at resourcehandle level.
> 
> I am not sure about all the implications but one of the advantages of having the memory cache in the networking process is that you don't need to do complex cache synchronizations (apart from the fact that you wouldn't have duplicated resources in the caches). I guess the obvious drawback is that you'd need IPC even for cached resources which would be accessed blazingly fast otherwise from WebProcess located caches.
> 
> In any case I remember reading a Google report that showed that the memory wasted by having different memory caches was not that significant as most of them were very small resources (like the I like it fbook icon). So I guess that having multiple memory caches is not a big issue for the moment.

Thanks for the information. 

As Brady Eidson mentioned in comment #21, the current implementation of NetworkProcess tries to save memory by replacing the contents of a cached resource in web processes with MMAP'ed memory of the resource in NetworkProcess. (Bug 112943)

I think this is a good trade-off between memory and code complexity and I'd like to use the same approach in libsoup for now. But it seems it is not easy to get the MMAP'ed pointer of a resource residing the disk cache.
Comment 25 Brady Eidson 2013-07-09 10:29:47 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > We actually don't rely on CFNetwork's memory caching, which is the analog of what libsoup's memory caching would be.
> 
> Really? I thought that the Mac port uses CFNetwork's memory caching because the following code block seems to set the memory cache size for the shared NSURLCache.
> 
> void NetworkProcess::platformSetCacheModel(CacheModel cacheModel)
> {
> ...
>     NSURLCache *nsurlCache = [NSURLCache sharedURLCache];
>     [nsurlCache setMemoryCapacity:urlCacheMemoryCapacity];
> ...
> }

Yah, you're right.  In practice it is often used.

In practice it is also not particularly relevant.  It's great to try to do some libsoup mem caching if at all feasible, but that's for a different bug (discussion in this one is getting way off topic)

> > Moving the WebCore object cache into a shared location is still in the future, but isn't critical to switch over to NetworkProcess in practice.
> 
> So what's our plan on shared memory cache? Do we want to move the layer above CachedResourceRequest?

Almost certainly not.  More details will be forthcoming once we start working on it, but it certainly isn't a blocker.  The advantages are more about coordination and saving a little CPU time here and there, and much less about memory savings.
Comment 26 Kwang Yul Seo 2013-07-12 07:20:32 PDT
(In reply to comment #7)
> An alternative approach, I guess, would be don't let anybody ever use didReceiveData and force Soup to abstract it away to didReceiveBuffer in their ResourceHandleSoup.

I took an alternative approach and changed soup to use didReceiveBuffer in Bug 118598. If is is accepted, I will close this bug in favor of Bug 118598.
Comment 27 Kwang Yul Seo 2013-07-25 16:53:13 PDT
(In reply to comment #26)
> I took an alternative approach and changed soup to use didReceiveBuffer in Bug 118598. If is is accepted, I will close this bug in favor of Bug 118598.

After discussing with Gustavo, using didReceiveData seems to be more reasonable for soup. So I'd like to check if this approach is okay with with Apple reviewers.
Comment 28 Csaba Osztrogonác 2013-08-26 07:09:12 PDT
Comment on attachment 206212 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=206212&action=review

> Source/WebKit2/NetworkProcess/NetworkResourceLoader.cpp:-110
> -template<typename U> bool NetworkResourceLoader::sendAbortingOnFailure(const U& message, unsigned messageSendFlags)
> -{
> -    bool result = messageSenderConnection()->send(message, messageSenderDestinationID(), messageSendFlags);
> -    if (!result)
> -        abort();
> -    return result;
> -}
> -

It was also moved to the header file by r154183.
Comment 29 Csaba Osztrogonác 2013-08-26 07:15:39 PDT
Created attachment 209651 [details]
Patch

updated to ToT, sendAbortingOnFailure was already moved to the header file.
Comment 30 Csaba Osztrogonác 2013-09-26 08:13:57 PDT
(In reply to comment #29)
> Created an attachment (id=209651) [details]
> Patch
> 
> updated to ToT, sendAbortingOnFailure was already moved to the header file.

The patch is still up-to-date. Could you review it please?
Comment 31 Csaba Osztrogonác 2013-10-07 10:19:48 PDT
ping?
Comment 32 Brady Eidson 2013-10-07 12:06:47 PDT
Comment on attachment 209651 [details]
Patch

This patch cannot possibly be right any longer.  NetworkResourceLoader doesn't message the WebProcess; AsynchronousNetworkLoaderClient and SynchronousNetworkLoaderClient do.

Sync XHRs cannot possibly work with this patch, for example.

Note that (A)SynchronousNetworkLoaderClient do not have didReceiveData calls, but only have didReceiveBuffer.

Right now, on non-Mac platforms, they always perform a copy out of the buffer.  This will not be true in the future.  And I don't think we should make the mistake of adding a didReceiveData now just to cover for a networking implementation that doesn't support a buffer callback.
Comment 33 Csaba Osztrogonác 2013-10-10 07:56:39 PDT
(In reply to comment #32)
> (From update of attachment 209651 [details])
> This patch cannot possibly be right any longer.  NetworkResourceLoader doesn't message the WebProcess; AsynchronousNetworkLoaderClient and SynchronousNetworkLoaderClient do.
> 
> Sync XHRs cannot possibly work with this patch, for example.
> 
> Note that (A)SynchronousNetworkLoaderClient do not have didReceiveData calls, but only have didReceiveBuffer.
> 
> Right now, on non-Mac platforms, they always perform a copy out of the buffer.  This will not be true in the future.  And I don't think we should make the mistake of adding a didReceiveData now just to cover for a networking implementation that doesn't support a buffer callback.

:((((( ... because we had to wait for 3 months for the negative review.
Let's start implementing didReceiveBuffer in https://bugs.webkit.org/show_bug.cgi?id=118598