Bug 62581 - WebCore should cache compressed data
Summary: WebCore should cache compressed data
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Pratik Solanki
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2011-06-13 11:28 PDT by Pratik Solanki
Modified: 2012-02-21 11:02 PST (History)
11 users (show)

See Also:


Attachments
Patch (789.71 KB, patch)
2011-06-13 12:49 PDT, Pratik Solanki
no flags Details | Formatted Diff | Diff
Patch v2 (829.80 KB, patch)
2011-06-17 14:13 PDT, Pratik Solanki
no flags Details | Formatted Diff | Diff
Patch v3 - should fix compile issues on other platforms (829.51 KB, patch)
2011-06-17 17:47 PDT, Pratik Solanki
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pratik Solanki 2011-06-13 11:28:03 PDT
If CFNetwork supports it, WebCore should cache the compressed data returned by servers and handle data decompression itself. That way we can cache just the compressed data and reduce memory usage.
Comment 1 Pratik Solanki 2011-06-13 11:28:18 PDT
<rdar://problem/8989320>
Comment 2 Pratik Solanki 2011-06-13 12:49:40 PDT
Created attachment 96992 [details]
Patch
Comment 3 WebKit Review Bot 2011-06-13 13:05:19 PDT
Comment on attachment 96992 [details]
Patch

Attachment 96992 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8831885
Comment 4 Alexey Proskuryakov 2011-06-13 13:48:45 PDT
Comment on attachment 96992 [details]
Patch

+        No new tests because the option is disabled by default.

We now have a policy that new features should be discussed on webkit-dev first, and a new preprocessor option qualifies as a new feature.

+#if USE(COMPRESSED_DATA_FROM_CFNETWORK)
+- (CFURLRequestRef) _CFURLRequest;
+#endif

I'm not sure if you need to protect that much code with ifdefs. For example, nothing will break if you add this details declaration unconditionally.


+    if (client()->shouldUseCompressedData())
+        wkRequestDoNotDecodeData([nsRequest _CFURLRequest]);

This sounds more like "wants" or "canUse" than "should".
Comment 5 Adam Barth 2011-06-13 13:58:32 PDT
Comment on attachment 96992 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=96992&action=review

> Source/WebCore/loader/ResourceLoader.h:114
> +#if USE(COMPRESSED_DATA_FROM_CFNETWORK)

COMPRESSED_DATA_FROM_CFNETWORK => COMPRESSED_DATA_FROM_NETWORK ?

It seems like you could use this feature with all the various backends.  The CFNETWORK specific parts should be behind USE(CFNETWORK) or whatever.

> Source/WebCore/loader/cache/CachedResource.cpp:647
> +        m_compressedData = adoptRef(new CompressedBuffer(this, m_data.get()));

"adoptRef(new CompressedBuffer" should be CompressedBuffer::create (we allocate all RefCounted types through static "create" methods to make sure everyone calls adoptRef).

> Source/WebCore/loader/mac/CompressedBuffer.h:34
> +class CompressedBuffer : public RefCounted<CompressedBuffer> {

This does not appear to be Mac-specific.

> Source/WebCore/loader/mac/CompressedBuffer.h:44
> +    unsigned m_prevCompressedDataSize;

m_prevCompressedDataSize => m_previousCompressedDataSize <-- please use full words in variable names.
Comment 6 Eric Seidel (no email) 2011-06-13 14:36:04 PDT
Comment on attachment 96992 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=96992&action=review

> Source/WebCore/platform/SharedBuffer.h:119
> +    CFArrayRef dataArray() { return m_dataArray.get(); }

Should this be const?  (I honestly don't know, I didn't read the whole const thread.)
Comment 7 Pratik Solanki 2011-06-14 10:51:29 PDT
Thanks for all the feedback. I'll post an updated patch soon.
Comment 8 Pratik Solanki 2011-06-17 14:13:11 PDT
Created attachment 97651 [details]
Patch v2
Comment 9 Pratik Solanki 2011-06-17 14:14:25 PDT
I reworked the patch so its more like the way PurgeableBuffer works on the mac.
Comment 10 WebKit Review Bot 2011-06-17 14:35:24 PDT
Comment on attachment 97651 [details]
Patch v2

Attachment 97651 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8872661
Comment 11 Eric Seidel (no email) 2011-06-17 14:36:22 PDT
Comment on attachment 97651 [details]
Patch v2

View in context: https://bugs.webkit.org/attachment.cgi?id=97651&action=review

I'm not sure I fully understand the consequence of this change?  I assume this is a big memory win.  It's unclear to me from my first read-through how the underlying systems even use this data.

The point is that we're storing the compressed data on the CachedResource instead of the uncompressed?  Are we ever passing that data somewhere (it doesn't seem so).   For example, I could imagine us want to pass the compressed data through to the graphics system, depending on the compression.

When we're speaking of compression here, are we talking about un-decoded data? (we seem to be caching the text-decode result still).  Or JPEG compression? (I know that we cache the uncompressed frames on BitmapImage for example.  Or are we talking about things like gzip/zip compression?  If so, why wouldn't we just always compress all data before putting it in our cache? or at least compress it when asked to purge?

> Source/WebCore/ChangeLog:10
> +        Add support for storing the uncompressed data received from the network in our cache. This
> +        allows us to hold on to the uncompressed data when CachedResources are deemed dead (i.e.

Don't you mean compressed here?

> Source/WebCore/loader/cache/CachedFont.cpp:135
> +    RefPtr<SharedBuffer> data = CachedResource::data();

I'm confused why we use a RefPtr here, and don't just call data() in each place we used to use m_data?

> Source/WebCore/loader/cache/CachedResourceRequest.cpp:289
> +    return true;

Why is this a correct default?  Why do we even have this accessor if "true" is the default?

> Source/WebCore/loader/cache/CachedScript.cpp:86
> +    RefPtr<SharedBuffer> buffer = CachedResource::data();

Why always explicitly call the superclass::data()?

> Source/WebCore/loader/cache/CachedScript.cpp:104
> +    RefPtr<SharedBuffer> buffer = CachedResource::data();
> +    setEncodedSize(buffer.get() ? buffer->size() : 0);

Bleh.  This feels like we should have an updateEncodedSize() method which does this since we do this in multiple places.  This also causes uneeded ref-churn on the SharedBuffer.
Comment 12 Early Warning System Bot 2011-06-17 14:51:10 PDT
Comment on attachment 97651 [details]
Patch v2

Attachment 97651 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8877733
Comment 13 Pratik Solanki 2011-06-17 15:16:34 PDT
(In reply to comment #11)
> (From update of attachment 97651 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=97651&action=review
> 
> I'm not sure I fully understand the consequence of this change?  I assume this is a big memory win.  It's unclear to me from my first read-through how the underlying systems even use this data.
> 
> The point is that we're storing the compressed data on the CachedResource instead of the uncompressed?

Correct.

>  Are we ever passing that data somewhere (it doesn't seem so).   For example, I could imagine us want to pass the compressed data through to the graphics system, depending on the compression.

Maybe.. for now we're just using it internally in CachedResource.

> When we're speaking of compression here, are we talking about un-decoded data? (we seem to be caching the text-decode result still).  Or JPEG compression? (I know that we cache the uncompressed frames on BitmapImage for example.  Or are we talking about things like gzip/zip compression?

It's gzip compression. Essentially data that comes over the wire in compressed form (Content-Encoding:gzip).

>  If so, why wouldn't we just always compress all data before putting it in our cache? or at least compress it when asked to purge?

We could do that. That adds an extra compression step in WebCore. The advantage here is that the networking stack could just hand us the uncompressed data.

> > Source/WebCore/ChangeLog:10
> > +        Add support for storing the uncompressed data received from the network in our cache. This
> > +        allows us to hold on to the uncompressed data when CachedResources are deemed dead (i.e.
> 
> Don't you mean compressed here?

Yes I did.

> > Source/WebCore/loader/cache/CachedFont.cpp:135
> > +    RefPtr<SharedBuffer> data = CachedResource::data();
> 
> I'm confused why we use a RefPtr here, and don't just call data() in each place we used to use m_data?

I used a local variable since we were using it multiple times below.

> > Source/WebCore/loader/cache/CachedResourceRequest.cpp:289
> > +    return true;
> 
> Why is this a correct default?  Why do we even have this accessor if "true" is the default?

This is part of SubresourceLoaderClient interface.CachedResourceRequest is the only client that sets it to true. Default is false.

> > Source/WebCore/loader/cache/CachedScript.cpp:86
> > +    RefPtr<SharedBuffer> buffer = CachedResource::data();
> 
> Why always explicitly call the superclass::data()?

Hmm. I seem to recall doing that to disambiguate between methods but looking at the code now, I don't see any reason why it should be CachedResource::data(). 

> > Source/WebCore/loader/cache/CachedScript.cpp:104
> > +    RefPtr<SharedBuffer> buffer = CachedResource::data();
> > +    setEncodedSize(buffer.get() ? buffer->size() : 0);
> 
> Bleh.  This feels like we should have an updateEncodedSize() method which does this since we do this in multiple places.  This also causes uneeded ref-churn on the SharedBuffer.

Maybe I could just use SharedBuffer *buffer = data().get(); ?
Comment 14 Pratik Solanki 2011-06-17 15:18:27 PDT
(In reply to comment #12)
> (From update of attachment 97651 [details])
> Attachment 97651 [details] did not pass qt-ews (qt):
> Output: http://queues.webkit.org/results/8877733

I had some local changes that fixed the build errors but somehow lost that before uploading the patch. All non-mac EWS bots will fail to build.
Comment 15 Pratik Solanki 2011-06-17 15:52:28 PDT
(In reply to comment #13)

> > > Source/WebCore/loader/cache/CachedScript.cpp:86
> > > +    RefPtr<SharedBuffer> buffer = CachedResource::data();
> > 
> > Why always explicitly call the superclass::data()?
> 
> Hmm. I seem to recall doing that to disambiguate between methods but looking at the code now, I don't see any reason why it should be CachedResource::data(). 

So if I don't do that, the compiler gets confused since we also have 

     CachedResource::data(PassRefPtr<SharedBuffer> buffer, bool allDataReceived)

which gets overridden in subclasses. Compiler gives me an error saying

CachedCSSStyleSheet.cpp:79: error: no matching function for call to 'WebCore::CachedCSSStyleSheet::data() const'
CachedCSSStyleSheet.h:51: note: candidates are: virtual void WebCore::CachedCSSStyleSheet::data(WTF::PassRefPtr<WebCore::SharedBuffer>, bool)

I could fix that by making the new method be virtual in CachedResource.h and redeclaring it in each of the subclasses header file.  But it might be better just to do CachedResource::data()? or change the name to something else, perhaps.
Comment 16 WebKit Review Bot 2011-06-17 16:39:06 PDT
Comment on attachment 97651 [details]
Patch v2

Attachment 97651 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/8874711
Comment 17 Gustavo Noronha (kov) 2011-06-17 17:34:38 PDT
Comment on attachment 97651 [details]
Patch v2

Attachment 97651 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8873019
Comment 18 Pratik Solanki 2011-06-17 17:47:45 PDT
Created attachment 97673 [details]
Patch v3 - should fix compile issues on other platforms
Comment 19 James Simonsen 2011-06-20 13:13:47 PDT
Hi,

I'm curious why it's desirable to move compressed data into the MemoryCache. Why not just leave it in the network cache?

We already have a code path for decompressing data as it moves from the network layer to WebCore when it's downloaded. And I think most platforms have some sort of a cache in the network layer too, which can be used to cache the compressed data and decompress it as needed.

It seems like a nice division of responsibilities to keep compressed data in the network layer and only deal with decompressed data in WebCore. It also means we don't need to write code in both places to handle decompressing data.

I understand there are probably cases where something lives longer in the memory cache than in the network cache, but it seems better to just fix that instead of duplicating the responsibilities. For instance, we've talked here about having a way to ping back to the network cache on Memory Cache hits, so that the network cache can do a better job holding on to important resources.

If we make the network cache smarter, then we can hold fewer things in the MemoryCache and make it smaller, which seems to be the ultimate goal here. And we won't have to add the extra code paths to handle compressed resources in WebCore.
Comment 20 Pratik Solanki 2011-06-21 16:25:58 PDT
(In reply to comment #19)
> Hi,
> 
> I'm curious why it's desirable to move compressed data into the MemoryCache. Why not just leave it in the network cache?

It would reduce the memory taken up by dead resources in the cache.

> We already have a code path for decompressing data as it moves from the network layer to WebCore when it's downloaded. And I think most platforms have some sort of a cache in the network layer too, which can be used to cache the compressed data and decompress it as needed.
> 
> It seems like a nice division of responsibilities to keep compressed data in the network layer and only deal with decompressed data in WebCore. It also means we don't need to write code in both places to handle decompressing data.

By moving to WebCore, the decompressed data needs to be kept in memory for only as long as the resource is live. And the code isn't really in two places since we call a network layer function to decompress the data. It's just the management of when the decompression happens that is moved to WebCore.

> I understand there are probably cases where something lives longer in the memory cache than in the network cache, but it seems better to just fix that instead of duplicating the responsibilities.

I suppose that could happen because each cache has its own different eviction policy and cache sizes but that's not what I am trying to address.

> For instance, we've talked here about having a way to ping back to the network cache on Memory Cache hits, so that the network cache can do a better job holding on to important resources.
> 
> If we make the network cache smarter, then we can hold fewer things in the MemoryCache and make it smaller, which seems to be the ultimate goal here. And we won't have to add the extra code paths to handle compressed resources in WebCore.

By definition, if we have a cache, then we would likely have unused (or dead) resources in that cache. In that case, this should reduce the memory usage of the resources in cache.
Comment 21 James Simonsen 2011-06-21 17:48:07 PDT
(In reply to comment #20)
> (In reply to comment #19)
> > I'm curious why it's desirable to move compressed data into the MemoryCache. Why not just leave it in the network cache?
> 
> It would reduce the memory taken up by dead resources in the cache.

If this is your goal, then why not just evict all dead resources in the MemoryCache? You should be able to get the compressed versions back from the network cache.

> By moving to WebCore, the decompressed data needs to be kept in memory for only as long as the resource is live.

This is already how it works.

> > If we make the network cache smarter, then we can hold fewer things in the MemoryCache and make it smaller, which seems to be the ultimate goal here. And we won't have to add the extra code paths to handle compressed resources in WebCore.
> 
> By definition, if we have a cache, then we would likely have unused (or dead) resources in that cache. In that case, this should reduce the memory usage of the resources in cache.

But the resources are already cached (and compressed) in the network cache, which may also be in memory. In that case, you're using twice as much memory as you need to. To be fair, this is still an improvement over the current state.

I suspect the real issue here is that the MemoryCache has a better hit rate than the network cache. Or, possibly, going to the network cache is too slow. Whichever it is, it seems fixing that will be more valuable on the whole.
Comment 22 Pratik Solanki 2011-06-23 14:52:50 PDT
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > I'm curious why it's desirable to move compressed data into the MemoryCache. Why not just leave it in the network cache?
> > 
> > It would reduce the memory taken up by dead resources in the cache.
> 
> If this is your goal, then why not just evict all dead resources in the MemoryCache? You should be able to get the compressed versions back from the network cache.

That's an interesting idea. But I think its better to use MemoryCache since we have a caching algorithm that is tailored for web content.

> > By moving to WebCore, the decompressed data needs to be kept in memory for only as long as the resource is live.
> 
> This is already how it works.

No. We don't have compressed and decompressed data. Since we only get decompressed data from the network, that's what we store in memory.

> > > If we make the network cache smarter, then we can hold fewer things in the MemoryCache and make it smaller, which seems to be the ultimate goal here. And we won't have to add the extra code paths to handle compressed resources in WebCore.
> > 
> > By definition, if we have a cache, then we would likely have unused (or dead) resources in that cache. In that case, this should reduce the memory usage of the resources in cache.
> 
> But the resources are already cached (and compressed) in the network cache, which may also be in memory. In that case, you're using twice as much memory as you need to. To be fair, this is still an improvement over the current state.

But the network cache won't really have a need to keep the uncompressed memory around. It can just hold onto the compressed bits.
Comment 23 Eric Seidel (no email) 2012-02-16 13:55:48 PST
Who should review this 9-month old patch?  Should it be r-'d?
Comment 24 Pratik Solanki 2012-02-21 11:02:03 PST
Comment on attachment 97673 [details]
Patch v3 - should fix compile issues on other platforms

I'm going to obsolete the patch since I am not working on it currently and I'm sure the patch won't apply cleanly anymore and would need to be reworked.