62581 – WebCore should cache compressed data

NEW 62581

WebCore should cache compressed data

https://bugs.webkit.org/show_bug.cgi?id=62581

Summary WebCore should cache compressed data

Pratik Solanki

Reported 2011-06-13 11:28:03 PDT

If CFNetwork supports it, WebCore should cache the compressed data returned by servers and handle data decompression itself. That way we can cache just the compressed data and reduce memory usage.

Attachments
Patch (789.71 KB, patch) 2011-06-13 12:49 PDT, Pratik Solanki	no flags	Details Formatted Diff Diff
Patch v2 (829.80 KB, patch) 2011-06-17 14:13 PDT, Pratik Solanki	no flags	Details Formatted Diff Diff
Patch v3 - should fix compile issues on other platforms (829.51 KB, patch) 2011-06-17 17:47 PDT, Pratik Solanki	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Pratik Solanki

Comment 1 2011-06-13 11:28:18 PDT

<rdar://problem/8989320>

Pratik Solanki

Comment 2 2011-06-13 12:49:40 PDT

Created attachment 96992 [details] Patch

WebKit Review Bot

Comment 3 2011-06-13 13:05:19 PDT

Comment on attachment 96992 [details] Patch Attachment 96992 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/8831885

Alexey Proskuryakov

Comment 4 2011-06-13 13:48:45 PDT

Comment on attachment 96992 [details] Patch + No new tests because the option is disabled by default. We now have a policy that new features should be discussed on webkit-dev first, and a new preprocessor option qualifies as a new feature. +#if USE(COMPRESSED_DATA_FROM_CFNETWORK) +- (CFURLRequestRef) _CFURLRequest; +#endif I'm not sure if you need to protect that much code with ifdefs. For example, nothing will break if you add this details declaration unconditionally. + if (client()->shouldUseCompressedData()) + wkRequestDoNotDecodeData([nsRequest _CFURLRequest]); This sounds more like "wants" or "canUse" than "should".

Adam Barth

Comment 5 2011-06-13 13:58:32 PDT

Comment on attachment 96992 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=96992&action=review > Source/WebCore/loader/ResourceLoader.h:114 > +#if USE(COMPRESSED_DATA_FROM_CFNETWORK) COMPRESSED_DATA_FROM_CFNETWORK => COMPRESSED_DATA_FROM_NETWORK ? It seems like you could use this feature with all the various backends. The CFNETWORK specific parts should be behind USE(CFNETWORK) or whatever. > Source/WebCore/loader/cache/CachedResource.cpp:647 > + m_compressedData = adoptRef(new CompressedBuffer(this, m_data.get())); "adoptRef(new CompressedBuffer" should be CompressedBuffer::create (we allocate all RefCounted types through static "create" methods to make sure everyone calls adoptRef). > Source/WebCore/loader/mac/CompressedBuffer.h:34 > +class CompressedBuffer : public RefCounted<CompressedBuffer> { This does not appear to be Mac-specific. > Source/WebCore/loader/mac/CompressedBuffer.h:44 > + unsigned m_prevCompressedDataSize; m_prevCompressedDataSize => m_previousCompressedDataSize <-- please use full words in variable names.

Eric Seidel (no email)

Comment 6 2011-06-13 14:36:04 PDT

Comment on attachment 96992 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=96992&action=review > Source/WebCore/platform/SharedBuffer.h:119 > + CFArrayRef dataArray() { return m_dataArray.get(); } Should this be const? (I honestly don't know, I didn't read the whole const thread.)

Pratik Solanki

Comment 7 2011-06-14 10:51:29 PDT

Thanks for all the feedback. I'll post an updated patch soon.

Pratik Solanki

Comment 8 2011-06-17 14:13:11 PDT

Created attachment 97651 [details] Patch v2

Pratik Solanki

Comment 9 2011-06-17 14:14:25 PDT

I reworked the patch so its more like the way PurgeableBuffer works on the mac.

WebKit Review Bot

Comment 10 2011-06-17 14:35:24 PDT

Comment on attachment 97651 [details] Patch v2 Attachment 97651 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/8872661

Eric Seidel (no email)

Comment 11 2011-06-17 14:36:22 PDT

Comment on attachment 97651 [details] Patch v2 View in context: https://bugs.webkit.org/attachment.cgi?id=97651&action=review I'm not sure I fully understand the consequence of this change? I assume this is a big memory win. It's unclear to me from my first read-through how the underlying systems even use this data. The point is that we're storing the compressed data on the CachedResource instead of the uncompressed? Are we ever passing that data somewhere (it doesn't seem so). For example, I could imagine us want to pass the compressed data through to the graphics system, depending on the compression. When we're speaking of compression here, are we talking about un-decoded data? (we seem to be caching the text-decode result still). Or JPEG compression? (I know that we cache the uncompressed frames on BitmapImage for example. Or are we talking about things like gzip/zip compression? If so, why wouldn't we just always compress all data before putting it in our cache? or at least compress it when asked to purge? > Source/WebCore/ChangeLog:10 > + Add support for storing the uncompressed data received from the network in our cache. This > + allows us to hold on to the uncompressed data when CachedResources are deemed dead (i.e. Don't you mean compressed here? > Source/WebCore/loader/cache/CachedFont.cpp:135 > + RefPtr<SharedBuffer> data = CachedResource::data(); I'm confused why we use a RefPtr here, and don't just call data() in each place we used to use m_data? > Source/WebCore/loader/cache/CachedResourceRequest.cpp:289 > + return true; Why is this a correct default? Why do we even have this accessor if "true" is the default? > Source/WebCore/loader/cache/CachedScript.cpp:86 > + RefPtr<SharedBuffer> buffer = CachedResource::data(); Why always explicitly call the superclass::data()? > Source/WebCore/loader/cache/CachedScript.cpp:104 > + RefPtr<SharedBuffer> buffer = CachedResource::data(); > + setEncodedSize(buffer.get() ? buffer->size() : 0); Bleh. This feels like we should have an updateEncodedSize() method which does this since we do this in multiple places. This also causes uneeded ref-churn on the SharedBuffer.

Early Warning System Bot

Comment 12 2011-06-17 14:51:10 PDT

Comment on attachment 97651 [details] Patch v2 Attachment 97651 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/8877733

Pratik Solanki

Comment 13 2011-06-17 15:16:34 PDT

(In reply to comment #11) > (From update of attachment 97651 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=97651&action=review > > I'm not sure I fully understand the consequence of this change? I assume this is a big memory win. It's unclear to me from my first read-through how the underlying systems even use this data. > > The point is that we're storing the compressed data on the CachedResource instead of the uncompressed? Correct. > Are we ever passing that data somewhere (it doesn't seem so). For example, I could imagine us want to pass the compressed data through to the graphics system, depending on the compression. Maybe.. for now we're just using it internally in CachedResource. > When we're speaking of compression here, are we talking about un-decoded data? (we seem to be caching the text-decode result still). Or JPEG compression? (I know that we cache the uncompressed frames on BitmapImage for example. Or are we talking about things like gzip/zip compression? It's gzip compression. Essentially data that comes over the wire in compressed form (Content-Encoding:gzip). > If so, why wouldn't we just always compress all data before putting it in our cache? or at least compress it when asked to purge? We could do that. That adds an extra compression step in WebCore. The advantage here is that the networking stack could just hand us the uncompressed data. > > Source/WebCore/ChangeLog:10 > > + Add support for storing the uncompressed data received from the network in our cache. This > > + allows us to hold on to the uncompressed data when CachedResources are deemed dead (i.e. > > Don't you mean compressed here? Yes I did. > > Source/WebCore/loader/cache/CachedFont.cpp:135 > > + RefPtr<SharedBuffer> data = CachedResource::data(); > > I'm confused why we use a RefPtr here, and don't just call data() in each place we used to use m_data? I used a local variable since we were using it multiple times below. > > Source/WebCore/loader/cache/CachedResourceRequest.cpp:289 > > + return true; > > Why is this a correct default? Why do we even have this accessor if "true" is the default? This is part of SubresourceLoaderClient interface.CachedResourceRequest is the only client that sets it to true. Default is false. > > Source/WebCore/loader/cache/CachedScript.cpp:86 > > + RefPtr<SharedBuffer> buffer = CachedResource::data(); > > Why always explicitly call the superclass::data()? Hmm. I seem to recall doing that to disambiguate between methods but looking at the code now, I don't see any reason why it should be CachedResource::data(). > > Source/WebCore/loader/cache/CachedScript.cpp:104 > > + RefPtr<SharedBuffer> buffer = CachedResource::data(); > > + setEncodedSize(buffer.get() ? buffer->size() : 0); > > Bleh. This feels like we should have an updateEncodedSize() method which does this since we do this in multiple places. This also causes uneeded ref-churn on the SharedBuffer. Maybe I could just use SharedBuffer *buffer = data().get(); ?

Pratik Solanki

Comment 14 2011-06-17 15:18:27 PDT

(In reply to comment #12) > (From update of attachment 97651 [details]) > Attachment 97651 [details] did not pass qt-ews (qt): > Output: http://queues.webkit.org/results/8877733 I had some local changes that fixed the build errors but somehow lost that before uploading the patch. All non-mac EWS bots will fail to build.

Pratik Solanki

Comment 15 2011-06-17 15:52:28 PDT

(In reply to comment #13) > > > Source/WebCore/loader/cache/CachedScript.cpp:86 > > > + RefPtr<SharedBuffer> buffer = CachedResource::data(); > > > > Why always explicitly call the superclass::data()? > > Hmm. I seem to recall doing that to disambiguate between methods but looking at the code now, I don't see any reason why it should be CachedResource::data(). So if I don't do that, the compiler gets confused since we also have CachedResource::data(PassRefPtr<SharedBuffer> buffer, bool allDataReceived) which gets overridden in subclasses. Compiler gives me an error saying CachedCSSStyleSheet.cpp:79: error: no matching function for call to 'WebCore::CachedCSSStyleSheet::data() const' CachedCSSStyleSheet.h:51: note: candidates are: virtual void WebCore::CachedCSSStyleSheet::data(WTF::PassRefPtr<WebCore::SharedBuffer>, bool) I could fix that by making the new method be virtual in CachedResource.h and redeclaring it in each of the subclasses header file. But it might be better just to do CachedResource::data()? or change the name to something else, perhaps.

WebKit Review Bot

Comment 16 2011-06-17 16:39:06 PDT

Comment on attachment 97651 [details] Patch v2 Attachment 97651 [details] did not pass cr-mac-ews (chromium): Output: http://queues.webkit.org/results/8874711

Gustavo Noronha (kov)

Comment 17 2011-06-17 17:34:38 PDT

Comment on attachment 97651 [details] Patch v2 Attachment 97651 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/8873019

Pratik Solanki

Comment 18 2011-06-17 17:47:45 PDT

Created attachment 97673 [details] Patch v3 - should fix compile issues on other platforms

James Simonsen

Comment 19 2011-06-20 13:13:47 PDT

Hi, I'm curious why it's desirable to move compressed data into the MemoryCache. Why not just leave it in the network cache? We already have a code path for decompressing data as it moves from the network layer to WebCore when it's downloaded. And I think most platforms have some sort of a cache in the network layer too, which can be used to cache the compressed data and decompress it as needed. It seems like a nice division of responsibilities to keep compressed data in the network layer and only deal with decompressed data in WebCore. It also means we don't need to write code in both places to handle decompressing data. I understand there are probably cases where something lives longer in the memory cache than in the network cache, but it seems better to just fix that instead of duplicating the responsibilities. For instance, we've talked here about having a way to ping back to the network cache on Memory Cache hits, so that the network cache can do a better job holding on to important resources. If we make the network cache smarter, then we can hold fewer things in the MemoryCache and make it smaller, which seems to be the ultimate goal here. And we won't have to add the extra code paths to handle compressed resources in WebCore.

Pratik Solanki

Comment 20 2011-06-21 16:25:58 PDT

(In reply to comment #19) > Hi, > > I'm curious why it's desirable to move compressed data into the MemoryCache. Why not just leave it in the network cache? It would reduce the memory taken up by dead resources in the cache. > We already have a code path for decompressing data as it moves from the network layer to WebCore when it's downloaded. And I think most platforms have some sort of a cache in the network layer too, which can be used to cache the compressed data and decompress it as needed. > > It seems like a nice division of responsibilities to keep compressed data in the network layer and only deal with decompressed data in WebCore. It also means we don't need to write code in both places to handle decompressing data. By moving to WebCore, the decompressed data needs to be kept in memory for only as long as the resource is live. And the code isn't really in two places since we call a network layer function to decompress the data. It's just the management of when the decompression happens that is moved to WebCore. > I understand there are probably cases where something lives longer in the memory cache than in the network cache, but it seems better to just fix that instead of duplicating the responsibilities. I suppose that could happen because each cache has its own different eviction policy and cache sizes but that's not what I am trying to address. > For instance, we've talked here about having a way to ping back to the network cache on Memory Cache hits, so that the network cache can do a better job holding on to important resources. > > If we make the network cache smarter, then we can hold fewer things in the MemoryCache and make it smaller, which seems to be the ultimate goal here. And we won't have to add the extra code paths to handle compressed resources in WebCore. By definition, if we have a cache, then we would likely have unused (or dead) resources in that cache. In that case, this should reduce the memory usage of the resources in cache.

James Simonsen

Comment 21 2011-06-21 17:48:07 PDT

(In reply to comment #20) > (In reply to comment #19) > > I'm curious why it's desirable to move compressed data into the MemoryCache. Why not just leave it in the network cache? > > It would reduce the memory taken up by dead resources in the cache. If this is your goal, then why not just evict all dead resources in the MemoryCache? You should be able to get the compressed versions back from the network cache. > By moving to WebCore, the decompressed data needs to be kept in memory for only as long as the resource is live. This is already how it works. > > If we make the network cache smarter, then we can hold fewer things in the MemoryCache and make it smaller, which seems to be the ultimate goal here. And we won't have to add the extra code paths to handle compressed resources in WebCore. > > By definition, if we have a cache, then we would likely have unused (or dead) resources in that cache. In that case, this should reduce the memory usage of the resources in cache. But the resources are already cached (and compressed) in the network cache, which may also be in memory. In that case, you're using twice as much memory as you need to. To be fair, this is still an improvement over the current state. I suspect the real issue here is that the MemoryCache has a better hit rate than the network cache. Or, possibly, going to the network cache is too slow. Whichever it is, it seems fixing that will be more valuable on the whole.

Pratik Solanki

Comment 22 2011-06-23 14:52:50 PDT

(In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #19) > > > I'm curious why it's desirable to move compressed data into the MemoryCache. Why not just leave it in the network cache? > > > > It would reduce the memory taken up by dead resources in the cache. > > If this is your goal, then why not just evict all dead resources in the MemoryCache? You should be able to get the compressed versions back from the network cache. That's an interesting idea. But I think its better to use MemoryCache since we have a caching algorithm that is tailored for web content. > > By moving to WebCore, the decompressed data needs to be kept in memory for only as long as the resource is live. > > This is already how it works. No. We don't have compressed and decompressed data. Since we only get decompressed data from the network, that's what we store in memory. > > > If we make the network cache smarter, then we can hold fewer things in the MemoryCache and make it smaller, which seems to be the ultimate goal here. And we won't have to add the extra code paths to handle compressed resources in WebCore. > > > > By definition, if we have a cache, then we would likely have unused (or dead) resources in that cache. In that case, this should reduce the memory usage of the resources in cache. > > But the resources are already cached (and compressed) in the network cache, which may also be in memory. In that case, you're using twice as much memory as you need to. To be fair, this is still an improvement over the current state. But the network cache won't really have a need to keep the uncompressed memory around. It can just hold onto the compressed bits.

Eric Seidel (no email)

Comment 23 2012-02-16 13:55:48 PST

Who should review this 9-month old patch? Should it be r-'d?

Pratik Solanki

Comment 24 2012-02-21 11:02:03 PST

Comment on attachment 97673 [details] Patch v3 - should fix compile issues on other platforms I'm going to obsolete the patch since I am not working on it currently and I'm sure the patch won't apply cleanly anymore and would need to be reworked.

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Pratik Solanki

Reported

2011-06-13 11:28 PDT

Modified

2012-02-21 11:02 PST History

CC List

11 users Show

URL

Keywords InRadar

Depends on

Blocks