RESOLVED FIXED 143652
Network Cache: Deduplicate body data
https://bugs.webkit.org/show_bug.cgi?id=143652
Summary Network Cache: Deduplicate body data
Antti Koivisto
Reported 2015-04-12 14:40:05 PDT
We can improve disk space efficiency and use less memory by sharing identical body data between cache entries.
Attachments
patch (56.64 KB, patch)
2015-04-13 01:53 PDT, Antti Koivisto
no flags
patch (56.64 KB, patch)
2015-04-13 02:25 PDT, Antti Koivisto
darin: review+
applied comments (57.08 KB, patch)
2015-04-14 07:55 PDT, Antti Koivisto
no flags
Antti Koivisto
Comment 1 2015-04-13 01:53:13 PDT
Antti Koivisto
Comment 2 2015-04-13 02:25:01 PDT
Chris Dumez
Comment 3 2015-04-13 09:49:28 PDT
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:37 > +#if ENABLE(NETWORK_CACHE) Usually between the main header include (#include "NetworkCacheBlobStorage.h") and the remaining includes (#include "Logging.h" ...) > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:43 > + : m_blobDirectoryPath(blobDirectoryPath.utf8()) Why are we storing the path as a CString when we always use it as a String? I am guessing this has to do with thread-safety somehow? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:60 > + unlink(filePath.data()); Why aren't we using WebCore::deleteFile()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:85 > + unlink(linkPath.data()); Why aren't we using WebCore::deleteFile()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:87 > + bool blobExists = access(blobPath.data(), F_OK) != -1; WebCore::fileExists() ? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:94 > + unlink(blobPath.data()); WebCore::deleteFile()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:144 > + unlink(linkPath.data()); WebCore::deleteFile()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > + return stat.st_nlink - 1; Why the - 1 ? Maybe add a comment for clarity. > Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:310 > +static Data encodeRecordHeader(const Storage::Record& record, SHA1::Digest bodyHash) Why are we copying the Digest? (it is an std::array<uint8_t, 20>).
Darin Adler
Comment 4 2015-04-13 09:52:59 PDT
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > + return stat.st_nlink - 1; Any danger of using more links to a single file than the file system supports? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:35 > +#include <wtf/text/WTFString.h> Don’t need this include. > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:50 > + // These are all synchronous and should not be used from the main thread. This comment applies to the constructor of the class too! > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:66 > + const CString m_blobDirectoryPath; Don’t we have thread safety issues with the reference count of this data member? I don’t understand the threading restrictions on this class. Maybe the only function that is safe to call on another thread is approximateSize()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheData.h:135 > +Data mapFile(const CString& path); Unusual style to take a const CString&; wouldn’t we normally take const char* instead? > Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm:148 > + if (a.isNull() || b.isNull()) > + return false; So null != null? Is that really what we want?
Chris Dumez
Comment 5 2015-04-13 09:57:11 PDT
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:121 > + return { }; Don't we need to munmap in this case?
Chris Dumez
Comment 6 2015-04-13 10:06:57 PDT
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm:133 > +SHA1::Digest computeSHA1(const Data& data) Digest is an std::array and therefore cannot cheaply be moved (it is O(n) as it moves each contained element). Maybe we want to use a reference argument for this one, like SHA1::computeHash() does?
Chris Dumez
Comment 7 2015-04-13 10:18:47 PDT
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:610 > + // This cleans unreferences blobs. "unreferenced" ? > Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:642 > + effectiveWorth /= std::min(bodyShareCount, maximumEffectiveShareCount); Maybe I am misreading but shouldn't it be the opposite? The higher the bodyShareCount, the higher the effectiveWorth should be?
Andreas Kling
Comment 8 2015-04-13 11:00:49 PDT
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:107 > + void* map = mmap(nullptr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); We should make sure to drop PROT_WRITE once we're done writing the resource out to disk.
Antti Koivisto
Comment 9 2015-04-14 05:11:17 PDT
> > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:60 > > + unlink(filePath.data()); > > Why aren't we using WebCore::deleteFile()? I don't use the abstractions in this file because - Existing abstractions don't cover everything needed and I don't want to expand them for no practical benefit. For consistency I use system calls everywhere. - No one seems to be interested in making non-Posix port. - Here specifically I want to to unlink, not delete anything. deleteFile might do wrong thing on non-Posix platform. > Why are we copying the Digest? (it is an std::array<uint8_t, 20>). It is still cheap and it reads better. Compiler may optimize moves.
Antti Koivisto
Comment 10 2015-04-14 05:26:42 PDT
(In reply to comment #4) > Comment on attachment 250632 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=250632&action=review > > > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > > + return stat.st_nlink - 1; > > Any danger of using more links to a single file than the file system > supports? Not really. On Apple platforms the limit seems to be 32767. If that fills then the link call will fail. We won't create new cache entries for that resource but it shouldn't be a problem otherwise. > Don’t we have thread safety issues with the reference count of this data > member? I don’t understand the threading restrictions on this class. Maybe > the only function that is safe to call on another thread is > approximateSize()? It is ok as long as the ref count is not accessed from any other thread (and it isn't). Cache is currently a singleton so these s guaranteed to stay alive. But I'll switch to isolatedCopy() based scheme suggested by Anders which reads better.
Antti Koivisto
Comment 11 2015-04-14 05:37:10 PDT
> Maybe I am misreading but shouldn't it be the opposite? The higher the > bodyShareCount, the higher the effectiveWorth should be? Good catch, did some last minute changes here and got this one wrong.
Antti Koivisto
Comment 12 2015-04-14 05:52:10 PDT
> So null != null? Is that really what we want? Debatable. The function is called bytesEqual. Can they be equal if they don't exist? (we differentiate between null and empty here).
Antti Koivisto
Comment 13 2015-04-14 05:54:09 PDT
> Digest is an std::array and therefore cannot cheaply be moved (it is O(n) as > it moves each contained element). Maybe we want to use a reference argument > for this one, like SHA1::computeHash() does? std::array is a fixed size data structure. Copies of instances are constant time.
Antti Koivisto
Comment 14 2015-04-14 07:55:32 PDT
Created attachment 250706 [details] applied comments
WebKit Commit Bot
Comment 15 2015-04-14 11:41:28 PDT
Comment on attachment 250706 [details] applied comments Clearing flags on attachment: 250706 Committed r182803: <http://trac.webkit.org/changeset/182803>
WebKit Commit Bot
Comment 16 2015-04-14 11:41:34 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.