We can improve disk space efficiency and use less memory by sharing identical body data between cache entries.
Created attachment 250631 [details] patch
Created attachment 250632 [details] patch
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:37 > +#if ENABLE(NETWORK_CACHE) Usually between the main header include (#include "NetworkCacheBlobStorage.h") and the remaining includes (#include "Logging.h" ...) > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:43 > + : m_blobDirectoryPath(blobDirectoryPath.utf8()) Why are we storing the path as a CString when we always use it as a String? I am guessing this has to do with thread-safety somehow? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:60 > + unlink(filePath.data()); Why aren't we using WebCore::deleteFile()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:85 > + unlink(linkPath.data()); Why aren't we using WebCore::deleteFile()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:87 > + bool blobExists = access(blobPath.data(), F_OK) != -1; WebCore::fileExists() ? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:94 > + unlink(blobPath.data()); WebCore::deleteFile()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:144 > + unlink(linkPath.data()); WebCore::deleteFile()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > + return stat.st_nlink - 1; Why the - 1 ? Maybe add a comment for clarity. > Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:310 > +static Data encodeRecordHeader(const Storage::Record& record, SHA1::Digest bodyHash) Why are we copying the Digest? (it is an std::array<uint8_t, 20>).
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > + return stat.st_nlink - 1; Any danger of using more links to a single file than the file system supports? > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:35 > +#include <wtf/text/WTFString.h> Don’t need this include. > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:50 > + // These are all synchronous and should not be used from the main thread. This comment applies to the constructor of the class too! > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:66 > + const CString m_blobDirectoryPath; Don’t we have thread safety issues with the reference count of this data member? I don’t understand the threading restrictions on this class. Maybe the only function that is safe to call on another thread is approximateSize()? > Source/WebKit2/NetworkProcess/cache/NetworkCacheData.h:135 > +Data mapFile(const CString& path); Unusual style to take a const CString&; wouldn’t we normally take const char* instead? > Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm:148 > + if (a.isNull() || b.isNull()) > + return false; So null != null? Is that really what we want?
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:121 > + return { }; Don't we need to munmap in this case?
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm:133 > +SHA1::Digest computeSHA1(const Data& data) Digest is an std::array and therefore cannot cheaply be moved (it is O(n) as it moves each contained element). Maybe we want to use a reference argument for this one, like SHA1::computeHash() does?
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:610 > + // This cleans unreferences blobs. "unreferenced" ? > Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:642 > + effectiveWorth /= std::min(bodyShareCount, maximumEffectiveShareCount); Maybe I am misreading but shouldn't it be the opposite? The higher the bodyShareCount, the higher the effectiveWorth should be?
Comment on attachment 250632 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:107 > + void* map = mmap(nullptr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); We should make sure to drop PROT_WRITE once we're done writing the resource out to disk.
> > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:60 > > + unlink(filePath.data()); > > Why aren't we using WebCore::deleteFile()? I don't use the abstractions in this file because - Existing abstractions don't cover everything needed and I don't want to expand them for no practical benefit. For consistency I use system calls everywhere. - No one seems to be interested in making non-Posix port. - Here specifically I want to to unlink, not delete anything. deleteFile might do wrong thing on non-Posix platform. > Why are we copying the Digest? (it is an std::array<uint8_t, 20>). It is still cheap and it reads better. Compiler may optimize moves.
(In reply to comment #4) > Comment on attachment 250632 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=250632&action=review > > > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > > + return stat.st_nlink - 1; > > Any danger of using more links to a single file than the file system > supports? Not really. On Apple platforms the limit seems to be 32767. If that fills then the link call will fail. We won't create new cache entries for that resource but it shouldn't be a problem otherwise. > Don’t we have thread safety issues with the reference count of this data > member? I don’t understand the threading restrictions on this class. Maybe > the only function that is safe to call on another thread is > approximateSize()? It is ok as long as the ref count is not accessed from any other thread (and it isn't). Cache is currently a singleton so these s guaranteed to stay alive. But I'll switch to isolatedCopy() based scheme suggested by Anders which reads better.
> Maybe I am misreading but shouldn't it be the opposite? The higher the > bodyShareCount, the higher the effectiveWorth should be? Good catch, did some last minute changes here and got this one wrong.
> So null != null? Is that really what we want? Debatable. The function is called bytesEqual. Can they be equal if they don't exist? (we differentiate between null and empty here).
> Digest is an std::array and therefore cannot cheaply be moved (it is O(n) as > it moves each contained element). Maybe we want to use a reference argument > for this one, like SHA1::computeHash() does? std::array is a fixed size data structure. Copies of instances are constant time.
Created attachment 250706 [details] applied comments
Comment on attachment 250706 [details] applied comments Clearing flags on attachment: 250706 Committed r182803: <http://trac.webkit.org/changeset/182803>
All reviewed patches have been landed. Closing bug.