WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
143652
Network Cache: Deduplicate body data
https://bugs.webkit.org/show_bug.cgi?id=143652
Summary
Network Cache: Deduplicate body data
Antti Koivisto
Reported
2015-04-12 14:40:05 PDT
We can improve disk space efficiency and use less memory by sharing identical body data between cache entries.
Attachments
patch
(56.64 KB, patch)
2015-04-13 01:53 PDT
,
Antti Koivisto
no flags
Details
Formatted Diff
Diff
patch
(56.64 KB, patch)
2015-04-13 02:25 PDT
,
Antti Koivisto
darin
: review+
Details
Formatted Diff
Diff
applied comments
(57.08 KB, patch)
2015-04-14 07:55 PDT
,
Antti Koivisto
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Antti Koivisto
Comment 1
2015-04-13 01:53:13 PDT
Created
attachment 250631
[details]
patch
Antti Koivisto
Comment 2
2015-04-13 02:25:01 PDT
Created
attachment 250632
[details]
patch
Chris Dumez
Comment 3
2015-04-13 09:49:28 PDT
Comment on
attachment 250632
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250632&action=review
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:37 > +#if ENABLE(NETWORK_CACHE)
Usually between the main header include (#include "NetworkCacheBlobStorage.h") and the remaining includes (#include "Logging.h" ...)
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:43 > + : m_blobDirectoryPath(blobDirectoryPath.utf8())
Why are we storing the path as a CString when we always use it as a String? I am guessing this has to do with thread-safety somehow?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:60 > + unlink(filePath.data());
Why aren't we using WebCore::deleteFile()?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:85 > + unlink(linkPath.data());
Why aren't we using WebCore::deleteFile()?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:87 > + bool blobExists = access(blobPath.data(), F_OK) != -1;
WebCore::fileExists() ?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:94 > + unlink(blobPath.data());
WebCore::deleteFile()?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:144 > + unlink(linkPath.data());
WebCore::deleteFile()?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > + return stat.st_nlink - 1;
Why the - 1 ? Maybe add a comment for clarity.
> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:310 > +static Data encodeRecordHeader(const Storage::Record& record, SHA1::Digest bodyHash)
Why are we copying the Digest? (it is an std::array<uint8_t, 20>).
Darin Adler
Comment 4
2015-04-13 09:52:59 PDT
Comment on
attachment 250632
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250632&action=review
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > + return stat.st_nlink - 1;
Any danger of using more links to a single file than the file system supports?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:35 > +#include <wtf/text/WTFString.h>
Don’t need this include.
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:50 > + // These are all synchronous and should not be used from the main thread.
This comment applies to the constructor of the class too!
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:66 > + const CString m_blobDirectoryPath;
Don’t we have thread safety issues with the reference count of this data member? I don’t understand the threading restrictions on this class. Maybe the only function that is safe to call on another thread is approximateSize()?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheData.h:135 > +Data mapFile(const CString& path);
Unusual style to take a const CString&; wouldn’t we normally take const char* instead?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm:148 > + if (a.isNull() || b.isNull()) > + return false;
So null != null? Is that really what we want?
Chris Dumez
Comment 5
2015-04-13 09:57:11 PDT
Comment on
attachment 250632
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250632&action=review
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:121 > + return { };
Don't we need to munmap in this case?
Chris Dumez
Comment 6
2015-04-13 10:06:57 PDT
Comment on
attachment 250632
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250632&action=review
> Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm:133 > +SHA1::Digest computeSHA1(const Data& data)
Digest is an std::array and therefore cannot cheaply be moved (it is O(n) as it moves each contained element). Maybe we want to use a reference argument for this one, like SHA1::computeHash() does?
Chris Dumez
Comment 7
2015-04-13 10:18:47 PDT
Comment on
attachment 250632
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250632&action=review
> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:610 > + // This cleans unreferences blobs.
"unreferenced" ?
> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:642 > + effectiveWorth /= std::min(bodyShareCount, maximumEffectiveShareCount);
Maybe I am misreading but shouldn't it be the opposite? The higher the bodyShareCount, the higher the effectiveWorth should be?
Andreas Kling
Comment 8
2015-04-13 11:00:49 PDT
Comment on
attachment 250632
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250632&action=review
> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:107 > + void* map = mmap(nullptr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
We should make sure to drop PROT_WRITE once we're done writing the resource out to disk.
Antti Koivisto
Comment 9
2015-04-14 05:11:17 PDT
> > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:60 > > + unlink(filePath.data()); > > Why aren't we using WebCore::deleteFile()?
I don't use the abstractions in this file because - Existing abstractions don't cover everything needed and I don't want to expand them for no practical benefit. For consistency I use system calls everywhere. - No one seems to be interested in making non-Posix port. - Here specifically I want to to unlink, not delete anything. deleteFile might do wrong thing on non-Posix platform.
> Why are we copying the Digest? (it is an std::array<uint8_t, 20>).
It is still cheap and it reads better. Compiler may optimize moves.
Antti Koivisto
Comment 10
2015-04-14 05:26:42 PDT
(In reply to
comment #4
)
> Comment on
attachment 250632
[details]
> patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=250632&action=review
> > > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155 > > + return stat.st_nlink - 1; > > Any danger of using more links to a single file than the file system > supports?
Not really. On Apple platforms the limit seems to be 32767. If that fills then the link call will fail. We won't create new cache entries for that resource but it shouldn't be a problem otherwise.
> Don’t we have thread safety issues with the reference count of this data > member? I don’t understand the threading restrictions on this class. Maybe > the only function that is safe to call on another thread is > approximateSize()?
It is ok as long as the ref count is not accessed from any other thread (and it isn't). Cache is currently a singleton so these s guaranteed to stay alive. But I'll switch to isolatedCopy() based scheme suggested by Anders which reads better.
Antti Koivisto
Comment 11
2015-04-14 05:37:10 PDT
> Maybe I am misreading but shouldn't it be the opposite? The higher the > bodyShareCount, the higher the effectiveWorth should be?
Good catch, did some last minute changes here and got this one wrong.
Antti Koivisto
Comment 12
2015-04-14 05:52:10 PDT
> So null != null? Is that really what we want?
Debatable. The function is called bytesEqual. Can they be equal if they don't exist? (we differentiate between null and empty here).
Antti Koivisto
Comment 13
2015-04-14 05:54:09 PDT
> Digest is an std::array and therefore cannot cheaply be moved (it is O(n) as > it moves each contained element). Maybe we want to use a reference argument > for this one, like SHA1::computeHash() does?
std::array is a fixed size data structure. Copies of instances are constant time.
Antti Koivisto
Comment 14
2015-04-14 07:55:32 PDT
Created
attachment 250706
[details]
applied comments
WebKit Commit Bot
Comment 15
2015-04-14 11:41:28 PDT
Comment on
attachment 250706
[details]
applied comments Clearing flags on attachment: 250706 Committed
r182803
: <
http://trac.webkit.org/changeset/182803
>
WebKit Commit Bot
Comment 16
2015-04-14 11:41:34 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug