Bug 143652 - Network Cache: Deduplicate body data
Summary: Network Cache: Deduplicate body data
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-12 14:40 PDT by Antti Koivisto
Modified: 2015-04-14 11:41 PDT (History)
4 users (show)

See Also:


Attachments
patch (56.64 KB, patch)
2015-04-13 01:53 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff
patch (56.64 KB, patch)
2015-04-13 02:25 PDT, Antti Koivisto
darin: review+
Details | Formatted Diff | Diff
applied comments (57.08 KB, patch)
2015-04-14 07:55 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Antti Koivisto 2015-04-12 14:40:05 PDT
We can improve disk space efficiency and use less memory by sharing identical body data between cache entries.
Comment 1 Antti Koivisto 2015-04-13 01:53:13 PDT
Created attachment 250631 [details]
patch
Comment 2 Antti Koivisto 2015-04-13 02:25:01 PDT
Created attachment 250632 [details]
patch
Comment 3 Chris Dumez 2015-04-13 09:49:28 PDT
Comment on attachment 250632 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:37
> +#if ENABLE(NETWORK_CACHE)

Usually between the main header include (#include "NetworkCacheBlobStorage.h") and the remaining includes (#include "Logging.h" ...)

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:43
> +    : m_blobDirectoryPath(blobDirectoryPath.utf8())

Why are we storing the path as a CString when we always use it as a String? I am guessing this has to do with thread-safety somehow?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:60
> +            unlink(filePath.data());

Why aren't we using WebCore::deleteFile()?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:85
> +    unlink(linkPath.data());

Why aren't we using WebCore::deleteFile()?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:87
> +    bool blobExists = access(blobPath.data(), F_OK) != -1;

WebCore::fileExists() ?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:94
> +        unlink(blobPath.data());

WebCore::deleteFile()?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:144
> +    unlink(linkPath.data());

WebCore::deleteFile()?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155
> +    return stat.st_nlink - 1;

Why the - 1 ? Maybe add a comment for clarity.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:310
> +static Data encodeRecordHeader(const Storage::Record& record, SHA1::Digest bodyHash)

Why are we copying the Digest? (it is an std::array<uint8_t, 20>).
Comment 4 Darin Adler 2015-04-13 09:52:59 PDT
Comment on attachment 250632 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155
> +    return stat.st_nlink - 1;

Any danger of using more links to a single file than the file system supports?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:35
> +#include <wtf/text/WTFString.h>

Don’t need this include.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:50
> +    // These are all synchronous and should not be used from the main thread.

This comment applies to the constructor of the class too!

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h:66
> +    const CString m_blobDirectoryPath;

Don’t we have thread safety issues with the reference count of this data member? I don’t understand the threading restrictions on this class. Maybe the only function that is safe to call on another thread is approximateSize()?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheData.h:135
> +Data mapFile(const CString& path);

Unusual style to take a const CString&; wouldn’t we normally take const char* instead?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm:148
> +    if (a.isNull() || b.isNull())
> +        return false;

So null != null? Is that really what we want?
Comment 5 Chris Dumez 2015-04-13 09:57:11 PDT
Comment on attachment 250632 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:121
> +        return { };

Don't we need to munmap in this case?
Comment 6 Chris Dumez 2015-04-13 10:06:57 PDT
Comment on attachment 250632 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm:133
> +SHA1::Digest computeSHA1(const Data& data)

Digest is an std::array and therefore cannot cheaply be moved (it is O(n) as it moves each contained element). Maybe we want to use a reference argument for this one, like SHA1::computeHash() does?
Comment 7 Chris Dumez 2015-04-13 10:18:47 PDT
Comment on attachment 250632 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:610
> +        // This cleans unreferences blobs.

"unreferenced" ?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp:642
> +        effectiveWorth /= std::min(bodyShareCount, maximumEffectiveShareCount);

Maybe I am misreading but shouldn't it be the opposite? The higher the bodyShareCount, the higher the effectiveWorth should be?
Comment 8 Andreas Kling 2015-04-13 11:00:49 PDT
Comment on attachment 250632 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250632&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:107
> +    void* map = mmap(nullptr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

We should make sure to drop PROT_WRITE once we're done writing the resource out to disk.
Comment 9 Antti Koivisto 2015-04-14 05:11:17 PDT
> > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:60
> > +            unlink(filePath.data());
> 
> Why aren't we using WebCore::deleteFile()?

I don't use the abstractions in this file because

- Existing abstractions don't cover everything needed and I don't want to expand them for no practical benefit. For consistency I use system calls everywhere.
- No one seems to be interested in making non-Posix port.
- Here specifically I want to to unlink, not delete anything. deleteFile might do wrong thing on non-Posix platform.

> Why are we copying the Digest? (it is an std::array<uint8_t, 20>).

It is still cheap and it reads better. Compiler may optimize moves.
Comment 10 Antti Koivisto 2015-04-14 05:26:42 PDT
(In reply to comment #4)
> Comment on attachment 250632 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=250632&action=review
> 
> > Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp:155
> > +    return stat.st_nlink - 1;
> 
> Any danger of using more links to a single file than the file system
> supports?

Not really. On Apple platforms the limit seems to be 32767. If that fills then the link call will fail. We won't create new cache entries for that resource but it shouldn't be a problem otherwise.

> Don’t we have thread safety issues with the reference count of this data
> member? I don’t understand the threading restrictions on this class. Maybe
> the only function that is safe to call on another thread is
> approximateSize()?

It is ok as long as the ref count is not accessed from any other thread (and it isn't). Cache is currently a singleton so these s guaranteed to stay alive. But I'll switch to isolatedCopy() based scheme suggested by Anders which reads better.
Comment 11 Antti Koivisto 2015-04-14 05:37:10 PDT
> Maybe I am misreading but shouldn't it be the opposite? The higher the
> bodyShareCount, the higher the effectiveWorth should be?

Good catch, did some last minute changes here and got this one wrong.
Comment 12 Antti Koivisto 2015-04-14 05:52:10 PDT
> So null != null? Is that really what we want?

Debatable. The function is called bytesEqual. Can they be equal if they don't exist? (we differentiate between null and empty here).
Comment 13 Antti Koivisto 2015-04-14 05:54:09 PDT
> Digest is an std::array and therefore cannot cheaply be moved (it is O(n) as
> it moves each contained element). Maybe we want to use a reference argument
> for this one, like SHA1::computeHash() does?

std::array is a fixed size data structure. Copies of instances are constant time.
Comment 14 Antti Koivisto 2015-04-14 07:55:32 PDT
Created attachment 250706 [details]
applied comments
Comment 15 WebKit Commit Bot 2015-04-14 11:41:28 PDT
Comment on attachment 250706 [details]
applied comments

Clearing flags on attachment: 250706

Committed r182803: <http://trac.webkit.org/changeset/182803>
Comment 16 WebKit Commit Bot 2015-04-14 11:41:34 PDT
All reviewed patches have been landed.  Closing bug.