Bug 123333

Summary: [curl] Add file cache
Product: WebKit Reporter: Mátyás Mustoha <mmatyas>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, chenzx, commit-queue, davidsz, galpeter, sipka, zsborbely.u-szeged
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 117300    
Attachments:
Description Flags
work in progress patch
none
work in progress patch 2
none
Proposed patch
bfulgham: review-
Proposed patch 2
bfulgham: review-
Proposed patch 3 none

Description Mátyás Mustoha 2013-10-25 05:27:13 PDT
This is a work-in-progress patch for implementing an on-disc file cache for the curl network backend.
Comment 1 Mátyás Mustoha 2013-10-25 05:30:15 PDT
Created attachment 215165 [details]
work in progress patch
Comment 2 Mátyás Mustoha 2013-11-05 08:44:48 PST
Created attachment 216042 [details]
work in progress patch 2

Updated patch
Comment 3 Chen Zhixiang 2013-11-05 20:05:37 PST
Hi, I've implemented a http discache module for WinCE port's default WinInet backend, and later migrated to cURL.

The impl. is based on Qt's source code, but later trimmed a lot.

The cache file format mainly contains 3 parts: a fixed-size header struct; a http response header, but can be skipped by header-length field of previous struct; the response body.

And it also uses Qt's tricks: such as storing cache file first into a temple file, and later renamed to a url-hashed file name. This may maximize the `ACID` property of low-level file operations.

Hope these information may help you!
Comment 4 Mátyás Mustoha 2013-11-06 04:47:14 PST
Created attachment 216164 [details]
Proposed patch
Comment 5 Brent Fulgham 2013-11-06 09:42:12 PST
Comment on attachment 216164 [details]
Proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=216164&action=review

A very nice start!  There seems to be a bit of excess copying going on, as well as a few coding standard violations in this patch.  Please revise and resubmit so we can push it forward.

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:47
> +CurlCacheEntry::CurlCacheEntry(String url, String cacheDir)

These should be passed by const ref to avoid copying.

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:87
> +    return &m_requestHeaders;

If the m_requestHeaders member always exists, why do we return as a pointer?  Why not return it by reference?

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:90
> +bool CurlCacheEntry::saveCachedData(const char *data, int size)

We write this as "const char* data"

It seems like size should be a "size_t" type.

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:92
> +    PlatformFileHandle contentFile = openFile(m_contentFilename, OpenForWrite);

It's too bad we don't have some kind of smart pointer for file operations (to ensure it is closed for any return state).

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:105
> +bool CurlCacheEntry::loadCachedData(ResourceHandle *job)

We write this as "ResourceHandle* job"

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:117
> +bool CurlCacheEntry::saveResponseHeaders(ResourceResponse &response)

We write this as "ResourceResponse& response".

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:142
> +// load the headers from cache file into memory

This comment isn't really needed.

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:149
> +    // create strings from buffer

This comment doesn't really add anything.

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:167
> +void CurlCacheEntry::setResponseFromCachedHeaders(ResourceResponse &response)

We write this as "ResourceResponse& response".

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:173
> +    HTTPHeaderMap cachedResponseHeaders = m_cachedResponse.httpHeaderFields();

This is making a copy; is that intentional?

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:207
> +void CurlCacheEntry::generateBaseFilename(CString url)

Cstring might be fairly big; shouldn't this be a const reference?

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:220
> +bool CurlCacheEntry::loadFileToBuffer(String filepath, Vector<char> &buffer)

Strings can be big; should probably pass by const reference.

We write this as "Vector<char>& buffer"

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:265
> +bool CurlCacheEntry::parseResponseHeaders(ResourceResponse &response)

We write this as "ResourceResponse& response"

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:271
> +    bool maxAgeIsValid = false;

In general, we prefer to define values as close to the point they are used as possible.  Several of these could be located further down in the method implementation.

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:338
> +

Extra space should be removed.

> Source/WebCore/platform/network/curl/CurlCacheEntry.h:42
> +private:

We put private stuff at the end of our header files, not the top.

> Source/WebCore/platform/network/curl/CurlCacheEntry.h:54
> +    bool loadFileToBuffer(String filepath, Vector<char> &buffer);

See my comments about using const references and proper syntax (e.g., use "Vector<char>& buffer", etc.)

> Source/WebCore/platform/network/curl/CurlCacheEntry.h:58
> +    CurlCacheEntry(String url, String cacheDir);

Ditto const references.

> Source/WebCore/platform/network/curl/CurlCacheEntry.h:62
> +    bool isInMemory() { return m_headerInMemory; }

This should be const.

> Source/WebCore/platform/network/curl/CurlCacheEntry.h:65
> +    bool saveCachedData(const char* data, int size);

It seems like size should be a "size_t" type.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:54
> +void CurlCacheManager::setCacheDirectory(String directory)

Pass by const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:90
> +    long long filesize = -1;

You might be better off using one of the C99 size types, such as "int64_t" to represent a long-long.  What should this be for a 64-bit build, for example? Should it be a 128-bit value?

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:100
> +    int bufferReadSize = 4096;

The "4096" magic number should be declared as a constant somewhere and given a meaningful name.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:158
> +void CurlCacheManager::didReceiveResponse(ResourceHandle *job, ResourceResponse& response)

We write this as "ResourceHandle* job".

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:167
> +

Remove this blank line.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:182
> +

Remove this blank line.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:185
> +void CurlCacheManager::didFinishLoading(String url)

Pass by const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:195
> +bool CurlCacheManager::isCached(String url)

Pass by const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:210
> +HTTPHeaderMap* CurlCacheManager::requestHeaders(String url)

Pass by const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:222
> +void CurlCacheManager::didReceiveData(String url, const char *data, int size)

Pass by const reference.

We write this as "const char* data"

"size" should probably be a size_t, unless you are using -1 as a sentinel value.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:233
> +void CurlCacheManager::saveResponseHeaders(String url, ResourceResponse& response)

Pass url as const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:244
> +void CurlCacheManager::invalidateCacheEntry(String url)

Pass url as const reference.  Why copy a few hundred bytes if you don't need to?

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:256
> +void CurlCacheManager::didFail(String url)

Pass by const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:261
> +void CurlCacheManager::loadCachedData(String url, ResourceHandle *job, ResourceResponse& response)

Pass by const reference.

We write this as "ResourceHandle* job"

> Source/WebCore/platform/network/curl/CurlCacheManager.h:40
> +private:

Private stuff goes at the bottom of the file.

> Source/WebCore/platform/network/curl/CurlCacheManager.h:43
> +    CurlCacheManager(CurlCacheManager const &);

We write this as "CurlCacheManager const&"

> Source/WebCore/platform/network/curl/CurlCacheManager.h:44
> +    void operator=(CurlCacheManager const &);

Ditto.

> Source/WebCore/platform/network/curl/CurlCacheManager.h:55
> +    void loadCachedData(String, ResourceHandle*, ResourceResponse&);

The above should be passing by const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.h:58
> +    static CurlCacheManager & getInstance()

This should be written as "CurlCacheManager& getInstance".

Move this implementation into the cpp file. You don't want this being emitted into every compilation unit, do you?

> Source/WebCore/platform/network/curl/CurlCacheManager.h:64
> +    String getCacheDirectory() { return m_cacheDir; }

Return as const reference. Make the method const.

> Source/WebCore/platform/network/curl/CurlCacheManager.h:65
> +    void setCacheDirectory(String);

Pass a const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.h:68
> +    HTTPHeaderMap* requestHeaders(String); // load headers

Pass the above two Strings as const references.

> Source/WebCore/platform/network/curl/CurlCacheManager.h:71
> +    void didReceiveData(String, const char*, int); // save data

Consider making this 'int' type a size_t.

Pass String by const reference.

> Source/WebCore/platform/network/curl/CurlCacheManager.h:73
> +    void didFinishLoading(String);

Pass the above strings as const references.

> Source/WebCore/platform/network/curl/ResourceHandleManager.cpp:953
> +            HTTPHeaderMap* requestHeaders = CurlCacheManager::getInstance().requestHeaders(url);

It seems like requestHeaders should be returned by reference.
Comment 6 Mátyás Mustoha 2013-11-08 05:34:09 PST
Created attachment 216378 [details]
Proposed patch 2

Thanks for the detailed review! I fixed the issues you've mentioned.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:90
> +    long long filesize = -1;

I've used "long long" because getFileSize() in FileSystem.h expects a long long variable as parameter:
bool getFileSize(const String&, long long& result);
Comment 7 Brent Fulgham 2013-11-08 09:51:25 PST
Comment on attachment 216378 [details]
Proposed patch 2

View in context: https://bugs.webkit.org/attachment.cgi?id=216378&action=review

This is looking really good!  I had a couple of more minor things, then I think we're ready to land this.  Thanks for doing this work!

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:105
> +bool CurlCacheEntry::loadCachedData(ResourceHandle *job)

ResourceHandle* please.

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:165
> +void CurlCacheEntry::setResponseFromCachedHeaders(ResourceResponse &response)

ResourceResponse& please.

> Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:326
> +

DOn't need this line.

> Source/WebCore/platform/network/curl/CurlCacheManager.cpp:158
> +        CString urlLatin1 = it->key.latin1();

This is making a copy.  I think you could use a const reference to avoid that.
Comment 8 Mátyás Mustoha 2013-11-09 03:54:52 PST
Created attachment 216491 [details]
Proposed patch 3
Comment 9 Brent Fulgham 2013-11-18 10:28:25 PST
Comment on attachment 216491 [details]
Proposed patch 3

r=me
Comment 10 WebKit Commit Bot 2013-11-18 10:32:54 PST
Comment on attachment 216491 [details]
Proposed patch 3

Clearing flags on attachment: 216491

Committed r159432: <http://trac.webkit.org/changeset/159432>
Comment 11 WebKit Commit Bot 2013-11-18 10:32:56 PST
All reviewed patches have been landed.  Closing bug.