Bug 123333

Summary: [curl] Add file cache
Product: WebKit Reporter: Mátyás Mustoha <mmatyas>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, chenzx, commit-queue, davidsz, galpeter, sipka, zsborbely.u-szeged
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 117300    
Attachments:
Description Flags
work in progress patch
none
work in progress patch 2
none
Proposed patch
bfulgham: review-
Proposed patch 2
bfulgham: review-
Proposed patch 3 none

Mátyás Mustoha
Reported 2013-10-25 05:27:13 PDT
This is a work-in-progress patch for implementing an on-disc file cache for the curl network backend.
Attachments
work in progress patch (32.98 KB, patch)
2013-10-25 05:30 PDT, Mátyás Mustoha
no flags
work in progress patch 2 (38.82 KB, patch)
2013-11-05 08:44 PST, Mátyás Mustoha
no flags
Proposed patch (36.68 KB, patch)
2013-11-06 04:47 PST, Mátyás Mustoha
bfulgham: review-
Proposed patch 2 (36.70 KB, patch)
2013-11-08 05:34 PST, Mátyás Mustoha
bfulgham: review-
Proposed patch 3 (36.70 KB, patch)
2013-11-09 03:54 PST, Mátyás Mustoha
no flags
Mátyás Mustoha
Comment 1 2013-10-25 05:30:15 PDT
Created attachment 215165 [details] work in progress patch
Mátyás Mustoha
Comment 2 2013-11-05 08:44:48 PST
Created attachment 216042 [details] work in progress patch 2 Updated patch
Chen Zhixiang
Comment 3 2013-11-05 20:05:37 PST
Hi, I've implemented a http discache module for WinCE port's default WinInet backend, and later migrated to cURL. The impl. is based on Qt's source code, but later trimmed a lot. The cache file format mainly contains 3 parts: a fixed-size header struct; a http response header, but can be skipped by header-length field of previous struct; the response body. And it also uses Qt's tricks: such as storing cache file first into a temple file, and later renamed to a url-hashed file name. This may maximize the `ACID` property of low-level file operations. Hope these information may help you!
Mátyás Mustoha
Comment 4 2013-11-06 04:47:14 PST
Created attachment 216164 [details] Proposed patch
Brent Fulgham
Comment 5 2013-11-06 09:42:12 PST
Comment on attachment 216164 [details] Proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=216164&action=review A very nice start! There seems to be a bit of excess copying going on, as well as a few coding standard violations in this patch. Please revise and resubmit so we can push it forward. > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:47 > +CurlCacheEntry::CurlCacheEntry(String url, String cacheDir) These should be passed by const ref to avoid copying. > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:87 > + return &m_requestHeaders; If the m_requestHeaders member always exists, why do we return as a pointer? Why not return it by reference? > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:90 > +bool CurlCacheEntry::saveCachedData(const char *data, int size) We write this as "const char* data" It seems like size should be a "size_t" type. > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:92 > + PlatformFileHandle contentFile = openFile(m_contentFilename, OpenForWrite); It's too bad we don't have some kind of smart pointer for file operations (to ensure it is closed for any return state). > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:105 > +bool CurlCacheEntry::loadCachedData(ResourceHandle *job) We write this as "ResourceHandle* job" > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:117 > +bool CurlCacheEntry::saveResponseHeaders(ResourceResponse &response) We write this as "ResourceResponse& response". > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:142 > +// load the headers from cache file into memory This comment isn't really needed. > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:149 > + // create strings from buffer This comment doesn't really add anything. > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:167 > +void CurlCacheEntry::setResponseFromCachedHeaders(ResourceResponse &response) We write this as "ResourceResponse& response". > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:173 > + HTTPHeaderMap cachedResponseHeaders = m_cachedResponse.httpHeaderFields(); This is making a copy; is that intentional? > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:207 > +void CurlCacheEntry::generateBaseFilename(CString url) Cstring might be fairly big; shouldn't this be a const reference? > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:220 > +bool CurlCacheEntry::loadFileToBuffer(String filepath, Vector<char> &buffer) Strings can be big; should probably pass by const reference. We write this as "Vector<char>& buffer" > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:265 > +bool CurlCacheEntry::parseResponseHeaders(ResourceResponse &response) We write this as "ResourceResponse& response" > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:271 > + bool maxAgeIsValid = false; In general, we prefer to define values as close to the point they are used as possible. Several of these could be located further down in the method implementation. > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:338 > + Extra space should be removed. > Source/WebCore/platform/network/curl/CurlCacheEntry.h:42 > +private: We put private stuff at the end of our header files, not the top. > Source/WebCore/platform/network/curl/CurlCacheEntry.h:54 > + bool loadFileToBuffer(String filepath, Vector<char> &buffer); See my comments about using const references and proper syntax (e.g., use "Vector<char>& buffer", etc.) > Source/WebCore/platform/network/curl/CurlCacheEntry.h:58 > + CurlCacheEntry(String url, String cacheDir); Ditto const references. > Source/WebCore/platform/network/curl/CurlCacheEntry.h:62 > + bool isInMemory() { return m_headerInMemory; } This should be const. > Source/WebCore/platform/network/curl/CurlCacheEntry.h:65 > + bool saveCachedData(const char* data, int size); It seems like size should be a "size_t" type. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:54 > +void CurlCacheManager::setCacheDirectory(String directory) Pass by const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:90 > + long long filesize = -1; You might be better off using one of the C99 size types, such as "int64_t" to represent a long-long. What should this be for a 64-bit build, for example? Should it be a 128-bit value? > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:100 > + int bufferReadSize = 4096; The "4096" magic number should be declared as a constant somewhere and given a meaningful name. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:158 > +void CurlCacheManager::didReceiveResponse(ResourceHandle *job, ResourceResponse& response) We write this as "ResourceHandle* job". > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:167 > + Remove this blank line. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:182 > + Remove this blank line. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:185 > +void CurlCacheManager::didFinishLoading(String url) Pass by const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:195 > +bool CurlCacheManager::isCached(String url) Pass by const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:210 > +HTTPHeaderMap* CurlCacheManager::requestHeaders(String url) Pass by const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:222 > +void CurlCacheManager::didReceiveData(String url, const char *data, int size) Pass by const reference. We write this as "const char* data" "size" should probably be a size_t, unless you are using -1 as a sentinel value. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:233 > +void CurlCacheManager::saveResponseHeaders(String url, ResourceResponse& response) Pass url as const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:244 > +void CurlCacheManager::invalidateCacheEntry(String url) Pass url as const reference. Why copy a few hundred bytes if you don't need to? > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:256 > +void CurlCacheManager::didFail(String url) Pass by const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:261 > +void CurlCacheManager::loadCachedData(String url, ResourceHandle *job, ResourceResponse& response) Pass by const reference. We write this as "ResourceHandle* job" > Source/WebCore/platform/network/curl/CurlCacheManager.h:40 > +private: Private stuff goes at the bottom of the file. > Source/WebCore/platform/network/curl/CurlCacheManager.h:43 > + CurlCacheManager(CurlCacheManager const &); We write this as "CurlCacheManager const&" > Source/WebCore/platform/network/curl/CurlCacheManager.h:44 > + void operator=(CurlCacheManager const &); Ditto. > Source/WebCore/platform/network/curl/CurlCacheManager.h:55 > + void loadCachedData(String, ResourceHandle*, ResourceResponse&); The above should be passing by const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.h:58 > + static CurlCacheManager & getInstance() This should be written as "CurlCacheManager& getInstance". Move this implementation into the cpp file. You don't want this being emitted into every compilation unit, do you? > Source/WebCore/platform/network/curl/CurlCacheManager.h:64 > + String getCacheDirectory() { return m_cacheDir; } Return as const reference. Make the method const. > Source/WebCore/platform/network/curl/CurlCacheManager.h:65 > + void setCacheDirectory(String); Pass a const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.h:68 > + HTTPHeaderMap* requestHeaders(String); // load headers Pass the above two Strings as const references. > Source/WebCore/platform/network/curl/CurlCacheManager.h:71 > + void didReceiveData(String, const char*, int); // save data Consider making this 'int' type a size_t. Pass String by const reference. > Source/WebCore/platform/network/curl/CurlCacheManager.h:73 > + void didFinishLoading(String); Pass the above strings as const references. > Source/WebCore/platform/network/curl/ResourceHandleManager.cpp:953 > + HTTPHeaderMap* requestHeaders = CurlCacheManager::getInstance().requestHeaders(url); It seems like requestHeaders should be returned by reference.
Mátyás Mustoha
Comment 6 2013-11-08 05:34:09 PST
Created attachment 216378 [details] Proposed patch 2 Thanks for the detailed review! I fixed the issues you've mentioned. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:90 > + long long filesize = -1; I've used "long long" because getFileSize() in FileSystem.h expects a long long variable as parameter: bool getFileSize(const String&, long long& result);
Brent Fulgham
Comment 7 2013-11-08 09:51:25 PST
Comment on attachment 216378 [details] Proposed patch 2 View in context: https://bugs.webkit.org/attachment.cgi?id=216378&action=review This is looking really good! I had a couple of more minor things, then I think we're ready to land this. Thanks for doing this work! > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:105 > +bool CurlCacheEntry::loadCachedData(ResourceHandle *job) ResourceHandle* please. > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:165 > +void CurlCacheEntry::setResponseFromCachedHeaders(ResourceResponse &response) ResourceResponse& please. > Source/WebCore/platform/network/curl/CurlCacheEntry.cpp:326 > + DOn't need this line. > Source/WebCore/platform/network/curl/CurlCacheManager.cpp:158 > + CString urlLatin1 = it->key.latin1(); This is making a copy. I think you could use a const reference to avoid that.
Mátyás Mustoha
Comment 8 2013-11-09 03:54:52 PST
Created attachment 216491 [details] Proposed patch 3
Brent Fulgham
Comment 9 2013-11-18 10:28:25 PST
Comment on attachment 216491 [details] Proposed patch 3 r=me
WebKit Commit Bot
Comment 10 2013-11-18 10:32:54 PST
Comment on attachment 216491 [details] Proposed patch 3 Clearing flags on attachment: 216491 Committed r159432: <http://trac.webkit.org/changeset/159432>
WebKit Commit Bot
Comment 11 2013-11-18 10:32:56 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.