Bug 21239 - [CURL] Add on-disk file cache
Summary: [CURL] Add on-disk file cache
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Marco Barisione
URL:
Keywords: Curl
Depends on:
Blocks:
 
Reported: 2008-09-30 03:59 PDT by Marco Barisione
Modified: 2010-08-19 08:44 PDT (History)
8 users (show)

See Also:


Attachments
Implement on-disk cache (32.61 KB, patch)
2009-01-13 11:16 PST, Marco Barisione
no flags Details | Formatted Diff | Diff
Updated patch v0.1 (28.57 KB, patch)
2009-01-15 23:33 PST, Alexander Butenko
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marco Barisione 2008-09-30 03:59:39 PDT
At the moment the GTK port doesn't support on-disk cache.
I'm implementing it in a generic way that can be easily used by both the CURL and soup backend.
Comment 1 Marco Barisione 2009-01-13 11:16:53 PST
Created attachment 26675 [details]
Implement on-disk cache

This is an old patch that I didn't have time to finish and propose for review.

In the patch I paid attention to make it possible to have more processes accessing the cache at the same time without race, corruptions, etc.

From a quick look the obvious problems are:
- Too many .utf8().data() calls, they were only meant as a temporary workaround as the patch to do a proper conversion was not merged yet.
- Tons of g_print for debugging.
- The cache is saved in the current directory and not in a proper location for testing reasons
- The cache should be nester, the cache entry with hash aabbccdd.. should be saved in aa/bbccdd...
- Not completely portable to windows, I have some ideas on how to do that.

ATM there's no way to know the total size of the cache or to remove files but I had some ideas on how to implement it, so ping me on IRC if you want to discuss it.
Comment 2 Alexander Butenko 2009-01-15 23:16:34 PST
ok. here is an update of Marco patch. There is some issues which i was to discuss here. 

- Too many .utf8().data() calls, they were only meant as a temporary workaround
as the patch to do a proper conversion was not merged yet.
 
i switched most of the String vars into char*. 

- Tons of g_print for debugging.

Thanks, debug is very nice. Patch still have them. What should we do with it? Add ifdef(DEBUG)? 
 
- The cache is saved in the current directory and not in a proper location for
testing reasons

fixed. Current location is $HOME/.cache/webkit/. Do we need any api to redefine this location from applications? From one point, unified cache directory should be a nice way to handle it. Once user have few webkit based applications, he will be able to use all cache from all his applications. From other point its a lack of configuration for advanced. users. 

- The cache should be nester, the cache entry with hash aabbccdd.. should be
saved in aa/bbccdd...

fixed. 

- Not completely portable to windows, I have some ideas on how to do that.

Sorry, no idea here. I have no idea about development for windows plus i dont have any for test. 


- CURL support is broken in mine patch. What should we do here. Marco and I is using GIO way of sending data to client. Is this way will be portable to all OS that webkit supports? What can be done, is that i can move GIO stuff out from soup file into separate file and reuse it for curl. 
Comment 3 Alexander Butenko 2009-01-15 23:19:34 PST
- Cookies have an 'Expire' field. should we honor it and send non expired files from cache without confirmation from the server that it not out outdated?  
Comment 4 Alexander Butenko 2009-01-15 23:33:10 PST
Created attachment 26786 [details]
Updated patch v0.1

patch not for a commit yet, but will be nice if somebody will check it.
There is some things is missing, i will update it soon.
Comment 5 Christian Dywan 2009-02-14 08:25:45 PST
(In reply to comment #4)
> Created an attachment (id=26786) [review]
> Updated patch v0.1
> 
> patch not for a commit yet, but will be nice if somebody will check it.
> There is some things is missing, i will update it soon.

It was mentioned in discussions on IRC that cache as a libSoup session feature makes much more sense than a WebKit internal implementation. We are only going to support libSoup in the future, and being part of the network interface makes it usable outside of WebKit.
Comment 6 Gustavo Noronha (kov) 2009-02-26 15:34:54 PST
Leaving only Curl as keyword, as this may still be useful for Curl; see Christian Dywan's comments regarding GTK+/Soup.
Comment 7 Gustavo Noronha (kov) 2009-03-05 18:14:36 PST
Moving away from WebKitGTK
Comment 8 Denis Cheremisov 2009-04-03 11:36:18 PDT
So, what the situation with this issue now? I'm using latest webkit, and it seems both Epiphany-Webkit and Midori use disk too intensive (compared to Opera, Firefox, Chromium, etc)
Comment 9 Christian Dywan 2009-04-03 14:21:09 PDT
(In reply to comment #8)
> So, what the situation with this issue now? I'm using latest webkit, and it
> seems both Epiphany-Webkit and Midori use disk too intensive (compared to
> Opera, Firefox, Chromium, etc)

Xan is working on a SoupSessionFeature for this, ie. a cache that is implemented in libSoup and can be used by all WebKit applications.
Comment 10 Benjamin Meyer 2009-08-14 07:38:34 PDT
The location of the cache should be configurable and should not point to $HOME/.cache/webkit/ as this is not webkit http cache, but curl/soup cache and a specific implementation at that.  Not to mention that it is really gtkwebkit and not webkit.  As it is the cache patch is not shared with chromium or arora (both on linux and both using webkit, not to say we couldn't improve our cache's to all work together)  If in a year someone comes along and comes up with a better way to store the curl/soup cache they might not want to have both cache's in the same directory. 

On Linux $HOME/.cache should not be used, but the location should be determined from the environment variable $XDG_CACHE_HOME falling back to $HOME/.cache when it is empty.  See http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html  One OS X and Windows of course this is also different and not $HOME/.cache.

Lastly although I do not see it in the xdg spec looking at my .cache I see that the way that the .cache directory is being used is $XDG_CACHE_HOME/$COMPANY_NAME/$APPLICATION_NAME/ for example $HOME/.cache/midori/ or $HOME/.cache/Trolltech/demobrowser  Different applications have different needs for cache.  One application might hit the same few website(s) for years such as an rss reader.  For that application it is very valuable that the cache not be deleted, but a web browser cache usage is different and could completely change every hour.
Comment 11 Christian Dywan 2009-08-14 12:16:16 PDT
To make this clear: WebKitGTK+ is not using CURL anymore and hence this feature request is obsolete as far as WebKitGTK+ is concerned.
Comment 12 Sergio Villar Senin 2010-08-19 07:20:39 PDT
I guess it's better to close this now that we have https://bugs.webkit.org/show_bug.cgi?id=44261
Comment 13 Benjamin Meyer 2010-08-19 08:25:55 PDT
Shouldn't you leave the old bug with lots of useful information open and close the new duplicate bug that contains one link and move that link here?
Comment 14 Sergio Villar Senin 2010-08-19 08:44:00 PDT
(In reply to comment #13)
> Shouldn't you leave the old bug with lots of useful information open and close the new duplicate bug that contains one link and move that link here?

I don't think so. Mainly because although they both target the same feature they are based on very different implementations. The new one is just for discussing that other implementation.

As kov said curl cache won't be integrated in webkitgtk+ I thought it could be useful for other people to know about new plans. Anyway this can be left open, not strong opinion about it.