Bug 14730

Summary: [curl] Cookie handling
Product: WebKit Reporter: Holger Freyther <zecke>
Component: New BugsAssignee: Julien Chaffraix <jchaffraix>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: ahya365, alp, Basuke.Suzuki, christian, davidsz, galpeter, jchaffraix, kevino, markybob, mrowe, snehalkedar, tonikitoo, uws+webkit, xfdbse
Priority: P2 Keywords: Curl
Version: 523.x (Safari 3)   
Hardware: Mac   
OS: OS X 10.4   
Bug Depends on:    
Bug Blocks: 14725    
Attachments:
Description Flags
First draft (cookie handling)
none
Updated patch (corrected most coding style issues)
mrowe: review-
Rewritten patch addressing Marc's comments
none
First part : implement CookieManager stubs & network bindings
zecke: review-
First patch rewritten: should address Zecke's comments
kevino: review-
My "working progress" implementation none

Description Holger Freyther 2007-07-23 11:58:23 PDT
How do we want to handle cookies with Curl? Should we use the cookie jar functionality of curl. In this case we have no control over individual cookies and can not save the jar.
We could do the cookie parsing inside WebCore allowing us to have tight control on which cookies to accept and which to reject (e.g. having signals in our  equivalent to WebFrameLoadDelegate).
Comment 1 Holger Freyther 2007-07-23 13:14:37 PDT
Add the Gtk keyword, even to the curl bugs.
Comment 2 Holger Freyther 2007-07-23 13:18:37 PDT
Add Curl as keyword.
Comment 3 Holger Freyther 2007-09-23 07:53:40 PDT
The OWB/Sandlabs WebKit fork is still using CURL but have made some improvements. In the fork Cookies are handled within WebCore. This makes sense as it gives full control of storing, loading and deleting cookies which would not be easily doable using the CURL APIs.
Comment 4 Julien Chaffraix 2007-11-27 12:48:36 PST
Created attachment 17560 [details]
First draft (cookie handling)

This patch imports the OWB cookie management both for handling, refusing and storing cookies.

The points that remain are :
- writing cookies to a file should occur only when the CookieManager is destroyed. My idea was to do that in the destructor of the ResourceHandleManager but it is never called (my workaround is to write it each time a cookie is added)
- I have left all the original comments and logging for discussion

Any comments appreciated.
Comment 5 Christian Dywan 2007-11-28 01:13:08 PST
(In reply to comment #4)
> This patch imports the OWB cookie management both for handling, refusing and
> storing cookies.
This looks nice. I'll test it a bit today.

> - writing cookies to a file should occur only when the CookieManager is
> destroyed. My idea was to do that in the destructor of the
> ResourceHandleManager but it is never called (my workaround is to write it each
> time a cookie is added)
I think saving changes instantly is a very good idea. However I would favor an sqlite database over a textfile. This would be more flexible and allow multiple webkit clients to access cookies at the same time.

> This fonction tries to analyse the string to obtain the proper format.
Let me guess... you are French, aren't you? ;)
Comment 6 Julien Chaffraix 2007-11-28 08:31:34 PST
> This looks nice. I'll test it a bit today.

Sure. I have tested it and it seems to work well.

> I think saving changes instantly is a very good idea. However I would favor an
> sqlite database over a textfile. This would be more flexible and allow multiple
> webkit clients to access cookies at the same time.

It seems like a good idea, I will dig it. The textfile method is the one used by OWB that's why I imported it as a starting point.

>> This fonction tries to analyse the string to obtain the proper format.
> Let me guess... you are French, aren't you? ;)

The OWB team is French ;) (and I happen to be too)
Comment 7 Alp Toker 2007-11-29 16:44:38 PST
Hey, sorry I haven't had time to look at this properly yet.

I think this is a great addition since there is no http stack or build in cookie support in GTK+ or GNOME.

There are some minor coding style issues to do with whitespace mostly I think.

It's good to use plain text files for interoperability here. It's quite common for applications to go and inspect the cookies a browser stores nowadays by just opening those files since there isn't a better solution.
Comment 8 Christian Dywan 2007-11-29 18:20:20 PST
(In reply to comment #7)
> It's good to use plain text files for interoperability here. It's quite common
> for applications to go and inspect the cookies a browser stores nowadays by
> just opening those files since there isn't a better solution.

I can't quite agree here. SQL is very well known and interoperable, it is already used in WebKit. Other browsers are also using or switching to sqlite. Working around all the drawbacks of text files is painful and I don't see any benefit of that.
Comment 9 Julien Chaffraix 2007-12-02 19:02:40 PST
Created attachment 17660 [details]
Updated patch (corrected most coding style issues)

> There are some minor coding style issues to do with whitespace mostly I think.

I did not paid enough attention to coding style issues. It is now corrected for most of them I hope.

For the SQL versus files issue, the current implementation works with files so we can start with that and see what others think about this issue. FWIW I would tend to agree with Alp on that point as other applications could use our cookie implementation to use the cookie file.
Comment 10 Mark Rowe (bdash) 2007-12-03 03:56:48 PST
Comment on attachment 17660 [details]
Updated patch (corrected most coding style issues)

I don't see any value in using a custom plain text file format.  It adds nothing from an interoperability point of view as any applications wishing to parse it would need to write new code to do so.  Rather than doing this I think using SQLite would be preferable.  Firefox 3, currently in beta, uses SQLite for its cookie store so it could be desirable to match its schema.  I can't think of any good reason to use a plain text file over SQLite when the major browser with which interoperability is desirable has itself switched from plain text files.

There are several other issues with this implementation.  The largest is that ResourceHandleManager is a singleton that is never destroyed.  This means that cookies will never be saved to disk.  Tying the persisting of cookies to disk to the lifetime of ResourceHandleManager seems fundamentally flawed.  What if I load a bunch of cookie-using pages and then the application crashes?  Another issue worth considering is multiple instances of the application.  If I have two instances of the application running, the second to quit will overwrite any cookies stored by the first.  This seems bad.

There are many places within the patch that lack adequate error-handling.  This is especially bad when dealing with data from untrusted sources, eg. the network or file system.

Finally, there are many coding style issues.  String arguments should be passed as "const String&" rather than just "String" to prevent their contents from being copied.  Variables should be declared at their point of first use unless needed, rather than in blocks at the start of a function.

There also appears to be a logic error in getDaysTillFirstOfMonth.  Quoting Wikipedia:
""The Gregorian calendar, the current standard calendar in most of the world, adds a 29th day to February in all years evenly divisible by 4, except for centennial years (those ending in -00) which are not evenly divisible by 400. Thus 1600, 2000 and 2400 are leap years but 1700, 1800, 1900, 2100, 2200 and 2300 are not.""

It's really easy to get these sort of date calculations wrong, so it would definitely be better to reuse existing libraries rather than writing it anew.  We possibly have code that handles this in JavaScriptCore for our date support there, otherwise I'm sure ICU and other platform libraries have this knowledge.
Comment 11 Julien Chaffraix 2008-02-29 16:43:13 PST
Created attachment 19465 [details]
Rewritten patch addressing Marc's comments

It took me some time but I have rewritten most of the first patch. It had too many issues to be really usable : Marc pointed some but there were others.

The new one should address all the comments : 
- for the date issue, I have used curl_getdate as ICU data parsing was way too simple for processing cookies' date.
- the cookies are saved in a sqlite table
- they are saved as soon as they are processed (instead on waiting for the destructor)
- the methods take "const String&" to prevent content's copy

I have added some features :
- we have a real parser that is RFC2109 compliant (with some Mozilla hacks)
- LRU replacement policy for cookies (as required per RFC2109)

The patch is quite huge and if necessary, I should be able to cut it to several pieces to ease review.
Comment 12 Holger Freyther 2008-03-04 19:59:50 PST
(In reply to comment #11)
> Created an attachment (id=19465) [edit]
Some random thoughts after a quick look:

- CookieManager::getCookieManager() looks like it is never returning 0 (unless you get into a OOM situation). So all behavior which is checking the return value is probably wrong, cookiesEnabled is coming to my mind.
- The c'tor of CookieManager is looking wrong, if you set such a thing in the c'tor, undo it in the d'tor? But then again you force it to be a singleton why care?
- getCookie(String,KURL), do you only return one cookie? => getCookies
- You have looked at firefox's SQL schema? Why do you decide to be incompatible with it? If we don't use isHttpOnly, why break their schema, there is nothing wrong with an 'empty' column for the sake of compability
- setCookieJarFile(const char*), do not use char* in WebCore, make the higher levels parse and give you a WebCore::String, you use the latin1() c'tor of WebCore::String which will break for utf-8 filenames..
- the indention of Cookie.h is wrong. the whole class needs to be indented inside the namespace.
- What is preventing you to pass the Cookie by reference/value?
- Same goes for the CookieMap in the CookieManager
- Do you have two different upper limits for Cookies?
- If you say you are RFC compliant, how did you test it?


Comment 13 Julien Chaffraix 2008-03-05 06:46:35 PST
(In reply to comment #12)
> (In reply to comment #11)
> > Created an attachment (id=19465) [edit]
> Some random thoughts after a quick look:
> 

Thanks for taking the time to look at it !

> - CookieManager::getCookieManager() looks like it is never returning 0 (unless
> you get into a OOM situation). So all behavior which is checking the return
> value is probably wrong, cookiesEnabled is coming to my mind.

Granted even thought WebCore::cookieEnabled is only a check for the CookieManager value (null or not).

> - The c'tor of CookieManager is looking wrong, if you set such a thing in the
> c'tor, undo it in the d'tor? But then again you force it to be a singleton why
> care?

What I do in the constructor is :
- setting the default file in which save the CookieJar
- creating the Cookie representation of the cookies saved in the database (as it is only called once, it should be ok)
- saving the shared instance of CookieManager in case it is called later

Only the third point could be arguable IMHO as it duplicates what is done in CookieManager::getCookieManager()

Concerning the destructor, I think you are right : if we are to provide a destructor, I should at least nullify the shared instance. I will take care of that.

> - getCookie(String,KURL), do you only return one cookie? => getCookies

Good catch.

> - You have looked at firefox's SQL schema? Why do you decide to be incompatible
> with it? If we don't use isHttpOnly, why break their schema, there is nothing
> wrong with an 'empty' column for the sake of compability

Yes, apart from the table name (which is mozilla_cookies in FF), the column names are the same.
I did not think about breaking their schema when I removed isHttpOnly so it seems fair to add it again.

> - setCookieJarFile(const char*), do not use char* in WebCore, make the higher
> levels parse and give you a WebCore::String, you use the latin1() c'tor of
> WebCore::String which will break for utf-8 filenames..

I will correct that in the next patch.

> - the indention of Cookie.h is wrong. the whole class needs to be indented
> inside the namespace.

Ditto.

> - What is preventing you to pass the Cookie by reference/value?
> - Same goes for the CookieMap in the CookieManager

Nothing as far as I know. I have started working with pointers when I rewrote the implementation and kept with that convention. I had a look at WebCore and we use both so I thought it would be ok. Should I use reference/value in the next iteration ?

> - Do you have two different upper limits for Cookies?

Yes, a limit per site (enforced in CookieMap) and a global limit (in CookieManager).

> - If you say you are RFC compliant, how did you test it?

I am loosely compliant as I have added some firefox extension which are not standard and merely done for compatibility with some sites.
I have not found any cookie test suite so I did some tests on real websites. It is a bit weak to prove compliance so it should be a good addition to include test cases (I have done some but I could produce a complete test suite).

I intend to break the next patch to ease review so I could add some test cases on this bug.
Comment 14 Julien Chaffraix 2008-03-05 06:50:34 PST
Comment on attachment 19465 [details]
Rewritten patch addressing Marc's comments

Clearing review flag (issues to address + patch to split)
Comment 15 Julien Chaffraix 2008-03-16 18:01:30 PDT
> > - What is preventing you to pass the Cookie by reference/value?
> > - Same goes for the CookieMap in the CookieManager
> 
> Nothing as far as I know. I have started working with pointers when I rewrote
> the implementation and kept with that convention. I had a look at WebCore and
> we use both so I thought it would be ok. Should I use reference/value in the
> next iteration ?
> 

It seems that I was a bit too enthusiastic when I said there was nothing to prevent me from switching: I am using some methods that can return 0 (HashMap::take for example). I also have to use a pointer at several places (to know whether the value has been initialized or not). I could find a way around those issues but I think it would hurt readability in the end.
Comment 16 Julien Chaffraix 2008-03-16 18:06:52 PDT
Created attachment 19814 [details]
First part : implement CookieManager stubs & network bindings
Comment 17 Mark Rowe (bdash) 2008-04-27 20:29:31 PDT
*** Bug 18777 has been marked as a duplicate of this bug. ***
Comment 18 Eric Seidel (no email) 2008-08-01 12:13:09 PDT
Would mark or Holger be willing to review the rest of this?
Comment 19 Holger Freyther 2008-08-27 13:12:20 PDT
Comment on attachment 19814 [details]
First part : implement CookieManager stubs & network bindings

Sorry I think we need another iteration.


-    return cookieJar.get(url.string());
+    if (cookiesEnabled(document))
+        return CookieManager::getCookieManager()->getCookies(url);
+    else
+        return String();

Could we have an early return here?

-    return true;
+    CookieManager* manager = CookieManager::getCookieManager();
+    if (manager)
+        return true;
+    else
+        return false;

return manager; :)



+    // Cookie size should not be above 81920 (per construction and also because we
+    // do not want to  cookie).
+    ASSERT(cookiePairs.length() <= 81920);

make that a runtime option? Why should big cookies make webkit assert?


+    strncpy(cookieChar, cookiePairs.ascii().data(),

what ascii is that? Is that available when build in release mode? I think it is not.


+        ~CookieManager();

can we make that private. With the current d'tor implementation I would feel more happy with that.
Comment 20 marcos pinto 2008-08-28 01:44:46 PDT
I know this isnt a particularly helpful comment, but i really cant wait for this to be implemented.  It's the only thing keeping me from using WebKit instead of gecko.  Cheers to those working on it and I hope it's taken care of in not too long!  Thanks!
Comment 21 snehalkedar 2008-09-11 02:24:45 PDT
if (cookie->domain()[0] != '.' || cookie->domain().find(".", 1) == -1 || cookie->domain().find(".", 1) == cookie->domain().length())
+        return true;
+
+    // The request host should domain match the Domain attribute.
+    int diffPos = url.host().find(cookie->domain());
+    if (diffPos == -1)
+        return true;

can you explain
Comment 22 Julien Chaffraix 2008-09-16 11:22:36 PDT
(In reply to comment #21)
> if (cookie->domain()[0] != '.' || cookie->domain().find(".", 1) == -1 ||
> cookie->domain().find(".", 1) == cookie->domain().length())
> +        return true;
> +
> +    // The request host should domain match the Domain attribute.
> +    int diffPos = url.host().find(cookie->domain());
> +    if (diffPos == -1)
> +        return true;
> 
> can you explain
> 
Yes. A cookie is set for the Domain's subnetwork (see RFC2109 - 4.3.4: Domain selection). The check is in fact wrong: it should be endsWith(). Is it what you were pointing out?
Comment 23 Julien Chaffraix 2008-09-17 09:22:02 PDT
Created attachment 23500 [details]
First patch rewritten: should address Zecke's comments

I have finally taken the time to update this patch. Sorry for the really long delay.
Comment 24 marcos pinto 2008-09-18 15:32:07 PDT
i just finished building webkit with the new (09/17) patch...doesn't seem to be working here, though.  it didnt save my google account login and i had to repeatedly reenter the information when the browser is restarted.  is there anything that i need to be specifying in configure?  
Comment 25 Julien Chaffraix 2008-09-22 09:30:55 PDT
(In reply to comment #24)
> i just finished building webkit with the new (09/17) patch...doesn't seem to be
> working here, though.  it didnt save my google account login and i had to
> repeatedly reenter the information when the browser is restarted.  is there
> anything that i need to be specifying in configure?  

This is the first part of the whole rewrite and it does not contain the ability to save cookies. As it is a long way from being finished, I would advise you to use curl's cookie jar support. It is currently not enabled but setting m_cookieJarFileName in ResourceHandleManager to an existing file should enable it.
Comment 26 monil 2008-12-23 21:48:00 PST
(In reply to comment #25)
> (In reply to comment #24)
> > i just finished building webkit with the new (09/17) patch...doesn't seem to be
> > working here, though.  it didnt save my google account login and i had to
> > repeatedly reenter the information when the browser is restarted.  is there
> > anything that i need to be specifying in configure?  
> 
> This is the first part of the whole rewrite and it does not contain the ability
> to save cookies. As it is a long way from being finished, I would advise you to
> use curl's cookie jar support. It is currently not enabled but setting
> m_cookieJarFileName in ResourceHandleManager to an existing file should enable
> it.
> 

Hi, 
 Iam working on a browser using gtk/webkit. i want to implement cookies functionality in that.
is new patch for cookie implementation in webkit available?
or do we need to use curl's cookie jar support.
Any suggestions?
Comment 27 Gustavo Noronha (kov) 2009-02-27 20:04:15 PST
Removing the Gtk keyword, since GTK+ doesn't use curl anymore.
Comment 28 Eric Seidel (no email) 2009-05-22 19:54:13 PDT
If gtk no longer uses curl do we still need this patch? I guess the cairo win port uses curl?  (If no one does, seems we should pull the curl code from the repo.)
Comment 29 Eric Seidel (no email) 2009-05-22 19:58:00 PDT
Zecke no longer seems like the right reviewer since this patch no longer affects gtk.  Kevin Oliver is the only reviewer for a port which still uses curl (the wx port).
Comment 30 Kevin Ollivier 2009-05-23 09:01:11 PDT
Overall it looks good, but r- because we are leaking the CookieManager. If you're going to have a global object which is not owned by another WebCore or WebKit object, I'd suggest making the CookieManager a RefCounted class so that when the static gets destroyed it will also cause the CookieManager to be destroyed. 

It'd be nice, also, if some basic persistent storage mechanism were present in this patch for testing purposes, though I know you split things up to improve your changes of getting the patch landed. Either way, though, I'd suggest removing the COOKIEFILE option from the next patch, because it's just a stopgap and will be removed in the final implementation. It's confusing to see two different approaches to managing cookies in the same patch.
Comment 31 Szabolcs David 2013-07-03 01:07:25 PDT
Created attachment 205975 [details]
My "working progress" implementation

This is my experimental implementation, a little different approach for testing purposes. It works fine, but it still has some deficiencies, for example: top level domain checking, RFC incompatibility around the domain attribute.
I am working on these weaknesses and I would like to optimize the whole cookie storage method.
Comment 32 Basuke Suzuki 2018-11-14 11:00:36 PST
Curl cookie-backend database implementation was already landed.

*** This bug has been marked as a duplicate of bug 174942 ***