Bug 146589 - [GTK] Add API to WebKitWebsiteDataManager to handle website data
Summary: [GTK] Add API to WebKitWebsiteDataManager to handle website data
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: Gtk
: 147319 (view as bug list)
Depends on: 167281
Blocks: 147319
  Show dependency treegraph
 
Reported: 2015-07-03 06:16 PDT by Carlos Garcia Campos
Modified: 2017-01-23 23:50 PST (History)
5 users (show)

See Also:


Attachments
Patch (66.15 KB, patch)
2015-07-03 06:21 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Rebased patch (68.52 KB, patch)
2015-07-27 04:20 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch (68.54 KB, patch)
2015-07-28 03:37 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
New patch (85.57 KB, patch)
2017-01-21 02:16 PST, Carlos Garcia Campos
mcatanzaro: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Garcia Campos 2015-07-03 06:16:24 PDT
Allow to get origins with memory and disk cached resources, as well as delete caches for individual origins, or clearing the whole cache.
Comment 1 Carlos Garcia Campos 2015-07-03 06:21:54 PDT
Created attachment 256098 [details]
Patch

Includes unit tests and also implements about:data in MiniBrowser to easily play with Website data.
Comment 2 Gustavo Noronha (kov) 2015-07-16 06:41:11 PDT
Comment on attachment 256098 [details]
Patch

All of this seems to make sense to me. That said, do you have something in mind for using the per-origin APIs already?
Comment 3 Carlos Garcia Campos 2015-07-16 06:51:59 PDT
(In reply to comment #2)
> Comment on attachment 256098 [details]
> Patch
> 
> All of this seems to make sense to me. That said, do you have something in
> mind for using the per-origin APIs already?

The idea is to implement a better personal info dialog in ephy, so that you can remove website data only for a particular origin, the same way you currently can delete only the cookies of a particular domain.
Comment 4 Carlos Garcia Campos 2015-07-27 04:20:26 PDT
Created attachment 257556 [details]
Rebased patch
Comment 5 WebKit Commit Bot 2015-07-27 04:23:33 PDT
Attachment 257556 [details] did not pass style-queue:


ERROR: Tools/MiniBrowser/gtk/main.c:396:  Declaration has space between * and variable name in WebKitWebsiteDataManager* manager  [whitespace/declaration] [3]
ERROR: Tools/MiniBrowser/gtk/main.c:424:  Declaration has space between * and variable name in WebKitWebsiteDataManager* manager  [whitespace/declaration] [3]
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.h"
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.h"
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/webkit2.h"
Total errors found: 2 in 17 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Michael Catanzaro 2015-07-27 17:17:16 PDT
Comment on attachment 257556 [details]
Rebased patch

View in context: https://bugs.webkit.org/attachment.cgi?id=257556&action=review

Unofficial +1 on the API changes from me.

I think it would be helpful to add to the documentation of WebKitWebsiteDataManager a short description of the difference between the memory cache and the disk cache. (I presume the memory cache is the page cache, and the disk cache is the network cache?)

> Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:30
> + * @Short_description: A security boundary for Websites

I would rather say "websites" in lowercase.

> Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:35
> + * by Websites. An origin consists of a host name, a protocol, and a port

Same.

> Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:55
> +    CString host;

I don't understand the value of duplicating protocol and host here, when they are already stored in the WebCore::SecurityOrigin. It's also a bit confusing that these are initialized in getters.

> Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:63
> +    WebKitSecurityOrigin* origin = g_slice_new(WebKitSecurityOrigin);

I think we should stop using GSlice and just use plain new/delete instead. I base this off of a comment in the GNOME memory management programming guide: "we don’t want to encourage [use of g_slice_new]. GLib developers are moving towards deprecating it due to its performance slipping significantly behind the libc allocator’s on Linux in recent years." That is literally a comment, as in a commented-out section of the guide, so it doesn't show up on the actual website, but it's there to discourage people from adding guidance on GSlice: https://git.gnome.org/browse/gnome-devel-docs/tree/programming-guidelines/C/memory-management.page#n22

> Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:77
> +    ASSERT(origin);

When you would write an assertion like this, I think it's better to pass by reference instead. I think we've been trying to avoid pass by pointer except in C API and when the pointer is expected to be null.

> Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:166
> + * Returns: The host of the #WebKitSecurityOrigin.

This needs a (nullable) annotation, or (allow-none), since you decided to return nullptr if the string is empty.

Did you see https://blogs.gnome.org/desrt/2014/05/27/allow-none-is-dead-long-live-nullable/ and continue to use (allow-none) for compatibility with old gobject-introspection?

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:512
> +                origins = g_list_prepend(origins, webkitSecurityOriginCreate(origin.get()));

And the same origin won't ever be in multiple records? (Same question applies to the case of the memory cache.)

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:524
> + * @error: return location for error or %NULL to ignore

Shouldn't this be annotated (optional) or (allow-none)? Same for all the other error parameters in this file.

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:651
> + * %WEBKIT_PROCESS_MODEL_SHARED_SECONDARY_PROCESS process model, an empty list will

This is a comma splice; you can fix it by changing the comma to a semicolon, or by splitting it into two sentences if you don't like semicolons.

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.h:94
> +                                                                       GError                  ** error);

Extra space after the **

> Tools/ChangeLog:9
> +        MiniBrowsore to show and handle Website data.

Is MiniBrowsore a typo or a pun? I can't tell. :D

> Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestWebKitWebContext.cpp:736
> +    test->wait(1);

:(
Comment 7 Carlos Garcia Campos 2015-07-27 23:38:58 PDT
(In reply to comment #6)
> Comment on attachment 257556 [details]
> Rebased patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=257556&action=review
> 
> Unofficial +1 on the API changes from me.

Thanks for looking at it.

> I think it would be helpful to add to the documentation of
> WebKitWebsiteDataManager a short description of the difference between the
> memory cache and the disk cache. (I presume the memory cache is the page
> cache, and the disk cache is the network cache?)

Memory cache is not the page cache is the memory cache, used to store any cacheable resources in memory, and disk cache is the HTTP disk cache used by the networking process.

> > Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:30
> > + * @Short_description: A security boundary for Websites
> 
> I would rather say "websites" in lowercase.

Ok.

> > Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:35
> > + * by Websites. An origin consists of a host name, a protocol, and a port
> 
> Same.

Ok.

> > Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:55
> > +    CString host;
> 
> I don't understand the value of duplicating protocol and host here, when
> they are already stored in the WebCore::SecurityOrigin. It's also a bit
> confusing that these are initialized in getters.

This is a common thing we do in the API just to make it more convenient to use. The host and protocol are stored in WebCore::SecurityOrigin as String, but all our APIs return always string in utf8 (like all other GNOME platform APIs). If we return the converted string directly, the returned value should be freed by the user. So, we cache the converted strings to be able to return a const char *, and we do that only on demand, so if you never call get_host we don't have a cached value for nothing.

> > Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:63
> > +    WebKitSecurityOrigin* origin = g_slice_new(WebKitSecurityOrigin);
> 
> I think we should stop using GSlice and just use plain new/delete instead. I
> base this off of a comment in the GNOME memory management programming guide:
> "we don’t want to encourage [use of g_slice_new]. GLib developers are moving
> towards deprecating it due to its performance slipping significantly behind
> the libc allocator’s on Linux in recent years." That is literally a comment,
> as in a commented-out section of the guide, so it doesn't show up on the
> actual website, but it's there to discourage people from adding guidance on
> GSlice:
> https://git.gnome.org/browse/gnome-devel-docs/tree/programming-guidelines/C/
> memory-management.page#n22

Good point, didn't know that.

> > Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:77
> > +    ASSERT(origin);
> 
> When you would write an assertion like this, I think it's better to pass by
> reference instead. I think we've been trying to avoid pass by pointer except
> in C API and when the pointer is expected to be null.

WebKitSecurityOrigin is a boxed type not a C++ class, we always use pointers with GObjects or boxed types. That ASSERT is equivalent to the g_return macros used in public methods, but in this case I used ASSERT because it's a private method.

> > Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:166
> > + * Returns: The host of the #WebKitSecurityOrigin.
> 
> This needs a (nullable) annotation, or (allow-none), since you decided to
> return nullptr if the string is empty.

I thought nullable/allow-none was assumed for char *, I think we don't use that in any of our APIs, we will have to update them all.

> Did you see
> https://blogs.gnome.org/desrt/2014/05/27/allow-none-is-dead-long-live-
> nullable/ and continue to use (allow-none) for compatibility with old
> gobject-introspection?

No.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:512
> > +                origins = g_list_prepend(origins, webkitSecurityOriginCreate(origin.get()));
> 
> And the same origin won't ever be in multiple records? (Same question
> applies to the case of the memory cache.)

That could probably happen when you call fetchData with more than one type, but we always use fetchData with a single type. 

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:524
> > + * @error: return location for error or %NULL to ignore
> 
> Shouldn't this be annotated (optional) or (allow-none)? Same for all the
> other error parameters in this file.

I thought that GError ** was handled automatically too.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:651
> > + * %WEBKIT_PROCESS_MODEL_SHARED_SECONDARY_PROCESS process model, an empty list will
> 
> This is a comma splice; you can fix it by changing the comma to a semicolon,
> or by splitting it into two sentences if you don't like semicolons.

I don't mind to use ;

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.h:94
> > +                                                                       GError                  ** error);
> 
> Extra space after the **

Good catch.

> > Tools/ChangeLog:9
> > +        MiniBrowsore to show and handle Website data.
> 
> Is MiniBrowsore a typo or a pun? I can't tell. :D

:-D

> > Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestWebKitWebContext.cpp:736
> > +    test->wait(1);
> 
> :(
Comment 8 Carlos Garcia Campos 2015-07-28 03:37:47 PDT
Created attachment 257641 [details]
Updated patch
Comment 9 Michael Catanzaro 2015-07-28 09:05:28 PDT
(In reply to comment #7)
> 
> This is a common thing we do in the API just to make it more convenient to
> use. The host and protocol are stored in WebCore::SecurityOrigin as String,
> but all our APIs return always string in utf8 (like all other GNOME platform
> APIs). If we return the converted string directly, the returned value should
> be freed by the user. So, we cache the converted strings to be able to
> return a const char *, and we do that only on demand, so if you never call
> get_host we don't have a cached value for nothing.

Surely if we were to take a String s and return s.utf8().data(), that does not need to be freed by the caller. It is a pointer to the CString's internal CStringBuffer. And the CString has a reference-counted CStringBuffer, so the data should be freed automatically when... er, when the function returns. Ah, OK, so that would be bad. WebKit's strings are confusing; I wish it hadn't taken a year for me to realize this.

> > > Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:77
> > > +    ASSERT(origin);
> > 
> > When you would write an assertion like this, I think it's better to pass by
> > reference instead. I think we've been trying to avoid pass by pointer except
> > in C API and when the pointer is expected to be null.
> 
> WebKitSecurityOrigin is a boxed type not a C++ class, we always use pointers
> with GObjects or boxed types.

I am not sure why, though. I think it would still work fine if we were to do pass by reference.

> > > Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:166
> I thought nullable/allow-none was assumed for char *, I think we don't use
> that in any of our APIs, we will have to update them all.

I think that is wrong, unfortunately. I found [1] which says:

gchar* means (type utf8) (transfer full)
const gchar* means (type utf8) (transfer none)

Where (nullable) is conspicuously missing. And it includes this example:

/**
 * gtk_link_button_new_with_label: 
 * @uri: A URI
 * @label: (nullable): A piece of text or NULL
 */
GtkWidget *
gtk_link_button_new_with_label (const gchar *uri,
                                const gchar *label);

So I think we've been doing it wrong.

Looking at [1], it does look like no annotations are needed for GError, so you're right about that. I was looking at some Gio code that used (out), but I guess it was redundant.

[1] https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations
Comment 10 Martin Robinson 2015-07-28 09:19:23 PDT
(In reply to comment #9)
> (In reply to comment #7)
> > 
> > This is a common thing we do in the API just to make it more convenient to
> > use. The host and protocol are stored in WebCore::SecurityOrigin as String,
> > but all our APIs return always string in utf8 (like all other GNOME platform
> > APIs). If we return the converted string directly, the returned value should
> > be freed by the user. So, we cache the converted strings to be able to
> > return a const char *, and we do that only on demand, so if you never call
> > get_host we don't have a cached value for nothing.
> 
> Surely if we were to take a String s and return s.utf8().data(), that does
> not need to be freed by the caller. It is a pointer to the CString's
> internal CStringBuffer. And the CString has a reference-counted
> CStringBuffer, so the data should be freed automatically when... er, when
> the function returns. Ah, OK, so that would be bad. WebKit's strings are
> confusing; I wish it hadn't taken a year for me to realize this.


Yes, exactly. We have the CString members because the return value of utf8() is a temporary. :)
Comment 11 WebKit Commit Bot 2015-07-28 10:25:35 PDT
Attachment 257641 [details] did not pass style-queue:


ERROR: Tools/MiniBrowser/gtk/main.c:396:  Declaration has space between * and variable name in WebKitWebsiteDataManager* manager  [whitespace/declaration] [3]
ERROR: Tools/MiniBrowser/gtk/main.c:424:  Declaration has space between * and variable name in WebKitWebsiteDataManager* manager  [whitespace/declaration] [3]
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.h"
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.h"
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/webkit2.h"
Total errors found: 2 in 17 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 WebKit Commit Bot 2015-07-30 14:34:25 PDT
Attachment 257641 [details] did not pass style-queue:


ERROR: Tools/MiniBrowser/gtk/main.c:396:  Declaration has space between * and variable name in WebKitWebsiteDataManager* manager  [whitespace/declaration] [3]
ERROR: Tools/MiniBrowser/gtk/main.c:424:  Declaration has space between * and variable name in WebKitWebsiteDataManager* manager  [whitespace/declaration] [3]
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.h"
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.h"
WARNING: File exempt from style guide. Skipping: "Source/WebKit2/UIProcess/API/gtk/webkit2.h"
Total errors found: 2 in 17 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 Michael Catanzaro 2016-09-17 07:53:08 PDT
Comment on attachment 257641 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=257641&action=review

You'll need to update all the Since tags, obviously.

And it'd be nice to see implementation in Ephy before landing, since it's not great to add API that no apps are using, but I guess the MB implementation is enough to show the API is sufficient for the task.

> Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:46
> +struct _WebKitSecurityOrigin {
> +    _WebKitSecurityOrigin(WebCore::SecurityOrigin* coreSecurityOrigin)
> +        : securityOrigin(coreSecurityOrigin)
> +    {
> +    }

Why is it useful to create WebKitSecurityOrigin objects with a null WebCore::SecurityOrigin? I would have expected the constructor to receive a reference, and to use a Ref rather than RefPtr member variable.

> Source/WebKit2/UIProcess/API/gtk/WebKitSecurityOrigin.cpp:79
> +WebCore::SecurityOrigin& webkitSecurityOriginGetSecurityOrigin(WebKitSecurityOrigin* origin)
> +{
> +    ASSERT(origin);
> +    return *origin->securityOrigin;
> +}

This ASSERT indicates to me that allowing null WebCore::SecurityOrigins is probably not appropriate.

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:652
> + * Note that this is not supported for #WebKitWebContext<!-- -->s using
> + * %WEBKIT_PROCESS_MODEL_SHARED_SECONDARY_PROCESS process model; an empty list will
> + * always be returned in such case.

Were we still using the soup cache in shared secondary process mode when you wrote this patch? Hopefully this comment is obsolete and should be removed?

> Tools/MiniBrowser/gtk/main.c:396
> +    WebKitWebsiteDataManager* manager = webkit_web_context_get_website_data_manager(webkit_web_context_get_default());

WebKitWebsiteDataManager *manager

> Tools/MiniBrowser/gtk/main.c:424
> +    AboutDataRequest *dataRequest = aboutDataRequestNew(request);
> +    WebKitWebsiteDataManager* manager = webkit_web_context_get_website_data_manager(webkit_web_context_get_default());

WebKitWebsiteDataManager *manager

> Tools/MiniBrowser/gtk/main.c:493
> +    WebKitUserContentManager *userContentManager = webkit_user_content_manager_new();

Looks like it's leaked?

> Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestWebKitWebContext.cpp:736
> +    // Disk cache delays the storing of initial resources for 1 second to avoid
> +    // affecting early page load. So, wait 1 second here to make sure resources
> +    // have already been stored.
> +    test->wait(1);

This is unfortunate.

> Tools/TestWebKitAPI/gtk/WebKit2Gtk/WebViewTest.cpp:42
> +WebViewTest::WebViewTest(WebKitProcessModel processModel)
> +    : Test(processModel)
> +    , m_webView(WEBKIT_WEB_VIEW(g_object_ref_sink(webkit_web_view_new_with_context(m_webContext.get()))))
> +    , m_mainLoop(g_main_loop_new(nullptr, TRUE))
>  {
>      assertObjectIsDeletedWhenTestFinishes(G_OBJECT(m_webView));
>      g_signal_connect(m_webView, "web-process-crashed", G_CALLBACK(WebViewTest::webProcessCrashed), this);

This is not good. Use delegating constructors to avoid the need to reimplement the constructor body. Make sure there is a constructor that can take both a WebKitUserContentManager and a WebKitProcessModel.
Comment 14 Carlos Garcia Campos 2016-12-30 06:01:19 PST
Comment on attachment 257641 [details]
Updated patch

Clearing flags, I'm working on a new patch using a different API, as discussed in https://lists.webkit.org/pipermail/webkit-gtk/2015-July/002404.html
Comment 15 Carlos Garcia Campos 2017-01-21 01:58:13 PST
*** Bug 147319 has been marked as a duplicate of this bug. ***
Comment 16 Carlos Garcia Campos 2017-01-21 02:16:15 PST
Created attachment 299429 [details]
New patch
Comment 17 Carlos Garcia Campos 2017-01-21 02:17:24 PST
Making this depend on bug #167281 only because unit tests fail.
Comment 18 Michael Catanzaro 2017-01-22 16:38:57 PST
(Note: I'd like to see a patch that uses this API in https://bugzilla.gnome.org/show_bug.cgi?id=741447.)
Comment 19 Michael Catanzaro 2017-01-22 17:46:28 PST
Comment on attachment 299429 [details]
New patch

View in context: https://bugs.webkit.org/attachment.cgi?id=299429&action=review

I like the API.

I would slightly prefer slightly longer but more specific function names: webkit_website_data_manager_fetch_data, _remove_data, _clear_data. What do you think about that?

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:38
> + * A website is normally a set of URLs grouped by domain name using the public suffix list.

Why is the public suffix list used? Keeping in mind that libsoup's public suffix list is often years out of date and that we should almost never be using it. It's my primary concern with this patch.

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:39
> + * You can ge the website name, which is usually the domain, with webkit_website_data_get_name().

ge -> get

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:42
> + * A website can store different types of data in the client side. #WebKitWebsiteDataTypes is an enum containing

Leave a blank line between these paragraphs.

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:43
> + * all the possible data types, use webkit_website_data_get_types() to get the bitmask of data types.

Change the comma to a semicolon

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:175
> +        if (websiteData->record.displayName == "Local documents on your computer")

That's way too fragile; it's going to break whenever someone decides to change the display name. There has got to be a better way to check if the record represents a local file. How about checking if the protocol of the security origin is "file"?

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:176
> +            websiteData->displayName = _("Local files");

You forgot to update POTFILES.in.

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:188
> + * types actually found, not the types queried with webkit_website_data_manager_fetch().

"actually present"?

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:521
> + * @user_data: (closure): the data to pass to callback function

to the callback function

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:579
> + * Use webkit_website_data_manager_clear() if you want to remove the website data for all sites.

sites -> websites (this is the only place you wrote "sites" and it's nice to use consistent terminology)

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:639
> + * Asynchronously clear the website data of the given @types modified since the given @timespan.

This is a bit confusing. Let's say: "modified in the past @timespam microseconds"

> Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataPrivate.h:2
> + * Copyright (C) 2016 Igalia S.L.

Happy new years!

> Tools/MiniBrowser/gtk/main.c:272
> +    if (webkit_website_data_manager_remove_finish(manager, result, NULL))
> +        webkit_web_view_reload(webkit_uri_scheme_request_get_web_view(dataRequest->request));

Why reloading?

> Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestWebsiteData.cpp:282
> +    GList removeList = { itemList->data, nullptr, nullptr };

I think GList removeList = g_list_prepend(nullptr, data) would be a much clearer way to write this. You might recall teaching me that recently. ;) (Note it's duplicated in several places below.)
Comment 20 Carlos Garcia Campos 2017-01-22 23:07:02 PST
(In reply to comment #18)
> (Note: I'd like to see a patch that uses this API in
> https://bugzilla.gnome.org/show_bug.cgi?id=741447.)

I have a wip patch for ephy, I just need to finish it.
Comment 21 Carlos Garcia Campos 2017-01-22 23:17:03 PST
(In reply to comment #19)
> Comment on attachment 299429 [details]
> New patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=299429&action=review
> 
> I like the API.
> 
> I would slightly prefer slightly longer but more specific function names:
> webkit_website_data_manager_fetch_data, _remove_data, _clear_data. What do
> you think about that?

I thought about that but sounded redundant to me, this is data manager.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:38
> > + * A website is normally a set of URLs grouped by domain name using the public suffix list.
> 
> Why is the public suffix list used? Keeping in mind that libsoup's public
> suffix list is often years out of date and that we should almost never be
> using it. It's my primary concern with this patch.

Because that's how the internal WebKit2 API works. My guess is that you normally want to remove the data for a website, no matter how many url forms that website has. It makes everything easier to handle.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:39
> > + * You can ge the website name, which is usually the domain, with webkit_website_data_get_name().
> 
> ge -> get
> 
> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:42
> > + * A website can store different types of data in the client side. #WebKitWebsiteDataTypes is an enum containing
> 
> Leave a blank line between these paragraphs.

Ok.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:43
> > + * all the possible data types, use webkit_website_data_get_types() to get the bitmask of data types.
> 
> Change the comma to a semicolon

Ok.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:175
> > +        if (websiteData->record.displayName == "Local documents on your computer")
> 
> That's way too fragile; it's going to break whenever someone decides to
> change the display name. There has got to be a better way to check if the
> record represents a local file. How about checking if the protocol of the
> security origin is "file"?

Yes, I agree, I did it this way to avoid a GTK ifdef in displayNameForLocalFiles(). Fortunately unit tests will catch it if it's changed.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:176
> > +            websiteData->displayName = _("Local files");
> 
> You forgot to update POTFILES.in.

As always :-P

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteData.cpp:188
> > + * types actually found, not the types queried with webkit_website_data_manager_fetch().
> 
> "actually present"?

Sure.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:521
> > + * @user_data: (closure): the data to pass to callback function
> 
> to the callback function

Ok.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:579
> > + * Use webkit_website_data_manager_clear() if you want to remove the website data for all sites.
> 
> sites -> websites (this is the only place you wrote "sites" and it's nice to
> use consistent terminology)

Agree.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataManager.cpp:639
> > + * Asynchronously clear the website data of the given @types modified since the given @timespan.
> 
> This is a bit confusing. Let's say: "modified in the past @timespam
> microseconds"

I avoided using microseconds detail, it's a GTimeSpan and not a guint64 for that reason.

> > Source/WebKit2/UIProcess/API/gtk/WebKitWebsiteDataPrivate.h:2
> > + * Copyright (C) 2016 Igalia S.L.
> 
> Happy new years!

:-)

> > Tools/MiniBrowser/gtk/main.c:272
> > +    if (webkit_website_data_manager_remove_finish(manager, result, NULL))
> > +        webkit_web_view_reload(webkit_uri_scheme_request_get_web_view(dataRequest->request));
> 
> Why reloading?

To update the data and check that whatever you removed is no longer listed.

> > Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestWebsiteData.cpp:282
> > +    GList removeList = { itemList->data, nullptr, nullptr };
> 
> I think GList removeList = g_list_prepend(nullptr, data) would be a much
> clearer way to write this. You might recall teaching me that recently. ;)
> (Note it's duplicated in several places below.)

Then you would be heap allocating just to store a single item list. This is a trick for single item lists.
Comment 22 Michael Catanzaro 2017-01-23 07:17:09 PST
(In reply to comment #21) 
> Because that's how the internal WebKit2 API works. My guess is that you
> normally want to remove the data for a website, no matter how many url forms
> that website has. It makes everything easier to handle.

Is the data stored for each security origin? That's fine then. But then I'm not sure why the public suffix list would be required, then. If it's trying to do unreliable public suffix heuristics to merge multiple subdomains under a single base domain, then I would say that's quite unfortunate and undesirable behavior. Regardless, we probably shouldn't say "public suffix list" in our documentation.
Comment 23 Michael Catanzaro 2017-01-23 20:52:53 PST
Comment on attachment 299429 [details]
New patch

View in context: https://bugs.webkit.org/attachment.cgi?id=299429&action=review

> Source/WebKit2/ChangeLog:33
> +        (webkit_website_data_manager_remove): Remove the website data of the given types for the given WebKitWebsiteData list.
> +        (webkit_website_data_manager_remove_finish):
> +        (webkit_website_data_manager_clear): Clear all website data of the given types modified since the given time span.
> +        (webkit_website_data_manager_clear_finish):

I realized the distinction between _remove and _clear is too confusing. How about renaming _clear to _clear_all?
Comment 24 Carlos Garcia Campos 2017-01-23 22:55:39 PST
(In reply to comment #23)
> Comment on attachment 299429 [details]
> New patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=299429&action=review
> 
> > Source/WebKit2/ChangeLog:33
> > +        (webkit_website_data_manager_remove): Remove the website data of the given types for the given WebKitWebsiteData list.
> > +        (webkit_website_data_manager_remove_finish):
> > +        (webkit_website_data_manager_clear): Clear all website data of the given types modified since the given time span.
> > +        (webkit_website_data_manager_clear_finish):
> 
> I realized the distinction between _remove and _clear is too confusing. How
> about renaming _clear to _clear_all?

That's why it's documented :-) I find clear_all redundant and even more confusing, all what? all data types? of course that's why you pass the types to clear, all websites data? of course since you are not providing any. Doesn't clear implies all in English?
Comment 25 Carlos Garcia Campos 2017-01-23 23:50:08 PST
Committed r211079: <http://trac.webkit.org/changeset/211079>