Bug 154553 - [GTK] Expose user content filters in WebKitUserContentManager
Summary: [GTK] Expose user content filters in WebKitUserContentManager
Status: RESOLVED DUPLICATE of bug 167941
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Adrian Perez
URL:
Keywords:
Depends on: 160363 160364 167941 169037 169607 192562
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-22 12:45 PST by Adrian Perez
Modified: 2018-12-13 00:12 PST (History)
9 users (show)

See Also:


Attachments
[WIP] Expose content filters in WebKitGTK+ (30.11 KB, patch)
2016-06-01 15:43 PDT, Adrian Perez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Perez 2016-02-22 12:45:02 PST
Expose support for “content blockers” (a.k.a. “user content filters”) in the
WebKitGTK+ API. Like user style sheets and user scripts, this could be added
as methods on WebKitUserContentManager:

  void webkit_user_content_manager_add_filter(WebKitUserContentManager*
                                              WebKitUserContentFilter*);

  void webkit_user_content_manager_remove_all_filters(WebKitUserContentManager*);

Plus functions to instantiate and ref/unref WebKitUserContentFilter objects,
like it's done for WebKitUser{Script,StyleSheet}.

(As per discussion at https://lists.webkit.org/pipermail/webkit-gtk/2016-February/002628.html)
Comment 1 Adrian Perez 2016-02-22 12:49:44 PST
I have started to implement this. The bits to add the new API functions
are easy, but how to parse and compile a ruleset is a bit involved :P

There is interest in this API addition so GNOME Web (Epiphany) can
make use of it. The corresponding bug is:

  https://bugzilla.gnome.org/show_bug.cgi?id=757824
Comment 2 Steef435 2016-05-30 17:50:29 PDT
Are you still working on this? I'm new to WebKit but I wouldn't mind giving it a shot.

It looks like API::UserContentExtensionStore provides a good example of how compiling should be implemented. This class stores compiled extensions to disk though. Is that also the goal of this implementation? If so, in what directory should they be stored? I assume this is only used to save compile time instead of memory, so I'm inclined to just store the compiled extensions in memory as in this case there's no real extension system where the same rules have to be used frequently (for that situation it would be best to use an exposed UserContentExtensionStore but that's out of the scope of this task as is, assuming webkit_user_content_filter_new takes json data and it isn't supposed to be an abstraction of UserContentExtensionStore).

API::UserContentExtension also has a name property which is used by UserContentExtensionStore to create a filename for the compiled extension and to identify it with the WebUserContentControllerProxy. So either a unique string has to be generated for each filter or the user has to specify one with a call to webkit_user_content_filter_new. It seems silly to me to ask the user for such a thing if the filter isn't going to be stored on disk anyway, so I think an automatically generated identifier would be the way to go, using WebCore::createCanonicalUUIDString. Consecutively, WebKitUserContentFilter won't have a public name property in contrast to UserContentExtension.

To save the compile time, extra functions could be added to create a Filter from compiled data and to retrieve compiled data from a Filter.

On the other hand, UserContentExtensionStore could just be exposed and compiled extensions just saved on disk, which would look a lot cleaner from the WebKit side. I wouldn't know a default location to save this data though.

What are your thoughts?
Comment 3 Carlos Garcia Campos 2016-05-31 00:13:48 PDT
(In reply to comment #2)
> Are you still working on this? I'm new to WebKit but I wouldn't mind giving
> it a shot.

Yes, Adrian is still working on this, he has a WIP patch already.

> It looks like API::UserContentExtensionStore provides a good example of how
> compiling should be implemented. This class stores compiled extensions to
> disk though. Is that also the goal of this implementation? If so, in what
> directory should they be stored? I assume this is only used to save compile
> time instead of memory, so I'm inclined to just store the compiled
> extensions in memory as in this case there's no real extension system where
> the same rules have to be used frequently (for that situation it would be
> best to use an exposed UserContentExtensionStore but that's out of the scope
> of this task as is, assuming webkit_user_content_filter_new takes json data
> and it isn't supposed to be an abstraction of UserContentExtensionStore).
> 
> API::UserContentExtension also has a name property which is used by
> UserContentExtensionStore to create a filename for the compiled extension
> and to identify it with the WebUserContentControllerProxy. So either a
> unique string has to be generated for each filter or the user has to specify
> one with a call to webkit_user_content_filter_new. It seems silly to me to
> ask the user for such a thing if the filter isn't going to be stored on disk
> anyway, so I think an automatically generated identifier would be the way to
> go, using WebCore::createCanonicalUUIDString. Consecutively,
> WebKitUserContentFilter won't have a public name property in contrast to
> UserContentExtension.
> 
> To save the compile time, extra functions could be added to create a Filter
> from compiled data and to retrieve compiled data from a Filter.
> 
> On the other hand, UserContentExtensionStore could just be exposed and
> compiled extensions just saved on disk, which would look a lot cleaner from
> the WebKit side. I wouldn't know a default location to save this data though.
> 
> What are your thoughts?

Thanks for your feedback! Adrian can comment on all this, but I think it would also be useful to have some input from apple guys, since I'm not familiar with the content extensions API.
Comment 4 Alex Christensen 2016-05-31 10:38:31 PDT
(In reply to comment #2)
> Are you still working on this? I'm new to WebKit but I wouldn't mind giving
> it a shot.
> 
> It looks like API::UserContentExtensionStore provides a good example of how
> compiling should be implemented. This class stores compiled extensions to
> disk though. Is that also the goal of this implementation? If so, in what
> directory should they be stored? I assume this is only used to save compile
> time instead of memory, so I'm inclined to just store the compiled
> extensions in memory as in this case there's no real extension system where
> the same rules have to be used frequently (for that situation it would be
> best to use an exposed UserContentExtensionStore but that's out of the scope
> of this task as is, assuming webkit_user_content_filter_new takes json data
> and it isn't supposed to be an abstraction of UserContentExtensionStore).
> 
> API::UserContentExtension also has a name property which is used by
> UserContentExtensionStore to create a filename for the compiled extension
> and to identify it with the WebUserContentControllerProxy. So either a
> unique string has to be generated for each filter or the user has to specify
> one with a call to webkit_user_content_filter_new. It seems silly to me to
> ask the user for such a thing if the filter isn't going to be stored on disk
> anyway, so I think an automatically generated identifier would be the way to
> go, using WebCore::createCanonicalUUIDString. Consecutively,
> WebKitUserContentFilter won't have a public name property in contrast to
> UserContentExtension.
The whole feature is designed to use shared mmap'd memory from a file so we don't use a lot of memory with many web processes using the same filters.  The only input is a string containing the json list of regexes.  There are many things in WebKit that are stored in files.
> 
> To save the compile time, extra functions could be added to create a Filter
> from compiled data and to retrieve compiled data from a Filter.
We considered this.  This would save compile time, but we didn't want to expose the ability for content blocker authors to write their own byte code because that would allow authors to do strange things.  For example, right now there is a guarantee that there are no infinite loops in the byte code, and this would not be guaranteed any more.
> 
> On the other hand, UserContentExtensionStore could just be exposed and
> compiled extensions just saved on disk, which would look a lot cleaner from
> the WebKit side. I wouldn't know a default location to save this data though.
That is what we did in cocoa, and that's what it's designed for.  See _WKUserContentExtensionStore
> 
> What are your thoughts?
Comment 5 Adrian Perez 2016-06-01 15:43:51 PDT
Created attachment 280273 [details]
[WIP] Expose content filters in WebKitGTK+

I'm attaching my current very-WIP (builds, but there are crashes
which need debugging) so we have something to comment on. Some notes:

* API::UserContentStore is reused to compile filters, but not exposed
  in the public WebKitGTK+ API. Code using the API passes the JSON
  source to webkit_user_content_filter_new_sync() to start compilation
  in background. Before compiling, a lookup is tried first (to avoid
  uneeded compilations). The identifiers for the filters are the SHA1
  hashes of the JSON source text. The completion callback then uses
  webkit_user_content_filter_new_finish() to obtain a valid
  WebKitUserContentFilter instance.

* API::UserContentStore::defaultStore() is always used: This is okay
  because the on-disk path is derived from the values returned by
  g_get_user_cache_dir() and g_get_prgname(). This way each program
  uses a separate directory automatically, and each user gets its own.
  (For example a similar thing is done in the GTK+ port for the “Local
  Storage” databases.)

* The webkit_user_content_manager_add_content_filter() will enable a
  filter in all the WebKitWebViews which use the WebKitUserContentManager.
Comment 6 Adrian Perez 2016-06-01 16:00:31 PDT
(In reply to comment #3)
> (In reply to comment #2)
> > Are you still working on this? I'm new to WebKit but I wouldn't mind giving
> > it a shot.
> 
> Yes, Adrian is still working on this, he has a WIP patch already.

...which is now attached as a sneak peek ;-)

> > It looks like API::UserContentExtensionStore provides a good example of how
> > compiling should be implemented. This class stores compiled extensions to
> > disk though. Is that also the goal of this implementation?

Yes. It seems like good idea to reuse the mechanism provided by the
API::UserContentExtensionStore class. Also, it seems like a good idea to
cache compiled extensions (more on this below).

> > [...] If so, in what directory should they be stored?

It is stored in the per-user cache directory, and each application gets its
own cache directory (see g_get_user_cache_dir() and g_get_prgname() in GLib).

> > [...] I assume this is only used to save compile
> > time instead of memory, so I'm inclined to just store the compiled
> > extensions in memory as in this case there's no real extension system where
> > the same rules have to be used frequently (for that situation it would be
> > best to use an exposed UserContentExtensionStore but that's out of the scope
> > of this task as is, assuming webkit_user_content_filter_new takes json data
> > and it isn't supposed to be an abstraction of UserContentExtensionStore).

I would rather provide a simple API first which covers an usual "I am making a
browser with ad blocking" use-case. If somebody comes up later with an use-case
later on which does really need an additional degree of flexibility being provided
by the WebKitGTK+ public API, it is better to add new API later — once new API
functions make it to a stable release, we have to maintain it forever, so keeping
it small and simple benefits boths its users and we developers :-)

> > API::UserContentExtension also has a name property which is used by
> > UserContentExtensionStore to create a filename for the compiled extension
> > and to identify it with the WebUserContentControllerProxy. So either a
> > unique string has to be generated for each filter or the user has to specify
> > one with a call to webkit_user_content_filter_new. It seems silly to me to
> > ask the user for such a thing if the filter isn't going to be stored on disk
> > anyway, so I think an automatically generated identifier would be the way to
> > go, using WebCore::createCanonicalUUIDString. Consecutively,
> > WebKitUserContentFilter won't have a public name property in contrast to
> > UserContentExtension.

I have opted for generating the identifiers automatically (more on this below).

> > To save the compile time, extra functions could be added to create a Filter
> > from compiled data and to retrieve compiled data from a Filter.
> > 
> > On the other hand, UserContentExtensionStore could just be exposed and
> > compiled extensions just saved on disk, which would look a lot cleaner from
> > the WebKit side. I wouldn't know a default location to save this data though.
> > 
> > What are your thoughts?
> 
> Thanks for your feedback! Adrian can comment on all this, but I think it
> would also be useful to have some input from apple guys, since I'm not
> familiar with the content extensions API.

As I (briefly) commented in the comment when attaching the WIP patch, I think
it is enough for now to provide a simple API which follows the same model as
the other "user content" things like injected scripts and style sheets:

 * There is a way to instantiate a content filter (or an injected style sheet,
   or an injected script).
 * WebKitUserContentManager provides a method to "enable" using the filter
   (or injected style sheet, or injected script).

This way we can hide the details of how/whether content filters are compiled and
stored and from the point of view of the users of our public API things Just Work™.
Many other things in WebKit(GTK+) already depend on having a writable file system,
and therefore it should not be much of an issue to also have content filters saved
to disk. Note that the identifiers in my WIP patch are the SHA1 hex digest of the
input JSON, which neatly allows to lookup compiled filters from the disk cache: as
long as the input source does not change, nor will the hash, and the cached version
will be always used.
Comment 7 Adrian Perez 2016-06-01 16:06:45 PDT
(In reply to comment #4)
> (In reply to comment #2)
>
> > [...]
> > 
> > API::UserContentExtension also has a name property which is used by
> > UserContentExtensionStore to create a filename for the compiled extension
> > and to identify it with the WebUserContentControllerProxy. So either a
> > unique string has to be generated for each filter or the user has to specify
> > one with a call to webkit_user_content_filter_new. It seems silly to me to
> > ask the user for such a thing if the filter isn't going to be stored on disk
> > anyway, so I think an automatically generated identifier would be the way to
> > go, using WebCore::createCanonicalUUIDString. Consecutively,
> > WebKitUserContentFilter won't have a public name property in contrast to
> > UserContentExtension.
>
> The whole feature is designed to use shared mmap'd memory from a file so we
> don't use a lot of memory with many web processes using the same filters. 
> The only input is a string containing the json list of regexes.  There are
> many things in WebKit that are stored in files.
> 
> > To save the compile time, extra functions could be added to create a Filter
> > from compiled data and to retrieve compiled data from a Filter.
>
> We considered this.  This would save compile time, but we didn't want to
> expose the ability for content blocker authors to write their own byte code
> because that would allow authors to do strange things.  For example, right
> now there is a guarantee that there are no infinite loops in the byte code,
> and this would not be guaranteed any more.

These are a couple of very good reasons to use files, and for us to to reuse
API::UserContentExtensionStore as-is. Plus, the closest we stay to the other
ports and the more shared code we have means that the common code gets more
used and well tested, and we can share the maintenance effort :-)
Comment 8 Carlos Garcia Campos 2016-06-07 01:51:44 PDT
Comment on attachment 280273 [details]
[WIP] Expose content filters in WebKitGTK+

View in context: https://bugs.webkit.org/attachment.cgi?id=280273&action=review

I think we should split the patch, enabling content extensions first, and unskipping the layout tests, and then another patch to add the GTK+ API. It would help a lot to start working on  a patch for epiphany in parallel.

> Source/WebCore/PlatformGTK.cmake:77
> +    loader/ResourceLoadInfo.cpp
> +

This is not platform specific, I think we could add this in the CMakeList.txt

> Source/WebCore/platform/glib/FileSystemGlib.cpp:367
> +// XXX: This probably can go into FileSystemPOSIX.cpp, but if I put it there
> +//      then linking fails.

This is because we don't use FileSystemPOSIX at all in the GTK+ port. The GLib implementation is supposed to work on windows too (even though it's not currently used), so this implementation should use glib so that it will work in all platforms supported by glib. Use unescapedFilename() for both paths and then just g_rename(). We could probably do this in a separate bug/patch.

> Source/WebKit2/Platform/unix/SharedMemoryUnix.cpp:109
> +RefPtr<SharedMemory> SharedMemory::create(void* addr, size_t size, Protection protection)

hmm, we didn't implement ::create(), this could also be moved to its own bug/patch.

> Source/WebKit2/UIProcess/API/gtk/WebKitError.h:150
> + * Enum values used to denote errors happening when handling user content.
> + */

I know this is a wip patch, but remember to add all since: 2.14 tags

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:267
> +    _WebKitUserContentFilterPrivate(): jsonSource(), userContentExtension() { }

Split this in different lines.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:270
> +    RefPtr<API::UserContentExtension> userContentExtension;

Move member declaration after the methods

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:272
> +    String identifier() const {

Move the { to the next line.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:284
> +    GTask* task = g_task_new(initable, cancellable, callback, userData);

I think we should make more explicit the ownership of the task, now that we have switched to C++ 14 we can use a GrefPtr here, adopting the ref, and then pass the ownership to the lambda.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:287
> +        [identifier, task](RefPtr<API::UserContentExtension> userContentExtension, std::error_code error) {

task = WTFMove(task) and identifier will be passed to a secondary thread, so we probably want to pass an isolated copy: identifier = identifier.isolatedCopy()

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:290
> +                WebKitUserContentFilter* userContentFilter = WEBKIT_USER_CONTENT_FILTER(g_task_get_source_object(task));

This is done in both branch of the if, so we could move it before the if

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:292
> +                    [task](RefPtr<API::UserContentExtension> userContentExtension, std::error_code error) {

task = WTFMove(task)

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:298
> +                            userContentFilter->priv->jsonSource = ""; // The JSON source is not needed anymore.

= nullptr

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:305
> +                userContentFilter->priv->userContentExtension = userContentExtension;
> +                userContentFilter->priv->jsonSource = ""; // The JSON source is not needed anymore.

We could add a helper to reduce code duplication here. webkitUserContentFilterSetContentExtension() that sets the extension and clears the json source.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:329
> +WEBKIT_DEFINE_TYPE_WITH_CODE(
> +    WebKitUserContentFilter, webkit_user_content_filter, G_TYPE_OBJECT,
> +    G_IMPLEMENT_INTERFACE(G_TYPE_ASYNC_INITABLE, webkit_user_content_filter_interface_init))

Move this to the beginning, after the private struct declaration

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:332
> + * webkit_user_content_filter_new_async:

Do not use async suffix, it's obvious this is async, and we don't have a sync version, so better to use just webkit_user_content_filter_new(). If we eventually add a sync version we'll use _sync suffix.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:360
> +void webkit_user_content_filter_new_async(const gchar* jsonSource,
> +                                          GAsyncReadyCallback callback,
> +                                          gpointer userData) {

This should be a single line and the { should be in the next line.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:362
> +    g_return_if_fail(jsonSource); WebKitUserContentFilter* userContentFilter =
> +        WEBKIT_USER_CONTENT_FILTER(g_object_new(WEBKIT_TYPE_USER_CONTENT_FILTER,

Split this in lines too,

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:364
> +    userContentFilter->priv->jsonSource = jsonSource;

The json source could be a writable construct only property, so here you could just use g_async_initable_new_async().

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:372
> + * @initable: The #GAsyncInitable passed to the completion callback.

GAsyncInitable should be an implementation detail from the API user point of view. This function should just receive the async result and error

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:381
> + * Returns a #WebKitUserContentFilter if successful. If an error has occured,

Returns:

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:387
> +{

This is public API, you should add g_return macros to check passed in values.

g_return_val_if_fail (G_IS_ASYNC_RESULT(result), nullptr);
g_return_val_if_fail(error == nullptr || *error == nullptr, nullptr);

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:389
> +    if (g_async_initable_init_finish(initable, result, error)) {
> +        return WEBKIT_USER_CONTENT_FILTER(g_object_ref(initable));

And here we would use g_async_initable_new_finish() instead.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:392
> +    } else {
> +        return nullptr;
> +    }

Don't use else after a return, and do not use { for single line bodies.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.h:156
> +webkit_user_content_filter_new_async  (const gchar *json_source, GAsyncReadyCallback callback, gpointer user_data);

Move every parameter to its own line

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.h:159
> +webkit_user_content_filter_new_finish (GAsyncInitable *initable, GAsyncResult *result, GError **error);

Ditto.

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContentManager.cpp:275
> + * Since: 2.12

2.14

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContentManager.cpp:292
> + * Since: 2.12

2.14

> Source/WebKit2/UIProcess/API/gtk/WebKitUserContentPrivate.h:33
> +API::UserContentExtension& webkitUserContentFilterGetUserContentExtension(WebKitUserContentFilter *userContentFilter);

WebKitUserContentFilter *userContentFilter -> WebKitUserContentFilter*

> Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestWebKitUserContentManager.cpp:224
> +    char *dummy[1] = { nullptr };
> +    removeOldInjectedContentAndResetLists(test->m_userContentManager.get(), dummy, dummy);

I guess this is still TODO.
Comment 9 Adrian Perez 2016-07-29 16:38:55 PDT
(In reply to comment #8)
> Comment on attachment 280273 [details]
> [WIP] Expose content filters in WebKitGTK+
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=280273&action=review
> 
> I think we should split the patch, enabling content extensions first, and
> unskipping the layout tests, and then another patch to add the GTK+ API. It
> would help a lot to start working on  a patch for epiphany in parallel.

ACK. I have been working on the split, and while doing the first part which
unskips the layout test I noticed that there was missing some changes to the
CMake files to have ENABLE_CONTENT_FILTERING defined, and once that was enabled
it was needed to add more files to the build, and ContentFilterUnblockHandler
needs refactoring as it's only implemented for Cocoa.

The refactor be a separate patch as well, which will add a port-agnostic
ContentFilterUnblockHandlerBase, and move the Cocoa-specific implementation
into WebCore/platform/cocoa/ContentFilterUnblockHandler.{h,mm}. After that
is done, we can add an implementation for the GTK port.

> > Source/WebCore/platform/glib/FileSystemGlib.cpp:367
> > +// XXX: This probably can go into FileSystemPOSIX.cpp, but if I put it there
> > +//      then linking fails.
> 
> This is because we don't use FileSystemPOSIX at all in the GTK+ port. The
> GLib implementation is supposed to work on windows too (even though it's not
> currently used), so this implementation should use glib so that it will work
> in all platforms supported by glib. Use unescapedFilename() for both paths
> and then just g_rename(). We could probably do this in a separate bug/patch.

Moved to separate bug/patch: https://bugs.webkit.org/show_bug.cgi?id=160363

> > Source/WebKit2/Platform/unix/SharedMemoryUnix.cpp:109
> > +RefPtr<SharedMemory> SharedMemory::create(void* addr, size_t size, Protection protection)
> 
> hmm, we didn't implement ::create(), this could also be moved to its own
> bug/patch.

Moved to separate bug/patch: https://bugs.webkit.org/show_bug.cgi?id=160364
Comment 10 Adrian Perez 2017-02-07 06:01:20 PST
I have had some time to work again on this during the flights to/from FOSDEM.
Sometimes there's nothing like being a few hours in an isolated environment
with a laptop to get some stuff done :-) 

(In reply to comment #8)
> Comment on attachment 280273 [details]
> [WIP] Expose content filters in WebKitGTK+
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=280273&action=review
> 
> I think we should split the patch, enabling content extensions first, and
> unskipping the layout tests, and then another patch to add the GTK+ API. It
> would help a lot to start working on  a patch for epiphany in parallel.

TL;DR: we need the API added first, then update WKTR, then unskip the tests.

We cannot unskip first because “Tools/WebKitTestRunner/gtk/TestControllerGtk.cpp”
needs changes to detect whether test cases are inside the “contentextensions/”
subdirectory, and if yes compile and enable the correspoding JSON ruleset file.

The WebKit2 C API does not have public functions for content extensions, so
for example the Mac port #imports the public Objective-C API headers in the
“TestControllerMac.mm” file and use the public API to do the needed configuration
in “TestController::platformConfigureViewForTest()”.
Comment 11 Adrian Perez 2017-02-07 08:41:24 PST
There is one thing that is bothering me with the API as it is currently.
Consider the following actions:

 1.  Instantiate a WebKitUserContentFilter. This picks the JSON source,
     calculates its SHA1 to use as identifier, and compiles it. The last
     step saves it to disk in a way that it can be retrieved if the SHA1
     (the identifier) is known.

 2.  The JSON source changes, and the process above repeated. The new
     compiled extension has a *different* identifier because the SHA1 hash
     of the source has changed.

 3a. The old version of the content extension cannot be removed from disk,
     because the SHA1 hash of the first JSON source is never returned to
     the caller.

 3b. Even if the SHA1 of the JSON source is calculated by the program
     without relying on WebKitGTK+ to return it, we provide no API at
     all to remove compiled content extensions from disk. So WebKitGTK+
     would be leaking all the compiled content extensions on disk. Ouch.

Therefore, I think it makes sense to go all-in and also expose an API for
for the storage of compiled content extensions. That would mean adding the
following to our public API:

* Methods to instantiate a WebKitUserContentDataManager, or to obtain the
  default singleton instance:

    WebKitUserContentDataManager* webkit_user_content_data_managar_get_default (void)
    WebKitUserContentDataManager* webkit_user_content_data_manager_new (const char *path)

* Asynchronous methods to lookup an already-compiled filter, compile a filter
  (with the side effect of being saved to disk), and removing a filter (with
  the side effect of being removed from disk):

    void webkit_user_content_data_manager_find_filter (WebKitUserContentDataManager*,
                                                      const char *identifier,
                                                      GAsyncResultCallback callback,
                                                      gpointer userdata)
    WebKitUserContentFilter*
    webkit_user_content_data_manager_find_filter_finish (WebKitUserContentDataManager*,
                                                         GAsyncResult *result,
                                                         GError **error)

    void webkit_user_content_data_manager_compile_filter (WebKitUserContentDataManager*,
                                                          const char *identifier,
                                                          const char *json_source,
                                                          GAsyncResultCallback callback,
                                                          gpointer userdata)
    WebKitUserContentFilter*
    webkit_user_content_data_manager_compile_filter_finish (WebKitUserContentDataManager*
                                                            GAsyncResult *result,
                                                            GError **error)

    void webkit_user_content_data_manager_remove_filter (WebKitUserContentDataManager*,
                                                       const char *identifier,
                                                       GAsyncResultCallback callback,
                                                       gpointer userdata)
    gboolean
    webkit_user_content_data_manager_remove_filter_finish (WebKitUserContentDataManager*,
                                                           GASyncResult *result,
                                                           GError **error)

* Methods in WebKitUserContentManager to add/remove *active* filters. These have
  effect on content loaded by webviews which use the WebKitUserContentManager, but
  _no effect_ on the compiled filters stored on disk:

    void webkit_user_content_manager_add_filter (WebKitUserContentManager*,
                                                 WebKitUserContentFilter*)
    void webkit_user_content_manager_remove_filter (WebKitUserContentManager*,
                                                    WebKitUserContentFilter*)
    void webkit_user_content_manager_remove_all_filters (WebKitUserContentManager*)

* Last by not least, by having the responsibility of compiling filters moved to the
  new WebKitUserContentDataManager class, WebKitUserContentFilter can be a boxed
  struct, and the public API only needs to provide the webkit_user_content_filter_ref()
  and webkit_user_content_filter_unref() methods. Note that direct instantiation
  would not be possible, and always done using the the lookup/compile methods from 
  WebKitUserContentDataManager.

I think this approach gives a better idea at the API level compiled filters are saved
“somewhere” (i.e. on disk) because there is a WebKitUserContentDataManager class which
is named in a way similar to WebKitWebsiteDataManager, which also deals with persisted
data. Also, this kind of API also transmits the idea that compiling a filter and storing
it on disk is a separate “process”, and it is possible to do it independently without
having to instantiate webviews and so on.

WDYT?
Comment 12 Adrian Perez 2017-02-07 08:49:26 PST
(In reply to comment #11)

> Therefore, I think it makes sense to go all-in and also expose an API for
> for the storage of compiled content extensions. That would mean adding the
> following to our public API:

With the API that I have just proposed, when a browser obtains a new version
of a filters ruleset, it would use the following:

 1. webkit_user_content_data_manager_compile() to compile the new version.
    This is done asynchronously and the browser can continue using the old
    version of the filter in the meanwhile.
 2. webkit_user_content_manager_remove_filter() to stop using the old
    version of the filter.
 3. webkit_user_content_manager_add_filter() to start using the new
    version of the filter.
 4. webkit_user_content_data_manager_remove() to remove the old compiled
    version of the filter from disk.

I think it could even be possible to remove the compiled filter from disk
with webkit_user_content_data_manager_remove() as _first_ step while still
using the filter: it will be unlinked from disk, and the space freed once
the filter gets disabled by webkit_user_content_manager_remove_filter()
and munmap() called and the low-level file descriptor closed. This method
allows reusing the same identifier for the new version of the filter.
Comment 13 Michael Catanzaro 2017-02-07 15:34:49 PST
(In reply to comment #11)
>  3b. Even if the SHA1 of the JSON source is calculated by the program
>      without relying on WebKitGTK+ to return it, we provide no API at
>      all to remove compiled content extensions from disk. So WebKitGTK+
>      would be leaking all the compiled content extensions on disk. Ouch.

It would be desirable that this work automatically, without applications needing to keep track of it themselves. Is that not possible...?

> * Methods in WebKitUserContentManager to add/remove *active* filters. These
> have
>   effect on content loaded by webviews which use the
> WebKitUserContentManager, but
>   _no effect_ on the compiled filters stored on disk:

Ideally applications would only think about what's happening in their web views, and not worry about what's being cached on disk. Is there no reasonable way for WebKit to know when a filter is no longer needed? If not, then perhaps applications will just have to manage it manually like you suggest.
Comment 14 Adrian Perez 2017-02-08 08:46:42 PST
(In reply to comment #13)
> (In reply to comment #11)
> >  3b. Even if the SHA1 of the JSON source is calculated by the program
> >      without relying on WebKitGTK+ to return it, we provide no API at
> >      all to remove compiled content extensions from disk. So WebKitGTK+
> >      would be leaking all the compiled content extensions on disk. Ouch.
> 
> It would be desirable that this work automatically, without applications
> needing to keep track of it themselves. Is that not possible...?

Not quite. Content extensions are essentially a completely separate entity
and they do not go through the usual cache paths (e.g. there's no cache
replacement or anything like that). In my understanding, this is intentional
because of the following reasons:

 - Content extensions are, well *extensions*, which are installed due to
   explicit user actions (“user enables EasyList”, “user disables EasyList”
   (“user removes EasyList”). Therefore we *know* when an extension has to
   be compiled/enabled/disabled/removed.

 - Also, we don't want WebKit to do cache replacement or removing compiled
   content extensions randomly, because there is no way for WebKit to know
   from which source file(s) to recreate them.

 - In contrast, the user does not directly control data saved locally by
   websites, and the code of websites can change at any time out of the
   control of the browser. That's why for website-stored data, we consider
   it “a cache”, while for content extension it is “a _storage_ area”. As
   such, a _storage_ is permanent and explicitly controlled.

> > * Methods in WebKitUserContentManager to add/remove *active* filters. These
> > have
> >   effect on content loaded by webviews which use the
> > WebKitUserContentManager, but
> >   _no effect_ on the compiled filters stored on disk:
> 
> Ideally applications would only think about what's happening in their web
> views, and not worry about what's being cached on disk. Is there no
> reasonable way for WebKit to know when a filter is no longer needed? If not,
> then perhaps applications will just have to manage it manually like you
> suggest.

Thing is, as I tried to explain above (and hopefully succeeded!), content 
extension are *not* cached data. If you think that the “EasyList adblock list”
maps 1:1 to a content extension, then you want the user to be able to enable
or disable it with a checkbox in your UI, or add/remove more. So in the end
the user is under control, therefore the WebKit API should cater to that need.

Something we can discuss is whether we allow users of this API to give their
own identifier or we calculate them and return them after compilation.
Comment 15 Adrian Perez 2017-02-08 09:13:01 PST
One clarification more... When choosing the names for new API elements like
WebKitUserContentDataManager my reasons were:

 - We already have other classes ending in *Manager (and *DataManager),
   let's be consistent with that.

 - We are already using the “UserContent” moniker for other pieces of data
   provided by the user (either final user or a developer who uses WebKitGTK+),
   like injected CSS style sheets and injected JavaScript snippets.

 - I did not want to have “Extension” or “Filter” in the name. The idea
   is that in the future we might have other “user content” data (local
   content not originated from a website). The WebKitUserContentDataManager
   name is generic, so it could handle other kinds of “user content” in
   the future as well. This name would potentially avoid having to add
   other WebKitUserContent${NAME}DataManager classes a the future.

I hope this helps further clarify what's behind the naming choices :-)
Comment 16 Adrian Cochrane 2018-09-07 22:59:00 PDT
Is this bug still active? Is there anything I can do to help?

According to the metadata it hasn't see any activity over the last year.
Comment 17 Adrian Perez 2018-12-10 11:35:12 PST
(In reply to Adrian Cochrane from comment #16)
> Is this bug still active? Is there anything I can do to help?
> 
> According to the metadata it hasn't see any activity over the last year.

I have resumed work on this a couple of weeks ago.
Comment 18 Adrian Perez 2018-12-13 00:12:04 PST
I'm handling this as part of #167941 in the end.

*** This bug has been marked as a duplicate of bug 167941 ***