Bug 154553

Summary: [GTK] Expose user content filters in WebKitUserContentManager
Product: WebKit Reporter: Adrian Perez <aperez>
Component: WebKitGTKAssignee: Adrian Perez <aperez>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: achristensen, alcinnz, benjamin, bugs-noreply, cgarcia, hilobakho, mcatanzaro, sam, steefhegeman
Priority: P2    
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 160363, 160364, 167941, 169037, 169607, 192562    
Bug Blocks:    
Attachments:
Description Flags
[WIP] Expose content filters in WebKitGTK+ none

Adrian Perez
Reported 2016-02-22 12:45:02 PST
Expose support for “content blockers” (a.k.a. “user content filters”) in the WebKitGTK+ API. Like user style sheets and user scripts, this could be added as methods on WebKitUserContentManager: void webkit_user_content_manager_add_filter(WebKitUserContentManager* WebKitUserContentFilter*); void webkit_user_content_manager_remove_all_filters(WebKitUserContentManager*); Plus functions to instantiate and ref/unref WebKitUserContentFilter objects, like it's done for WebKitUser{Script,StyleSheet}. (As per discussion at https://lists.webkit.org/pipermail/webkit-gtk/2016-February/002628.html)
Attachments
[WIP] Expose content filters in WebKitGTK+ (30.11 KB, patch)
2016-06-01 15:43 PDT, Adrian Perez
no flags
Adrian Perez
Comment 1 2016-02-22 12:49:44 PST
I have started to implement this. The bits to add the new API functions are easy, but how to parse and compile a ruleset is a bit involved :P There is interest in this API addition so GNOME Web (Epiphany) can make use of it. The corresponding bug is: https://bugzilla.gnome.org/show_bug.cgi?id=757824
Steef435
Comment 2 2016-05-30 17:50:29 PDT
Are you still working on this? I'm new to WebKit but I wouldn't mind giving it a shot. It looks like API::UserContentExtensionStore provides a good example of how compiling should be implemented. This class stores compiled extensions to disk though. Is that also the goal of this implementation? If so, in what directory should they be stored? I assume this is only used to save compile time instead of memory, so I'm inclined to just store the compiled extensions in memory as in this case there's no real extension system where the same rules have to be used frequently (for that situation it would be best to use an exposed UserContentExtensionStore but that's out of the scope of this task as is, assuming webkit_user_content_filter_new takes json data and it isn't supposed to be an abstraction of UserContentExtensionStore). API::UserContentExtension also has a name property which is used by UserContentExtensionStore to create a filename for the compiled extension and to identify it with the WebUserContentControllerProxy. So either a unique string has to be generated for each filter or the user has to specify one with a call to webkit_user_content_filter_new. It seems silly to me to ask the user for such a thing if the filter isn't going to be stored on disk anyway, so I think an automatically generated identifier would be the way to go, using WebCore::createCanonicalUUIDString. Consecutively, WebKitUserContentFilter won't have a public name property in contrast to UserContentExtension. To save the compile time, extra functions could be added to create a Filter from compiled data and to retrieve compiled data from a Filter. On the other hand, UserContentExtensionStore could just be exposed and compiled extensions just saved on disk, which would look a lot cleaner from the WebKit side. I wouldn't know a default location to save this data though. What are your thoughts?
Carlos Garcia Campos
Comment 3 2016-05-31 00:13:48 PDT
(In reply to comment #2) > Are you still working on this? I'm new to WebKit but I wouldn't mind giving > it a shot. Yes, Adrian is still working on this, he has a WIP patch already. > It looks like API::UserContentExtensionStore provides a good example of how > compiling should be implemented. This class stores compiled extensions to > disk though. Is that also the goal of this implementation? If so, in what > directory should they be stored? I assume this is only used to save compile > time instead of memory, so I'm inclined to just store the compiled > extensions in memory as in this case there's no real extension system where > the same rules have to be used frequently (for that situation it would be > best to use an exposed UserContentExtensionStore but that's out of the scope > of this task as is, assuming webkit_user_content_filter_new takes json data > and it isn't supposed to be an abstraction of UserContentExtensionStore). > > API::UserContentExtension also has a name property which is used by > UserContentExtensionStore to create a filename for the compiled extension > and to identify it with the WebUserContentControllerProxy. So either a > unique string has to be generated for each filter or the user has to specify > one with a call to webkit_user_content_filter_new. It seems silly to me to > ask the user for such a thing if the filter isn't going to be stored on disk > anyway, so I think an automatically generated identifier would be the way to > go, using WebCore::createCanonicalUUIDString. Consecutively, > WebKitUserContentFilter won't have a public name property in contrast to > UserContentExtension. > > To save the compile time, extra functions could be added to create a Filter > from compiled data and to retrieve compiled data from a Filter. > > On the other hand, UserContentExtensionStore could just be exposed and > compiled extensions just saved on disk, which would look a lot cleaner from > the WebKit side. I wouldn't know a default location to save this data though. > > What are your thoughts? Thanks for your feedback! Adrian can comment on all this, but I think it would also be useful to have some input from apple guys, since I'm not familiar with the content extensions API.
Alex Christensen
Comment 4 2016-05-31 10:38:31 PDT
(In reply to comment #2) > Are you still working on this? I'm new to WebKit but I wouldn't mind giving > it a shot. > > It looks like API::UserContentExtensionStore provides a good example of how > compiling should be implemented. This class stores compiled extensions to > disk though. Is that also the goal of this implementation? If so, in what > directory should they be stored? I assume this is only used to save compile > time instead of memory, so I'm inclined to just store the compiled > extensions in memory as in this case there's no real extension system where > the same rules have to be used frequently (for that situation it would be > best to use an exposed UserContentExtensionStore but that's out of the scope > of this task as is, assuming webkit_user_content_filter_new takes json data > and it isn't supposed to be an abstraction of UserContentExtensionStore). > > API::UserContentExtension also has a name property which is used by > UserContentExtensionStore to create a filename for the compiled extension > and to identify it with the WebUserContentControllerProxy. So either a > unique string has to be generated for each filter or the user has to specify > one with a call to webkit_user_content_filter_new. It seems silly to me to > ask the user for such a thing if the filter isn't going to be stored on disk > anyway, so I think an automatically generated identifier would be the way to > go, using WebCore::createCanonicalUUIDString. Consecutively, > WebKitUserContentFilter won't have a public name property in contrast to > UserContentExtension. The whole feature is designed to use shared mmap'd memory from a file so we don't use a lot of memory with many web processes using the same filters. The only input is a string containing the json list of regexes. There are many things in WebKit that are stored in files. > > To save the compile time, extra functions could be added to create a Filter > from compiled data and to retrieve compiled data from a Filter. We considered this. This would save compile time, but we didn't want to expose the ability for content blocker authors to write their own byte code because that would allow authors to do strange things. For example, right now there is a guarantee that there are no infinite loops in the byte code, and this would not be guaranteed any more. > > On the other hand, UserContentExtensionStore could just be exposed and > compiled extensions just saved on disk, which would look a lot cleaner from > the WebKit side. I wouldn't know a default location to save this data though. That is what we did in cocoa, and that's what it's designed for. See _WKUserContentExtensionStore > > What are your thoughts?
Adrian Perez
Comment 5 2016-06-01 15:43:51 PDT
Created attachment 280273 [details] [WIP] Expose content filters in WebKitGTK+ I'm attaching my current very-WIP (builds, but there are crashes which need debugging) so we have something to comment on. Some notes: * API::UserContentStore is reused to compile filters, but not exposed in the public WebKitGTK+ API. Code using the API passes the JSON source to webkit_user_content_filter_new_sync() to start compilation in background. Before compiling, a lookup is tried first (to avoid uneeded compilations). The identifiers for the filters are the SHA1 hashes of the JSON source text. The completion callback then uses webkit_user_content_filter_new_finish() to obtain a valid WebKitUserContentFilter instance. * API::UserContentStore::defaultStore() is always used: This is okay because the on-disk path is derived from the values returned by g_get_user_cache_dir() and g_get_prgname(). This way each program uses a separate directory automatically, and each user gets its own. (For example a similar thing is done in the GTK+ port for the “Local Storage” databases.) * The webkit_user_content_manager_add_content_filter() will enable a filter in all the WebKitWebViews which use the WebKitUserContentManager.
Adrian Perez
Comment 6 2016-06-01 16:00:31 PDT
(In reply to comment #3) > (In reply to comment #2) > > Are you still working on this? I'm new to WebKit but I wouldn't mind giving > > it a shot. > > Yes, Adrian is still working on this, he has a WIP patch already. ...which is now attached as a sneak peek ;-) > > It looks like API::UserContentExtensionStore provides a good example of how > > compiling should be implemented. This class stores compiled extensions to > > disk though. Is that also the goal of this implementation? Yes. It seems like good idea to reuse the mechanism provided by the API::UserContentExtensionStore class. Also, it seems like a good idea to cache compiled extensions (more on this below). > > [...] If so, in what directory should they be stored? It is stored in the per-user cache directory, and each application gets its own cache directory (see g_get_user_cache_dir() and g_get_prgname() in GLib). > > [...] I assume this is only used to save compile > > time instead of memory, so I'm inclined to just store the compiled > > extensions in memory as in this case there's no real extension system where > > the same rules have to be used frequently (for that situation it would be > > best to use an exposed UserContentExtensionStore but that's out of the scope > > of this task as is, assuming webkit_user_content_filter_new takes json data > > and it isn't supposed to be an abstraction of UserContentExtensionStore). I would rather provide a simple API first which covers an usual "I am making a browser with ad blocking" use-case. If somebody comes up later with an use-case later on which does really need an additional degree of flexibility being provided by the WebKitGTK+ public API, it is better to add new API later — once new API functions make it to a stable release, we have to maintain it forever, so keeping it small and simple benefits boths its users and we developers :-) > > API::UserContentExtension also has a name property which is used by > > UserContentExtensionStore to create a filename for the compiled extension > > and to identify it with the WebUserContentControllerProxy. So either a > > unique string has to be generated for each filter or the user has to specify > > one with a call to webkit_user_content_filter_new. It seems silly to me to > > ask the user for such a thing if the filter isn't going to be stored on disk > > anyway, so I think an automatically generated identifier would be the way to > > go, using WebCore::createCanonicalUUIDString. Consecutively, > > WebKitUserContentFilter won't have a public name property in contrast to > > UserContentExtension. I have opted for generating the identifiers automatically (more on this below). > > To save the compile time, extra functions could be added to create a Filter > > from compiled data and to retrieve compiled data from a Filter. > > > > On the other hand, UserContentExtensionStore could just be exposed and > > compiled extensions just saved on disk, which would look a lot cleaner from > > the WebKit side. I wouldn't know a default location to save this data though. > > > > What are your thoughts? > > Thanks for your feedback! Adrian can comment on all this, but I think it > would also be useful to have some input from apple guys, since I'm not > familiar with the content extensions API. As I (briefly) commented in the comment when attaching the WIP patch, I think it is enough for now to provide a simple API which follows the same model as the other "user content" things like injected scripts and style sheets: * There is a way to instantiate a content filter (or an injected style sheet, or an injected script). * WebKitUserContentManager provides a method to "enable" using the filter (or injected style sheet, or injected script). This way we can hide the details of how/whether content filters are compiled and stored and from the point of view of the users of our public API things Just Work™. Many other things in WebKit(GTK+) already depend on having a writable file system, and therefore it should not be much of an issue to also have content filters saved to disk. Note that the identifiers in my WIP patch are the SHA1 hex digest of the input JSON, which neatly allows to lookup compiled filters from the disk cache: as long as the input source does not change, nor will the hash, and the cached version will be always used.
Adrian Perez
Comment 7 2016-06-01 16:06:45 PDT
(In reply to comment #4) > (In reply to comment #2) > > > [...] > > > > API::UserContentExtension also has a name property which is used by > > UserContentExtensionStore to create a filename for the compiled extension > > and to identify it with the WebUserContentControllerProxy. So either a > > unique string has to be generated for each filter or the user has to specify > > one with a call to webkit_user_content_filter_new. It seems silly to me to > > ask the user for such a thing if the filter isn't going to be stored on disk > > anyway, so I think an automatically generated identifier would be the way to > > go, using WebCore::createCanonicalUUIDString. Consecutively, > > WebKitUserContentFilter won't have a public name property in contrast to > > UserContentExtension. > > The whole feature is designed to use shared mmap'd memory from a file so we > don't use a lot of memory with many web processes using the same filters. > The only input is a string containing the json list of regexes. There are > many things in WebKit that are stored in files. > > > To save the compile time, extra functions could be added to create a Filter > > from compiled data and to retrieve compiled data from a Filter. > > We considered this. This would save compile time, but we didn't want to > expose the ability for content blocker authors to write their own byte code > because that would allow authors to do strange things. For example, right > now there is a guarantee that there are no infinite loops in the byte code, > and this would not be guaranteed any more. These are a couple of very good reasons to use files, and for us to to reuse API::UserContentExtensionStore as-is. Plus, the closest we stay to the other ports and the more shared code we have means that the common code gets more used and well tested, and we can share the maintenance effort :-)
Carlos Garcia Campos
Comment 8 2016-06-07 01:51:44 PDT
Comment on attachment 280273 [details] [WIP] Expose content filters in WebKitGTK+ View in context: https://bugs.webkit.org/attachment.cgi?id=280273&action=review I think we should split the patch, enabling content extensions first, and unskipping the layout tests, and then another patch to add the GTK+ API. It would help a lot to start working on a patch for epiphany in parallel. > Source/WebCore/PlatformGTK.cmake:77 > + loader/ResourceLoadInfo.cpp > + This is not platform specific, I think we could add this in the CMakeList.txt > Source/WebCore/platform/glib/FileSystemGlib.cpp:367 > +// XXX: This probably can go into FileSystemPOSIX.cpp, but if I put it there > +// then linking fails. This is because we don't use FileSystemPOSIX at all in the GTK+ port. The GLib implementation is supposed to work on windows too (even though it's not currently used), so this implementation should use glib so that it will work in all platforms supported by glib. Use unescapedFilename() for both paths and then just g_rename(). We could probably do this in a separate bug/patch. > Source/WebKit2/Platform/unix/SharedMemoryUnix.cpp:109 > +RefPtr<SharedMemory> SharedMemory::create(void* addr, size_t size, Protection protection) hmm, we didn't implement ::create(), this could also be moved to its own bug/patch. > Source/WebKit2/UIProcess/API/gtk/WebKitError.h:150 > + * Enum values used to denote errors happening when handling user content. > + */ I know this is a wip patch, but remember to add all since: 2.14 tags > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:267 > + _WebKitUserContentFilterPrivate(): jsonSource(), userContentExtension() { } Split this in different lines. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:270 > + RefPtr<API::UserContentExtension> userContentExtension; Move member declaration after the methods > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:272 > + String identifier() const { Move the { to the next line. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:284 > + GTask* task = g_task_new(initable, cancellable, callback, userData); I think we should make more explicit the ownership of the task, now that we have switched to C++ 14 we can use a GrefPtr here, adopting the ref, and then pass the ownership to the lambda. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:287 > + [identifier, task](RefPtr<API::UserContentExtension> userContentExtension, std::error_code error) { task = WTFMove(task) and identifier will be passed to a secondary thread, so we probably want to pass an isolated copy: identifier = identifier.isolatedCopy() > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:290 > + WebKitUserContentFilter* userContentFilter = WEBKIT_USER_CONTENT_FILTER(g_task_get_source_object(task)); This is done in both branch of the if, so we could move it before the if > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:292 > + [task](RefPtr<API::UserContentExtension> userContentExtension, std::error_code error) { task = WTFMove(task) > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:298 > + userContentFilter->priv->jsonSource = ""; // The JSON source is not needed anymore. = nullptr > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:305 > + userContentFilter->priv->userContentExtension = userContentExtension; > + userContentFilter->priv->jsonSource = ""; // The JSON source is not needed anymore. We could add a helper to reduce code duplication here. webkitUserContentFilterSetContentExtension() that sets the extension and clears the json source. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:329 > +WEBKIT_DEFINE_TYPE_WITH_CODE( > + WebKitUserContentFilter, webkit_user_content_filter, G_TYPE_OBJECT, > + G_IMPLEMENT_INTERFACE(G_TYPE_ASYNC_INITABLE, webkit_user_content_filter_interface_init)) Move this to the beginning, after the private struct declaration > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:332 > + * webkit_user_content_filter_new_async: Do not use async suffix, it's obvious this is async, and we don't have a sync version, so better to use just webkit_user_content_filter_new(). If we eventually add a sync version we'll use _sync suffix. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:360 > +void webkit_user_content_filter_new_async(const gchar* jsonSource, > + GAsyncReadyCallback callback, > + gpointer userData) { This should be a single line and the { should be in the next line. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:362 > + g_return_if_fail(jsonSource); WebKitUserContentFilter* userContentFilter = > + WEBKIT_USER_CONTENT_FILTER(g_object_new(WEBKIT_TYPE_USER_CONTENT_FILTER, Split this in lines too, > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:364 > + userContentFilter->priv->jsonSource = jsonSource; The json source could be a writable construct only property, so here you could just use g_async_initable_new_async(). > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:372 > + * @initable: The #GAsyncInitable passed to the completion callback. GAsyncInitable should be an implementation detail from the API user point of view. This function should just receive the async result and error > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:381 > + * Returns a #WebKitUserContentFilter if successful. If an error has occured, Returns: > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:387 > +{ This is public API, you should add g_return macros to check passed in values. g_return_val_if_fail (G_IS_ASYNC_RESULT(result), nullptr); g_return_val_if_fail(error == nullptr || *error == nullptr, nullptr); > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:389 > + if (g_async_initable_init_finish(initable, result, error)) { > + return WEBKIT_USER_CONTENT_FILTER(g_object_ref(initable)); And here we would use g_async_initable_new_finish() instead. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.cpp:392 > + } else { > + return nullptr; > + } Don't use else after a return, and do not use { for single line bodies. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.h:156 > +webkit_user_content_filter_new_async (const gchar *json_source, GAsyncReadyCallback callback, gpointer user_data); Move every parameter to its own line > Source/WebKit2/UIProcess/API/gtk/WebKitUserContent.h:159 > +webkit_user_content_filter_new_finish (GAsyncInitable *initable, GAsyncResult *result, GError **error); Ditto. > Source/WebKit2/UIProcess/API/gtk/WebKitUserContentManager.cpp:275 > + * Since: 2.12 2.14 > Source/WebKit2/UIProcess/API/gtk/WebKitUserContentManager.cpp:292 > + * Since: 2.12 2.14 > Source/WebKit2/UIProcess/API/gtk/WebKitUserContentPrivate.h:33 > +API::UserContentExtension& webkitUserContentFilterGetUserContentExtension(WebKitUserContentFilter *userContentFilter); WebKitUserContentFilter *userContentFilter -> WebKitUserContentFilter* > Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestWebKitUserContentManager.cpp:224 > + char *dummy[1] = { nullptr }; > + removeOldInjectedContentAndResetLists(test->m_userContentManager.get(), dummy, dummy); I guess this is still TODO.
Adrian Perez
Comment 9 2016-07-29 16:38:55 PDT
(In reply to comment #8) > Comment on attachment 280273 [details] > [WIP] Expose content filters in WebKitGTK+ > > View in context: > https://bugs.webkit.org/attachment.cgi?id=280273&action=review > > I think we should split the patch, enabling content extensions first, and > unskipping the layout tests, and then another patch to add the GTK+ API. It > would help a lot to start working on a patch for epiphany in parallel. ACK. I have been working on the split, and while doing the first part which unskips the layout test I noticed that there was missing some changes to the CMake files to have ENABLE_CONTENT_FILTERING defined, and once that was enabled it was needed to add more files to the build, and ContentFilterUnblockHandler needs refactoring as it's only implemented for Cocoa. The refactor be a separate patch as well, which will add a port-agnostic ContentFilterUnblockHandlerBase, and move the Cocoa-specific implementation into WebCore/platform/cocoa/ContentFilterUnblockHandler.{h,mm}. After that is done, we can add an implementation for the GTK port. > > Source/WebCore/platform/glib/FileSystemGlib.cpp:367 > > +// XXX: This probably can go into FileSystemPOSIX.cpp, but if I put it there > > +// then linking fails. > > This is because we don't use FileSystemPOSIX at all in the GTK+ port. The > GLib implementation is supposed to work on windows too (even though it's not > currently used), so this implementation should use glib so that it will work > in all platforms supported by glib. Use unescapedFilename() for both paths > and then just g_rename(). We could probably do this in a separate bug/patch. Moved to separate bug/patch: https://bugs.webkit.org/show_bug.cgi?id=160363 > > Source/WebKit2/Platform/unix/SharedMemoryUnix.cpp:109 > > +RefPtr<SharedMemory> SharedMemory::create(void* addr, size_t size, Protection protection) > > hmm, we didn't implement ::create(), this could also be moved to its own > bug/patch. Moved to separate bug/patch: https://bugs.webkit.org/show_bug.cgi?id=160364
Adrian Perez
Comment 10 2017-02-07 06:01:20 PST
I have had some time to work again on this during the flights to/from FOSDEM. Sometimes there's nothing like being a few hours in an isolated environment with a laptop to get some stuff done :-) (In reply to comment #8) > Comment on attachment 280273 [details] > [WIP] Expose content filters in WebKitGTK+ > > View in context: > https://bugs.webkit.org/attachment.cgi?id=280273&action=review > > I think we should split the patch, enabling content extensions first, and > unskipping the layout tests, and then another patch to add the GTK+ API. It > would help a lot to start working on a patch for epiphany in parallel. TL;DR: we need the API added first, then update WKTR, then unskip the tests. We cannot unskip first because “Tools/WebKitTestRunner/gtk/TestControllerGtk.cpp” needs changes to detect whether test cases are inside the “contentextensions/” subdirectory, and if yes compile and enable the correspoding JSON ruleset file. The WebKit2 C API does not have public functions for content extensions, so for example the Mac port #imports the public Objective-C API headers in the “TestControllerMac.mm” file and use the public API to do the needed configuration in “TestController::platformConfigureViewForTest()”.
Adrian Perez
Comment 11 2017-02-07 08:41:24 PST
There is one thing that is bothering me with the API as it is currently. Consider the following actions: 1. Instantiate a WebKitUserContentFilter. This picks the JSON source, calculates its SHA1 to use as identifier, and compiles it. The last step saves it to disk in a way that it can be retrieved if the SHA1 (the identifier) is known. 2. The JSON source changes, and the process above repeated. The new compiled extension has a *different* identifier because the SHA1 hash of the source has changed. 3a. The old version of the content extension cannot be removed from disk, because the SHA1 hash of the first JSON source is never returned to the caller. 3b. Even if the SHA1 of the JSON source is calculated by the program without relying on WebKitGTK+ to return it, we provide no API at all to remove compiled content extensions from disk. So WebKitGTK+ would be leaking all the compiled content extensions on disk. Ouch. Therefore, I think it makes sense to go all-in and also expose an API for for the storage of compiled content extensions. That would mean adding the following to our public API: * Methods to instantiate a WebKitUserContentDataManager, or to obtain the default singleton instance: WebKitUserContentDataManager* webkit_user_content_data_managar_get_default (void) WebKitUserContentDataManager* webkit_user_content_data_manager_new (const char *path) * Asynchronous methods to lookup an already-compiled filter, compile a filter (with the side effect of being saved to disk), and removing a filter (with the side effect of being removed from disk): void webkit_user_content_data_manager_find_filter (WebKitUserContentDataManager*, const char *identifier, GAsyncResultCallback callback, gpointer userdata) WebKitUserContentFilter* webkit_user_content_data_manager_find_filter_finish (WebKitUserContentDataManager*, GAsyncResult *result, GError **error) void webkit_user_content_data_manager_compile_filter (WebKitUserContentDataManager*, const char *identifier, const char *json_source, GAsyncResultCallback callback, gpointer userdata) WebKitUserContentFilter* webkit_user_content_data_manager_compile_filter_finish (WebKitUserContentDataManager* GAsyncResult *result, GError **error) void webkit_user_content_data_manager_remove_filter (WebKitUserContentDataManager*, const char *identifier, GAsyncResultCallback callback, gpointer userdata) gboolean webkit_user_content_data_manager_remove_filter_finish (WebKitUserContentDataManager*, GASyncResult *result, GError **error) * Methods in WebKitUserContentManager to add/remove *active* filters. These have effect on content loaded by webviews which use the WebKitUserContentManager, but _no effect_ on the compiled filters stored on disk: void webkit_user_content_manager_add_filter (WebKitUserContentManager*, WebKitUserContentFilter*) void webkit_user_content_manager_remove_filter (WebKitUserContentManager*, WebKitUserContentFilter*) void webkit_user_content_manager_remove_all_filters (WebKitUserContentManager*) * Last by not least, by having the responsibility of compiling filters moved to the new WebKitUserContentDataManager class, WebKitUserContentFilter can be a boxed struct, and the public API only needs to provide the webkit_user_content_filter_ref() and webkit_user_content_filter_unref() methods. Note that direct instantiation would not be possible, and always done using the the lookup/compile methods from WebKitUserContentDataManager. I think this approach gives a better idea at the API level compiled filters are saved “somewhere” (i.e. on disk) because there is a WebKitUserContentDataManager class which is named in a way similar to WebKitWebsiteDataManager, which also deals with persisted data. Also, this kind of API also transmits the idea that compiling a filter and storing it on disk is a separate “process”, and it is possible to do it independently without having to instantiate webviews and so on. WDYT?
Adrian Perez
Comment 12 2017-02-07 08:49:26 PST
(In reply to comment #11) > Therefore, I think it makes sense to go all-in and also expose an API for > for the storage of compiled content extensions. That would mean adding the > following to our public API: With the API that I have just proposed, when a browser obtains a new version of a filters ruleset, it would use the following: 1. webkit_user_content_data_manager_compile() to compile the new version. This is done asynchronously and the browser can continue using the old version of the filter in the meanwhile. 2. webkit_user_content_manager_remove_filter() to stop using the old version of the filter. 3. webkit_user_content_manager_add_filter() to start using the new version of the filter. 4. webkit_user_content_data_manager_remove() to remove the old compiled version of the filter from disk. I think it could even be possible to remove the compiled filter from disk with webkit_user_content_data_manager_remove() as _first_ step while still using the filter: it will be unlinked from disk, and the space freed once the filter gets disabled by webkit_user_content_manager_remove_filter() and munmap() called and the low-level file descriptor closed. This method allows reusing the same identifier for the new version of the filter.
Michael Catanzaro
Comment 13 2017-02-07 15:34:49 PST
(In reply to comment #11) > 3b. Even if the SHA1 of the JSON source is calculated by the program > without relying on WebKitGTK+ to return it, we provide no API at > all to remove compiled content extensions from disk. So WebKitGTK+ > would be leaking all the compiled content extensions on disk. Ouch. It would be desirable that this work automatically, without applications needing to keep track of it themselves. Is that not possible...? > * Methods in WebKitUserContentManager to add/remove *active* filters. These > have > effect on content loaded by webviews which use the > WebKitUserContentManager, but > _no effect_ on the compiled filters stored on disk: Ideally applications would only think about what's happening in their web views, and not worry about what's being cached on disk. Is there no reasonable way for WebKit to know when a filter is no longer needed? If not, then perhaps applications will just have to manage it manually like you suggest.
Adrian Perez
Comment 14 2017-02-08 08:46:42 PST
(In reply to comment #13) > (In reply to comment #11) > > 3b. Even if the SHA1 of the JSON source is calculated by the program > > without relying on WebKitGTK+ to return it, we provide no API at > > all to remove compiled content extensions from disk. So WebKitGTK+ > > would be leaking all the compiled content extensions on disk. Ouch. > > It would be desirable that this work automatically, without applications > needing to keep track of it themselves. Is that not possible...? Not quite. Content extensions are essentially a completely separate entity and they do not go through the usual cache paths (e.g. there's no cache replacement or anything like that). In my understanding, this is intentional because of the following reasons: - Content extensions are, well *extensions*, which are installed due to explicit user actions (“user enables EasyList”, “user disables EasyList” (“user removes EasyList”). Therefore we *know* when an extension has to be compiled/enabled/disabled/removed. - Also, we don't want WebKit to do cache replacement or removing compiled content extensions randomly, because there is no way for WebKit to know from which source file(s) to recreate them. - In contrast, the user does not directly control data saved locally by websites, and the code of websites can change at any time out of the control of the browser. That's why for website-stored data, we consider it “a cache”, while for content extension it is “a _storage_ area”. As such, a _storage_ is permanent and explicitly controlled. > > * Methods in WebKitUserContentManager to add/remove *active* filters. These > > have > > effect on content loaded by webviews which use the > > WebKitUserContentManager, but > > _no effect_ on the compiled filters stored on disk: > > Ideally applications would only think about what's happening in their web > views, and not worry about what's being cached on disk. Is there no > reasonable way for WebKit to know when a filter is no longer needed? If not, > then perhaps applications will just have to manage it manually like you > suggest. Thing is, as I tried to explain above (and hopefully succeeded!), content extension are *not* cached data. If you think that the “EasyList adblock list” maps 1:1 to a content extension, then you want the user to be able to enable or disable it with a checkbox in your UI, or add/remove more. So in the end the user is under control, therefore the WebKit API should cater to that need. Something we can discuss is whether we allow users of this API to give their own identifier or we calculate them and return them after compilation.
Adrian Perez
Comment 15 2017-02-08 09:13:01 PST
One clarification more... When choosing the names for new API elements like WebKitUserContentDataManager my reasons were: - We already have other classes ending in *Manager (and *DataManager), let's be consistent with that. - We are already using the “UserContent” moniker for other pieces of data provided by the user (either final user or a developer who uses WebKitGTK+), like injected CSS style sheets and injected JavaScript snippets. - I did not want to have “Extension” or “Filter” in the name. The idea is that in the future we might have other “user content” data (local content not originated from a website). The WebKitUserContentDataManager name is generic, so it could handle other kinds of “user content” in the future as well. This name would potentially avoid having to add other WebKitUserContent${NAME}DataManager classes a the future. I hope this helps further clarify what's behind the naming choices :-)
Adrian Cochrane
Comment 16 2018-09-07 22:59:00 PDT
Is this bug still active? Is there anything I can do to help? According to the metadata it hasn't see any activity over the last year.
Adrian Perez
Comment 17 2018-12-10 11:35:12 PST
(In reply to Adrian Cochrane from comment #16) > Is this bug still active? Is there anything I can do to help? > > According to the metadata it hasn't see any activity over the last year. I have resumed work on this a couple of weeks ago.
Adrian Perez
Comment 18 2018-12-13 00:12:04 PST
I'm handling this as part of #167941 in the end. *** This bug has been marked as a duplicate of bug 167941 ***
Note You need to log in before you can comment on or make changes to this bug.