Bug 29134

Summary: [GTK] Add API to access sub resources
Product: WebKit Reporter: Gustavo Noronha (kov) <gustavo>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: jmalonzo, xan.lopez
Priority: P2 Keywords: Gtk
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
Attachments:
Description Flags
proposed API
gustavo: commit-queue-
save as... implementation for epiphany
gustavo: commit-queue-
comments addressed, tests added jmalonzo: review+, gustavo: commit-queue-

Gustavo Noronha (kov)
Reported 2009-09-10 11:43:16 PDT
We now have API to peek into the data that is used to render the page, but only the main resource is currently accessible. It's important to also be able to peek at the images, css, javascript, and all other files that compose the page.
Attachments
proposed API (16.90 KB, patch)
2009-09-10 12:10 PDT, Gustavo Noronha (kov)
gustavo: commit-queue-
save as... implementation for epiphany (10.97 KB, patch)
2009-09-10 12:14 PDT, Gustavo Noronha (kov)
gustavo: commit-queue-
comments addressed, tests added (25.11 KB, patch)
2009-09-11 15:29 PDT, Gustavo Noronha (kov)
jmalonzo: review+
gustavo: commit-queue-
Gustavo Noronha (kov)
Comment 1 2009-09-10 12:10:48 PDT
Created attachment 39366 [details] proposed API Points worth noting: WebView is used to store the hash table because the DataSource object is only created after the first resource was already created; I would appreciate suggestions on where to place the hash table in a way that we don't need to get to the WebView from the DataSource, since I believe the API should be in the DataSource. Some things are done in a way that seems a bit convoluted (the main resource handling, for instance), but that is because I really want the resource objects to be exactly the same through their lifetime, so that you can keep objects you get in resource-request-starting, and even compare them with other references to know if they are the same.
Gustavo Noronha (kov)
Comment 2 2009-09-10 12:14:01 PDT
Created attachment 39367 [details] save as... implementation for epiphany This is the save operation in Epiphany implemented using the sub resources API. It's worth noting that if we can agree on the API for 1.1.15, that should probably release with GNOME 2.28, we can get this in to provide this feature for Epiphany soon rather than later. As long as the implementation works and is stable, it can be perfected/refactored afterwards.
Xan Lopez
Comment 3 2009-09-10 13:37:23 PDT
Comment on attachment 39366 [details] proposed API > @@ -799,6 +808,11 @@ void FrameLoaderClient::finishedLoading(WebCore::DocumentLoader* documentLoader) > > void FrameLoaderClient::provisionalLoadStarted() > { > + WebKitWebView* webView = getViewFromFrame(m_frame); > + > + if (m_frame == webkit_web_view_get_main_frame(webView)) > + webkit_web_view_clear_resources(webView); > + > notImplemented(); > } My usual question, what else do you need to do here? Either add a comment ar remove the notImplemented(). > > @@ -824,6 +838,23 @@ void FrameLoaderClient::dispatchDidFinishLoading(WebCore::DocumentLoader* loader > { > static_cast<WebKit::DocumentLoader*>(loader)->decreaseLoadCount(identifier); > > + WebKitWebView* webView = getViewFromFrame(m_frame); > + GOwnPtr<gchar> identifierString(identifierToString(identifier)); > + WebKitWebResource* webResource = webkit_web_view_get_resource(webView, identifierString.get()); > + > + const char* uri = webkit_web_resource_get_uri(webResource); > + RefPtr<ArchiveResource> coreResource(loader->subresource(KURL(KURL(), uri))); > + > + // If coreResource is NULL here, the resource failed to load, > + // unless it's the main resource. > + if (!coreResource && (webResource != webkit_web_view_get_main_resource(webView))) You don't really need the parenthesis. > + > + webkit_web_resource_init_with_core_resource(webResource, coreResource.get()); > + > // FIXME: This function should notify the application that the resource > // finished loading, maybe using a load-status property in the > // WebKitWebResource object, similar to what we do for WebKitWebFrame' > diff --git a/WebKit/gtk/webkit/webkitwebresource.cpp b/WebKit/gtk/webkit/webkitwebresource.cpp > index 98e781a..17bead1 100644 > --- a/WebKit/gtk/webkit/webkitwebresource.cpp > +++ b/WebKit/gtk/webkit/webkitwebresource.cpp > @@ -60,12 +60,14 @@ G_DEFINE_TYPE(WebKitWebResource, webkit_web_resource, G_TYPE_OBJECT); > static void webkit_web_resource_get_property(GObject* object, guint prop_id, GValue* value, GParamSpec* pspec); > static void webkit_web_resource_set_property(GObject* object, guint prop_id, const GValue* value, GParamSpec* pspec); > > -static void webkit_web_resource_cleanup(WebKitWebResource* webResource) > +static void webkit_web_resource_cleanup(WebKitWebResource* webResource, bool isFinalize) > { > WebKitWebResourcePrivate* priv = webResource->priv; > > - g_free(priv->uri); > - priv->uri = NULL; > + if (isFinalize) { > + g_free(priv->uri); > + priv->uri = NULL; > + } > > g_free(priv->mimeType); > priv->mimeType = NULL; > @@ -98,7 +100,7 @@ static void webkit_web_resource_finalize(GObject* object) > { > WebKitWebResource* webResource = WEBKIT_WEB_RESOURCE(object); > > - webkit_web_resource_cleanup(webResource); > + webkit_web_resource_cleanup(webResource, true); > > G_OBJECT_CLASS(webkit_web_resource_parent_class)->finalize(object); > } > @@ -230,6 +232,20 @@ WebKitWebResource* webkit_web_resource_new_with_core_resource(PassRefPtr<Archive > return webResource; > } > Per discussion on IRC it seems this is not useful, and that even cleaning up the resource on init_with_core_resource is also not useful, so I'd say leave it out. Looks good otherwise, so 1/2 r=me (but we need tests!).
Gustavo Noronha (kov)
Comment 4 2009-09-11 15:29:22 PDT
Created attachment 39481 [details] comments addressed, tests added
Jan Alonzo
Comment 5 2009-09-11 18:15:31 PDT
Comment on attachment 39481 [details] comments addressed, tests added > + > + https://bugs.webkit.org/show_bug.cgi?id=29134 > + [GTK] Add API to access sub resources > + > + Implement getting subresources, and improve testing of > + main, and sub resources loading. > + > + * WebCoreSupport/FrameLoaderClientGtk.cpp: > + (WebKit::identifierToString): > + (WebKit::FrameLoaderClient::dispatchWillSendRequest): > + (WebKit::FrameLoaderClient::assignIdentifierToInitialRequest): > + (WebKit::FrameLoaderClient::provisionalLoadStarted): > + (WebKit::FrameLoaderClient::dispatchDidFinishLoading): > + * tests/testwebresource.c: > + (server_callback): > + (resource_request_starting_cb): > + (load_finished_cb): > + (test_web_resource_loading): > + (resource_request_starting_sub_cb): > + (load_finished_sub_cb): > + (idle_quit_loop_cb): > + (test_web_resource_sub_resource_loading): > + (main): > + * webkit/webkitprivate.h: > + * webkit/webkitwebdatasource.cpp: > + (webkit_web_data_source_get_main_resource): > + (webkit_web_data_source_get_sub_resources): > + * webkit/webkitwebdatasource.h: > + * webkit/webkitwebresource.cpp: > + (webkit_web_resource_init_with_core_resource): > + (webkit_web_resource_get_data): > + * webkit/webkitwebview.cpp: > + (webkit_web_view_dispose): > + (webkit_web_view_finalize): > + (webkit_web_view_init): > + (webkit_web_view_add_resource): > + (webkit_web_view_get_resource): > + (webkit_web_view_get_main_resource): > + (webkit_web_view_clear_resources): > + (webkit_web_view_get_sub_resources): No need to check the method names unless they have specific comments. > +static char* identifierToString(unsigned long identifier) Maybe name this toString instead. > + if (!redirectResponse.isNull()) { > + // This is a redirect, so we need to update the WebResource's knowledge > + // of the URI. > + g_free(webResource->priv->uri); > + webResource->priv->uri = g_strdup(request.url().string().utf8().data()); I would've preferred to set the property instead of accessing private members like this. > -void FrameLoaderClient::assignIdentifierToInitialRequest(unsigned long, WebCore::DocumentLoader*, const ResourceRequest&) > +void FrameLoaderClient::assignIdentifierToInitialRequest(unsigned long identifier, WebCore::DocumentLoader*, const ResourceRequest& request) > { > - notImplemented(); > + webkit_web_view_add_resource(getViewFromFrame(m_frame), identifierToString(identifier), > + WEBKIT_WEB_RESOURCE(g_object_new(WEBKIT_TYPE_WEB_RESOURCE, "uri", request.url().string().utf8().data(), 0))); > } > > void FrameLoaderClient::postProgressStartedNotification() > @@ -799,7 +815,10 @@ void FrameLoaderClient::finishedLoading(WebCore::DocumentLoader* documentLoader) > > void FrameLoaderClient::provisionalLoadStarted() > { > - notImplemented(); > + WebKitWebView* webView = getViewFromFrame(m_frame); > + > + if (m_frame == webkit_web_view_get_main_frame(webView)) > + webkit_web_view_clear_resources(webView); > } > > void FrameLoaderClient::didFinishLoad() { > @@ -824,6 +843,23 @@ void FrameLoaderClient::dispatchDidFinishLoading(WebCore::DocumentLoader* loader > { > static_cast<WebKit::DocumentLoader*>(loader)->decreaseLoadCount(identifier); > > + WebKitWebView* webView = getViewFromFrame(m_frame); > + GOwnPtr<gchar> identifierString(identifierToString(identifier)); > + WebKitWebResource* webResource = webkit_web_view_get_resource(webView, identifierString.get()); > + > + const char* uri = webkit_web_resource_get_uri(webResource); > + RefPtr<ArchiveResource> coreResource(loader->subresource(KURL(KURL(), uri))); > + > + // If coreResource is NULL here, the resource failed to load, > + // unless it's the main resource. > + if (!coreResource && webResource != webkit_web_view_get_main_resource(webView)) You could just get the main resource from the data source here. > + if (!coreResource) > + coreResource = loader->mainResource().releaseRef(); > + > + webkit_web_resource_init_with_core_resource(webResource, coreResource.get()); > + What's this for? > diff --git a/WebKit/gtk/webkit/webkitprivate.h b/WebKit/gtk/webkit/webkitprivate.h > index c471714..38c6869 100644 > --- a/WebKit/gtk/webkit/webkitprivate.h > +++ b/WebKit/gtk/webkit/webkitprivate.h > @@ -140,6 +140,12 @@ extern "C" { > > gboolean disposing; > gboolean usePrimaryForPaste; > + > + // These are hosted here because the DataSource object is > + // created too late in the frame loading process. Can you please expound more about when we need the resource and why it's too late in the frame loading process? Can we have a resource without a data source/document loader? Are we doing something different from Mac here? > + WebKitWebResource* mainResource; > + char* mainResourceIdentifier; > + GHashTable* subResources; Same. Can we have a main resource without a data source/document loader? > webkit_web_resource_new_with_core_resource(PassRefPtr<WebCore::ArchiveResource>); > > + void > + webkit_web_resource_init_with_core_resource(WebKitWebResource*, PassRefPtr<WebCore::ArchiveResource>); > + You could just add a WebResource param to _new_with_core_resource and check for NULL in the impl. > + void > + webkit_web_view_add_resource(WebKitWebView*, char*, WebKitWebResource*); > + > + WebKitWebResource* > + webkit_web_view_get_resource(WebKitWebView*, char*); > + > + WebKitWebResource* > + webkit_web_view_get_main_resource(WebKitWebView*); > + > + void > + webkit_web_view_clear_resources(WebKitWebView*); > + > + GList* > + webkit_web_view_get_sub_resources(WebKitWebView*); > - RefPtr<ArchiveResource> coreResource = priv->loader->mainResource(); > + if (priv->mainresource) > + return priv->mainresource; > > - ASSERT(coreResource); > + WebKitWebFrame* webFrame = webkit_web_data_source_get_web_frame(webDataSource); > + WebKitWebView* webView = getViewFromFrame(webFrame); > > - if (priv->mainresource) > - g_object_unref(priv->mainresource); > + priv->mainresource = WEBKIT_WEB_RESOURCE(g_object_ref(webkit_web_view_get_main_resource(webView))); > > - priv->mainresource = webkit_web_resource_new_with_core_resource(coreResource.release()); How is the view mainresource different from the DS main resource? r-. We need to clarify why we can't use DS for these and expound more on difference between WebView resources to DS resources.
Gustavo Noronha (kov)
Comment 6 2009-09-12 08:00:03 PDT
(In reply to comment #5) > No need to check the method names unless they have specific comments. This is news to me, but I am happy to remove that crap from my commits =). > > +static char* identifierToString(unsigned long identifier) > > Maybe name this toString instead. Sounds like a plan. > > + g_free(webResource->priv->uri); > > + webResource->priv->uri = g_strdup(request.url().string().utf8().data()); > > I would've preferred to set the property instead of accessing private members > like this. This involves removing the G_PARAM_CONSTRUCT_ONLY from the property. Are you OK with doing that? > > + // If coreResource is NULL here, the resource failed to load, > > + // unless it's the main resource. > > + if (!coreResource && webResource != webkit_web_view_get_main_resource(webView)) > > You could just get the main resource from the data source here. Yeah, but then I have to first get the DataSource object. Since I already have the webView here, this seemed sensible =). > > + if (!coreResource) > > + coreResource = loader->mainResource().releaseRef(); > > + > > + webkit_web_resource_init_with_core_resource(webResource, coreResource.get()); > > + > > What's this for? When the WebResource is first created (in willSendRequest before this patch, in assignIdentifier in this patch), WebCore still does not have the ArchiveResource object for the resource. That object is only created when the resource load is finished. Here we fill the WebResource object we created at assignIdentifier, and gave to the user during willSendRequest with the actual, interesting, data. > > + // These are hosted here because the DataSource object is > > + // created too late in the frame loading process. > > Can you please expound more about when we need the resource and why it's too > late in the frame loading process? Can we have a resource without a data > source/document loader? > Are we doing something different from Mac here? Sure. Yeah, Mac doesn't use an object to represent the resource itself, it uses an 'index' object, that you can use to request the object, and it also stores these objects in its WebView, not in the DataSource, so we are doing exactly like Mac in this regard. The problem here is that, when a load starts, the DataSource is still not created when the main resource starts loading (during, say, assignIdentifier). > > + WebKitWebResource* mainResource; > > + char* mainResourceIdentifier; > > + GHashTable* subResources; > > Same. Can we have a main resource without a data source/document loader? Yeah. > > webkit_web_resource_new_with_core_resource(PassRefPtr<WebCore::ArchiveResource>); > > > > + void > > + webkit_web_resource_init_with_core_resource(WebKitWebResource*, PassRefPtr<WebCore::ArchiveResource>); > > + > > You could just add a WebResource param to _new_with_core_resource and check for > NULL in the impl. Right, but I am not creating a new WebResource object there, so the semantics are quite different. If you look at init_with_core_resource's implementation you'll see there's no g_object_new call there. The important thing to keep in mind here is that we want the same WebResource object to remain throughout the various signals/delegates , so that the user can keep a reference the object he got in resource-request-starting, and even compare his pointer to a pointer he gets later on to know if they are the same. This is important since we don't give the identifier to the user - the identifier is the pointer. > > - g_object_unref(priv->mainresource); > > + priv->mainresource = WEBKIT_WEB_RESOURCE(g_object_ref(webkit_web_view_get_main_resource(webView))); > > > > - priv->mainresource = webkit_web_resource_new_with_core_resource(coreResource.release()); > > How is the view mainresource different from the DS main resource? It's not, they are exactly the same. The view is just holding the stuff, because it exists before the DS is created. DS is still the API that exposes them to the user. This is similar to what Mac is doing, like I said: void WebFrameLoaderClient::dispatchWillSendRequest(DocumentLoader* loader, unsig ned long identifier, ResourceRequest& request, const ResourceResponse& redirectR esponse) { [...] if (implementations->willSendRequestFunc) request = (NSURLRequest *)CallResourceLoadDelegate(implementations->willSendRequestFunc, webView, @selector(webView:resource:willSendRequest:redirectResponse:fromDataSource:), [webView _objectForIdentifier:identifier], request.nsURLRequest(), redirectResponse.nsURLResponse(), dataSource(loader)); } Notice the [webView _objectForIdentifier:identifier] message here =). > r-. We need to clarify why we can't use DS for these and expound more on > difference between WebView resources to DS resources. I hope I have clarified these points in the comments here.
Jan Alonzo
Comment 7 2009-09-14 12:57:34 PDT
Comment on attachment 39481 [details] comments addressed, tests added Ok. Possibly renaming get_sub_resources to get_subresources. Also, please fix the agreed changes in the previous comment before landing. r=me.
Gustavo Noronha (kov)
Comment 8 2009-09-14 19:13:19 PDT
Landed as r48382. I am not sure there was agreement on whether we should make the uri property non-CONSTRUCT_ONLY, so I left it as it was, but we can change it if you guys think it's a good choice.
Note You need to log in before you can comment on or make changes to this bug.