Bug 40302

Summary: [GTK] Memory managament for DOM GObject wrappers
Product: WebKit Reporter: Xan Lopez <xan.lopez>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: mrobinson, wk.censorship.bypass.0006, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 38576    
Attachments:
Description Flags
domcache.diff
mrobinson: review-
domcache.diff
none
domcache.diff
none
domcache.diff
none
domcache.diff none

Description Xan Lopez 2010-06-08 08:09:31 PDT
When a GObject DOM object is created we wrap the core DOM object (say, WebCore::Node) with a native GObject, add a manual C++ ref to the core object, and place the matching deref in the finalize method (note to self, this should probably be done in dispose). So far, this seems sane.

Problem is, before returning the object we do the following: we place the object in a hash map, indexed by the core object, and if a wrapper for this core object is requested again we'll just return the corresponding GObject as-is. This entry in the hash map is, also, cleared on finalize. Since we do not add extra references in this process, you can get into a situation where you have two pointers to a GObject DOM binding that are actually the same object, with only one total reference; if you do g_object_unref on objects you don't need, tragedy will ensue when you try to use the second pointer (even if it's only to also unref it).

The hash map concept seems very reasonable to me in general, so I think what we should do is to always add a new reference when we return an object, even if we are only getting it from the map. This way the rule "always unref DOM objects when you are done with them" will work.

The only remaining problem would be the huge pain in the ass that this kind of API is for C programmers, of course...
Comment 1 webkit censorship bypass 0006 2010-06-11 04:53:05 PDT
(In reply to comment #0)

> The hash map concept seems very reasonable to me in general, so I think what we should do is to always add a new reference when we return an object, even if we are only getting it from the map. This way the rule "always unref DOM objects when you are done with them" will work.

 the description of the problem is somewhat ambiguous, so here are some possible answers:

 if the intent is to create multiple gobjects which point to the same underlying webkit DOM object, this will be a complete disaster from a programming perspective by the time it gets up to high-level language bindings.

 many high-level applications rely on being able to do tests to see if a node is equal (isSameNode).

 removing such functionality can result in total project failure as, when using high-level languages, there is no other non-intrusive method for correctly identifying/comparing DOM objects.  intrusive methods include adding unique IDs to DOM Nodes, using the Node "id" field which then of course interferes with other uses (such as from javascript being executed in parallel with high-level language DOM manipulation).

 surely the problem is solved by simply increasing the refcount when returning the corresponding GObject as-is?  this seems to be the implied course of action recommended, by xan, and, if so, it is a good one.
Comment 2 Martin Robinson 2010-10-21 17:44:46 PDT
Any updates on this issue? I know we've discussed it multiple times since the last comment, but I can't recall if there was any resolution.
Comment 3 Xan Lopez 2010-10-22 18:19:04 PDT
(In reply to comment #2)
> Any updates on this issue? I know we've discussed it multiple times since the last comment, but I can't recall if there was any resolution.

We are basically stuck in finding a reasonable place to collect all the GObject wrappers created for the DOM objects in a page.
Comment 4 Xan Lopez 2010-10-22 18:20:21 PDT
(In reply to comment #3)
> (In reply to comment #2)
> > Any updates on this issue? I know we've discussed it multiple times since the last comment, but I can't recall if there was any resolution.
> 
> We are basically stuck in finding a reasonable place to collect all the GObject wrappers created for the DOM objects in a page.

This of course assumes we want some sort of automatic collection, as opposed to the manual approach suggested in the first comment (which is still trivial to do but still makes the API cumbersome).
Comment 5 Martin Robinson 2010-10-22 18:40:14 PDT
Warning: ugly solution follows. One approach would be to cache the DOM objects and then clean the cache on a timer. This relieves the user from having to worry about memory management (unless they wanted the objects to stick around) and it doesn't break the current API.
Comment 6 Xan Lopez 2010-11-18 15:03:22 PST
Created attachment 74295 [details]
domcache.diff
Comment 7 Martin Robinson 2010-11-18 16:46:12 PST
Comment on attachment 74295 [details]
domcache.diff

View in context: https://bugs.webkit.org/attachment.cgi?id=74295&action=review

Looks good, though I have a few nagging concerns (some of which I've outlined below). Another issue is what happens when a developer uses a DOM object to change the frame location. Isn't there the possibility that it will unref the DOM wrapper out from under them?

> WebCore/bindings/gobject/DOMObjectCache.cpp:72
> +    // Unrefing the objects removes them from the cache in their

Should be "Unreffing" here.

> WebCore/bindings/gobject/DOMObjectCache.cpp:77
> +        DOMObjectCacheData* data = static_cast<DOMObjectCacheData*>(iter->second);

I don't think this static_cast is necessary.

> WebCore/bindings/gobject/DOMObjectCache.cpp:78
> +        if (data && (!view || data->view == view))

Would ASSERT(data) be more appropriate, or are there cases where the data can, in fact, be NULL?

> WebCore/bindings/gobject/DOMObjectCache.cpp:101
> +    return g_object_ref(data->object);

Won't this lead to leaks, since the cache only does one unref when cleared and DOMObjectCache::get can be called multiple times before that?

> WebCore/bindings/gobject/DOMObjectCache.cpp:119
> +    put(static_cast<void*>(objectHandle), wrapper);

I don't think the cast is necessary to get void*.

> WebCore/bindings/gobject/DOMObjectCache.h:22
> +#include <glib-object.h>

Using void* instead of gpointer would prevent the need for this include, which would be a win for compile time.

> WebKit/gtk/WebCoreSupport/FrameLoaderClientGtk.cpp:703
> -    // this is only interesting once we provide an external API for the DOM
> +    if (!ready)
> +        DOMObjectCache::clearByView(getViewFromFrame(m_frame));

Doesn't this mean that when an iframe reloads, all cached DOM objects will be removed for the entire view? I think it would be better to track per-frame.
Comment 8 Xan Lopez 2010-11-19 16:00:11 PST
Created attachment 74436 [details]
domcache.diff

This should address the previous review.
Comment 9 Xan Lopez 2010-11-19 19:42:26 PST
Created attachment 74460 [details]
domcache.diff
Comment 10 Xan Lopez 2010-11-19 19:49:18 PST
Created attachment 74462 [details]
domcache.diff

Make the refCount check more explicit.
Comment 11 Martin Robinson 2010-11-20 10:21:10 PST
Comment on attachment 74462 [details]
domcache.diff

View in context: https://bugs.webkit.org/attachment.cgi?id=74462&action=review

> WebCore/bindings/gobject/DOMObjectCache.cpp:32
> +    gint refCount;

Kind of a nit, but I think I'd prefer this to be named something like timesReturned, since it isn't technically the reference count of the GObject. Also, this could be size_t.

> WebCore/bindings/gobject/DOMObjectCache.cpp:82
> +    Vector<DOMObjectCacheData*>::iterator last = toUnref.end();
> +    for (Vector<DOMObjectCacheData*>::iterator it = toUnref.begin(); it != last; ++it) {
> +        DOMObjectCacheData* data = *it;
> +        while (data->refCount-- > 0)
> +            g_object_unref(data->object);
> +    }

There's some chance here that the object might die before you finish unreffing it, right? Say the user has unreffed only some of the return values. :/ Since there doesn't appear to be a way to peak at the reference count, maybe you should install a weak reference here and stop unreffing when you detect death. Another approach would be to keep information about dead objects in the cache and clear them at some later point. Wow, this is hairy.
Comment 12 Xan Lopez 2010-11-20 14:37:04 PST
(In reply to comment #11)
> Kind of a nit, but I think I'd prefer this to be named something like timesReturned, since it isn't technically the reference count of the GObject. Also, this could be size_t.

Why size_t?

> 
> > WebCore/bindings/gobject/DOMObjectCache.cpp:82
> > +    Vector<DOMObjectCacheData*>::iterator last = toUnref.end();
> > +    for (Vector<DOMObjectCacheData*>::iterator it = toUnref.begin(); it != last; ++it) {
> > +        DOMObjectCacheData* data = *it;
> > +        while (data->refCount-- > 0)
> > +            g_object_unref(data->object);
> > +    }
> 
> There's some chance here that the object might die before you finish unreffing it, right? Say the user has unreffed only some of the return values. :/ Since there doesn't appear to be a way to peak at the reference count, maybe you should install a weak reference here and stop unreffing when you detect death. Another approach would be to keep information about dead objects in the cache and clear them at some later point. Wow, this is hairy.

I think adding a weak ref at the beginning of the process and stopping if it's triggered makes sense.
Comment 13 Xan Lopez 2010-11-20 18:15:00 PST
Created attachment 74490 [details]
domcache.diff

Added the weak_ref trick and renamed refCount to timesReturned. I changed timesReturned to a 'guint' since I'm unsure why it should be a size_t.
Comment 14 Martin Robinson 2010-11-20 21:04:35 PST
Comment on attachment 74490 [details]
domcache.diff

View in context: https://bugs.webkit.org/attachment.cgi?id=74490&action=review

Very nice! This was tricky. Only change I think you should make (other than the two ultra nits below) is to use Frame* instead of void* for all things that deal with frames.

> WebCore/bindings/gobject/DOMObjectCache.cpp:103
> +        ASSERT(data->timesReturned == 0);

I think this has to be ASSERT(!data->timesReturned); to make the style gods happy.

> WebKit/gtk/webkit/webkithittestresult.cpp:81
> +    WebKitHitTestResultPrivate* priv = WEBKIT_HIT_TEST_RESULT(object)->priv;
> +
> +    g_object_unref(priv->innerNode);
> +
> +    G_OBJECT_CLASS(webkit_hit_test_result_parent_class)->dispose(object);
> +}

I think this should be two lines to preserve vertical space:

g_object_unref(WEBKIT_HIT_TEST_RESULT(object)->priv->innerNode);
G_OBJECT_CLASS(webkit_hit_test_result_parent_class)->dispose(object);
Comment 15 Xan Lopez 2010-11-20 22:12:16 PST
Comment on attachment 74490 [details]
domcache.diff

Landed as r72492 with some minor adjustments.
Comment 16 Xan Lopez 2010-11-20 22:12:29 PST
Closing.