Bug 18205 - DOMNode objects are garbage collected although there are strong references
Summary: DOMNode objects are garbage collected although there are strong references
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit API (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: Darin Adler
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2008-03-29 04:43 PDT by Frank Illenberger
Modified: 2008-12-12 10:08 PST (History)
6 users (show)

See Also:


Attachments
Testcase demonstrating the problem. (32.54 KB, application/octet-stream)
2008-03-29 04:44 PDT, Frank Illenberger
no flags Details
Crash Log (20.01 KB, text/plain)
2008-03-29 10:29 PDT, Frank Illenberger
no flags Details
Second testcase (33.27 KB, application/octet-stream)
2008-04-05 01:31 PDT, Frank Illenberger
no flags Details
New, simpler and more reproducible test case (46.88 KB, application/octet-stream)
2008-11-05 10:44 PST, Kai Brüning
no flags Details
Fix for the resurrection bug of DOM wrapper objects (2.62 KB, patch)
2008-11-05 10:46 PST, Kai Brüning
darin: review-
Details | Formatted Diff | Diff
Second attempt to fix the resurrection bug for DOM wrapper objects (4.01 KB, patch)
2008-12-06 02:58 PST, Kai Brüning
no flags Details | Formatted Diff | Diff
Fix for the resurrection bug of DOM wrapper objects using BUILDING_ON_TIGER (3.94 KB, patch)
2008-12-07 02:19 PST, Kai Brüning
darin: review-
Details | Formatted Diff | Diff
Fix for the resurrection bug of a varity of wrapper objects (8.82 KB, patch)
2008-12-12 08:33 PST, Kai Brüning
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Illenberger 2008-03-29 04:43:39 PDT
I am using the WebKit.framework inside a ObjC-2.0 application which uses the new Leopard garbage collection. In my custom objects I am holding strong references to DOMNode objects. Sometimes when I access  these nodes using my references after they were removed from their DOMDocument, my app crashes. It seems as if the DOMNodes get collected although I am holding strong references to them.
I attached a test case that demonstrates this issue.
Comment 1 Frank Illenberger 2008-03-29 04:44:24 PDT
Created attachment 20190 [details]
Testcase demonstrating the problem.
Comment 2 Darin Adler 2008-03-29 07:41:56 PDT
I downloaded and built the test application and tried and was not yet able to reproduce this problem.
Comment 3 Frank Illenberger 2008-03-29 10:28:11 PDT
I was able to reproduce the problem on two machines so far: 
a Mac Pro 8-core, and a MacBook Core 2 Duo
On both machines, I could manage to crash the application after 3-4 repeats of the procedure.
I attached the crash log to this bug.
Comment 4 Frank Illenberger 2008-03-29 10:29:19 PDT
Created attachment 20195 [details]
Crash Log
Comment 5 Frank Illenberger 2008-03-29 11:02:06 PDT
When garbage collection is deactivated in the Xcode test project, the crash never occurs.
Comment 6 Mark Rowe (bdash) 2008-03-29 17:15:59 PDT
I was able to reproduce this with the attached test case.
Comment 7 Mark Rowe (bdash) 2008-03-29 17:17:49 PDT
The description of the bug talks about holding strong references to DOM nodes.  Where in the test application is that happening?
Comment 8 Frank Illenberger 2008-03-29 18:01:51 PDT
In the test application the NSUndoManager is holding the strong references. 
But the crash still occurs if you add more strong references by putting all created DOMNodes into an NSArray or an NSDictionary.

Comment 9 Frank Illenberger 2008-04-05 01:31:01 PDT
I managed to create a second test case that better demonstrates what is going wrong with the garbage collection. I attached it to this bug. 
To my mind, this is a severe problem. I still don't understand if it is a bug in WebKit, NSUndoManager or GC itself. 
 

Comment 10 Frank Illenberger 2008-04-05 01:31:39 PDT
Created attachment 20349 [details]
Second testcase
Comment 11 Darin Adler 2008-04-05 06:44:03 PDT
(In reply to comment #9)
> I managed to create a second test case that better demonstrates what is going
> wrong with the garbage collection. I attached it to this bug.
> To my mind, this is a severe problem. I still don't understand if it is a bug
> in WebKit, NSUndoManager or GC itself.

This second test case logs the identity of the parent, but the node, not its parent, is the object that's registered as an undo target. I'm not sure why you think the parent's wrapper shouldn't be garbage collected.
Comment 12 Darin Adler 2008-04-05 07:06:59 PDT
(In reply to comment #11)
> This second test case logs the identity of the parent, but the node, not its
> parent, is the object that's registered as an undo target. I'm not sure why you
> think the parent's wrapper shouldn't be garbage collected.

I changed my copy of the test to log the DOMText object instead of the parent, and I do see signs of what might be a problem. I think it's highly likely this is an NSUndoManager problem, not a WebKit problem. A good way to verify that would be to do the same kind of test with non-WebKit objects.
Comment 13 Darin Adler 2008-04-05 07:09:17 PDT
(In reply to comment #11)
> This second test case logs the identity of the parent, but the node, not its
> parent, is the object that's registered as an undo target. I'm not sure why you
> think the parent's wrapper shouldn't be garbage collected.

OK. My mistake. I was confusing the undo with the redo. I see that in the redo the parent node is part of the invocation. I think there's a good chance here there's a bug with how NSUndoManager handles extra parameters to more-complex NSInvocation objects. You might want to try using an approach where there's a single object that holds references to both the parent and node so you can use the simpler NSUndoManager code path. I bet that would make the bug go away.
Comment 14 Darin Adler 2008-04-05 07:26:28 PDT
OK, I have a different question for you.

Why is it that you think that objects that are put on the redo stack should never be finalized? If you put a new object on the undo stack, then I believe that makes the redo stack get emptied in which case the objects on the redo stack should be finalized soon afterward.

I was easily able to reproduce objects on the redo stack being finalized by changing the test case so the redo didn't involve DOM wrappers at all -- just a custom object, and many instances of this custom class were finalized even though all were added to the redo stack. I think that makes it pretty clear here that nothing here is about DOM wrappers. It seems more like a misunderstanding about undo/redo. But I don't get how that relates to your original crash.
Comment 15 Frank Illenberger 2008-04-05 08:35:51 PDT
Thanks for looking into this.
I did not mean to say that objects on the redo stack should never get finalized. You are completely right, that the cotents of the redo stack gets nuked after a new "do".
What happens in the test case - at least from time to time - is that directly after the first click of the button you see the following at the beginning of the log:

[Session started at 2008-04-05 17:27:56 +0200.]
2008-04-05 17:27:57.556 WebKitGCTest2[200:10b] DOMHTMLHtmlElement 0x10373c0 was put on redo stack.
2008-04-05 17:27:57.557 WebKitGCTest2[200:10b] DOMHTMLHtmlElement 0x10373c0 was finalized.
2008-04-05 17:27:57.557 WebKitGCTest2[200:10b] -----------------


Here you see that the HtmlElement wrapper object was put on the redo stack and finalized directly afterwards. But here no redo or new do was performed. So the reference to the finalized object is still on the redo stack. So performing redo in this state will definitely lead to the crash that my first test case demonstrated. You can also reproduce this in the second test case:

Just add a delayed call to redo to the test action:
- (IBAction)testAction:(id)sender
{
	[self insertNode:[document createTextNode:@"a"] parentNode:document.documentElement];
	[[NSGarbageCollector defaultCollector] collectExhaustively];
	[undoManager performSelector:@selector(undo) withObject:nil afterDelay:0.0];
	[undoManager performSelector:@selector(redo) withObject:nil afterDelay:1.0];
}

And in one out of five runs of the test app, you get a crash after the first press of the button:

[Session started at 2008-04-05 17:33:21 +0200.]
2008-04-05 17:33:23.291 WebKitGCTest2[309:10b] DOMHTMLHtmlElement 0x1013270 was put on redo stack.
2008-04-05 17:33:23.292 WebKitGCTest2[309:10b] DOMHTMLHtmlElement 0x1013270 was finalized.
2008-04-05 17:33:23.293 WebKitGCTest2[309:10b] -----------------

The Debugger has exited due to signal 11 (SIGSEGV).The Debugger has exited due to signal 11 (SIGSEGV).






Comment 16 Frank Illenberger 2008-04-05 09:16:28 PDT
Until now I have not been able to reproduce the issue using regular non-DOMWrapper objects. But as the issue seems to depend on when exactly the GC kicks in, it might be hard to recreate the same timing with regular objects. But I will keep on trying.
Comment 17 Frank Illenberger 2008-04-05 10:25:07 PDT
These crashes are not a theoretical issue for the development of our WebKit application. We are experiencing lots of undo/redo related crashes with DOM wrapper objects, like the one in the first test case.

The only workaround I could think of so far to rescue our app  is to subclass the undo manager to enable it to perform a complete collection before its actions:

@implementation MyUndoManager

- (void)superUndo
{
	[super undo];
}

- (void)undo
{
	[[NSGarbageCollector defaultCollector] collectExhaustively];
	[self performSelector:@selector(superUndo) withObject:nil afterDelay:0.0];
}

- (void)superRedo
{
	[super redo];
}

- (void)redo
{
	[[NSGarbageCollector defaultCollector] collectExhaustively];
	[self performSelector:@selector(superRedo) withObject:nil afterDelay:0.0];
}

@end

Comment 18 Kai Brüning 2008-11-05 10:44:16 PST
Created attachment 24912 [details]
New, simpler and more reproducible test case

After joining the development of the WebKit-based application in question, I hit this bug again in different circumstances.

We now have a new, much simpler and better reproducible test case (this attachment) and we think we understand the reason.

The problem is a race condition in the management of the DOMWrapper cache in WebCore/bindings/objc/DOMInternal.mm. In a garbage collected environment, wrappers are removed from the cache in -[DOMObject finalize]. This is sometimes too late, because garbage collection is done from a separate thread, but finalization is queued at the main thread for all WebScriptObject objects (see +[WebScriptObject initialize]). If a DOM wrapper object is collected and then a wrapper for the same underlying object is requested again before the wrapper object received its finalize message, it is resurrected via the DOMWrapper cache.

I attach a patch which fixes the problem by replacing the HashMap with an NSMapTable using zeroing weak memory for the wrapper value. Additionally I removed the finalize method from DOMObject, since now the garbage collector auto-removes the wrapper from the cache.

This fixes the crashes in the test case and in our application.
Comment 19 Kai Brüning 2008-11-05 10:46:28 PST
Created attachment 24913 [details]
Fix for the resurrection bug of DOM wrapper objects

See previous comment for "New, simpler and more reproducible test case"
Comment 20 Darin Adler 2008-11-05 10:51:35 PST
Comment on attachment 24913 [details]
Fix for the resurrection bug of DOM wrapper objects

Unfortunately I believe that the NSMapTable hash function and implementation is not nearly as efficient as the HashTable hash function. Because of that I'm not sure we can use this implementation. But I could be mistaken; I hope I am.
Comment 21 Kai Brüning 2008-11-05 11:16:16 PST
I was afraid this could be a problem.

Nevertheless I’d say that using zeroing weak memory in some way is a sound solution for this problem (and a solution is needed without doubt to be able to use WebKit with GC).

Unfortunately I do not have enough knowledge about Apples GC to incorporate such a solution into the existing HashTable. If you give me a few pointers to start with, I’d sure be willing to give it a try.

Comment 22 Kai Brüning 2008-12-05 03:00:46 PST
By coincidence, I found the following in Apples "Garbage Collection Programming Guide, Inapplicable Patterns", which I think applies perfectly to this problem:

Resource wrapper objects

A common pattern is to associate an object with an external resource—for example, a file descriptor—that needs "management" or other state that the object coordinates, often across several threads. The typical implementation is to use a non-retaining CFDictionary coupled with a global lock at the lookup and deallocation stages. This pattern does not work when you use garbage collection because there is a timing window during finalization where the object is no longer reachable from a root, yet is still in the dictionary and can be resurrected.

The solution is to use an NSMapTable object. A map table can hold keys, values, or both weakly, and when the objects are discovered unreachable the table is immediately cleared of such entries before any finalization is performed. This prevents resurrection of the object being finalized. For resources created and destroyed within the application, such as file descriptors, this is an adequate solution.

Comment 23 Darin Adler 2008-12-05 09:59:51 PST
Comment on attachment 24913 [details]
Fix for the resurrection bug of DOM wrapper objects

OK, I retract my earlier complaint.

The basic approach of using an NSMapTable seems fine.

I'm going to have to say review-, though, because this code uses things in NSMapTable.h that are supported in Leopard and newer only, and we need code that will also work in Tiger. I believe that we can do most of the work by using the old map table API, the functions such as NSMapRemove, NSMapInsert and NSMapGet. We'll only need to use the new API for creating the new map table and we can use BUILDING_ON_TIGER to make a version that works on Tiger too.

Also, there are tabs in the ChangeLog. Please leave those out next time around.
Comment 24 Alexey Proskuryakov 2008-12-05 10:06:27 PST
See also: <rdar://problem/5528824> (only accessible to Apple employees, sorry).
Comment 25 Kai Brüning 2008-12-06 02:58:30 PST
Created attachment 25809 [details]
Second attempt to fix the resurrection bug for DOM wrapper objects 

Oops, sorry, I overlooked the Tiger-requirement.

I changed the implementation mostly as suggested.

What I do not understand is the suggestion about using BUILDING_ON_TIGER, because this is a compile time switch and I’d say we need a runtime check to create a single binary which works both under Tiger and Leopard. I implemented such a runtime check, but wasn’t able to test it under Tiger because I am lacking a suitable Tiger system (if only Apple wouldn’t be so restrictive about virtualizing Mac OS X...).

I do not know whether my runtime check is the prefered method for such checks in WebKit. Please advice.
Comment 26 David Kilzer (:ddkilzer) 2008-12-06 03:07:19 PST
(In reply to comment #25)
> What I do not understand is the suggestion about using BUILDING_ON_TIGER,
> because this is a compile time switch and I’d say we need a runtime check to
> create a single binary which works both under Tiger and Leopard. I implemented
> such a runtime check, but wasn’t able to test it under Tiger because I am
> lacking a suitable Tiger system (if only Apple wouldn’t be so restrictive about
> virtualizing Mac OS X...).
> 
> I do not know whether my runtime check is the prefered method for such checks
> in WebKit. Please advice.

There is no requirement for a single compiled binary to work on both Tiger and Leopard.  Separate copies of the WebKit framework are compiled for each operating system.  Please look at other uses of BUILDING_ON_TIGER in WebKit sources.
Comment 27 Kai Brüning 2008-12-07 02:19:49 PST
Created attachment 25829 [details]
Fix for the resurrection bug of DOM wrapper objects using BUILDING_ON_TIGER

Got it - that’s the different perspective of a third party application developer.

The patch now uses BUILDING_ON_TIGER.
Comment 28 Darin Adler 2008-12-07 15:08:23 PST
Comment on attachment 25829 [details]
Fix for the resurrection bug of DOM wrapper objects using BUILDING_ON_TIGER

> +    NSMapTableKeyCallBacks keyCallBacks;
> +    keyCallBacks.hash = 0;      // use pointer hashing
> +    keyCallBacks.isEqual = 0;   // use pointer equality
> +    keyCallBacks.retain = 0;
> +    keyCallBacks.release = 0;
> +    keyCallBacks.describe = 0;
> +    keyCallBacks.notAKeyMarker = 0;
> +    
> +    NSMapTableValueCallBacks valueCallBacks;
> +    valueCallBacks.retain = 0;
> +    valueCallBacks.release = 0;
> +    valueCallBacks.describe = describeDOMWrapper;
> +    
> +    DOMWrapperCache = NSCreateMapTable(keyCallBacks, valueCallBacks, 0);

You should use NSNonOwnedPointerMapKeyCallBacks and NSNonRetainedObjectMapValueCallBacks instead of defining your own.

> +    // Use the GC-aware NSMapTable if available (that is, under Leopard and newer).

We normally prefer to have comments like to explain the old version, and not the new one. Eventually we'll delete the old version, and at that point the comment about "if available" is a little strange; it's better if it just gets deleted along with the old unused code.

> +    DOMWrapperCache = [[NSMapTable alloc]
> +                       initWithKeyOptions:NSPointerFunctionsOpaqueMemory | NSPointerFunctionsOpaquePersonality
> +                       valueOptions:NSPointerFunctionsZeroingWeakMemory | NSPointerFunctionsObjectPersonality
> +                       capacity:0];

This is not typical indentation style for code in the WebKit project; we'd normally just indent subsequent lines 4 spaces and not try to line up with the opening square bracket.

Won't the DOMWrapperCache object itself get garbage collected? Does the GC runtime properly handle an object referenced only by a global variable without an explicit call to CFRetain? In the past we've had to explicitly CFRetain in some cases like this.

> -    if (!DOMWrapperCache)
> -        return nil;
> -    return DOMWrapperCache->get(impl);
> +    return DOMWrapperCache ? static_cast<NSObject*> (NSMapGet(DOMWrapperCache, impl)) : nil;

There shouldn't be a space between the > character and the ( character. It's also unnecessary to change this to ?: since the old if statement would have worked fine as is. Unless there's something better about the new style. I think part of the reason for the old style was to be consistent with the other two functions.

>  void addDOMWrapper(NSObject* wrapper, DOMObjectInternal* impl)
>  {
>      if (!DOMWrapperCache)
> -        DOMWrapperCache = new DOMWrapperMap;
> -    DOMWrapperCache->set(impl, wrapper);
> +        createDOMWrapperCache();
> +
> +    NSMapInsert(DOMWrapperCache, impl, wrapper);
>  }

No need to add the blank line here. Better to stay as close to the old version as possible.

>  void removeDOMWrapper(DOMObjectInternal* impl)
>  {
> -    if (!DOMWrapperCache)
> -        return;
> -    DOMWrapperCache->remove(impl);
> +    assert(DOMWrapperCache);
> +    NSMapRemove(DOMWrapperCache, impl);
>  }

The use of "assert" here is incorrect. We use the ASSERT (capitalized) macro from <wtf/Assertions.h>, not the assert macro from <assert.h>.

The old code was safe to call even if the DOM wrapper hadn't been added earlier. The new version will crash instead. I think it's not likely that the old behavior was ever helpful, but it seems unnecessary to change the code to be stricter in this way.

This is fine, but what about the identical code in WebScriptObject.mm that maintains JSWrapperMap? DOMObject is a subclass of WebScriptObject, so that code will be running too.

How did you test this?

I'm going to say review- mainly because of the JSWrapperMap issue. I also have some questions about the basic assumptions in this patch. I'll follow up with those.
Comment 29 Darin Adler 2008-12-07 15:10:46 PST
(In reply to comment #28)
> I also have
> some questions about the basic assumptions in this patch. I'll follow up with
> those.

Sorry. I take that back. Basic assumptions here are fine. But we do need to do the same thing with WebScriptObject and JSWrapperMap. We might want to put the map creation function in a header somewhere so both files can share the same creation function (with the Tiger vs. post-Tiger cases handled).
Comment 30 Darin Adler 2008-12-07 15:14:08 PST
(In reply to comment #29)
> But we do need to do
> the same thing with WebScriptObject and JSWrapperMap.

And the same thing has to be done a third time with wrapperCache in DOMRGBColor.mm, but there the keys are 32-bit integers rather than pointers.
Comment 31 Kai Brüning 2008-12-12 08:33:33 PST
Created attachment 25982 [details]
Fix for the resurrection bug of a varity of wrapper objects

New fix with the changes according to Darins comments and including both the caches in WebScriptObject.mm and DOMRGBColor.mm.

I followed Darins though to share the creation function for the caches. I put it into DOMInternal.h because this header is included in all three places anyway. The function is parameterized by the key personality.


> You should use NSNonOwnedPointerMapKeyCallBacks and
> NSNonRetainedObjectMapValueCallBacks instead of defining your own.

Sure, done. Just didn’t know of their existence.

> We normally prefer to have comments like to explain the old version, and not
> the new one. Eventually we'll delete the old version, and at that point the
> comment about "if available" is a little strange; it's better if it just gets
> deleted along with the old unused code.

Done.

> This is not typical indentation style for code in the WebKit project; we'd
> normally just indent subsequent lines 4 spaces and not try to line up with the
> opening square bracket.

Done - but it’s ugly now.

> Won't the DOMWrapperCache object itself get garbage collected? Does the GC
> runtime properly handle an object referenced only by a global variable without
> an explicit call to CFRetain? In the past we've had to explicitly CFRetain in
> some cases like this.

From the GC guide:

"Typically, the garbage collector treats global object pointers as root objects and so does not consider them candidates for collection (see “Root Set and Reference Types”). Globals of Objective-C objects or other __strong pointer variables, and function-level static variables, are written to with a write-barrier. Note that although this is true for Objective-C or Objective-C++, writing to globals from C or C++ is not supported."

Therefore we should be fine as long as the cache objects are created in .mm files.

> There shouldn't be a space between the > character and the ( character. 

Done.

> It's also unnecessary to change this to ?: since the old if statement would have
> worked fine as is. Unless there's something better about the new style. I think
> part of the reason for the old style was to be consistent with the other two
> functions.

Sure - this happend because I took a detour in the code. Changed it back.

> The use of "assert" here is incorrect. We use the ASSERT (capitalized) macro
> from <wtf/Assertions.h>, not the assert macro from <assert.h>.

Copied it from a few lines below. But now it’s gone again anyway.

> The old code was safe to call even if the DOM wrapper hadn't been added
> earlier. The new version will crash instead. I think it's not likely that the
> old behavior was ever helpful, but it seems unnecessary to change the code to
> be stricter in this way.

Makes sense - done.

> This is fine, but what about the identical code in WebScriptObject.mm that
> maintains JSWrapperMap? DOMObject is a subclass of WebScriptObject, so that
> code will be running too.

Done.

> How did you test this?

Mainly by using the Leopard build with my project, which used to crash due to this bug before and now no longer does.
The Tiger build is untested because unfortunately I do not have a suitable Tiger system.
Is there any way to build for Tiger under Leopard?
Comment 32 Darin Adler 2008-12-12 09:23:15 PST
Comment on attachment 25982 [details]
Fix for the resurrection bug of a varity of wrapper objects

Looking good.

> +2008-12-12  Kai  <set EMAIL_ADDRESS environment variable>

This needs your full name and email address.

> +        WARNING: NO TEST CASES ADDED OR CHANGED

You should remove this warning and replace it with the explanation of why there's no regression test.

The change log should also mention the URL of the bug.

> +#ifdef BUILDING_ON_TIGER
> +// Declarations needed by createWrapperCache()
> +enum {
> +    NSPointerFunctionsOpaquePersonality  = (1 << 8),
> +    NSPointerFunctionsIntegerPersonality = (5 << 8)
> +};
> +typedef NSUInteger NSPointerFunctionsOptions;
> +#endif

Will these constants actually work on Tiger? Can they be defined in terms of something in a header that will be present on Tiger?

I think we should just have two functions, one for object keys and one for integer keys. Inside the file they can both call a common function.

Or if you really think that one function is best, we should just use a boolean (false for objects, true for integers) or our own enum for the argument to our createWrapperCache function; if would be good to keep this relatively messy thing out of the header.

The patch otherwise looks great. I think I'm going to say review+ and land this myself with the fixes above.
Comment 33 Darin Adler 2008-12-12 09:23:42 PST
<rdar://problem/6441200>
Comment 34 Darin Adler 2008-12-12 10:08:08 PST
http://trac.webkit.org/changeset/39246