Bug 87165 - Crash in fast/files/read tests during Garbage Collection
Summary: Crash in fast/files/read tests during Garbage Collection
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2012-05-22 14:16 PDT by Michael Saboff
Modified: 2012-06-11 15:29 PDT (History)
2 users (show)

See Also:


Attachments
Patch (5.64 KB, patch)
2012-05-22 14:25 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch (13.87 KB, patch)
2012-06-06 11:36 PDT, Eric U.
no flags Details | Formatted Diff | Diff
Patch (13.92 KB, patch)
2012-06-06 16:12 PDT, Eric U.
msaboff: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2012-05-22 14:16:35 PDT
The file reader tests fast/files/read-blob-async.html and fast/files/read-file-async.html crash when COLLECT_ON_EVERY_ALLOCATION is turned on.

It appears that the overridden FileReader::hasPendingActivity() returns false when m_state is set to DONE, which can allow the related JSFileReader object to be collected.  From inspection, FileWriter has a similar issue.

This can be fixed by eliminating the overriding hasPendingActivity() and using the ActiveDOMObject::hasPendingActivity() and placing setPendingActivity() /unsetPendingActivity() at appropriate places in the code.

<rdar://problem/11501481>
Comment 1 Michael Saboff 2012-05-22 14:25:03 PDT
Created attachment 143361 [details]
Patch
Comment 2 Alexey Proskuryakov 2012-05-22 15:52:23 PDT
What is the situation where m_state is set to DONE, script doesn't have a reference to FileReader, and yet the wrapper needs to be protected? Does the collection happen while a callback is executed?
Comment 3 Michael Saboff 2012-05-22 20:52:49 PDT
(In reply to comment #2)
> What is the situation where m_state is set to DONE, script doesn't have a reference to FileReader, and yet the wrapper needs to be protected? Does the collection happen while a callback is executed?

Yes, a collection can happen when one of the event handlers is running.
Comment 4 Alexey Proskuryakov 2012-05-23 10:49:59 PDT
Interesting. I wonder why the object's wrapper is not protected automatically when it's a target of an event that's being handled. Could this be a bug elsewhere?
Comment 5 Michael Saboff 2012-05-23 11:03:02 PDT
(In reply to comment #4)
> Interesting. I wonder why the object's wrapper is not protected automatically when it's a target of an event that's being handled. Could this be a bug elsewhere?

The problem occurs because the garbage collection happens during GC after the event handler start executing. Because hasPendingActivity() returns false, the JSFileReader and all the event handlers have been GC'ed.  When we invoke the handlers, we need a JSFileReader object and we create a new one, since the old one is gone.  The new one now references the handlers via the FileReader. The error is actually that these handlers themselves were garbage collected.  In all the traces I've seen, we actually die because the ScopeChainNode attached to the JSFunction object has been collected and put on a free list (zapped).  We fail a GC sanity check when visiting the ScopeChainNode.  The JSFunction has been collected, but its memory hasn't been reused.

Remember, we need to induce this crash by turning on COLLECT_ON_EVERY_ALLOCATION, (which really doesn't GC on every allocation but makes GCs happen much more often).  The timing window is small and can be induced by COLLECT_ON_EVERY_ALLOCATION, but it is still a vulnerability.

I think there is a possibility that this issue exists in other code. The pattern to look for is executing a handler after the object's hasPendingActivity() would return false.  I will be looking for this pattern in other code.
Comment 6 Alexey Proskuryakov 2012-05-23 11:36:21 PDT
My question is why JSFileReader is not protected through event.target attribute. I think that it should be.
Comment 7 Alexey Proskuryakov 2012-05-23 12:16:53 PDT
Comment on attachment 143361 [details]
Patch

Discussed this in person. The issue is that garbage collection can happen after state is set to DONE before window.event.target chain is fully established.

Also, the suggested design is similar to what XMLHttpRequest does, which is good for consistency.

r=me, but please add comments explaining why unsetPendingActivity are in the places they are in.
Comment 8 Michael Saboff 2012-05-23 14:58:57 PDT
Committed r118253: <http://trac.webkit.org/changeset/118253>
Comment 9 Eric U. 2012-06-05 10:10:02 PDT
Comment on attachment 143361 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=143361&action=review

I think the safe way to do this is:
	Call setPendingActivity in doOperation, at the calls to writer()->write() and writer()->truncate().
	Don't call setPendingActivity for an abort.
	Call unsetPendingActivity in didFail, didTruncate, and didWrite /after/ the check for m_operationInProgress == OperationAbort.
	Call unsetPendingActivity in completeAbort /after/ the call to doOperation.

This needs a test, too, ideally one that would have crashed or leaked given what was just committed.

> Source/WebCore/Modules/filesystem/FileWriter.cpp:94
> +    setPendingActivity(this);

I'm sorry for the late response, but I don't think this is correct.  You're incrementing the refcount when stop() is called, but if you've gotten to this code, there's a write in progress, so the refcount was already incremented.

> Source/WebCore/Modules/filesystem/FileWriter.cpp:301
> +    unsetPendingActivity(this);

You're dereffing the object here, but if this code is hit as the result of an abort call, there may be an outstanding didFail call that will arrive shortly.
I say "may" because the abort may have been called when m_operationPending was already abort [if the user did write-abort-write-abort before the first abort completed].  If this deref kills the object, we'll crash when the didFail hits.
Comment 10 Eric U. 2012-06-05 10:10:42 PDT
See code review comments.
Comment 11 Michael Saboff 2012-06-05 11:14:15 PDT
(In reply to comment #9)
> (From update of attachment 143361 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=143361&action=review
> 
> I think the safe way to do this is:
>     Call setPendingActivity in doOperation, at the calls to writer()->write() and writer()->truncate().
>     Don't call setPendingActivity for an abort.
>     Call unsetPendingActivity in didFail, didTruncate, and didWrite /after/ the check for m_operationInProgress == OperationAbort.
>     Call unsetPendingActivity in completeAbort /after/ the call to doOperation.
> 
> This needs a test, too, ideally one that would have crashed or leaked given what was just committed.
> 
> > Source/WebCore/Modules/filesystem/FileWriter.cpp:94
> > +    setPendingActivity(this);
> 
> I'm sorry for the late response, but I don't think this is correct.  You're incrementing the refcount when stop() is called, but if you've gotten to this code, there's a write in progress, so the refcount was already incremented.
> 
> > Source/WebCore/Modules/filesystem/FileWriter.cpp:301
> > +    unsetPendingActivity(this);
> 
> You're dereffing the object here, but if this code is hit as the result of an abort call, there may be an outstanding didFail call that will arrive shortly.
> I say "may" because the abort may have been called when m_operationPending was already abort [if the user did write-abort-write-abort before the first abort completed].  If this deref kills the object, we'll crash when the didFail hits.

Thanks for the suggestions.  I will investigate these suggested changes and improving the tests.
Comment 12 Michael Saboff 2012-06-05 11:56:17 PDT
It seems to me that the placement for unsetPendingActivity(this) in didWrite() needs to be inside the if (complete && numAborts == m_numAborts) then block after the signalCompletion call.  The other unsetPendingActivity() calls need to be after signalCompletion().  That way the event handlers are called when hasPendingActivity() returns true.
Comment 13 Eric U. 2012-06-05 13:14:37 PDT
(In reply to comment #12)
> It seems to me that the placement for unsetPendingActivity(this) in didWrite() needs to be inside the if (complete && numAborts == m_numAborts) then block after the signalCompletion call.  The other unsetPendingActivity() calls need to be after signalCompletion().  That way the event handlers are called when hasPendingActivity() returns true.

You do want to restrict it to if (complete), but I don't think you want to guard on numAborts.  You don't really care about those, because: you're just decrementing for the current operation's increment; aborting in the onwrite handler won't actually result in a didFail call, since there wasn't a write in progress, because complete was true; if there's both an abort and a write call during the onwrite handler...damn, that looks like a bug.  We'll set m_operationInProgress to OperationAbort, but then we'll never get a didFail call.

I think we want to set m_operationInProgress to OperationNone before we fire the progress event.

But anyway, the new write call will do its own incrementation, and we'll do our own decrementation.

Let me throw together a patch; this is getting to messy to do in English.
Comment 14 Eric U. 2012-06-06 11:36:12 PDT
Created attachment 146076 [details]
Patch
Comment 15 Eric U. 2012-06-06 13:06:59 PDT
(In reply to comment #14)
> Created an attachment (id=146076) [details]
> Patch

This patch has a test for the bug I talked about in which the didFail never comes; it doesn't have a test for extra refs.  I'm not sure we have any way of detecting extra refs in a layout test, and hitting the extra deref may be tricky/flaky because of the timing.
Comment 16 Eric U. 2012-06-06 16:12:20 PDT
Created attachment 146138 [details]
Patch
Comment 17 Eric U. 2012-06-06 16:14:30 PDT
Thanks Michael--I had to make a one-line change, though.  In doOperation, we shouldn't be clearing m_operationInProgress during abort if it's already abort.  This comes up in nested aborts [see fast/filesystem/file-writer-abort-depth.html].

If I hear no screams, I'll land the fixed version on Monday.
Comment 18 Eric U. 2012-06-11 15:29:49 PDT
Committed r120013: <http://trac.webkit.org/changeset/120013>