Bug 31633 - Need to ensure that Document::postTask does not provide the Task with a dangling pointer to destroyed Document
: Need to ensure that Document::postTask does not provide the Task with a dangl...
Status: RESOLVED FIXED
: WebKit
WebCore Misc.
: 528+ (Nightly build)
: All All
: P2 Normal
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-11-18 12:12 PST by
Modified: 2010-01-15 13:53 PST (History)


Attachments
Patch. (4.92 KB, patch)
2010-01-13 18:04 PST, Dmitry Titov
darin: review+
dimich: commit‑queue-
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-11-18 12:12:52 PST
Based on a discussion in bug 31615, the Document::postTask, when posting a Task from non-main thread, relies on a Document still be alive when the Task is dispatched on the main thread. This was true for dedicated workers but since then we have shared workers that can outlive a document while using it for loading (XHR and importScripts).

It might be that because of serialization of all actions through main run loop we don't have much troubles right now, but this can be more of a happy accident. Need to review the design.
------- Comment #1 From 2009-12-04 12:49:19 PST -------
BTW, the situation is worse for dedicated workers than it is for SharedWorkers.

Closing a document causes a message to be sent to dedicated workers telling them to shut down, but they can still initiate Document::postTask() calls while that message floats around in the queue.

SharedWorkers can "outlive" their initial parent document, but once the document is detached the reference to it is removed from the SharedWorkerRepository, and so no further postTask() invocations can be sent to it. So the Document code has to handle messages that are already enqueued, but no more messages will be enqueued after the fact.
------- Comment #2 From 2009-12-04 13:10:39 PST -------
As discussed in bug 31615, worker messaging proxy is supposed to keep the document alive for as long as there can still be messages coming from dedicated workers.
------- Comment #3 From 2009-12-04 13:34:34 PST -------
Agreed, so we should be good with dedicated workers.

Following up with a bit more detail on my previous response - Document::detach() calls SharedWorkerRepository::documentDetached(), so the end result is that Documents are guaranteed not to receive any further tasks from SharedWorkers after Document::detach() exits.

However, there doesn't seem to be anything preventing the Document from being freed immediately after Document.detach() is called (I think) - FrameLoader::clear() invokes document->detach(), then shortly thereafter invokes m_frame->setDocument(0) which seems like it could free the document.

I know lots of ideas have been kicked around for this, but I'll throw another
one out there:

1) State that it's illegal to call Document.postTask() after Document.detach()
is invoked, and add an assertion to Document.postTask() to enforce this (possibly just in the "not called from the main thread" case).

2) At the end of Document.detach(), invoke Document::addRef() to increase the
ref count, then invoke postTask() to execute a task to decrement the ref count.
This ensures that any pending tasks for that Document get dispatched before the Document goes away.

Alternatively, we could get rid of #1 and push the code for #2 into
SharedWorkerRepository::documentDetached(), thereby making the worker code
itself responsible for keeping the Document alive until any pending x-thread
tasks have been dispatched. I actually think that's probably the simplest
solution, and is more in line with what we already do for dedicated workers.
------- Comment #4 From 2009-12-04 14:15:27 PST -------
I have some difficulty understanding the proposal.

> 1) State that it's illegal to call Document.postTask() after Document.detach()
> is invoked,

Is it correct that these methods are called from different threads? How can one thread make another not do something, other than by posting a message to that effect? How will it know when the message has been received, processed, and taken effect?

Enforcing behavior with assertions is good, but I'd like to understand what mechanism guarantees this behavior.
------- Comment #5 From 2009-12-04 14:30:57 PST -------
In the case of SharedWorkers, it's inherently enforced that Document->postTask() is only called while holding a mutex that is also grabbed by SharedWorkerRepository::documentDetached() (there's a similar thing happening with MessagePorts with an internal mutex). So you don't have to send a message - merely have an internal reference which is cleared/accessed via a mutex, which we currently do have.
------- Comment #6 From 2009-12-04 15:11:37 PST -------
In the case of dedicated workers, I'm finding comfort in the fact that we can switch to a lock-free queue if we need to.

It seems unfortunate that shared workers need a high level mutex that affects both queues and other code. Maybe that's the only way to implement them, though.
------- Comment #7 From 2009-12-04 15:26:09 PST -------
Given that shared workers only communicate with the parent thread in order to coordinate network access, worker startup and document shutdown, I think the overhead of a mutex is negligible compared to the frequency/cost of those operations.
------- Comment #8 From 2010-01-09 14:23:57 PST -------
Perhaps a solution that is more generic and does not rely on analysis of actual codepath for each use case could be more reliable. Other parts of WebKit code will eventually use ScriptExecutionContext::postTask() and the authors of that code might be unaware of the necessary conditions under which this API is safe. Database in Workers (bug 22725) is one example of code that will likely use it.

Andrew and I discussed potential solutions lately. The one we think is promising is along these lines:

- Have a thread-safe-ref-counted class that keeps a raw pointer to a Document (lets call it DocumentWeakRef). 
- Keep a RefPtr to it from Document. The DocumentWeakRef is created on demand and then never replaced for a given Document.
- Give RefPtr to DocumentWeakRef to each Task.
- When Document dies, it nullifies the pointer to itself inside DocumentWeakRef.
- When Task comes from queues to be executed, if the document pointer in DocumentWeakRef is 0, it does not call Task::performTask(document).

It adds a thread-safe refcount per task creation/destruction. On the other hand, it is only needed for cases when non-main thread posts a task to a Document. For example, posting a message to a Worker is safe because Workers control their run loop.

If this sounds good in general, I'll create a patch.
------- Comment #9 From 2010-01-09 18:47:07 PST -------
The design you describe is quite similar to the way nodes used to point to documents. It's more efficient for document destruction than a design where we have to keep track of multiple weak document pointers and null out each of one of them, and I like that.

We need to be sure that people only read and write the document pointer in one of these handles (DocumentWeakRef objects) on a single thread, the same one where it's legal to ref/deref and destroy documents, so I'd like to see assertions that check that.

Other than that, I can't think of any obvious disadvantages. Lets do it.
------- Comment #10 From 2010-01-13 18:04:33 PST -------
Created an attachment (id=46531) [details]
Patch.
------- Comment #11 From 2010-01-14 11:25:19 PST -------
(From update of attachment 46531 [details])
> +DocumentWeakReference::DocumentWeakReference(Document* doc)
> +    : m_document(doc)
> +{
> +    ASSERT(isMainThread());
> +}
> +
> +Document* DocumentWeakReference::document()
> +{
> +    ASSERT(isMainThread());
> +    return m_document;
> +}
> +
> +void DocumentWeakReference::clear()
> +{
> +    ASSERT(isMainThread());
> +    m_document = 0;
> +}

These super-short functions seem like they ought to be inlined, unless there is a reason not to. If they are only used within this file, then I suggest just marking them inline. If they are used elsewhere we could consider putting them in the header. Perhaps it's the main thread bit that can't be easily be put in the header?

It's a bit strange to have this class's function definitions appear in the file above the definition of the global variable, documentsThatNeedStyleRecalc. I'd prefer they be further down. Maybe even at the end of the file instead of the start.

Argument name should be "document", not "doc".

> +    if (m_weakReference)
> +        m_weakReference->clear();

No need for the null check here.

> +    PerformTaskContext(PassRefPtr<DocumentWeakReference> documentWeakRef, PassOwnPtr<ScriptExecutionContext::Task> task)
> +        : documentWeakRef(documentWeakRef)
>          , task(task)
>      {
>      }

I suggest naming the argument "document" or "documentReference" instead of documentWeakRef.

> +    RefPtr<DocumentWeakReference> documentWeakRef;

I think you should name data member "document" or "documentReference"; no need to repeat the word "weak" or to abbreviate "reference".

> +    if (Document* doc = ptctx->documentWeakRef->document())
> +        ptctx->task->performTask(doc);

Please use the name "document" instead of "doc" here for the local variable.

I also wise the variable "ptctx" was instead named "context".

> +        ASSERT(m_weakReference);
> +        callOnMainThread(performTask, new PerformTaskContext(m_weakReference, task));

I don't think you need this assertion.

> +    static PassRefPtr<DocumentWeakReference> create(Document* doc)
> +    {
> +        return adoptRef(new DocumentWeakReference(doc));
> +    }

Please use the argument name "document" instead of "doc".

I'm going to say review+ because all my comments are essentially stylistic ones; please consider addressing them anyway. I'm disappointed we can't test it!
------- Comment #12 From 2010-01-14 11:26:15 PST -------
(From update of attachment 46531 [details])
Is it always OK to not call the task at all if the document is 0? What if the task needs to deallocate some memory or something? Maybe instead we should call with Document* of 0 and have the tasks handle that case?
------- Comment #13 From 2010-01-14 15:56:18 PST -------
(In reply to comment #12)
> (From update of attachment 46531 [details] [details])
> Is it always OK to not call the task at all if the document is 0? What if the
> task needs to deallocate some memory or something? Maybe instead we should call
> with Document* of 0 and have the tasks handle that case?

I was thinking about having another virtual method, like 'doCleanup()' that would be always called after performTask(document), or alone if the document is 0. This way, even if document is 0 there is always a method to be called and do document-independent things. The reason I didn't add it was that I didn't find a case where it would be actually needed.

Maybe later if we find a case which needs it?
------- Comment #14 From 2010-01-15 10:03:32 PST -------
(In reply to comment #12)
> (From update of attachment 46531 [details] [details])
> Is it always OK to not call the task at all if the document is 0? What if the
> task needs to deallocate some memory or something? 

Tasks can use their destructor for this (and most do, via instance members of type RefPtr, etc). Seems more fragile to pass in a null document pointer (so a poorly written performTask() handler crashes the browser) than it would be to just leak memory.

I don't think that an explicit doCleanup() function is necessary. If an instance really needed to know that performTask() was not called, they could easily accomplish this with a boolean and a check in their destructor.
------- Comment #15 From 2010-01-15 11:34:42 PST -------
(In reply to comment #14)
> Tasks can use their destructor for this (and most do, via instance members of
> type RefPtr, etc).

Good answer.

> Seems more fragile to pass in a null document pointer (so a
> poorly written performTask() handler crashes the browser) than it would be to
> just leak memory.

I don’t agree with this sentence, but it doesn’t matter. The first sentence was sufficient to convince me the design is good.
------- Comment #16 From 2010-01-15 13:53:07 PST -------
Fixed all the style issues Darin suggested.

Landed: http://trac.webkit.org/changeset/53345