Bug 28697 - WebKit crash on WebCore::Node::nodeIndex()
Summary: WebKit crash on WebCore::Node::nodeIndex()
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All Other
: P1 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords: HasReduction, InRadar
Depends on:
Blocks:
 
Reported: 2009-08-24 17:27 PDT by Yaar Schnitman
Modified: 2010-05-10 15:14 PDT (History)
10 users (show)

See Also:


Attachments
more crash reports (38.62 KB, text/html)
2009-08-24 17:35 PDT, Yaar Schnitman
no flags Details
Crash seen in Mac Safari (31.45 KB, text/plain)
2009-11-05 10:35 PST, Eric Seidel (no email)
no flags Details
reduction. Not a layout test yet. (1020 bytes, text/html)
2009-11-05 14:11 PST, Eric Seidel (no email)
no flags Details
Slightly better reduction (1.41 KB, text/html)
2009-11-05 14:24 PST, Eric Seidel (no email)
no flags Details
Even better reduction with more source comments. (1.47 KB, text/html)
2009-11-05 14:40 PST, Eric Seidel (no email)
no flags Details
Even better reduction which highlights that the document.write is executed twice. (1.38 KB, text/html)
2009-11-05 15:07 PST, Eric Seidel (no email)
no flags Details
A LayoutTest ready reduction (will crash Safari) (899 bytes, text/plain)
2009-11-25 21:30 PST, Eric Seidel (no email)
no flags Details
First attempt at a fix. (10.96 KB, patch)
2009-11-29 21:28 PST, Eric Seidel (no email)
darin: review-
Details | Formatted Diff | Diff
a more reliable test (will crash) (1.01 KB, text/html)
2010-05-07 17:13 PDT, Alexey Proskuryakov
no flags Details
proposed fix (11.08 KB, patch)
2010-05-10 14:05 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yaar Schnitman 2009-08-24 17:27:42 PDT
This bug was detected on Chromium: http://code.google.com/p/chromium/issues/detail?id=17023

The chromium crash reporting tool identified a few different crash stacks that fail on the same method. Here is one such stack trace:

0x6bc0fc59 [chrome.dll- node.cpp:781]WebCore::Node::nodeIndex()
0x6bc00f1e [chrome.dll- rangeboundarypoint.h:100] WebCore::RangeBoundaryPoint::offset()
0x6bc0360f [chrome.dll- range.cpp:1762] WebCore::boundaryTextNodesMerged
0x6bbe863e [chrome.dll- document.cpp:2718] WebCore::Document::textNodesMerged(WebCore::Text *,unsigned int)            
0x6bc0f9d2 [chrome.dll - node.cpp:622] WebCore::Node::normalize()
0x6bc838cf [chrome.dll- v8node.cpp:236] WebCore::NodeInternal::normalizeCallback
0x6c25d0a5 [chrome.dll- builtins.cc:381] v8::internal::Builtin_HandleApiCall
0x6c25a1cf [chrome.dll + 0x006ba1cf] 
0x030e6d9c

I haven't been able to reproduce this crash, but I suspect this had to do with simultaneous Range Selection and Node editing.
Comment 1 Yaar Schnitman 2009-08-24 17:35:54 PDT
Created attachment 38516 [details]
more crash reports

Crash reports suggest that user interaction involving range selection and dom change trigger the crashes.
Comment 2 Yaar Schnitman 2009-08-24 17:39:52 PDT
Digging into the bug, I suspect that the crashes happen when m_childBeforeBoundary is a deallocated node, or a sibling of a deallocated node. 

I believe that there are code paths that deallocate nodes without notifying the range objects. 

One thing that caught my eye is that RangeBoundaryPoint::m_childBeforeBoundary is a Node *, while m_containerNode is a RefPtr<Node>. Shouldn't m_childBeforeBoundary be RefPtr<Node> too?
Comment 3 Yaar Schnitman 2009-08-24 17:40:32 PDT
Might be related to: https://bugs.webkit.org/show_bug.cgi?id=20471
Comment 4 Yaar Schnitman 2009-09-17 11:47:21 PDT
Also crashes on mac:

0x029731e0	 Node.h:124]	 WebCore::Node::nodeIndex() const
0x029892ff	 RangeBoundaryPoint.h:89]	 WebCore::Range::firstNode() const
0x02e4cfed	 webframe_impl.cc:1663]	 WebFrameImpl::SetFindEndstateFocusAndSelection()
0x02e4d237	 webframe_impl.cc:1191]	 WebFrameImpl::stopFinding(bool)
0x02117a31	 render_view.cc:759]	 RenderView::OnStopFinding(bool)
0x021305b0	 ../base/tuple.h:422]	 bool IPC::MessageWithTuple<Tuple1<bool> >::Dispatch<RenderView, void (RenderView::*)(bool)>(IPC::Message const*, RenderView*, void (RenderView::*)(bool))
0x0212adbd	 render_view.cc:376]	 RenderView::OnMessageReceived(IPC::Message const&)
0x01d95c90	 message_router.cc:41]	 MessageRouter::RouteMessage(IPC::Message const&)
0x025820aa	 ../base/tuple.h:422]	 RunnableMethod<IPC::ChannelProxy::Context, void (IPC::ChannelProxy::Context::*)(IPC::Message const&), Tuple1<IPC::Message> >::Run()
0x0218ec4a	 message_loop.cc:314]	 MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)
0x0218effa	 message_loop.cc:429]	 MessageLoop::DoWork()
0x0219276b	 message_pump_mac.mm:217]	 base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
Comment 5 Yaar Schnitman 2009-09-17 11:47:55 PDT
Since mac & windows, changes platform to all.
Comment 6 Alexey Proskuryakov 2009-09-18 08:21:19 PDT
I don't think it's very useful to aggregate crashes in nodeIndex() - as you write in comment 2, these crashes occur when it's called incorrectly, which can happen in entirely different scenarios.
Comment 7 Yaar Schnitman 2009-10-05 14:45:07 PDT
Not being called correctly is just a hypothesis that has not been proven. I collected the crash reports since they are the only clues for reproducing this (and the last one proved its also on mac).
Comment 8 Eric Seidel (no email) 2009-11-05 10:35:38 PST
Created attachment 42577 [details]
Crash seen in Mac Safari
Comment 9 Eric Seidel (no email) 2009-11-05 10:36:05 PST
Reproducible crasher with reduction.  Marking P1.
Comment 10 Eric Seidel (no email) 2009-11-05 11:02:13 PST
The reduction is in the chromium bug.  I am attempting to reduce it further and attach it here.
Comment 11 Eric Seidel (no email) 2009-11-05 14:11:57 PST
Created attachment 42594 [details]
reduction.  Not a layout test yet.
Comment 12 Eric Seidel (no email) 2009-11-05 14:24:24 PST
Created attachment 42596 [details]
Slightly better reduction
Comment 13 Eric Seidel (no email) 2009-11-05 14:40:24 PST
Created attachment 42598 [details]
Even better reduction with more source comments.

I think there are multiple bugs at play here.  I expect that the range crasher could be reproduced w/o needing any copy event, but I've not been able to do so yet.

I think the cloneContents() is cloneing the <script> tag as well.  I suspect the document.write() From the script tag is executing during the appendChild(), and possibly causing the rest of the document to "fail to insert".  Thus the Nodes may be being "removed from the document" in a way that the Range does not expect and thus the range is never getting updated.

I'm not sure yet.  Darin might have a theory.  I'll CC him.
Comment 14 Eric Seidel (no email) 2009-11-05 15:07:27 PST
Created attachment 42601 [details]
Even better reduction which highlights that the document.write is executed twice.
Comment 15 Eric Seidel (no email) 2009-11-25 19:58:03 PST
OK, so the Range definitely contains a pointer to a deleted Node.  My current theory is that Document::removeChildren() (which is actually Container::removeChildren()) called from Document::implicitOpen() is removing all the root children, but that the root's grandchildren are not having willRemoveChild called, and thus the Document never learns that they're being removed and thus ranges which point to anything deep in the tree end up with invalid Node pointers.

I'll work to validate this theory tomorrow.
Comment 16 Eric Seidel (no email) 2009-11-25 20:18:14 PST
Ok, I found the bug.

In removeChildren():

    // Do any prep work needed before actually starting to detach
    // and remove... e.g. stop loading frames, fire unload events.
    // FIXME: Adding new children from event handlers can cause an infinite loop here.
    for (RefPtr<Node> n = m_firstChild; n; n = n->nextSibling())
        willRemoveChild(n.get());

willRemoveChild() is correctly calling void Range::nodeWillBeRemoved(Node* node)

However, since it's removing all the nodes at once, when the matching node is found:
            boundary.setToBeforeChild(nodeToBeRemoved);
is called, which does:
    m_childBeforeBoundary = child->previousSibling();

however, in this case, nodeToBeRemoved->previousSibling() is also being removed (and in fact, just had willRemoveChild() called for it) but since removeChildren is iterating from first to last node, Range will never notice, and happily set m_childBeforeBoundary to a node which is about to be removed!

I'm not yet sure what the proper fix is.
Comment 17 Eric Seidel (no email) 2009-11-25 20:29:59 PST
    document->nodeWillBeRemoved(child);
is just not designed to be called repeatedly like this.  NodeIterator looks like it could have the same problem depending on if NodePointer isPointerBeforeNode or not.

One solution would be to move the     document->nodeWillBeRemoved(child); call out of dispatchChildRemovalEvents and put it next to where the node will actually be removed.
Comment 18 Eric Seidel (no email) 2009-11-25 21:30:32 PST
Created attachment 43893 [details]
A LayoutTest ready reduction (will crash Safari)
Comment 19 Eric Seidel (no email) 2009-11-25 21:40:16 PST
The "easy" solution would be to get rid of removeChildren() all together.  Since the model of removing lots of children at once does not fit with the:
    document->nodeWillBeRemoved(child);

model.  nodeWillBeRemoved assumes that the node in question is the *only* node about to be removed, and that all siblings and parents are going to stay around.

We could have a separate callback in this case which is nodesChildrenWillBeRemoved?  That would fix the Range case.  Really not sure.  Going to sleep on it.  I'm still curious what Darin's opinion here is.
Comment 20 Eric Seidel (no email) 2009-11-28 15:01:43 PST
So this crash happens anytime we have a Range which points into a subtree which is removed by removeChildren().  Which is actually a rather common call, so although I don't think we've seen a lot of these crashers, it could be rather common.  I'll see if I can come up with a fix this evening.
Comment 21 Eric Seidel (no email) 2009-11-29 21:28:23 PST
Created attachment 44006 [details]
First attempt at a fix.
Comment 22 Eric Seidel (no email) 2009-11-29 21:31:16 PST
Comment on attachment 44006 [details]
First attempt at a fix.

I don't really like this fix because of all the duplicated code.  I also think that this needs a NodeIterator test case before it can be landed, but I'm posting it for review now in case those playing along at home have fundamental architectural complaints that I need to address.  We could land this as-is, but I also think it could definitely be better. I'm most interested in hearing from Darin Adler, since he has worked in this code extensively.
Comment 23 Adam Barth 2009-11-30 12:49:20 PST
style-queue ran check-webkit-style on attachment 44006 [details] without any errors.
Comment 24 Darin Adler 2009-11-30 16:06:35 PST
Comment on attachment 44006 [details]
First attempt at a fix.

> -static ExceptionCode willRemoveChild(Node *child)
> +static void willRemoveChild(const RefPtr<Node>& child)

There's no reason to make this a const RefPtr<Node>& instead of a Node*. I suggest you change it back.

> +static void willRemoveChildrenFromNode(const RefPtr<ContainerNode> node)

I think this should be called willRemoveChildren. I think the argument should be a ContainerNode* and be named container.

> +    // FIXME: Adding new children from event handlers can cause an infinite loop here.
> +    for (RefPtr<Node> child = node->firstChild(); child; child = child->nextSibling()) {
> +        // fire removed from document mutation events.
> +        dispatchChildRemovalEvents(child.get());
> +
> +        if (child->attached())
> +            child->willRemove();
> +    }

Instead of keeping this possible bug alive, maybe we should just gather all the children into a local stack buffer. You could do this:

    Vector<RefPtr<Node>, 32> children;
    for (Node* child = node->firstChild(); child; child = child->nextSibling())
        children.append(child);

And then use the children array for the events and such. But you could do that in another patch I guess.

> +    ASSERT(!eventDispatchForbidden());
> +
>      RefPtr<Node> c = child;
>      RefPtr<Document> document = child->document();
>  
> -    // update auxiliary doc info (e.g. iterators) to note that node is being removed
> -    document->nodeWillBeRemoved(child);
> -
> -    document->incDOMTreeVersion();
> +    // document->incDOMTreeVersion() and document->nodeWillBeRemoved() should be
> +    // handled by the caller as this function may be called in a loop.

This comment would make sense in a change log entry, but does not make sense in the code that remains. If you want to add a comment about calling nodeWillBeRemoved and incDOMTreeVersion, that should probably go before the function rather than in the middle of the implementation.

> +    for (HashSet<NodeIterator*>::const_iterator it = m_nodeIterators.begin(); it != nodeIteratorsEnd; ++it)
> +        for (Node* n = container->firstChild(); n; n = n->nextSibling())
> +            (*it)->nodeWillBeRemoved(n);

This needs braces.

review- because I think you should do at least one tweak
Comment 25 Alexey Proskuryakov 2010-05-05 12:02:25 PDT
<rdar://problem/7946578>
Comment 26 Alexey Proskuryakov 2010-05-05 12:05:29 PDT
Eric, do you intend to finish this fix in the near future?
Comment 27 Eric Seidel (no email) 2010-05-05 13:35:52 PDT
I'm not currently working on this, nor do I plan to return to it in the near future.
Comment 28 Alexey Proskuryakov 2010-05-07 17:13:58 PDT
Created attachment 55438 [details]
a more reliable test (will crash)

The test from proposed patch didn't always crash for me, but this one does.
Comment 29 Alexey Proskuryakov 2010-05-10 14:05:10 PDT
Created attachment 55600 [details]
proposed fix

Addressed Darin comments, fixed the code to avoid entering an infinite loop.
Comment 30 Eric Seidel (no email) 2010-05-10 15:09:32 PDT
Thank you ap.
Comment 31 Alexey Proskuryakov 2010-05-10 15:14:50 PDT
Committed <http://trac.webkit.org/changeset/59098>.

Pelase file separate bugs for any remaining crashes in Node::nodeIndex().