26044 – Crash at Node::nodeIndex()

RESOLVED FIXED 26044

Crash at Node::nodeIndex()

https://bugs.webkit.org/show_bug.cgi?id=26044

Summary Crash at Node::nodeIndex()

Dimitri Glazkov (Google)

Reported 2009-05-27 11:24:05 PDT

Looking at various crash reports, I am seeing a fairly common crash that looks like: *CRASHED* (EXC_BAD_ACCESS / KERN_INVALID_ADDRESS @0x90e43c50) WebCore::Node::nodeIndex() const WebCore::Range::textNodesMerged(WebCore::NodeWithIndex&, unsigned int) WebCore::Document::textNodesMerged(WebCore::Text*, unsigned int) WebCore::Node::normalize() WebCore::jsNodePrototypeFunctionNormalize(JSC::ExecState*, JSC::JSObject*, JSC::JSValuePtr, JSC::ArgList const&) JSC::Interpreter::cti_op_call_NotJSFunction(void*, ...) or for V8: *CRASHED* (EXCEPTION_ACCESS_VIOLATION @0x00720078) WebCore::Node::nodeIndex() WebCore::RangeBoundaryPoint::offset() WebCore::boundaryTextNodesMerged WebCore::Node::normalize() WebCore::NodeInternal::normalizeCallback v8::internal::Builtin_HandleApiCall v8::internal::JavaScriptFrame::is_at_function() It smells like a crash that occurs while the victim is using an RTE of some sort (TinyMCE or somesuch), given that we die in the .normalize() call. Sadly, I don't have any URLs or test cases at the moment. Any points/ideas on what could be happening are appreciated.

Attachments
One attempt at a test case. (2.50 KB, application/x-javascript) 2009-06-03 17:25 PDT, Eric Seidel (no email)	no flags	Details
Speculative fix (5.16 KB, patch) 2009-06-03 17:41 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
fix; needs regression tests and perhaps to be split up since it fixes two bugs (31.56 KB, patch) 2009-06-04 13:49 PDT, Darin Adler	no flags	Details Formatted Diff Diff
more work in progress (41.83 KB, patch) 2009-06-05 18:06 PDT, Darin Adler	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Dimitri Glazkov (Google)

Comment 1 2009-05-27 11:27:11 PDT

Corresponding bug in Chromium Bug Tracker: http://crbug.com/9815

Alexey Proskuryakov

Comment 2 2009-05-28 04:24:27 PDT

Any info on when it started? r43809 had some changed to normalize(), but there's not enough data to blame this patch.

Dimitri Glazkov (Google)

Comment 3 2009-05-28 09:27:51 PDT

I did some digging, and found that this crash shows up as early as Chromium 2.0.158.0, which pulls WebKit at r39953.

Alexey Proskuryakov

Comment 4 2009-05-28 09:47:19 PDT

See also: bug 20471.

Alexey Proskuryakov

Comment 5 2009-05-28 09:57:26 PDT

<rdar://problem/6704578> This (not the trace from the above bug, but this one) is known to happen when editing notes in Google Notebook sometimes.

Dimitri Glazkov (Google)

Comment 6 2009-05-28 10:04:03 PDT

What an awesome time to crash :) I bet users love this.

Eric Seidel (no email)

Comment 7 2009-06-01 16:34:18 PDT

Alexey, can you please attach some of the crash traces from crashreporter. I find them much more readable than Chrome crash traces. Without a reproducible case, this bug isn't worth much. I'll stare at the code again a bit in a minute.

Jon@Chromium

Comment 8 2009-06-02 13:55:39 PDT

I tried playing with Google Notebook but I could not trigger the crash in WebKit nightly or Chromium. I am sure it crashes because we continue to get lots of reports but I could not find a repro case. I do know that there does not appear to be any extra DLLs causing trouble.

Eric Seidel (no email)

Comment 9 2009-06-02 17:12:31 PDT

Talked with Alexey. He doesn't believe more complete stack traces would be of use, and he can't post them here anyway due to sekrit Safari symbols. These look like the Node pointers are trashed. Certainly the example: http://crash/reportdetail?reportid=72f0a87465b7174 where it crashes (with the already pasted above stack trace) at 0x13. 0x13 doesn't even make sense that any of hte "this" in the stack is NULL. I expect we're dealing with trashed or deleted memory here.

Eric Seidel (no email)

Comment 10 2009-06-03 12:15:35 PDT

WebCore::Node::nodeIndex() WebCore::RangeBoundaryPoint::offset() WebCore::boundaryTextNodesMerged WebCore::Node::normalize() WebCore::NodeInternal::normalizeCallback is missing symbols: normalize is calling: document()->textNodesMerged(nextText.get(), offset); which in turn is calling Range::textNodesMerged and thus boundaryTextNodesMerged

Eric Seidel (no email)

Comment 11 2009-06-03 12:20:50 PDT

Ok, my first theory is that we're not detaching ranges correctly. Ranges hold onto their Nodes, but Documents do not hold onto their attached ranges. So if ranges fail to detach themselves, documents will still try to call methods on invalid ranges. Range::~Range() { if (m_start.container()) m_ownerDocument->detachRange(this); It's possible to change m_start using: void Range::setStart(PassRefPtr<Node> refNode, int offset, ExceptionCode& ec) I'm not sure yet if it's possible to pass refNode = 0.

Eric Seidel (no email)

Comment 12 2009-06-03 12:21:42 PDT

Nevermind, there is a check for: if (!refNode) { ec = NOT_FOUND_ERR; return; } It still might be possible that something else is modifying m_start, thus causing a detach failure. Again, this is only one theory.

Eric Seidel (no email)

Comment 13 2009-06-03 13:39:16 PDT

Actually, looks like we always attach ranges,and only detach them when they have a m_start.container(). That seems like this code: for (var x = 0; x < 10000; x++) document.createRange(); would end up with lots of attached Ranges which never got detached. Going to see if I can make a crashing test case. :)

Eric Seidel (no email)

Comment 14 2009-06-03 13:49:05 PDT

Hum... document.createRange() creates a range which starts and ends at the document node. We need ranges which have no start in order to trip this bug. Still investigating how to create those.

Eric Seidel (no email)

Comment 15 2009-06-03 15:23:16 PDT

Ok, I'm still not sure that this crash is caused by a deleted Range ending up left in the Document. The two places in the code where RangeEndPoint::set() could be called with a null node, seem to be guarded: void Range::textNodesMerged(NodeWithIndex& oldNode, unsigned offset) -- checks node()->previousSibling() void Range::textNodeSplit(Text* oldNode) -- checks oldNode->nextSibling() Then again, these crashes are in a release build. This would of course be a no-brainer to fix if we had a reproducible case...

Eric Seidel (no email)

Comment 16 2009-06-03 17:25:34 PDT

Created attachment 30934 [details] One attempt at a test case.

Eric Seidel (no email)

Comment 17 2009-06-03 17:41:56 PDT

Created attachment 30935 [details] Speculative fix 4 files changed, 52 insertions(+), 11 deletions(-)

Darin Adler

Comment 18 2009-06-04 09:10:11 PDT

Comment on attachment 30935 [details] Speculative fix I have no object to speculative fixes, but I don't see a theory of the bug here. If the theory is that someone m_start.container() becomes 0, could you tell me the hypothetical about how that could happen? Maybe with a little more checking into calls to functions on m_start we can come up with something stronger.

Eric Seidel (no email)

Comment 19 2009-06-04 09:20:02 PDT

(In reply to comment #18) > (From update of attachment 30935 [details] [review]) > I have no object to speculative fixes, but I don't see a theory of the bug > here. If the theory is that someone m_start.container() becomes 0, could you > tell me the hypothetical about how that could happen? Maybe with a little more > checking into calls to functions on m_start we can come up with something > stronger. Sure thing. I can demonstrate m_start.container() becoming zero. The attached test case does that by preventing the insertion of the split text node into the document. // This call will hit an ASSERT due to void Range::textNodeSplit(Text* oldNode) // being called with oldNode->nextSibling() == NULL // In release builds, the ASSERT won't fire and a Range attached to the document // will end up with an invalid start. Ranges with an invalid start aren't // properly detached in ~Range, causing the Document to end up with pointers // to deleted ranges. text.splitText(1); Looking at the the code, RangeBoundaryPoint::set() seems like the only way that one can make a RBP have a .container() == NULL. The only place it's called is from the text node handling code, which ASSERTs that ->nextSibling() and ->previousSibling() are != NULL, but doesn't handle when they are (for the release case). The only case I was able to come up with was using dom mutation events. There may be others though, I'm not sure. I never got my test case to crash due to *this* bug, but I did get it to hit the various ASSERTs which try to guard against RBP's container() ever being NULL. The test case to just hit those ASSERTs is much simpler than the one attached however. ;) Making m_start.container() null was the only way I could think of to get a dead range into the Document's m_ranges list. It's also possible that some unrelated memory smasher has smashed the Document, Node or Range. But a deleted Range left in the Document's m_ranges list seems to be the most likely theory.

Darin Adler

Comment 20 2009-06-04 09:35:46 PDT

(In reply to comment #19) > Sure thing. I can demonstrate m_start.container() becoming zero. This is where we need to concentrate our fixes. We need to fix the mutation hooks to work properly in all cases, with changes either at the call sites or in the Range class. Should be straightforward. Lets do it.

Darin Adler

Comment 21 2009-06-04 09:37:02 PDT

Comment on attachment 30935 [details] Speculative fix Lets fix the problems that cause ranges to get into a bad state instead. We can do this by hardening the DOM mutation calls into Range and/or tightening up the callers.

Eric Seidel (no email)

Comment 22 2009-06-04 09:47:04 PDT

(In reply to comment #20) > (In reply to comment #19) > > Sure thing. I can demonstrate m_start.container() becoming zero. > > This is where we need to concentrate our fixes. We need to fix the mutation > hooks to work properly in all cases, with changes either at the call sites or > in the Range class. > > Should be straightforward. Lets do it. I disagree. :) I did not find it straight-forward yesterday. I spent a while getting this far. Although now that I'm pretty sure this is due to dead Ranges hanging around in m_ranges on the Document, getting further should be easier. I think the real broken-ness is using m_start.container() to mean "attached" or "detached". Whether m_start.container() ever becomes invalid seems a separate question from "does the document still have a pointer to me". One causes bad behavior, one causes crashes. I'm trying to fix the crashes here, and ignoring the bad behavior for the moment. The code seemed to fan out enough for me to mentally figure out what cases could be putting us into a bad state. Fixing design not to crash seemed like the right approach. This bad state is already caught by *numerous* assertions, but none of them are firing in release mode, and we have no reproducible case with which to hit them in debug mode. Please re-consider. I don't think patching this one case where someone uses mutation events is likely to fix all the crashes we've seen from this code (since my impression is so few sites use mutation events anyway). If you stare at the code and see other case besides the mutation event one, I'm happy to fix those too, but no others were obvious to me yesterday.

Darin Adler

Comment 23 2009-06-04 09:47:42 PDT

(In reply to comment #22) > I think the real broken-ness is using m_start.container() to mean "attached" or > "detached". I don't agree. I'll take a cut at fixing this myself.

Eric Seidel (no email)

Comment 24 2009-06-04 09:48:41 PDT

Yes, I expect that m_start.container() being null can probably cause other crashes too... I guess the real problem is w/o a reproducible case I'm a blind man swatting flies.

Eric Seidel (no email)

Comment 25 2009-06-04 09:51:00 PDT

The only two uses of RangeBoundaryPoint::set(): http://trac.webkit.org/browser/trunk/WebCore/dom/Range.cpp#L1758 http://trac.webkit.org/browser/trunk/WebCore/dom/Range.cpp#L1788 (note that they're at the end of the document, further than Safari will scroll down due to the anchor)

Darin Adler

Comment 26 2009-06-04 10:14:19 PDT

(In reply to comment #24) > Yes, I expect that m_start.container() being null can probably cause other > crashes too... I guess the real problem is w/o a reproducible case I'm a blind > man swatting flies. I think your sleuthing on this so far has been great! This won't be hard to fix now.

Darin Adler

Comment 27 2009-06-04 13:49:32 PDT

Created attachment 30960 [details] fix; needs regression tests and perhaps to be split up since it fixes two bugs

Eric Seidel (no email)

Comment 28 2009-06-04 14:12:24 PDT

Comment on attachment 30960 [details] fix; needs regression tests and perhaps to be split up since it fixes two bugs I'm glad you took this on. A couple comments about your intermediary patch: I think we might still want to add a Document::isRangeAttached(Range*) function at least in #ifndef NDEBUG mode to make sure that the range is actually detached in the destructor. It seems like this: 0 if (!isDetached()) 99111 m_ownerDocument->detachRange(this); would now make more sense to just call this->detach(), since after that call isDetached() will still return true! I'm very glad that you made isDetached() its own function, that makes the code much clearer. It's still worries me that we'll end up with these notions of attachment out of sync (as demonstrated in the destructor above). // because the repercussions of setting a container node of a boundary to 0 are serious. seems a bit opaque. We could link to this bug there, or explain that isDetach() depends on that constraint never being violated. When I was writing the patch I considered documenting in Range.h Range's assumption that m_start and m_end will either both be null or neither be null. And that null means detached. Thanks again for taking this on.

Darin Adler

Comment 29 2009-06-05 18:01:17 PDT

> - if (m_start.container()) > - m_ownerDocument->detachRange(this); > + // We unconditionally detach as a speculative fix for https://bugs.webkit.org/show_bug.cgi?id=26044 > + m_ownerDocument->detachRange(this); I've changed my mind. Even though I don't like this comment, I do like this code change. I think we should fix it this way, and by only making this one change to this function. Then we can do the rest of my patch as a separate check-in.

Darin Adler

Comment 30 2009-06-05 18:06:25 PDT

Created attachment 31022 [details] more work in progress

Eric Seidel (no email)

Comment 31 2009-06-11 14:15:15 PDT

Comment on attachment 30935 [details] Speculative fix Darin reviewed this (verbally) at the WebKit party. Will land with updated comment shortly!

Eric Seidel (no email)

Comment 32 2009-06-11 14:21:21 PDT

Hopefully the adjusted comment meets Darin's expectations. Darin plans to do more work here, but that can really be broken off into a separate issue. I'll leave this open for darin to move his patch elsewhere. I'll clear the review flag though. Committing to http://svn.webkit.org/repository/webkit/trunk ... M WebCore/ChangeLog M WebCore/dom/Document.cpp M WebCore/dom/Range.cpp M WebCore/dom/RangeBoundaryPoint.h Committed r44617

Eric Seidel (no email)

Comment 33 2009-06-11 14:21:37 PDT

Comment on attachment 30935 [details] Speculative fix Landed in r44617

Darin Adler

Comment 34 2009-06-14 22:17:23 PDT

This bug is fixed now. The work in progress is a fix for a different bug along with cleanup.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Text

Assignee

Darin Adler

Reported

2009-05-27 11:24 PDT

Modified

2009-06-14 22:17 PDT History

CC List

5 users Show

URL

Keywords InRadar

Depends on

Blocks