16868 – Gmail crash

RESOLVED FIXED 16868

Gmail crash

https://bugs.webkit.org/show_bug.cgi?id=16868

Summary Gmail crash

Cameron Zwarich (cpst)

Reported 2008-01-13 20:02:07 PST

If you modify the heapAllocate() method of the JS garbage collector so that it always calls collect(), then logging into Gmail will crash in ActivationImp::markChildren(). This does not happen in r29414, so it was almost surely introduced by r29425, the ActivationImp tear-off patch. As far as I can tell via printf debugging, the crash occurs in this section of code: for (size_t i = 0; i < size; ++i) { JSValue* value = localStorage[i].value; if (!value->marked()) value->mark(); } According to the Maciej, the debug build fails some assertions prior to this code, so we can't use it to get the exact place where the code fails without commenting out those assertions. I did some more printf debuging instead. If I make another copy of this loop that doesn't actually mark anything, it succeeds, so the problem is probably that value is an invalid pointer.

Attachments
Stack trace (38.91 KB, text/plain) 2008-01-13 20:46 PST, Cameron Zwarich (cpst)	no flags	Details
Proposed patch (2.40 KB, patch) 2008-01-15 01:01 PST, Cameron Zwarich (cpst)	no flags	Details Formatted Diff Diff
Proposed patch (2.40 KB, patch) 2008-01-15 01:04 PST, Cameron Zwarich (cpst)	no flags	Details Formatted Diff Diff
Revised proposed patch (3.77 KB, patch) 2008-01-15 08:47 PST, Cameron Zwarich (cpst)	no flags	Details Formatted Diff Diff
Revised proposed patch (3.07 KB, patch) 2008-01-15 13:29 PST, Cameron Zwarich (cpst)	no flags	Details Formatted Diff Diff
Revised proposed patch (2.83 KB, patch) 2008-01-16 11:12 PST, Cameron Zwarich (cpst)	no flags	Details Formatted Diff Diff
Revised proposed patch (5.02 KB, patch) 2008-01-16 14:07 PST, Cameron Zwarich (cpst)	mjs: review+	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Mark Rowe (bdash)

Comment 1 2008-01-13 20:11:29 PST

<rdar://problem/5686108>

Cameron Zwarich (cpst)

Comment 2 2008-01-13 20:46:43 PST

Created attachment 18430 [details] Stack trace Here's a stack trace. I must be so used to line number information in optimized code being bad, because the line number of the crash indicates that it is indeed an attempt to mark an invalid pointer. Also note that since ActivationImp::mark() was called rather than ActivationImp::markChildren() directly, this ActivationImp must have been torn off.

Cameron Zwarich (cpst)

Comment 3 2008-01-14 06:52:32 PST

I tried it myself on debug with r29455. I get the same crash as in release, without any failed assertions.

Cameron Zwarich (cpst)

Comment 4 2008-01-14 07:42:48 PST

I added some code to print out the JS being executed. It's pretty heavily obfuscated, as one might expect. The main thing that stands out is that there is a lot of usage of closures. I also modified the constructor of ExecState to always tear off the ActivationImp immediately after creating it. With this modification, Gmail loads fine, so the problem is likely that a required tear-off is missed. Anyone have any guesses? (In my previous post I meant that the particular ActivationImp is *not* torn off.)

Cameron Zwarich (cpst)

Comment 5 2008-01-14 11:28:24 PST

This is not fixed by r29474, which addresses bug 16871.

Cameron Zwarich (cpst)

Comment 6 2008-01-14 13:29:06 PST

Due to similarities with bug 16871, I added code to ActivationImp::mark() to check if it is called if the ActivationImp is on the stack (this should probably be an ASSERT, now that I think about it), as this could cause memory corruption by calling JSObject::mark(). However, this situation never occurs with this bug.

Cameron Zwarich (cpst)

Comment 7 2008-01-15 01:01:02 PST

Created attachment 18452 [details] Proposed patch Maciej and I debugged this crash for a while, and we came up with two problems: 1) There is no tear-off in the case of cross-window eval(). 2) If m_savedExec is different than m_callingExec, then this can cause some ActivationImp objects to be missed by the garbage collector. This may have been a very obscure issue before the tear-off patch, but the tear-off patch takes the majority of ActivationImp objects out of the JS heap, so the conservative collector won't catch any mistakes. This patch fixes both of these issues. It has been tested with both debug and release settings and passes all tests. I should be able to construct a layout test soon for the first issue, but probably not for the second. One strange thing is that you can't fix the bug by only calling mark() on m_savedExec if it is different than m_callingExec. You also need to call it when it is the same, presumedly to reach a distinct m_savedExec further back down the stack. This might be a potential performance issue, especially in the way it is written here. It is probably a good idea to rewrite this patch so it goes down the callingExec and savedExec chains in parallel, and only recurses when they become distinct, but I have to go to bed. Maciej suggested modifying WebCore so that we can remove m_savedExec entirely. However, that might be a bit of work. We should also investigate the relationship between this bug and bug 16871.

Cameron Zwarich (cpst)

Comment 8 2008-01-15 01:04:00 PST

Created attachment 18453 [details] Proposed patch I submitted the wrong version of the patch. Here is the one I want.

Adam Roben (:aroben)

Comment 9 2008-01-15 07:49:20 PST

Comment on attachment 18453 [details] Proposed patch FYI, this patch does not fix bug 16871 for me (nor does it claim to).

Darin Adler

Comment 10 2008-01-15 07:51:54 PST

Comment on attachment 18453 [details] Proposed patch + if (m_savedExec && m_savedExec != m_callingExec) + m_savedExec->mark(); The ExecState::mark function marks the scope chains of all the execs in the callingExec chain in an iterative way rather than a recursive way. But this new savedExec code uses recursion instead. Also, the code to mark the activation is only done on the top ExecState, which seems wrong. The m_callingExec loop should either use recursion too (not my favorite idea) or we should move the marking of the activation inside that loop.

Darin Adler

Comment 11 2008-01-15 07:53:06 PST

Comment on attachment 18453 [details] Proposed patch I think what I'm saying is that both the savedExec work and the activation work should be done inside the callingExec loop rather than outside.

Cameron Zwarich (cpst)

Comment 12 2008-01-15 07:59:01 PST

(In reply to comment #10) > (From update of attachment 18453 [details] [edit]) > + if (m_savedExec && m_savedExec != m_callingExec) > + m_savedExec->mark(); > > The ExecState::mark function marks the scope chains of all the execs in the > callingExec chain in an iterative way rather than a recursive way. But this new > savedExec code uses recursion instead. That's what I was referring to changing when I said we should go down the callingExec and savedExec chains in parallel. I'll upload a new patch soon that does that. > Also, the code to mark the activation is only done on the top ExecState, which > seems wrong. The m_callingExec loop should either use recursion too (not my > favorite idea) or we should move the marking of the activation inside that > loop. I was going to ask you today if there was any reason you didn't do that in your patch that added the marking of the activation, but I guess not, so I'll bring it into the loop.

Cameron Zwarich (cpst)

Comment 13 2008-01-15 08:47:16 PST

Created attachment 18458 [details] Revised proposed patch This makes the marking non-recursive except in the case where m_savedExec is actually different than m_callingExec. Instead of having the second loop, one could simply change the first loop to set currentSavedExec to currentCallingExec instead of breaking from the loop. I don't know if that is too obfuscated for something so simple.

Geoffrey Garen

Comment 14 2008-01-15 12:48:23 PST

Can you add a regression test to this patch? Typically, we require regression fixes to include test cases demonstrating the fix.

Cameron Zwarich (cpst)

Comment 15 2008-01-15 13:29:50 PST

Created attachment 18461 [details] Revised proposed patch Here is an updated version of the patch. It is less convoluted than before, maybe uses a few more branches in the bad case, and properly deals with the situation where there are multiple ExecStates down the callingExec chain that have distinct savedExec's (can this ever actually happen?). (In reply to comment #14) > Can you add a regression test to this patch? Typically, we require regression > fixes to include test cases demonstrating the fix. I should be able to make a layout test that shows the problems with cross-window eval(). I am not sure about the others. In theory, I should also be able to force a crash with the m_savedExec != m_calledExec situation by explicitly asking for a GC every other line. I will try to have at least one test soon.

Geoffrey Garen

Comment 16 2008-01-15 14:27:46 PST

I was just talking to Maciej about this: I don't think we can rely on the m_savedExecState chain. Any code that begins a new script evaluation in a global context will produce an "orphaned" ExecState that isn't linked to any previous ExecStates. This can happen during synchronous event dispatch, or any other client code where the client decides to evaluate a script from inside a JavaScript callback. Maciej suggested allocating all active ExecStates from a central storage area. That way, GC mark could just traverse all the active ExecStates in the central storage area, and we wouldn't need to rely on m_savedExecState.

Cameron Zwarich (cpst)

Comment 17 2008-01-15 14:33:34 PST

(In reply to comment #16) > I was just talking to Maciej about this: I don't think we can rely on the > m_savedExecState chain. Any code that begins a new script evaluation in a > global context will produce an "orphaned" ExecState that isn't linked to any > previous ExecStates. > > This can happen during synchronous event dispatch, or any other client code > where the client decides to evaluate a script from inside a JavaScript > callback. Can you think of a test case that will crash even with my latest patch? > Maciej suggested allocating all active ExecStates from a central storage area. > That way, GC mark could just traverse all the active ExecStates in the central > storage area, and we wouldn't need to rely on m_savedExecState. Not a bad idea. It might also allow an increase in the recursion limit.

Darin Adler

Comment 18 2008-01-16 10:45:46 PST

Comment on attachment 18461 [details] Revised proposed patch r=me

Darin Adler

Comment 19 2008-01-16 10:50:07 PST

Comment on attachment 18461 [details] Revised proposed patch Maybe I should not have said r=me while Geoff and Cameron were still debating the design, but still I don't see a reason to have this sitting in the review queue.

Cameron Zwarich (cpst)

Comment 20 2008-01-16 11:04:58 PST

I cleared the review flag because Maciej and I agreed I should make one change to the patch by taking out the m_activation marking. No one knows why it is necessary and it doesn't even fix bug 16871, whereas the other change does.

Cameron Zwarich (cpst)

Comment 21 2008-01-16 11:12:09 PST

Created attachment 18477 [details] Revised proposed patch Here is the newest revision of the patch.

Maciej Stachowiak

Comment 22 2008-01-16 12:00:51 PST

All we need now is layout tests.

Cameron Zwarich (cpst)

Comment 23 2008-01-16 14:07:52 PST

Created attachment 18484 [details] Revised proposed patch (In reply to comment #22) > All we need now is layout tests. Here is one for the cross-window eval() problem. It fails with the !switchGlobal check. Do I have to make one for the savedExec GC crash? Revision 29474 purported to fix a GC crash and was checked in without a test.

Maciej Stachowiak

Comment 24 2008-01-16 15:07:11 PST

Comment on attachment 18484 [details] Revised proposed patch Good enough for me. I'l make one for the savedExec crash after the fact since I think I know what happened there. r=me

David Kilzer (:ddkilzer)

Comment 25 2008-01-19 03:27:20 PST

Wasn't this fixed in r29542? If so, this bug should be closed as RESOLVED/FIXED. http://trac.webkit.org/projects/webkit/changeset/29542

David Kilzer (:ddkilzer)

Comment 26 2008-01-19 03:29:12 PST

*** Bug 16894 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P1

Severity Major

Classification Unclassified

Version 528+ (Nightly build)

Hardware Mac

OS OS X 10.5

Product WebKit

Component JavaScriptCore

Assignee

Nobody

Reported

2008-01-13 20:02 PST

Modified

2008-01-20 00:09 PST History

CC List

3 users Show

URL

Keywords InRadar

Duplicates (1)

16894 View as bug list

Depends on

Blocks