Bug 16868 - Gmail crash
Summary: Gmail crash
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P1 Major
Assignee: Nobody
URL:
Keywords: InRadar
: 16894 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-01-13 20:02 PST by Cameron Zwarich (cpst)
Modified: 2008-01-20 00:09 PST (History)
3 users (show)

See Also:


Attachments
Stack trace (38.91 KB, text/plain)
2008-01-13 20:46 PST, Cameron Zwarich (cpst)
no flags Details
Proposed patch (2.40 KB, patch)
2008-01-15 01:01 PST, Cameron Zwarich (cpst)
no flags Details | Formatted Diff | Diff
Proposed patch (2.40 KB, patch)
2008-01-15 01:04 PST, Cameron Zwarich (cpst)
no flags Details | Formatted Diff | Diff
Revised proposed patch (3.77 KB, patch)
2008-01-15 08:47 PST, Cameron Zwarich (cpst)
no flags Details | Formatted Diff | Diff
Revised proposed patch (3.07 KB, patch)
2008-01-15 13:29 PST, Cameron Zwarich (cpst)
no flags Details | Formatted Diff | Diff
Revised proposed patch (2.83 KB, patch)
2008-01-16 11:12 PST, Cameron Zwarich (cpst)
no flags Details | Formatted Diff | Diff
Revised proposed patch (5.02 KB, patch)
2008-01-16 14:07 PST, Cameron Zwarich (cpst)
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Cameron Zwarich (cpst) 2008-01-13 20:02:07 PST
If you modify the heapAllocate() method of the JS garbage collector so that it always calls collect(), then logging into Gmail will crash in ActivationImp::markChildren(). This does not happen in r29414, so it was almost surely introduced by r29425, the ActivationImp tear-off patch. As far as I can tell via printf debugging, the crash occurs in this section of code:

    for (size_t i = 0; i < size; ++i) {
        JSValue* value = localStorage[i].value;
        
        if (!value->marked())
            value->mark();
    }

According to the Maciej, the debug build fails some assertions prior to this code, so we can't use it to get the exact place where the code fails without commenting out those assertions. I did some more printf debuging instead. If I make another copy of this loop that doesn't actually mark anything, it succeeds, so the problem is probably that value is an invalid pointer.
Comment 1 Mark Rowe (bdash) 2008-01-13 20:11:29 PST
<rdar://problem/5686108>
Comment 2 Cameron Zwarich (cpst) 2008-01-13 20:46:43 PST
Created attachment 18430 [details]
Stack trace

Here's a stack trace. I must be so used to line number information in optimized code being bad, because the line number of the crash indicates that it is indeed an attempt to mark an invalid pointer. Also note that since ActivationImp::mark() was called rather than ActivationImp::markChildren() directly, this ActivationImp must have been torn off.
Comment 3 Cameron Zwarich (cpst) 2008-01-14 06:52:32 PST
I tried it myself on debug with r29455. I get the same crash as in release, without any failed assertions.
Comment 4 Cameron Zwarich (cpst) 2008-01-14 07:42:48 PST
I added some code to print out the JS being executed. It's pretty heavily obfuscated, as one might expect. The main thing that stands out is that there is a lot of usage of closures. 

I also modified the constructor of ExecState to always tear off the ActivationImp immediately after creating it. With this modification, Gmail loads fine, so the problem is likely that a required tear-off is missed. Anyone have any guesses?

(In my previous post I meant that the particular ActivationImp is *not* torn off.)
Comment 5 Cameron Zwarich (cpst) 2008-01-14 11:28:24 PST
This is not fixed by r29474, which addresses bug 16871.
Comment 6 Cameron Zwarich (cpst) 2008-01-14 13:29:06 PST
Due to similarities with bug 16871, I added code to ActivationImp::mark() to check if it is called if the ActivationImp is on the stack (this should probably be an ASSERT, now that I think about it), as this could cause memory corruption by calling JSObject::mark(). However, this situation never occurs with this bug.
Comment 7 Cameron Zwarich (cpst) 2008-01-15 01:01:02 PST
Created attachment 18452 [details]
Proposed patch

Maciej and I debugged this crash for a while, and we came up with two problems:

1) There is no tear-off in the case of cross-window eval().

2) If m_savedExec is different than m_callingExec, then this can cause some ActivationImp objects to be missed by the garbage collector. This may have been a very obscure issue before the tear-off patch, but the tear-off patch takes the majority of ActivationImp objects out of the JS heap, so the conservative collector won't catch any mistakes.

This patch fixes both of these issues. It has been tested with both debug and release settings and passes all tests. I should be able to construct a layout test soon for the first issue, but probably not for the second.

One strange thing is that you can't fix the bug by only calling mark() on m_savedExec if it is different than m_callingExec. You also need to call it when it is the same, presumedly to reach a distinct m_savedExec further back down the stack. This might be a potential performance issue, especially in the way it is written here. It is probably a good idea to rewrite this patch so it goes down the callingExec and savedExec chains in parallel, and only recurses when they become distinct, but I have to go to bed. Maciej suggested modifying WebCore so that we can remove m_savedExec entirely. However, that might be a bit of work.

We should also investigate the relationship between this bug and bug 16871.
Comment 8 Cameron Zwarich (cpst) 2008-01-15 01:04:00 PST
Created attachment 18453 [details]
Proposed patch

I submitted the wrong version of the patch. Here is the one I want.
Comment 9 Adam Roben (:aroben) 2008-01-15 07:49:20 PST
Comment on attachment 18453 [details]
Proposed patch

FYI, this patch does not fix bug 16871 for me (nor does it claim to).
Comment 10 Darin Adler 2008-01-15 07:51:54 PST
Comment on attachment 18453 [details]
Proposed patch

+    if (m_savedExec && m_savedExec != m_callingExec)
+        m_savedExec->mark();

The ExecState::mark function marks the scope chains of all the execs in the callingExec chain in an iterative way rather than a recursive way. But this new savedExec code uses recursion instead.

Also, the code to mark the activation is only done on the top ExecState, which seems wrong. The m_callingExec loop should either use recursion too (not my favorite idea) or we should move the marking of the activation inside that loop.
Comment 11 Darin Adler 2008-01-15 07:53:06 PST
Comment on attachment 18453 [details]
Proposed patch

I think what I'm saying is that both the savedExec work and the activation work should be done inside the callingExec loop rather than outside.
Comment 12 Cameron Zwarich (cpst) 2008-01-15 07:59:01 PST
(In reply to comment #10)
> (From update of attachment 18453 [details] [edit])
> +    if (m_savedExec && m_savedExec != m_callingExec)
> +        m_savedExec->mark();
> 
> The ExecState::mark function marks the scope chains of all the execs in the
> callingExec chain in an iterative way rather than a recursive way. But this new
> savedExec code uses recursion instead.

That's what I was referring to changing when I said we should go down the callingExec and savedExec chains in parallel. I'll upload a new patch soon that does that.

> Also, the code to mark the activation is only done on the top ExecState, which
> seems wrong. The m_callingExec loop should either use recursion too (not my
> favorite idea) or we should move the marking of the activation inside that
> loop.

I was going to ask you today if there was any reason you didn't do that in your patch that added the marking of the activation, but I guess not, so I'll bring it into the loop.
Comment 13 Cameron Zwarich (cpst) 2008-01-15 08:47:16 PST
Created attachment 18458 [details]
Revised proposed patch

This makes the marking non-recursive except in the case where m_savedExec is actually different than m_callingExec. Instead of having the second loop, one could simply change the first loop to set currentSavedExec to currentCallingExec instead of breaking from the loop. I don't know if that is too obfuscated for something so simple.
Comment 14 Geoffrey Garen 2008-01-15 12:48:23 PST
Can you add a regression test to this patch? Typically, we require regression fixes to include test cases demonstrating the fix.
Comment 15 Cameron Zwarich (cpst) 2008-01-15 13:29:50 PST
Created attachment 18461 [details]
Revised proposed patch

Here is an updated version of the patch. It is less convoluted than before, maybe uses a few more branches in the bad case, and properly deals with the situation where there are multiple ExecStates down the callingExec chain that have distinct savedExec's (can this ever actually happen?).

(In reply to comment #14)
> Can you add a regression test to this patch? Typically, we require regression
> fixes to include test cases demonstrating the fix.

I should be able to make a layout test that shows the problems with cross-window eval(). I am not sure about the others. In theory, I should also be able to force a crash with the m_savedExec != m_calledExec situation by explicitly asking for a GC every other line. I will try to have at least one test soon.
Comment 16 Geoffrey Garen 2008-01-15 14:27:46 PST
I was just talking to Maciej about this: I don't think we can rely on the m_savedExecState chain. Any code that begins a new script evaluation in a global context will produce an "orphaned" ExecState that isn't linked to any previous ExecStates.

This can happen during synchronous event dispatch, or any other client code where the client decides to evaluate a script from inside a JavaScript callback.

Maciej suggested allocating all active ExecStates from a central storage area. That way, GC mark could just traverse all the active ExecStates in the central storage area, and we wouldn't need to rely on m_savedExecState.
Comment 17 Cameron Zwarich (cpst) 2008-01-15 14:33:34 PST
(In reply to comment #16)
> I was just talking to Maciej about this: I don't think we can rely on the
> m_savedExecState chain. Any code that begins a new script evaluation in a
> global context will produce an "orphaned" ExecState that isn't linked to any
> previous ExecStates.
> 
> This can happen during synchronous event dispatch, or any other client code
> where the client decides to evaluate a script from inside a JavaScript
> callback.

Can you think of a test case that will crash even with my latest patch?

> Maciej suggested allocating all active ExecStates from a central storage area.
> That way, GC mark could just traverse all the active ExecStates in the central
> storage area, and we wouldn't need to rely on m_savedExecState.

Not a bad idea. It might also allow an increase in the recursion limit.
Comment 18 Darin Adler 2008-01-16 10:45:46 PST
Comment on attachment 18461 [details]
Revised proposed patch

r=me
Comment 19 Darin Adler 2008-01-16 10:50:07 PST
Comment on attachment 18461 [details]
Revised proposed patch

Maybe I should not have said r=me while Geoff and Cameron were still debating the design, but still I don't see a reason to have this sitting in the review queue.
Comment 20 Cameron Zwarich (cpst) 2008-01-16 11:04:58 PST
I cleared the review flag because Maciej and I agreed I should make one change to the patch by taking out the m_activation marking. No one knows why it is necessary and it doesn't even fix bug 16871, whereas the other change does.
Comment 21 Cameron Zwarich (cpst) 2008-01-16 11:12:09 PST
Created attachment 18477 [details]
Revised proposed patch

Here is the newest revision of the patch.
Comment 22 Maciej Stachowiak 2008-01-16 12:00:51 PST
All we need now is layout tests.
Comment 23 Cameron Zwarich (cpst) 2008-01-16 14:07:52 PST
Created attachment 18484 [details]
Revised proposed patch

(In reply to comment #22)
> All we need now is layout tests.

Here is one for the cross-window eval() problem. It fails with the !switchGlobal check. Do I have to make one for the savedExec GC crash? Revision 29474 purported to fix a GC crash and was checked in without a test.
Comment 24 Maciej Stachowiak 2008-01-16 15:07:11 PST
Comment on attachment 18484 [details]
Revised proposed patch

Good enough for me. I'l make one for the savedExec crash after the fact since I think I know what happened there.

r=me
Comment 25 David Kilzer (:ddkilzer) 2008-01-19 03:27:20 PST
Wasn't this fixed in r29542?  If so, this bug should be closed as RESOLVED/FIXED.

http://trac.webkit.org/projects/webkit/changeset/29542

Comment 26 David Kilzer (:ddkilzer) 2008-01-19 03:29:12 PST
*** Bug 16894 has been marked as a duplicate of this bug. ***