Bug 16909 (exec-crash)

Summary: REGRESSION: Amazon.com crash (ActivationImp)
Product: WebKit Reporter: Cameron Zwarich (cpst) <zwarich>
Component: JavaScriptCoreAssignee: Geoffrey Garen <ggaren>
Status: RESOLVED FIXED    
Severity: Major CC: mjs
Priority: P1 Keywords: InRadar, Regression
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
URL: http://www.amazon.com/
Attachments:
Description Flags
Stack trace
none
ExecState activity log
none
execstate activity log annotated with GC START and GC END markers
none
perl script to find inconsistencies
none
results of perl script (look for the XXX)
none
this patch is a .4% SunSpider regression so it can't go in as-is but I think I know how to make it viable
none
.7% speedup - do we have a winner?
none
The accompanying WebCore patch
none
This patch builds and passes regression tests mjs: review+

Description Cameron Zwarich (cpst) 2008-01-17 12:22:35 PST
Navigate to http://www.amazon.com/ and click on the Amazon logo in the top left corner. Go back and do it again. Repeating this a small number of times leads to a crash.

I haven't tested it yet on an old nightly build, but this crash was probably introduced by the ActivationImp tear-off patch r29425. The address it dies on is at a small offset from 0, it can happen at a number of places in the code, and it does not quite happen deterministically, so it looks like a missed GC mark along the lines of bug 16868 or bug 16871. I'll post a stack trace with a modified GC that collects after every allocation.
Comment 1 Cameron Zwarich (cpst) 2008-01-17 12:23:13 PST
<rdar://problem/5691998>
Comment 2 Cameron Zwarich (cpst) 2008-01-17 13:23:02 PST
Created attachment 18509 [details]
Stack trace

Strangely enough, I can't actually get it to crash with the GC modified to always collect after an allocation. Perhaps it is a timing issue that the extra garbage collection eliminates.

Here is a stack trace of a typical crash.
Comment 3 Maciej Stachowiak 2008-01-17 15:40:40 PST
Alternate steps for a crash on Amazon (which are more reliable for me):

1. Go to Amazon.com
2. Search for "Hoboken Chicken Emergency"
3. Open the link for the first product result in a background tab.

Comment 4 Cameron Zwarich (cpst) 2008-01-17 16:12:18 PST
Maciej's instructions reliably reproduce the crash for me, with a similar looking stack trace to what I posted. Still, changing it to garbage collect every allocation makes the crash impossible to reproduce.  Instead of crashing, it prints a JavaScript object such as

function n2RunEvent(sWhen) { goN2Initializer.run(sWhen); }

or

[object Object]

at the top of the page.
Comment 5 Cameron Zwarich (cpst) 2008-01-17 17:34:08 PST
Created attachment 18518 [details]
ExecState activity log

Tearing off every ActivationImp as soon as it is created fixes the crash, which suggests that the problem is due to a missed reference to an ActivationImp from an ExecState that should be explicitly mark()'d but isn't. Tearing off the activation puts a pointer into the GC heap in m_activation, which is caught by the conservative collector.

Maciej suggested I log ExecState creations, deletions, and markings so that we can check for anything suspicious. I've attached such a log from a session that crashed.
Comment 6 Maciej Stachowiak 2008-01-17 17:54:59 PST
Created attachment 18520 [details]
execstate activity log annotated with GC START and GC END markers
Comment 7 Maciej Stachowiak 2008-01-17 17:55:23 PST
Created attachment 18521 [details]
perl script to find inconsistencies
Comment 8 Maciej Stachowiak 2008-01-17 17:56:24 PST
Created attachment 18522 [details]
results of perl script (look for the XXX)
Comment 9 Maciej Stachowiak 2008-01-18 01:36:52 PST
With more investigation, I found that some live ExecStates fail to get marked specifically because script execution (via the <script> element) sometimes breaks both the callingExec and savedExec chains. I have a fix but it might be bad for performance - will have to test.
Comment 10 Maciej Stachowiak 2008-01-18 01:55:27 PST
Created attachment 18523 [details]
this patch is a .4% SunSpider regression so it can't go in as-is but I think I know how to make it viable
Comment 11 David Kilzer (:ddkilzer) 2008-01-19 11:22:28 PST
<rdar://problem/5696387>
Comment 12 David Kilzer (:ddkilzer) 2008-01-19 11:24:33 PST
(In reply to comment #1)
> <rdar://problem/5691998>

Cameron, please add the "InRadar" keyword to a bug if you file them in Radar.  Thanks!

Comment 13 Geoffrey Garen 2008-01-19 12:58:47 PST
Created attachment 18549 [details]
.7% speedup - do we have a winner?
Comment 14 Geoffrey Garen 2008-01-19 13:20:47 PST
Created attachment 18552 [details]
The accompanying WebCore patch
Comment 15 Geoffrey Garen 2008-01-19 14:25:55 PST
Created attachment 18556 [details]
This patch builds and passes regression tests
Comment 16 Maciej Stachowiak 2008-01-20 16:24:54 PST
Comment on attachment 18556 [details]
This patch builds and passes regression tests

r=me

Thanks for finishing this off. I'd guess we can also reinstate the LocalStorage optimization since all ExecStates will be tracked?
Comment 17 Geoffrey Garen 2008-01-21 15:12:58 PST
I'm working on landing this now.

With a little work, yes, I believe we can re-instate the LocalStorage buffer caching optimization.
Comment 18 Geoffrey Garen 2008-01-21 22:22:57 PST
Committed revision 29710.