Bug 16909 (exec-crash) - REGRESSION: Amazon.com crash (ActivationImp)
Summary: REGRESSION: Amazon.com crash (ActivationImp)
Status: RESOLVED FIXED
Alias: exec-crash
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P1 Major
Assignee: Geoffrey Garen
URL: http://www.amazon.com/
Keywords: InRadar, Regression
Depends on:
Blocks:
 
Reported: 2008-01-17 12:22 PST by Cameron Zwarich (cpst)
Modified: 2008-01-21 22:22 PST (History)
1 user (show)

See Also:


Attachments
Stack trace (27.75 KB, text/plain)
2008-01-17 13:23 PST, Cameron Zwarich (cpst)
no flags Details
ExecState activity log (477.88 KB, text/plain)
2008-01-17 17:34 PST, Cameron Zwarich (cpst)
no flags Details
execstate activity log annotated with GC START and GC END markers (477.94 KB, text/plain)
2008-01-17 17:54 PST, Maciej Stachowiak
no flags Details
perl script to find inconsistencies (573 bytes, text/plain)
2008-01-17 17:55 PST, Maciej Stachowiak
no flags Details
results of perl script (look for the XXX) (478.89 KB, text/plain)
2008-01-17 17:56 PST, Maciej Stachowiak
no flags Details
this patch is a .4% SunSpider regression so it can't go in as-is but I think I know how to make it viable (4.91 KB, patch)
2008-01-18 01:55 PST, Maciej Stachowiak
no flags Details | Formatted Diff | Diff
.7% speedup - do we have a winner? (7.73 KB, patch)
2008-01-19 12:58 PST, Geoffrey Garen
no flags Details | Formatted Diff | Diff
The accompanying WebCore patch (3.02 KB, patch)
2008-01-19 13:20 PST, Geoffrey Garen
no flags Details | Formatted Diff | Diff
This patch builds and passes regression tests (12.69 KB, patch)
2008-01-19 14:25 PST, Geoffrey Garen
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Cameron Zwarich (cpst) 2008-01-17 12:22:35 PST
Navigate to http://www.amazon.com/ and click on the Amazon logo in the top left corner. Go back and do it again. Repeating this a small number of times leads to a crash.

I haven't tested it yet on an old nightly build, but this crash was probably introduced by the ActivationImp tear-off patch r29425. The address it dies on is at a small offset from 0, it can happen at a number of places in the code, and it does not quite happen deterministically, so it looks like a missed GC mark along the lines of bug 16868 or bug 16871. I'll post a stack trace with a modified GC that collects after every allocation.
Comment 1 Cameron Zwarich (cpst) 2008-01-17 12:23:13 PST
<rdar://problem/5691998>
Comment 2 Cameron Zwarich (cpst) 2008-01-17 13:23:02 PST
Created attachment 18509 [details]
Stack trace

Strangely enough, I can't actually get it to crash with the GC modified to always collect after an allocation. Perhaps it is a timing issue that the extra garbage collection eliminates.

Here is a stack trace of a typical crash.
Comment 3 Maciej Stachowiak 2008-01-17 15:40:40 PST
Alternate steps for a crash on Amazon (which are more reliable for me):

1. Go to Amazon.com
2. Search for "Hoboken Chicken Emergency"
3. Open the link for the first product result in a background tab.

Comment 4 Cameron Zwarich (cpst) 2008-01-17 16:12:18 PST
Maciej's instructions reliably reproduce the crash for me, with a similar looking stack trace to what I posted. Still, changing it to garbage collect every allocation makes the crash impossible to reproduce.  Instead of crashing, it prints a JavaScript object such as

function n2RunEvent(sWhen) { goN2Initializer.run(sWhen); }

or

[object Object]

at the top of the page.
Comment 5 Cameron Zwarich (cpst) 2008-01-17 17:34:08 PST
Created attachment 18518 [details]
ExecState activity log

Tearing off every ActivationImp as soon as it is created fixes the crash, which suggests that the problem is due to a missed reference to an ActivationImp from an ExecState that should be explicitly mark()'d but isn't. Tearing off the activation puts a pointer into the GC heap in m_activation, which is caught by the conservative collector.

Maciej suggested I log ExecState creations, deletions, and markings so that we can check for anything suspicious. I've attached such a log from a session that crashed.
Comment 6 Maciej Stachowiak 2008-01-17 17:54:59 PST
Created attachment 18520 [details]
execstate activity log annotated with GC START and GC END markers
Comment 7 Maciej Stachowiak 2008-01-17 17:55:23 PST
Created attachment 18521 [details]
perl script to find inconsistencies
Comment 8 Maciej Stachowiak 2008-01-17 17:56:24 PST
Created attachment 18522 [details]
results of perl script (look for the XXX)
Comment 9 Maciej Stachowiak 2008-01-18 01:36:52 PST
With more investigation, I found that some live ExecStates fail to get marked specifically because script execution (via the <script> element) sometimes breaks both the callingExec and savedExec chains. I have a fix but it might be bad for performance - will have to test.
Comment 10 Maciej Stachowiak 2008-01-18 01:55:27 PST
Created attachment 18523 [details]
this patch is a .4% SunSpider regression so it can't go in as-is but I think I know how to make it viable
Comment 11 David Kilzer (:ddkilzer) 2008-01-19 11:22:28 PST
<rdar://problem/5696387>
Comment 12 David Kilzer (:ddkilzer) 2008-01-19 11:24:33 PST
(In reply to comment #1)
> <rdar://problem/5691998>

Cameron, please add the "InRadar" keyword to a bug if you file them in Radar.  Thanks!

Comment 13 Geoffrey Garen 2008-01-19 12:58:47 PST
Created attachment 18549 [details]
.7% speedup - do we have a winner?
Comment 14 Geoffrey Garen 2008-01-19 13:20:47 PST
Created attachment 18552 [details]
The accompanying WebCore patch
Comment 15 Geoffrey Garen 2008-01-19 14:25:55 PST
Created attachment 18556 [details]
This patch builds and passes regression tests
Comment 16 Maciej Stachowiak 2008-01-20 16:24:54 PST
Comment on attachment 18556 [details]
This patch builds and passes regression tests

r=me

Thanks for finishing this off. I'd guess we can also reinstate the LocalStorage optimization since all ExecStates will be tracked?
Comment 17 Geoffrey Garen 2008-01-21 15:12:58 PST
I'm working on landing this now.

With a little work, yes, I believe we can re-instate the LocalStorage buffer caching optimization.
Comment 18 Geoffrey Garen 2008-01-21 22:22:57 PST
Committed revision 29710.