Bug 16909 (exec-crash)

Summary: REGRESSION: Amazon.com crash (ActivationImp)
Product: WebKit Reporter: Cameron Zwarich (cpst) <zwarich>
Component: JavaScriptCoreAssignee: Geoffrey Garen <ggaren>
Status: RESOLVED FIXED    
Severity: Major CC: mjs
Priority: P1 Keywords: InRadar, Regression
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
URL: http://www.amazon.com/
Attachments:
Description Flags
Stack trace
none
ExecState activity log
none
execstate activity log annotated with GC START and GC END markers
none
perl script to find inconsistencies
none
results of perl script (look for the XXX)
none
this patch is a .4% SunSpider regression so it can't go in as-is but I think I know how to make it viable
none
.7% speedup - do we have a winner?
none
The accompanying WebCore patch
none
This patch builds and passes regression tests mjs: review+

Cameron Zwarich (cpst)
Reported 2008-01-17 12:22:35 PST
Navigate to http://www.amazon.com/ and click on the Amazon logo in the top left corner. Go back and do it again. Repeating this a small number of times leads to a crash. I haven't tested it yet on an old nightly build, but this crash was probably introduced by the ActivationImp tear-off patch r29425. The address it dies on is at a small offset from 0, it can happen at a number of places in the code, and it does not quite happen deterministically, so it looks like a missed GC mark along the lines of bug 16868 or bug 16871. I'll post a stack trace with a modified GC that collects after every allocation.
Attachments
Stack trace (27.75 KB, text/plain)
2008-01-17 13:23 PST, Cameron Zwarich (cpst)
no flags
ExecState activity log (477.88 KB, text/plain)
2008-01-17 17:34 PST, Cameron Zwarich (cpst)
no flags
execstate activity log annotated with GC START and GC END markers (477.94 KB, text/plain)
2008-01-17 17:54 PST, Maciej Stachowiak
no flags
perl script to find inconsistencies (573 bytes, text/plain)
2008-01-17 17:55 PST, Maciej Stachowiak
no flags
results of perl script (look for the XXX) (478.89 KB, text/plain)
2008-01-17 17:56 PST, Maciej Stachowiak
no flags
this patch is a .4% SunSpider regression so it can't go in as-is but I think I know how to make it viable (4.91 KB, patch)
2008-01-18 01:55 PST, Maciej Stachowiak
no flags
.7% speedup - do we have a winner? (7.73 KB, patch)
2008-01-19 12:58 PST, Geoffrey Garen
no flags
The accompanying WebCore patch (3.02 KB, patch)
2008-01-19 13:20 PST, Geoffrey Garen
no flags
This patch builds and passes regression tests (12.69 KB, patch)
2008-01-19 14:25 PST, Geoffrey Garen
mjs: review+
Cameron Zwarich (cpst)
Comment 1 2008-01-17 12:23:13 PST
Cameron Zwarich (cpst)
Comment 2 2008-01-17 13:23:02 PST
Created attachment 18509 [details] Stack trace Strangely enough, I can't actually get it to crash with the GC modified to always collect after an allocation. Perhaps it is a timing issue that the extra garbage collection eliminates. Here is a stack trace of a typical crash.
Maciej Stachowiak
Comment 3 2008-01-17 15:40:40 PST
Alternate steps for a crash on Amazon (which are more reliable for me): 1. Go to Amazon.com 2. Search for "Hoboken Chicken Emergency" 3. Open the link for the first product result in a background tab.
Cameron Zwarich (cpst)
Comment 4 2008-01-17 16:12:18 PST
Maciej's instructions reliably reproduce the crash for me, with a similar looking stack trace to what I posted. Still, changing it to garbage collect every allocation makes the crash impossible to reproduce. Instead of crashing, it prints a JavaScript object such as function n2RunEvent(sWhen) { goN2Initializer.run(sWhen); } or [object Object] at the top of the page.
Cameron Zwarich (cpst)
Comment 5 2008-01-17 17:34:08 PST
Created attachment 18518 [details] ExecState activity log Tearing off every ActivationImp as soon as it is created fixes the crash, which suggests that the problem is due to a missed reference to an ActivationImp from an ExecState that should be explicitly mark()'d but isn't. Tearing off the activation puts a pointer into the GC heap in m_activation, which is caught by the conservative collector. Maciej suggested I log ExecState creations, deletions, and markings so that we can check for anything suspicious. I've attached such a log from a session that crashed.
Maciej Stachowiak
Comment 6 2008-01-17 17:54:59 PST
Created attachment 18520 [details] execstate activity log annotated with GC START and GC END markers
Maciej Stachowiak
Comment 7 2008-01-17 17:55:23 PST
Created attachment 18521 [details] perl script to find inconsistencies
Maciej Stachowiak
Comment 8 2008-01-17 17:56:24 PST
Created attachment 18522 [details] results of perl script (look for the XXX)
Maciej Stachowiak
Comment 9 2008-01-18 01:36:52 PST
With more investigation, I found that some live ExecStates fail to get marked specifically because script execution (via the <script> element) sometimes breaks both the callingExec and savedExec chains. I have a fix but it might be bad for performance - will have to test.
Maciej Stachowiak
Comment 10 2008-01-18 01:55:27 PST
Created attachment 18523 [details] this patch is a .4% SunSpider regression so it can't go in as-is but I think I know how to make it viable
David Kilzer (:ddkilzer)
Comment 11 2008-01-19 11:22:28 PST
David Kilzer (:ddkilzer)
Comment 12 2008-01-19 11:24:33 PST
(In reply to comment #1) > <rdar://problem/5691998> Cameron, please add the "InRadar" keyword to a bug if you file them in Radar. Thanks!
Geoffrey Garen
Comment 13 2008-01-19 12:58:47 PST
Created attachment 18549 [details] .7% speedup - do we have a winner?
Geoffrey Garen
Comment 14 2008-01-19 13:20:47 PST
Created attachment 18552 [details] The accompanying WebCore patch
Geoffrey Garen
Comment 15 2008-01-19 14:25:55 PST
Created attachment 18556 [details] This patch builds and passes regression tests
Maciej Stachowiak
Comment 16 2008-01-20 16:24:54 PST
Comment on attachment 18556 [details] This patch builds and passes regression tests r=me Thanks for finishing this off. I'd guess we can also reinstate the LocalStorage optimization since all ExecStates will be tracked?
Geoffrey Garen
Comment 17 2008-01-21 15:12:58 PST
I'm working on landing this now. With a little work, yes, I believe we can re-instate the LocalStorage buffer caching optimization.
Geoffrey Garen
Comment 18 2008-01-21 22:22:57 PST
Committed revision 29710.
Note You need to log in before you can comment on or make changes to this bug.