Bug 37554

Summary:

REGRESSION: fast/workers/wrapper-map-gc.html crashes on Snow Leopard Release Bot

Product:

WebKit

Reporter:

Eric Seidel (no email) <eric>

Component:

Tools / Tests

Assignee:

Dmitry Titov <dimich>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

abarth, ap, dimich, ggaren, sam, webkit.review.bot

Priority:

Version:

528+ (Nightly build)

Hardware:

OS:

OS X 10.5

Attachments:

Description	Flags
crash log	none
Patch.	ggaren: review-, dimich: commit-queue-
Added check in ScheduledAction	none
Patch.	ggaren: review-, dimich: commit-queue-
Patch with a more explicit check.	ggaren: review+, dimich: commit-queue-

Eric Seidel (no email)

Reported 2010-04-13 23:42:02 PDT

fast/workers/wrapper-map-gc.html crashed on Snow Leopard Release Bot http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57566%20(8248)/results.html Sorry I don't have a stack trace for you (yet). I believe I've seen this one crash once before but I couldn't find a bug on it. http://trac.webkit.org/browser/trunk/LayoutTests/fast/workers/wrapper-map-gc.html

Attachments
crash log (26.85 KB, text/plain) 2010-04-18 14:20 PDT, Eric Seidel (no email)	no flags	Details
Patch. (2.84 KB, patch) 2010-04-20 20:49 PDT, Dmitry Titov	ggaren: review- dimich: commit-queue-	Details Formatted Diff Diff
Added check in ScheduledAction (3.55 KB, patch) 2010-04-22 11:40 PDT, Dmitry Titov	no flags	Details Formatted Diff Diff
Patch. (3.77 KB, patch) 2010-04-22 11:44 PDT, Dmitry Titov	ggaren: review- dimich: commit-queue-	Details Formatted Diff Diff
Patch with a more explicit check. (6.33 KB, patch) 2010-04-27 17:34 PDT, Dmitry Titov	ggaren: review+ dimich: commit-queue-	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2010-04-14 10:43:24 PDT

Again just now: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57587%20(8273)/results.html

Eric Seidel (no email)

Comment 2 2010-04-14 12:20:34 PDT

Struck again: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57593%20(8279)/results.html Given this has reproduced 3 times just this morning, I expect it should be easy to repro.

Eric Seidel (no email)

Comment 3 2010-04-14 14:36:59 PDT

Yeah, I think something might have recently regressed to cause this to get so flaky. Again just now: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57604%20(8293)/results.html

Eric Seidel (no email)

Comment 4 2010-04-14 22:43:49 PDT

This appears to be crashing most builds. Something is definitely wrong. http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57630%20(8320)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57631%20(8322)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57633%20(8323)/results.html

Eric Seidel (no email)

Comment 5 2010-04-18 12:20:45 PDT

This test continues to crash super often: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57795%20(8494)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57794%20(8492)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57790%20(8488)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57788%20(8481)/results.html

Eric Seidel (no email)

Comment 6 2010-04-18 14:20:24 PDT

Created attachment 53635 [details] crash log This crash is super-easy to repro locally. run-webkit-tests --debug --iterations 100 fast/workers/wrapper-map-gc.html did it for me. Crash log attached.

Eric Seidel (no email)

Comment 7 2010-04-18 14:23:09 PDT

Comment on attachment 53635 [details] crash log Looks like it hit this ASSERT: ASSERT(isUndefinedOrNull()); http://trac.webkit.org/browser/trunk/JavaScriptCore/runtime/JSValue.cpp#L78 That ASSERT has been there forever (added by ggaren).

Eric Seidel (no email)

Comment 8 2010-04-18 14:23:38 PDT

Maybe Geoff can offer some insight as to what Workers is doing wrong here.

Geoffrey Garen

Comment 9 2010-04-19 16:28:10 PDT

It looks like the JavaScript wrapper being used as 'this' on this test is garbage. Not sure why.

Dmitry Titov

Comment 10 2010-04-20 20:41:42 PDT

It was broken by introducing the possibility for a JS wrapper of WorkerContext be returned as 0 in case of terminated worker, to prevent the re-entry into JS (r57349). It works fine if all the places that get the wrapper before invoking JS check for the value. JSEventListener did not, so if worker's JS execution was terminated between pulling globalObject (returns 0 for terminated worker, there is a check) and getting a wrapper for the 'this' (returns 0 for terminated worker, no check) - the subsequent attempt to run a listener function crashes. Note there is no need to synchronize checks here, even though the termination is requested on one thread and the value of a flag is used on another - once the JS execution on worker thread is set up properly, it's ok to go into it since it will be terminated by JSC::Terminator causing an exception eventually. So, I broke it :-(. My only excuse is that I wrote the test that caught the borkage and I now have a fix :-) Patch coming.

Dmitry Titov

Comment 11 2010-04-20 20:49:54 PDT

Created attachment 53914 [details] Patch.

Eric Seidel (no email)

Comment 12 2010-04-21 04:27:36 PDT

Looks sane to me, but someone with more recent experience in JSC should probably r+ this.

Geoffrey Garen

Comment 13 2010-04-21 09:32:50 PDT

Thanks Dmitry, I think I understand what's going on here now. The ASSERT you ran into reflects a JSC philosophy that it's an error for the runtime ever to encounter a JSValue that is not a standard JavaScript value. We allow clients to construct an empty JSValue() / null C pointer, for their convenience, but we don't allow clients to pass an empty JSValue() / null C pointer into JSC. So, your change in JSWorkerContextBase.cpp is the right thing to do. However, I have two questions about the rest of the patch: 1. If a WorkerContext wants to prevent events from firing, isn't encoding that information in JSEventListener::handleEvent pretty far afield? It would be better if we could separate concerns, so that JSEventListener didn't have to know about the internal implementation details of WorkerContext. I wonder if WorkerContext could just override "virtual bool dispatchEvent(PassRefPtr<Event>);", and have it do nothing in the case where the WorkerContext's m_executionForbidden flag has been set. 2. Doesn't ScheduledAction::execute need to be patched too? It seems to assume that workerContextWrapper() will not return 0.

Geoffrey Garen

Comment 14 2010-04-21 09:33:13 PDT

Comment on attachment 53914 [details] Patch. r- because I think there's a better way to do this.

Dmitry Titov

Comment 15 2010-04-21 14:53:15 PDT

(In reply to comment #13) > However, I have two questions about the rest of the patch: > > 1. If a WorkerContext wants to prevent events from firing, isn't encoding that > information in JSEventListener::handleEvent pretty far afield? It would be > better if we could separate concerns, so that JSEventListener didn't have to > know about the internal implementation details of WorkerContext. I wonder if > WorkerContext could just override "virtual bool > dispatchEvent(PassRefPtr<Event>);", and have it do nothing in the case where > the WorkerContext's m_executionForbidden flag has been set. > > 2. Doesn't ScheduledAction::execute need to be patched too? It seems to assume > that workerContextWrapper() will not return 0. #1 - that was the first idea, for both JSC and V8 implementations - to check some flag on WorkerScriptController early on. However, it translated into need to add this check to so many places.. We have many EventTargets, even in Workers, various evaluate() methods, callbacks (like in databases), even things like conversion of JS types to string can reenter JS (at least in V8 they do). All of those places pull the global object or context in some form in order to enter JS - and many places already check for null, using this pattern (or similar): JSDOMGlobalObject* globalObject = toJSDOMGlobalObject(scriptExecutionContext, m_isolatedWorld.get()); if (!globalObject) return; So it seems logical to avoid lots of extra piping (to pull WorkerScriptController or some Worker-independent way to figure out the scripting engine was terminated) and just make it official that an attempt to pull the global object wrapper before running some JS can return no wrapper and we need checks for that. Also, it seems a better trap for errors, because returning 0 globalObject will likely cause crash, while silent eventual executing of some event handlers may go unnoticed for a while and create more problems down the road. It could also be harder to figure out since the crashes could happen later in the code and vary in signature. In addition to being more centralized, returning null context/globalObject also makes code for JCS and V8 be similar. In case of V8, there are more "js entry" points that has to be guarded against reentry, so having to pull WorkerScriptController (or even more ubiquitous ScriptExecutionContext) to all necessary places would be hard and would simply add another check to existing checks for null context. There is a bit more discussion in bug 37053, which introduced returning 0. It's possible to remove returning 0 from WorkerScriptController::workerContextWrapper() and add a virtual dispatchEvent() as you propose but it immediately breaks test from bug 37053 and perhaps introduces more issues not yet covered by tests. I'd prefer to keep the crash we have now, at least we will be able to recognize the root issue (running JS in terminated worker) because those crashes, while timing-dependent, would look similar. #2 is currently impossible in Workers (due to the way timers work there, via WorkerRunLoop) and I left it w/o check intentionally to make sure it crashes if there is a regression in the way WorkerRunLoop processes tasks. Could you please weight this and let me know what you think? I could have added more info in the beginning.

Dmitry Titov

Comment 16 2010-04-22 11:40:04 PDT

Created attachment 54081 [details] Added check in ScheduledAction I've added a check in Scheduled action, it makes sense to have this checks in all places that may be in worker's code . I'm still feeling the check for the global object wrapper performed down the hierarchy is the right thing. For example, dispatchEvent() invokes a non-virtual fireEventListeners which has a loop over all registered event handler functions and invoke them one after another. If the execution was terminated in the middle of one, we want to avoid invoking the rest.

WebKit Review Bot

Comment 17 2010-04-22 11:41:12 PDT

Attachment 54081 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1 WebCore/bindings/js/JSEventListener.cpp:118: One space before end of line comments [whitespace/comments] [5] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.

Dmitry Titov

Comment 18 2010-04-22 11:44:25 PDT

Created attachment 54082 [details] Patch. Fixed style issue.

Dmitry Titov

Comment 19 2010-04-26 12:54:12 PDT

I think this is ready for another look. I've changed ScheduledAction. Also decided to leave the check for 0 context low in the call hierarchy and tried to explain the reasons above. Lets iterate? This test crashes ~1 out of 3 runs on SL lately.

Adam Barth

Comment 20 2010-04-27 13:44:23 PDT

Again: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r58328%20(9058)/results.html

Adam Barth

Comment 21 2010-04-27 13:47:31 PDT

Comment on attachment 54082 [details] Patch. I'd like ggaren to take another look, but this crash is happening really often.

Dmitry Titov

Comment 22 2010-04-27 16:45:56 PDT

It's hard to get ggaren for a chat :-) From what I could glean from a brief IRC conversation, he would really like to see more explicit check rather then a null check on a pointer. I tend to agree since there is no precedent of checking a result of toJS(..) for 0. On another hand, it's best not to pull some high-level objects like WorkerScriptController down to JS level to do checks. I'm experimenting with another patch, will upload soon.

Geoffrey Garen

Comment 23 2010-04-27 17:01:06 PDT

It's not good that workerContextWrapper() can magically become 0 at any time from another thread. This means that all clients of workerContextWrapper(), implicit and explicit, must constantly check for 0, or whatever it might transmute into, like jsNull(), even within the subcomponents of a given expression. That's crazy! In this patch, you've missed at least three other place that need to check the magic crash bit: JSValue handleEventFunction = jsFunction->get(exec, Identifier(exec, "handleEvent")); JSC::call(exec, handleEventFunction, callType, callData, jsFunction, args) event->storeResult(ustringToString(retval.toString(exec))); I'm not sure if you've missed other places, too. It's very difficult to reason about a magic multithreaded bit that can cause your program to crash at any time.

Geoffrey Garen

Comment 24 2010-04-27 17:02:16 PDT

Comment on attachment 54082 [details] Patch. Please add tests for the cases that you missed, and fix those cases. I think the right fix would be to move away from a magic threaded bit that can cause WebKit to crash at any time, but maybe someone else has an even better solution.

Dmitry Titov

Comment 25 2010-04-27 17:34:06 PDT

Created attachment 54484 [details] Patch with a more explicit check. I hope this is better. I've added ScriptExecutionContext::isJSExecutionTerminated(), which is always 'false' for Document and becomes 'true' for WorkerContext once Worker.terminate() or WorkerGlobalScope.clsoe() is invoked. Apparently, JSC does not crash on reentry while execution is terminated, it will just cause another exception and almost immediate exit from the JS fragment.

Dmitry Titov

Comment 26 2010-04-27 18:03:09 PDT

(In reply to comment #24) > (From update of attachment 54082 [details]) > Please add tests for the cases that you missed, and fix those cases. > > I think the right fix would be to move away from a magic threaded bit that can > cause WebKit to crash at any time, but maybe someone else has an even better > solution. I think there are no more missed cases, this fixes wrapper-map-gc.html for JSC. There is no more 0 pointers returned from toJS. My understanding of the code is that GlobalData::terminator will raise a new exception on reentry and exit execution. As I understand your recommended to have an explicit check, avoid context pointer to be returned as 0, and you also suggested that it is not necessarily to check before every JS entry point, since JSC (unlike v8) tolerates reentry. The attached patch does this. I don't know how to find all the entry points into JS, how to write tests for that etc. While there is a remote chance of something breaking once JS execution starts throwing termination exception, it doesn't seem practical to try to add checks everywhere. It's better to write more tests with time on using Workers in various modes, find those issues and fix them.

Geoffrey Garen

Comment 27 2010-04-28 10:47:56 PDT

Comment on attachment 54484 [details] Patch with a more explicit check. r=me

Dmitry Titov

Comment 28 2010-04-28 12:40:39 PDT

Landed: http://trac.webkit.org/changeset/58421

Note You need to log in before you can comment on or make changes to this bug.