Summary: | REGRESSION: fast/workers/wrapper-map-gc.html crashes on Snow Leopard Release Bot | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Eric Seidel (no email) <eric> | ||||||||||||
Component: | Tools / Tests | Assignee: | Dmitry Titov <dimich> | ||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||
Severity: | Normal | CC: | abarth, ap, dimich, ggaren, sam, webkit.review.bot | ||||||||||||
Priority: | P1 | ||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||
Hardware: | PC | ||||||||||||||
OS: | OS X 10.5 | ||||||||||||||
Attachments: |
|
Description
Eric Seidel (no email)
2010-04-13 23:42:02 PDT
Again just now: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57587%20(8273)/results.html Struck again: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57593%20(8279)/results.html Given this has reproduced 3 times just this morning, I expect it should be easy to repro. Yeah, I think something might have recently regressed to cause this to get so flaky. Again just now: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57604%20(8293)/results.html This appears to be crashing most builds. Something is definitely wrong. http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57630%20(8320)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57631%20(8322)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57633%20(8323)/results.html This test continues to crash super often: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57795%20(8494)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57794%20(8492)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57790%20(8488)/results.html http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57788%20(8481)/results.html Created attachment 53635 [details]
crash log
This crash is super-easy to repro locally.
run-webkit-tests --debug --iterations 100 fast/workers/wrapper-map-gc.html
did it for me. Crash log attached.
Comment on attachment 53635 [details] crash log Looks like it hit this ASSERT: ASSERT(isUndefinedOrNull()); http://trac.webkit.org/browser/trunk/JavaScriptCore/runtime/JSValue.cpp#L78 That ASSERT has been there forever (added by ggaren). Maybe Geoff can offer some insight as to what Workers is doing wrong here. It looks like the JavaScript wrapper being used as 'this' on this test is garbage. Not sure why. It was broken by introducing the possibility for a JS wrapper of WorkerContext be returned as 0 in case of terminated worker, to prevent the re-entry into JS (r57349). It works fine if all the places that get the wrapper before invoking JS check for the value. JSEventListener did not, so if worker's JS execution was terminated between pulling globalObject (returns 0 for terminated worker, there is a check) and getting a wrapper for the 'this' (returns 0 for terminated worker, no check) - the subsequent attempt to run a listener function crashes. Note there is no need to synchronize checks here, even though the termination is requested on one thread and the value of a flag is used on another - once the JS execution on worker thread is set up properly, it's ok to go into it since it will be terminated by JSC::Terminator causing an exception eventually. So, I broke it :-(. My only excuse is that I wrote the test that caught the borkage and I now have a fix :-) Patch coming. Created attachment 53914 [details]
Patch.
Looks sane to me, but someone with more recent experience in JSC should probably r+ this. Thanks Dmitry, I think I understand what's going on here now. The ASSERT you ran into reflects a JSC philosophy that it's an error for the runtime ever to encounter a JSValue that is not a standard JavaScript value. We allow clients to construct an empty JSValue() / null C pointer, for their convenience, but we don't allow clients to pass an empty JSValue() / null C pointer into JSC. So, your change in JSWorkerContextBase.cpp is the right thing to do. However, I have two questions about the rest of the patch: 1. If a WorkerContext wants to prevent events from firing, isn't encoding that information in JSEventListener::handleEvent pretty far afield? It would be better if we could separate concerns, so that JSEventListener didn't have to know about the internal implementation details of WorkerContext. I wonder if WorkerContext could just override "virtual bool dispatchEvent(PassRefPtr<Event>);", and have it do nothing in the case where the WorkerContext's m_executionForbidden flag has been set. 2. Doesn't ScheduledAction::execute need to be patched too? It seems to assume that workerContextWrapper() will not return 0. Comment on attachment 53914 [details]
Patch.
r- because I think there's a better way to do this.
(In reply to comment #13) > However, I have two questions about the rest of the patch: > > 1. If a WorkerContext wants to prevent events from firing, isn't encoding that > information in JSEventListener::handleEvent pretty far afield? It would be > better if we could separate concerns, so that JSEventListener didn't have to > know about the internal implementation details of WorkerContext. I wonder if > WorkerContext could just override "virtual bool > dispatchEvent(PassRefPtr<Event>);", and have it do nothing in the case where > the WorkerContext's m_executionForbidden flag has been set. > > 2. Doesn't ScheduledAction::execute need to be patched too? It seems to assume > that workerContextWrapper() will not return 0. #1 - that was the first idea, for both JSC and V8 implementations - to check some flag on WorkerScriptController early on. However, it translated into need to add this check to so many places.. We have many EventTargets, even in Workers, various evaluate() methods, callbacks (like in databases), even things like conversion of JS types to string can reenter JS (at least in V8 they do). All of those places pull the global object or context in some form in order to enter JS - and many places already check for null, using this pattern (or similar): JSDOMGlobalObject* globalObject = toJSDOMGlobalObject(scriptExecutionContext, m_isolatedWorld.get()); if (!globalObject) return; So it seems logical to avoid lots of extra piping (to pull WorkerScriptController or some Worker-independent way to figure out the scripting engine was terminated) and just make it official that an attempt to pull the global object wrapper before running some JS can return no wrapper and we need checks for that. Also, it seems a better trap for errors, because returning 0 globalObject will likely cause crash, while silent eventual executing of some event handlers may go unnoticed for a while and create more problems down the road. It could also be harder to figure out since the crashes could happen later in the code and vary in signature. In addition to being more centralized, returning null context/globalObject also makes code for JCS and V8 be similar. In case of V8, there are more "js entry" points that has to be guarded against reentry, so having to pull WorkerScriptController (or even more ubiquitous ScriptExecutionContext) to all necessary places would be hard and would simply add another check to existing checks for null context. There is a bit more discussion in bug 37053, which introduced returning 0. It's possible to remove returning 0 from WorkerScriptController::workerContextWrapper() and add a virtual dispatchEvent() as you propose but it immediately breaks test from bug 37053 and perhaps introduces more issues not yet covered by tests. I'd prefer to keep the crash we have now, at least we will be able to recognize the root issue (running JS in terminated worker) because those crashes, while timing-dependent, would look similar. #2 is currently impossible in Workers (due to the way timers work there, via WorkerRunLoop) and I left it w/o check intentionally to make sure it crashes if there is a regression in the way WorkerRunLoop processes tasks. Could you please weight this and let me know what you think? I could have added more info in the beginning. Created attachment 54081 [details]
Added check in ScheduledAction
I've added a check in Scheduled action, it makes sense to have this checks in all places that may be in worker's code .
I'm still feeling the check for the global object wrapper performed down the hierarchy is the right thing. For example, dispatchEvent() invokes a non-virtual fireEventListeners which has a loop over all registered event handler functions and invoke them one after another. If the execution was terminated in the middle of one, we want to avoid invoking the rest.
Attachment 54081 [details] did not pass style-queue:
Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1
WebCore/bindings/js/JSEventListener.cpp:118: One space before end of line comments [whitespace/comments] [5]
Total errors found: 1 in 4 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 54082 [details]
Patch.
Fixed style issue.
I think this is ready for another look. I've changed ScheduledAction. Also decided to leave the check for 0 context low in the call hierarchy and tried to explain the reasons above. Lets iterate? This test crashes ~1 out of 3 runs on SL lately. Again: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r58328%20(9058)/results.html Comment on attachment 54082 [details]
Patch.
I'd like ggaren to take another look, but this crash is happening really often.
It's hard to get ggaren for a chat :-) From what I could glean from a brief IRC conversation, he would really like to see more explicit check rather then a null check on a pointer. I tend to agree since there is no precedent of checking a result of toJS(..) for 0. On another hand, it's best not to pull some high-level objects like WorkerScriptController down to JS level to do checks. I'm experimenting with another patch, will upload soon. It's not good that workerContextWrapper() can magically become 0 at any time from another thread. This means that all clients of workerContextWrapper(), implicit and explicit, must constantly check for 0, or whatever it might transmute into, like jsNull(), even within the subcomponents of a given expression. That's crazy! In this patch, you've missed at least three other place that need to check the magic crash bit: JSValue handleEventFunction = jsFunction->get(exec, Identifier(exec, "handleEvent")); JSC::call(exec, handleEventFunction, callType, callData, jsFunction, args) event->storeResult(ustringToString(retval.toString(exec))); I'm not sure if you've missed other places, too. It's very difficult to reason about a magic multithreaded bit that can cause your program to crash at any time. Comment on attachment 54082 [details]
Patch.
Please add tests for the cases that you missed, and fix those cases.
I think the right fix would be to move away from a magic threaded bit that can cause WebKit to crash at any time, but maybe someone else has an even better solution.
Created attachment 54484 [details]
Patch with a more explicit check.
I hope this is better. I've added ScriptExecutionContext::isJSExecutionTerminated(), which is always 'false' for Document and becomes 'true' for WorkerContext once Worker.terminate() or WorkerGlobalScope.clsoe() is invoked.
Apparently, JSC does not crash on reentry while execution is terminated, it will just cause another exception and almost immediate exit from the JS fragment.
(In reply to comment #24) > (From update of attachment 54082 [details]) > Please add tests for the cases that you missed, and fix those cases. > > I think the right fix would be to move away from a magic threaded bit that can > cause WebKit to crash at any time, but maybe someone else has an even better > solution. I think there are no more missed cases, this fixes wrapper-map-gc.html for JSC. There is no more 0 pointers returned from toJS. My understanding of the code is that GlobalData::terminator will raise a new exception on reentry and exit execution. As I understand your recommended to have an explicit check, avoid context pointer to be returned as 0, and you also suggested that it is not necessarily to check before every JS entry point, since JSC (unlike v8) tolerates reentry. The attached patch does this. I don't know how to find all the entry points into JS, how to write tests for that etc. While there is a remote chance of something breaking once JS execution starts throwing termination exception, it doesn't seem practical to try to add checks everywhere. It's better to write more tests with time on using Workers in various modes, find those issues and fix them. Comment on attachment 54484 [details]
Patch with a more explicit check.
r=me
|