WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
37554
REGRESSION: fast/workers/wrapper-map-gc.html crashes on Snow Leopard Release Bot
https://bugs.webkit.org/show_bug.cgi?id=37554
Summary
REGRESSION: fast/workers/wrapper-map-gc.html crashes on Snow Leopard Release Bot
Eric Seidel (no email)
Reported
2010-04-13 23:42:02 PDT
fast/workers/wrapper-map-gc.html crashed on Snow Leopard Release Bot
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57566%20(8248)/results.html
Sorry I don't have a stack trace for you (yet). I believe I've seen this one crash once before but I couldn't find a bug on it.
http://trac.webkit.org/browser/trunk/LayoutTests/fast/workers/wrapper-map-gc.html
Attachments
crash log
(26.85 KB, text/plain)
2010-04-18 14:20 PDT
,
Eric Seidel (no email)
no flags
Details
Patch.
(2.84 KB, patch)
2010-04-20 20:49 PDT
,
Dmitry Titov
ggaren
: review-
dimich
: commit-queue-
Details
Formatted Diff
Diff
Added check in ScheduledAction
(3.55 KB, patch)
2010-04-22 11:40 PDT
,
Dmitry Titov
no flags
Details
Formatted Diff
Diff
Patch.
(3.77 KB, patch)
2010-04-22 11:44 PDT
,
Dmitry Titov
ggaren
: review-
dimich
: commit-queue-
Details
Formatted Diff
Diff
Patch with a more explicit check.
(6.33 KB, patch)
2010-04-27 17:34 PDT
,
Dmitry Titov
ggaren
: review+
dimich
: commit-queue-
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Eric Seidel (no email)
Comment 1
2010-04-14 10:43:24 PDT
Again just now:
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57587%20(8273)/results.html
Eric Seidel (no email)
Comment 2
2010-04-14 12:20:34 PDT
Struck again:
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57593%20(8279)/results.html
Given this has reproduced 3 times just this morning, I expect it should be easy to repro.
Eric Seidel (no email)
Comment 3
2010-04-14 14:36:59 PDT
Yeah, I think something might have recently regressed to cause this to get so flaky. Again just now:
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57604%20(8293)/results.html
Eric Seidel (no email)
Comment 4
2010-04-14 22:43:49 PDT
This appears to be crashing most builds. Something is definitely wrong.
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57630%20(8320)/results.html
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57631%20(8322)/results.html
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57633%20(8323)/results.html
Eric Seidel (no email)
Comment 5
2010-04-18 12:20:45 PDT
This test continues to crash super often:
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57795%20(8494)/results.html
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57794%20(8492)/results.html
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57790%20(8488)/results.html
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r57788%20(8481)/results.html
Eric Seidel (no email)
Comment 6
2010-04-18 14:20:24 PDT
Created
attachment 53635
[details]
crash log This crash is super-easy to repro locally. run-webkit-tests --debug --iterations 100 fast/workers/wrapper-map-gc.html did it for me. Crash log attached.
Eric Seidel (no email)
Comment 7
2010-04-18 14:23:09 PDT
Comment on
attachment 53635
[details]
crash log Looks like it hit this ASSERT: ASSERT(isUndefinedOrNull());
http://trac.webkit.org/browser/trunk/JavaScriptCore/runtime/JSValue.cpp#L78
That ASSERT has been there forever (added by ggaren).
Eric Seidel (no email)
Comment 8
2010-04-18 14:23:38 PDT
Maybe Geoff can offer some insight as to what Workers is doing wrong here.
Geoffrey Garen
Comment 9
2010-04-19 16:28:10 PDT
It looks like the JavaScript wrapper being used as 'this' on this test is garbage. Not sure why.
Dmitry Titov
Comment 10
2010-04-20 20:41:42 PDT
It was broken by introducing the possibility for a JS wrapper of WorkerContext be returned as 0 in case of terminated worker, to prevent the re-entry into JS (
r57349
). It works fine if all the places that get the wrapper before invoking JS check for the value. JSEventListener did not, so if worker's JS execution was terminated between pulling globalObject (returns 0 for terminated worker, there is a check) and getting a wrapper for the 'this' (returns 0 for terminated worker, no check) - the subsequent attempt to run a listener function crashes. Note there is no need to synchronize checks here, even though the termination is requested on one thread and the value of a flag is used on another - once the JS execution on worker thread is set up properly, it's ok to go into it since it will be terminated by JSC::Terminator causing an exception eventually. So, I broke it :-(. My only excuse is that I wrote the test that caught the borkage and I now have a fix :-) Patch coming.
Dmitry Titov
Comment 11
2010-04-20 20:49:54 PDT
Created
attachment 53914
[details]
Patch.
Eric Seidel (no email)
Comment 12
2010-04-21 04:27:36 PDT
Looks sane to me, but someone with more recent experience in JSC should probably r+ this.
Geoffrey Garen
Comment 13
2010-04-21 09:32:50 PDT
Thanks Dmitry, I think I understand what's going on here now. The ASSERT you ran into reflects a JSC philosophy that it's an error for the runtime ever to encounter a JSValue that is not a standard JavaScript value. We allow clients to construct an empty JSValue() / null C pointer, for their convenience, but we don't allow clients to pass an empty JSValue() / null C pointer into JSC. So, your change in JSWorkerContextBase.cpp is the right thing to do. However, I have two questions about the rest of the patch: 1. If a WorkerContext wants to prevent events from firing, isn't encoding that information in JSEventListener::handleEvent pretty far afield? It would be better if we could separate concerns, so that JSEventListener didn't have to know about the internal implementation details of WorkerContext. I wonder if WorkerContext could just override "virtual bool dispatchEvent(PassRefPtr<Event>);", and have it do nothing in the case where the WorkerContext's m_executionForbidden flag has been set. 2. Doesn't ScheduledAction::execute need to be patched too? It seems to assume that workerContextWrapper() will not return 0.
Geoffrey Garen
Comment 14
2010-04-21 09:33:13 PDT
Comment on
attachment 53914
[details]
Patch. r- because I think there's a better way to do this.
Dmitry Titov
Comment 15
2010-04-21 14:53:15 PDT
(In reply to
comment #13
)
> However, I have two questions about the rest of the patch: > > 1. If a WorkerContext wants to prevent events from firing, isn't encoding that > information in JSEventListener::handleEvent pretty far afield? It would be > better if we could separate concerns, so that JSEventListener didn't have to > know about the internal implementation details of WorkerContext. I wonder if > WorkerContext could just override "virtual bool > dispatchEvent(PassRefPtr<Event>);", and have it do nothing in the case where > the WorkerContext's m_executionForbidden flag has been set. > > 2. Doesn't ScheduledAction::execute need to be patched too? It seems to assume > that workerContextWrapper() will not return 0.
#1 - that was the first idea, for both JSC and V8 implementations - to check some flag on WorkerScriptController early on. However, it translated into need to add this check to so many places.. We have many EventTargets, even in Workers, various evaluate() methods, callbacks (like in databases), even things like conversion of JS types to string can reenter JS (at least in V8 they do). All of those places pull the global object or context in some form in order to enter JS - and many places already check for null, using this pattern (or similar): JSDOMGlobalObject* globalObject = toJSDOMGlobalObject(scriptExecutionContext, m_isolatedWorld.get()); if (!globalObject) return; So it seems logical to avoid lots of extra piping (to pull WorkerScriptController or some Worker-independent way to figure out the scripting engine was terminated) and just make it official that an attempt to pull the global object wrapper before running some JS can return no wrapper and we need checks for that. Also, it seems a better trap for errors, because returning 0 globalObject will likely cause crash, while silent eventual executing of some event handlers may go unnoticed for a while and create more problems down the road. It could also be harder to figure out since the crashes could happen later in the code and vary in signature. In addition to being more centralized, returning null context/globalObject also makes code for JCS and V8 be similar. In case of V8, there are more "js entry" points that has to be guarded against reentry, so having to pull WorkerScriptController (or even more ubiquitous ScriptExecutionContext) to all necessary places would be hard and would simply add another check to existing checks for null context. There is a bit more discussion in
bug 37053
, which introduced returning 0. It's possible to remove returning 0 from WorkerScriptController::workerContextWrapper() and add a virtual dispatchEvent() as you propose but it immediately breaks test from
bug 37053
and perhaps introduces more issues not yet covered by tests. I'd prefer to keep the crash we have now, at least we will be able to recognize the root issue (running JS in terminated worker) because those crashes, while timing-dependent, would look similar. #2 is currently impossible in Workers (due to the way timers work there, via WorkerRunLoop) and I left it w/o check intentionally to make sure it crashes if there is a regression in the way WorkerRunLoop processes tasks. Could you please weight this and let me know what you think? I could have added more info in the beginning.
Dmitry Titov
Comment 16
2010-04-22 11:40:04 PDT
Created
attachment 54081
[details]
Added check in ScheduledAction I've added a check in Scheduled action, it makes sense to have this checks in all places that may be in worker's code . I'm still feeling the check for the global object wrapper performed down the hierarchy is the right thing. For example, dispatchEvent() invokes a non-virtual fireEventListeners which has a loop over all registered event handler functions and invoke them one after another. If the execution was terminated in the middle of one, we want to avoid invoking the rest.
WebKit Review Bot
Comment 17
2010-04-22 11:41:12 PDT
Attachment 54081
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1 WebCore/bindings/js/JSEventListener.cpp:118: One space before end of line comments [whitespace/comments] [5] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Dmitry Titov
Comment 18
2010-04-22 11:44:25 PDT
Created
attachment 54082
[details]
Patch. Fixed style issue.
Dmitry Titov
Comment 19
2010-04-26 12:54:12 PDT
I think this is ready for another look. I've changed ScheduledAction. Also decided to leave the check for 0 context low in the call hierarchy and tried to explain the reasons above. Lets iterate? This test crashes ~1 out of 3 runs on SL lately.
Adam Barth
Comment 20
2010-04-27 13:44:23 PDT
Again:
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r58328%20(9058)/results.html
Adam Barth
Comment 21
2010-04-27 13:47:31 PDT
Comment on
attachment 54082
[details]
Patch. I'd like ggaren to take another look, but this crash is happening really often.
Dmitry Titov
Comment 22
2010-04-27 16:45:56 PDT
It's hard to get ggaren for a chat :-) From what I could glean from a brief IRC conversation, he would really like to see more explicit check rather then a null check on a pointer. I tend to agree since there is no precedent of checking a result of toJS(..) for 0. On another hand, it's best not to pull some high-level objects like WorkerScriptController down to JS level to do checks. I'm experimenting with another patch, will upload soon.
Geoffrey Garen
Comment 23
2010-04-27 17:01:06 PDT
It's not good that workerContextWrapper() can magically become 0 at any time from another thread. This means that all clients of workerContextWrapper(), implicit and explicit, must constantly check for 0, or whatever it might transmute into, like jsNull(), even within the subcomponents of a given expression. That's crazy! In this patch, you've missed at least three other place that need to check the magic crash bit: JSValue handleEventFunction = jsFunction->get(exec, Identifier(exec, "handleEvent")); JSC::call(exec, handleEventFunction, callType, callData, jsFunction, args) event->storeResult(ustringToString(retval.toString(exec))); I'm not sure if you've missed other places, too. It's very difficult to reason about a magic multithreaded bit that can cause your program to crash at any time.
Geoffrey Garen
Comment 24
2010-04-27 17:02:16 PDT
Comment on
attachment 54082
[details]
Patch. Please add tests for the cases that you missed, and fix those cases. I think the right fix would be to move away from a magic threaded bit that can cause WebKit to crash at any time, but maybe someone else has an even better solution.
Dmitry Titov
Comment 25
2010-04-27 17:34:06 PDT
Created
attachment 54484
[details]
Patch with a more explicit check. I hope this is better. I've added ScriptExecutionContext::isJSExecutionTerminated(), which is always 'false' for Document and becomes 'true' for WorkerContext once Worker.terminate() or WorkerGlobalScope.clsoe() is invoked. Apparently, JSC does not crash on reentry while execution is terminated, it will just cause another exception and almost immediate exit from the JS fragment.
Dmitry Titov
Comment 26
2010-04-27 18:03:09 PDT
(In reply to
comment #24
)
> (From update of
attachment 54082
[details]
) > Please add tests for the cases that you missed, and fix those cases. > > I think the right fix would be to move away from a magic threaded bit that can > cause WebKit to crash at any time, but maybe someone else has an even better > solution.
I think there are no more missed cases, this fixes wrapper-map-gc.html for JSC. There is no more 0 pointers returned from toJS. My understanding of the code is that GlobalData::terminator will raise a new exception on reentry and exit execution. As I understand your recommended to have an explicit check, avoid context pointer to be returned as 0, and you also suggested that it is not necessarily to check before every JS entry point, since JSC (unlike v8) tolerates reentry. The attached patch does this. I don't know how to find all the entry points into JS, how to write tests for that etc. While there is a remote chance of something breaking once JS execution starts throwing termination exception, it doesn't seem practical to try to add checks everywhere. It's better to write more tests with time on using Workers in various modes, find those issues and fix them.
Geoffrey Garen
Comment 27
2010-04-28 10:47:56 PDT
Comment on
attachment 54484
[details]
Patch with a more explicit check. r=me
Dmitry Titov
Comment 28
2010-04-28 12:40:39 PDT
Landed:
http://trac.webkit.org/changeset/58421
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug