Summary: | REGRESSION (??? - r50171): inspector tests crashing at JSC::TypeInfo::type() | ||
---|---|---|---|
Product: | WebKit | Reporter: | Eric Seidel (no email) <eric> |
Component: | Web Inspector (Deprecated) | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED WORKSFORME | ||
Severity: | Normal | CC: | ap, aroben, atwilson, barraclough, dimich, ggaren, joepeck, knorton, levin, oliver, pfeldman, timothy |
Priority: | P1 | Keywords: | InRadar |
Version: | 528+ (Nightly build) | ||
Hardware: | Mac | ||
OS: | OS X 10.5 | ||
Bug Depends on: | 31817, 31615 | ||
Bug Blocks: | 30916, 31268 | ||
Attachments: |
Description
Eric Seidel (no email)
2009-10-27 14:39:52 PDT
I wonder if this could be related to the crash just seen on the Tiger bot in inspector/console- tests: http://build.webkit.org/results/Tiger%20Intel%20Release/r50171%20(5612)/results.html Looks like it's crashing again this morning. This is a real bug: http://build.webkit.org/results/Leopard%20Intel%20Debug%20(Tests)/r50217%20(6552)/results.html Where can we get the crash logs? Why isn't the stderr output file a 404? (That would help, since this is likely an ASSERT.) "why is" We don't have an easy way to get crash logs off the bots yet. :( bug 14861. I expect that run-webkit-tests --iterations 100 inspector/console-format.html might reproduce it in a local debug build. Since this crash seems flakey, it doesn't crash every time. This console failure (internal timeout): http://build.webkit.org/results/Leopard%20Intel%20Debug%20(Tests)/r50239%20(6569)/inspector/console-format-collections-pretty-diff.html may also be related. I ran the inspector tests 100 times (--iterations 100) locally in debug mode and saw no crashes. My --iterations 100 run was with a rather old build of WebKit. I'm going to update and run the tests again since these failures only recently started on the leopard bot. Sadly inspector/ is currently the most flakey set of tests on the leopard debug bot. :( There are two things that we can try: 1) Change DRT so that inspector is enabled for LayoutTests/inspectors tests 2) Replace setTimeout(0) with the direct call in tests. I worry this might be an http-induced crasher leaking into the inspector tests. I remember we've had some flakiness with XHR tests in the past causing random crashes. I'll dig up the bugs. I'm right now running: run-webkit-tests --iterations 100 http inspector to see if this could be related to being run after the http tests. bug 29344, bug 30726, bug 30519, bug 30392 and bug 29090 are all about flakey http tests, some of which involve unexplained crashers. It is possible the inspector tests are just the most recent victim here. Just had console-dirxml.html fail on the Tiger bot: http://build.webkit.org/results/Tiger%20Intel%20Release/r50240%20(5657)/results.html Clearly something is wrong here that's causing flakey inspector/ tests on multiple bots. :( Created attachment 42064 [details]
crash report from running "run-webkit-tests --iterations 100 http inspector"
I guess I'll CC some of the JSC guys.
CCing a couple JIT guys in case this crash dump looks familiar to them. Created attachment 42082 [details]
13 crash reports from another run of "run-webkit-tests --iterations 100 http inspector"
I let the run complete. Here are 13 crash reports from the run. Obviously this bug is reproducible . :) Now I guess we just need a reduction...
The 13 reports don't all look the same, but I think there are only two separate stack traces. I'm currently trying to reduce the number of tests required to cause this to fail. Right now I have the set down to about 140. I'll post the list when I have it down to a more reasonable number. Right now that 140 is a subset of the http tests and all of the inspector tests. If I'm correctly reading the stack traces correctly, it looks like something is trying to toString() a bad JSValue pointer? Is that a correct reading? There has been a bug where quarantine wrappers were not holding wrapped objects and those were collected on the go. That was causing random crap to take place including the one you describe. Quarantine code seemed to be right though - at least it had appropriate mark methods. Could we run this with GC disabled? Or do some stessful GC on inspector tests only? Quarantined objects are only used in inspector and should soon go away. Created attachment 42147 [details]
180 tests when run which are known to crash.
I'm still trying to reduce this set.
cat known_to_crash.txt | xargs run-webkit-tests --iterations 100
will lead to the crash.
When this crashes, it seems to crash on the very first inspector test. It's possible that a GC is triggered during an inspector test and that's the reason why it's crashing. I guess I could try sprinkling gc() calls in inspector/console-dir.html and see what happens. Alexey suggested I try using COLLECT_ON_EVERY_ALLOCATION from Collector.cpp. I built a copy of WebKit with that, and ran all of the inspector/*.html tests under DumpRenderTree. I was not able to produce a crash. I wonder if one of the http tests is smashing memory in some way? It's strange that all of the crashes seem to have very similar crash points: 0x00000000fffffff0 0x0000000000000001 0x0000000000000fe4 0x0000000000000002 0x00000000fffffff6 Do these values look familiar to anyone in JIT land? The crash point is always: 0 com.apple.JavaScriptCore 0x0052bb81 JSC::TypeInfo::type() const + 9 (JSTypeInfo.h:60) Why do all of those crash points only have the low 8 bytes set? Whether either getting passed in bogus values or (and this seems more likely) we're truncating that tag bits from an jsvalue. It would be good to see if we can find exactly what revision this started at. Eric are you running on leopard or snowleopard? I'm running Leopard. So are the bots we've seen this crash on. I believe I've only ever seen this crash on Leopard Debug, although it's possible it crashes on other configurations. As yet i have been unable to repro -- if you can get a narrower revision range that would be great Created attachment 42230 [details]
crash log
(In reply to comment #27) > As yet i have been unable to repro -- if you can get a narrower revision range > that would be great The tests (as well as the testing harness) for inspector were introduced not so long ago, so I think it might be hard to narrow the revision range. It might have been there for a while. I've added a myriad of assertions but have yet to hit anything prior to the actual crash. It's really bizarre. I may start adding assertions looking for these specific bad values. This seems to be less common on the bots today, but is not gone. I suspect that xmlhttprequest tests were added and thus changed what objects were live at the time gc() ended up being called during the crashing console tests. inspector/console-format-collections.html is crashing consistently no the Leopard Debug Test Bot this evening. I assume it's just this bug. I assume that the stars aligned with the addition of some new test such that the gc timing is correct to trigger this bug more frequently again. Or at least that my (uninformed) theory. :) Have not seen Leopard bots failing since I queued things carefully in https://bugs.webkit.org/show_bug.cgi?id=30884. Or was it failing? I haven't seen the bots crash due to this in a while either. But I haven't been paying super-close attention. If it's fixed, do we have any idea what could have fixed it? I've queued all the interaction between the inspected page and frontend more carefully. In particular this excluded re-enterability from withing the timer fire. Created attachment 43152 [details] another crash log on r50935 Created attachment 43183 [details] crash report from console-dir.html when trying to land bug 31474 https://bugs.webkit.org/show_bug.cgi?id=31474#c5 saw this crash again. There have been a rash of GC-related crashes the last few days, so this may not be related to this particular bug, but is the same test. bug 31460 is one example of the other JSC crashes seen in the last 48 hours. Created attachment 43186 [details] Crash report from console-dir.html when trying to land bug 31456 Created attachment 43188 [details] Another crash report from console-dir.html when trying to land bug 31406 I'm not sure that: https://bugs.webkit.org/attachment.cgi?id=43183 https://bugs.webkit.org/attachment.cgi?id=43186 https://bugs.webkit.org/attachment.cgi?id=43188 Are actually related to the original bug in question. They just happen to be console-dir.html crashes of the last couple days. They may be of a different origin. Looks like we're seeing console-dir.html crashes on the build bots too: http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r50956%20(7276)/results.html http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r50933%20(7258)/results.html Attempting to reduce the set of tests required to produce a crash. I've plugged the set of test cases into the automated test minimizer "tmin": http://code.google.com/p/tmin/wiki/TminManual and we're just gonna hope. :) Created attachment 43383 [details]
27 tests which are known to crash when run together
cat known_to_crash.txt | xargs run-webkit-tests --iterations 10 --no-launch-safari --debug
is the command I'm using.
OK. I've reduced it to 4 tests required to produce the crash: http/tests/xmlhttprequest/workers/shared-worker-close.html http/tests/xmlhttprequest/workers/shared-worker-methods.html inspector/console-dir.html inspector/console-format.html This command: run-webkit-tests --debug --iterations 20 --no-launch-safari http/tests/xmlhttprequest/workers/shared-worker-close.html http/tests/xmlhttprequest/workers/shared-worker-methods.html inspector/console-dir.html inspector/console-format.html Reliably produces a crash for me. Looking now to see if I can condense this down into a single test case. Looks like this is caused by Shared Workers + gc() This command crashes reliably for me: run-webkit-tests --debug --iterations 100 --no-launch-safari http/tests/xmlhttprequest/workers/shared-worker-methods.html I'll see if I can reduce that single test further. Although at this point I would expect one of the JSC experts should be able to give some theories as to what's going wrong here. :) I think I have a fix for this. Will create a patch later today. This particular test (xhr in shared workers) fails because of this change: http://trac.webkit.org/changeset/50919 (landed 11/12) (In reply to comment #47) > Looks like this is caused by Shared Workers + gc() > > This command crashes reliably for me: > run-webkit-tests --debug --iterations 100 --no-launch-safari > http/tests/xmlhttprequest/workers/shared-worker-methods.html > > I'll see if I can reduce that single test further. Although at this point I > would expect one of the JSC experts should be able to give some theories as to > what's going wrong here. :) Based on that i think the crash we're currently looking at is a different issue from the one this bug refers to (the revision you refer to is after the date this bug was filed) Perhaps there are more then one cause. One of them is bug 31615. Lets see what remains after that one will land. With patch for bug 31615 applied, this command does not crash (before it did): run-webkit-tests --iterations 1000 --no-launch-safari http/tests/xmlhttprequest/workers/shared-worker-methods.html Inspired by the diagnosis made in bug 31615 I looked back through the changes just before r50171 again. I wonder if http://trac.webkit.org/changeset/50167 could be related to this at all? XHRs are used from multiple threads, no? Is it safe to call those inspector methods from XHR? Timeline can only receive events on main thread. I can see timeline being called from within callReadyStateChangeListener only. This one is presumably dispatching events on the main thread for Document context. Workers' contexts should have no timeline agent instances due to the logic in InspectorTimelineAgent::retrieve. So things should be ok, unless marshalling of these events from XHR to main thread is happening later. *** Bug 31999 has been marked as a duplicate of this bug. *** Does not seem to be there anymore. |