WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
62297
nwrt: Chromium Win hangs frequently
https://bugs.webkit.org/show_bug.cgi?id=62297
Summary
nwrt: Chromium Win hangs frequently
Dimitri Glazkov (Google)
Reported
2011-06-08 10:04:47 PDT
The latest occurence is:
http://build.webkit.org/builders/Chromium%20Win%20Release%20%28Tests%29/builds/15243/steps/layout-test/logs/stdio
Attachments
Add attachment
proposed patch, testcase, etc.
Dimitri Glazkov (Google)
Comment 1
2011-06-08 10:11:37 PDT
It looks like there's a buuuuuunch of orphan processes in left over: python.exe, LayoutTestHelper.exe.
Dirk Pranke
Comment 2
2011-06-08 13:49:53 PDT
Right around 2011-06-08 09:36:14,887 8616 stack_utils.py:67 DEBUG raise e 2011-06-08 09:36:14,887 8616 worker.py:148 DEBUG worker/0 cleaning up 2011-06-08 09:36:14,887 8616 worker.py:114 DEBUG worker/0 exiting you can see one of the threads bailing out, in this case because we tried to delete an old pywebsocket log file and failed. Because this was an unexpected exception, we bail out without trying to clean up, which has the result that nothing gets cleaned up properly and you have all of these stale processes around. The patch I've posted in
bug 62180
will fix this particular issue; it's possible that we should do more to try and clean up on the way out, though.
Tony Chang
Comment 3
2011-06-08 13:54:38 PDT
We should certainly fix the bugs in the scripts if possible, but we should also make sure the buildbot can recover no matter what state we're in. This is what the task_kill step tries to do (clean up stray processes). This doesn't work for python processes because if we kill all python processes, we take down the buildbot process. The chromium win bots try to work around this by having a separate binary called python_slave.exe (I think it's just a copy of python.exe) and running the buildbot slave with that binary. Then it's safe to taskkill /f /im python.exe on the waterfall.
Dirk Pranke
Comment 4
2011-06-08 13:57:34 PDT
(In reply to
comment #3
)
> We should certainly fix the bugs in the scripts if possible, but we should also make sure the buildbot can recover no matter what state we're in. This is what the task_kill step tries to do (clean up stray processes). This doesn't work for python processes because if we kill all python processes, we take down the buildbot process. The chromium win bots try to work around this by having a separate binary called python_slave.exe (I think it's just a copy of python.exe) and running the buildbot slave with that binary. Then it's safe to taskkill /f /im python.exe on the waterfall.
I agree with everything you wrote, and that's an interesting suggestion. Originally I didn't attempt to clean up the workers on an unexpected exception because I figured that might just make a bad thing worse; however, it would be hard to be worse than what seems to be happening on windows, so maybe I'll try changing the code to always try to clean up the workers and see if that helps.
Ryosuke Niwa
Comment 5
2011-06-08 14:24:45 PDT
(In reply to
comment #3
)
> We should certainly fix the bugs in the scripts if possible, but we should also make sure the buildbot can recover no matter what state we're in. This is what the task_kill step tries to do (clean up stray processes). This doesn't work for python processes because if we kill all python processes, we take down the buildbot process. The chromium win bots try to work around this by having a separate binary called python_slave.exe (I think it's just a copy of python.exe) and running the buildbot slave with that binary. Then it's safe to taskkill /f /im python.exe on the waterfall.
That sounds like a great idea. But I wonder if we can achieve the same effect by using perl, ruby, or some other scripting language.
Dirk Pranke
Comment 6
2011-06-08 15:27:35 PDT
(In reply to
comment #5
)
> That sounds like a great idea. But I wonder if we can achieve the same effect by using perl, ruby, or some other scripting language.
I'm not sure I follow you. Buildbot, new-run-webkit-tests, and pywebsocket all are written in Python. Tony's point was that we can't simply kill all python processes from taskkill or one of these scripts without killing ourselves or our parents. Are you suggesting that we rewrite one of these so that it isn't in Python?
Ryosuke Niwa
Comment 7
2011-06-08 15:36:41 PDT
(In reply to
comment #6
)
> I'm not sure I follow you. Buildbot, new-run-webkit-tests, and pywebsocket all are written in Python. Tony's point was that we can't simply kill all python processes from taskkill or one of these scripts without killing ourselves or our parents. Are you suggesting that we rewrite one of these so that it isn't in Python?
Right. We can write a simple perl/ruby script that kills all python instances and starts new python instance. That'll avoid having to duplicate python.exe and makes it easier to be deployed across ports.
Dirk Pranke
Comment 8
2011-06-08 15:57:12 PDT
(In reply to
comment #7
)
> (In reply to
comment #6
) > > I'm not sure I follow you. Buildbot, new-run-webkit-tests, and pywebsocket all are written in Python. Tony's point was that we can't simply kill all python processes from taskkill or one of these scripts without killing ourselves or our parents. Are you suggesting that we rewrite one of these so that it isn't in Python? > > Right. We can write a simple perl/ruby script that kills all python instances and starts new python instance. That'll avoid having to duplicate python.exe and makes it easier to be deployed across ports.
Since buildbot is in python, it can't call a script that kills all python processes, or it itself would be killed (causing the whole build to fail). You're not suggesting we rewrite buildbot, presumably, so I'm not sure how this would work?
Ryosuke Niwa
Comment 9
2011-06-08 15:59:24 PDT
(In reply to
comment #8
)
> Since buildbot is in python, it can't call a script that kills all python processes, or it itself would be killed (causing the whole build to fail). You're not suggesting we rewrite buildbot, presumably, so I'm not sure how this would work?
Ah, that's a good point. We can't kill buildslave.
Ryosuke Niwa
Comment 10
2011-06-08 15:59:55 PDT
Is it possible to figure out which python process is running buildslave and whitelist it?
Dirk Pranke
Comment 11
2011-06-08 16:15:32 PDT
(In reply to
comment #10
)
> Is it possible to figure out which python process is running buildslave and whitelist it?
The taskkill /im / "killall" processes (which are systemwide utilities that we didn't write) don't give us a way to say "kill everything named X except me if I'm named X" or anything like that kind of flexibility. It is presumably possible to reconstruct that logic in python or some other language to do it ourselves, but we haven't (yet) done so, and I have no idea how much work it would be, but at least on windows, a decent amount, I think.
Dirk Pranke
Comment 12
2011-12-21 14:11:10 PST
we kill old processes on the build.webkit.org bots now, so I think we can close this. Please reopen if anyone disagrees.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug