WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
92299
[Chromium-Android] NRWT better handling of DRT deadlocks and crashes
https://bugs.webkit.org/show_bug.cgi?id=92299
Summary
[Chromium-Android] NRWT better handling of DRT deadlocks and crashes
Xianzhu Wang
Reported
2012-07-25 14:14:17 PDT
The current chromium-android NRWT driver has some problems handling deadlocks and crashes: 1. deadlock detector thread kills the driver: The _deadlock_detector() method running in the deadlock detector thread holds a reference of the driver. When the driver exists before deadlock detector thread exits, the driver won't be destroyed because of the reference held by the deadlock detector thread. When the thread exits, stop() will be called from __del__ of the driver which runs 'adb force-stop org.chromium.native_tests'. This causes the current running drt app to be unexpectedly killed. 2. The retry-after-killed logic is not good: the related code is scattered, and the retry counter is global instead of counting the current retries.
Attachments
Patch
(12.24 KB, patch)
2012-07-25 14:32 PDT
,
Xianzhu Wang
no flags
Details
Formatted Diff
Diff
Patch
(11.56 KB, patch)
2012-07-26 15:50 PDT
,
Xianzhu Wang
no flags
Details
Formatted Diff
Diff
Patch
(11.46 KB, patch)
2012-07-27 09:04 PDT
,
Xianzhu Wang
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Xianzhu Wang
Comment 1
2012-07-25 14:32:07 PDT
Created
attachment 154448
[details]
Patch
Xianzhu Wang
Comment 2
2012-07-26 13:23:33 PDT
Dirk, could you review the patch? Thanks!
Dirk Pranke
Comment 3
2012-07-26 13:34:45 PDT
Comment on
attachment 154448
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=154448&action=review
> Tools/Scripts/webkitpy/layout_tests/port/chromium_android.py:402 > + _log.debug('Run adb result: ' + result[:80])
nit: any particular reason to limit this to [:80], instead of the whole output?
> Tools/Scripts/webkitpy/layout_tests/port/chromium_android.py:516 > time.sleep(1)
typically in a slow-spin look like this I'll key this off of time instead of a count, e.g. deadline = time.time() + DRT_START_STO_SECS: while time.time() < deadline: if ...: break time.sleep(0.1) the advantage here is that your sleep granularity isn't quite so coarse; also, I think the logic can be a bit clearer. You have these slow-spins in several places in the code; it would probably be a good idea to wrap them into a loop_with_timeout() function or something to reuse the code to ensure consistency.
> Tools/Scripts/webkitpy/layout_tests/port/chromium_android.py:548 > + time.sleep(1)
same comment re: using deadlines instead of a range and a fixed sleep interval ...
> Tools/Scripts/webkitpy/layout_tests/port/chromium_android.py:592 > + raise AssertionError('DumpRenderTree is killed by Android for too many times!')
why do you need to retry the test? is there anything else running on the android device that might be causing OOMs and/or causing the DRT to get killed? It seems like you should just treat this as a crash and return and let the rest of the infrastructure in NRWT deal with handling potential flakiness?
> Tools/Scripts/webkitpy/layout_tests/port/server_process.py:234 > + _log.debug('unexpected EOF of stdout')
why did you change these?
Xianzhu Wang
Comment 4
2012-07-26 14:14:08 PDT
Comment on
attachment 154448
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=154448&action=review
>> Tools/Scripts/webkitpy/layout_tests/port/chromium_android.py:402 >> + _log.debug('Run adb result: ' + result[:80]) > > nit: any particular reason to limit this to [:80], instead of the whole output?
Some outputs of commands like 'adb logcat -d', 'cat /data/tombstones/tombstones01' are very long, and normally these outputs will be logged after the run_adb_command() call. So I limit the length of output here. Will add comment here.
>> Tools/Scripts/webkitpy/layout_tests/port/chromium_android.py:516 >> time.sleep(1) > > typically in a slow-spin look like this I'll key this off of time instead of a count, e.g. > deadline = time.time() + DRT_START_STO_SECS: > while time.time() < deadline: > if ...: > break > time.sleep(0.1) > > the advantage here is that your sleep granularity isn't quite so coarse; also, I think the logic can be a bit clearer. > > You have these slow-spins in several places in the code; it would probably be a good idea to wrap them into a loop_with_timeout() function or something to reuse the code to ensure consistency.
Good suggestion. Will make the change.
>> Tools/Scripts/webkitpy/layout_tests/port/chromium_android.py:592 >> + raise AssertionError('DumpRenderTree is killed by Android for too many times!') > > why do you need to retry the test? is there anything else running on the android device that might be causing OOMs and/or causing the DRT to get killed? It seems like you should just treat this as a crash and return and let the rest of the infrastructure in NRWT deal with handling potential flakiness?
The Android system kills processes frequently when needed so it doesn't always mean a failure of the test. Hao added the loop when he observed some unexpected failures. However, now we run DRT as an apk so the situation might be different. Perhaps we can try to remove the loop and see how it works.
>> Tools/Scripts/webkitpy/layout_tests/port/server_process.py:234 >> + _log.debug('unexpected EOF of stdout') > > why did you change these?
With _log.warning() the message will be printed on each crashing test on Android, which looks not good. I think a crashing test is a normal case for the driver script, so changed it to debug.
Xianzhu Wang
Comment 5
2012-07-26 14:15:05 PDT
Hao, could you provide more information about retry on killed by the system? How often did it occur?
Dirk Pranke
Comment 6
2012-07-26 14:17:41 PDT
(In reply to
comment #4
)
> >> Tools/Scripts/webkitpy/layout_tests/port/server_process.py:234 > >> + _log.debug('unexpected EOF of stdout') > > > > why did you change these? > > With _log.warning() the message will be printed on each crashing test on Android, which looks not good. I think a crashing test is a normal case for the driver script, so changed it to debug.
I was thinking this might have to do with crashes; if this happens after a crash, I frankly don't care at all. Can you leave this as _log.warning() but only log it if the subprocess is still alive?
Xianzhu Wang
Comment 7
2012-07-26 14:37:11 PDT
(In reply to
comment #6
)
> I was thinking this might have to do with crashes; if this happens after a crash, I frankly don't care at all. Can you leave this as _log.warning() but only log it if the subprocess is still alive?
It is almost always the case on Android. The pipe almost always terminates before the 'adb shell' process when the test crashes. On Android with the _log.warning() (even with 'and not self.poll()') condition, every crash will output the message. However, this is not a big issue, and I'd be glad to keep the warning if it's useful for other ports.
Dirk Pranke
Comment 8
2012-07-26 14:43:37 PDT
(In reply to
comment #7
)
> (In reply to
comment #6
) > > I was thinking this might have to do with crashes; if this happens after a crash, I frankly don't care at all. Can you leave this as _log.warning() but only log it if the subprocess is still alive? > > It is almost always the case on Android. The pipe almost always terminates before the 'adb shell' process when the test crashes. On Android with the _log.warning() (even with 'and not self.poll()') condition, every crash will output the message. > > However, this is not a big issue, and I'd be glad to keep the warning if it's useful for other ports.
Sorry, maybe I wasn't clear. If we hit this condition, and the test didn't crash (on any port), this indicates a bug. If the test did crash, then I don't care about this condition at all (because I wouldn't expect there to be a #EOF). So can you leave this as a warning, but only print it if the test didn't crash?
Xianzhu Wang
Comment 9
2012-07-26 14:55:52 PDT
(In reply to
comment #8
)
> Sorry, maybe I wasn't clear. > If we hit this condition, and the test didn't crash (on any port), this indicates a bug.
This is not true on Android, because the condition itself is our method to detect crash. The 'adb shell' subprocess will be almost always live when a test crashes. When we detect EOF, we know the crash and will kill the 'adb shell' subprocess.
> If the test did crash, then I don't care about this condition at all (because I wouldn't expect there to be a #EOF). So can you leave this as a warning, but only print it if the test didn't crash?
Dirk Pranke
Comment 10
2012-07-26 15:07:28 PDT
(In reply to
comment #9
)
> (In reply to
comment #8
) > > Sorry, maybe I wasn't clear. > > If we hit this condition, and the test didn't crash (on any port), this indicates a bug. > > This is not true on Android, because the condition itself is our method to detect crash. The 'adb shell' subprocess will be almost always live when a test crashes. When we detect EOF, we know the crash and will kill the 'adb shell' subprocess.
Oh, I see. Well, the warning indicates a bug on the other platforms, so either we need to provide a way to override this for android, or leave it alone :).
Xianzhu Wang
Comment 11
2012-07-26 15:50:34 PDT
Created
attachment 154775
[details]
Patch
Dirk Pranke
Comment 12
2012-07-26 15:58:51 PDT
Comment on
attachment 154775
[details]
Patch much cleaner. thanks!
WebKit Review Bot
Comment 13
2012-07-26 20:33:36 PDT
Comment on
attachment 154775
[details]
Patch Rejecting
attachment 154775
[details]
from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2 Last 500 characters of output: Kit/chromium/third_party/yasm/source/patched-yasm --revision 134927 --non-interactive --force --accept theirs-conflict --ignore-externals' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' 48>At revision 134927. ________ running '/usr/bin/python tools/clang/scripts/update.py --mac-only' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' ________ running '/usr/bin/python gyp_webkit' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' Updating webkit projects from gyp files... Full output:
http://queues.webkit.org/results/13378119
Xianzhu Wang
Comment 14
2012-07-26 20:58:09 PDT
Comment on
attachment 154775
[details]
Patch OOPS! Won't CQ update the OOPS line? Try again.
WebKit Review Bot
Comment 15
2012-07-26 22:10:49 PDT
Comment on
attachment 154775
[details]
Patch Rejecting
attachment 154775
[details]
from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2 Last 500 characters of output: Kit/chromium/third_party/yasm/source/patched-yasm --revision 134927 --non-interactive --force --accept theirs-conflict --ignore-externals' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' 48>At revision 134927. ________ running '/usr/bin/python tools/clang/scripts/update.py --mac-only' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' ________ running '/usr/bin/python gyp_webkit' in '/mnt/git/webkit-commit-queue/Source/WebKit/chromium' Updating webkit projects from gyp files... Full output:
http://queues.webkit.org/results/13373177
Adam Barth
Comment 16
2012-07-26 22:36:47 PDT
Comment on
attachment 154775
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=154775&action=review
> Tools/ChangeLog:14 > + Additional information of the change such as approach, rationale. Please add per-function descriptions below (OOPS!).
You need to remove this line yourself.
Xianzhu Wang
Comment 17
2012-07-27 09:04:51 PDT
Created
attachment 154967
[details]
Patch
Xianzhu Wang
Comment 18
2012-07-27 09:05:40 PDT
(In reply to
comment #16
)
> (From update of
attachment 154775
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=154775&action=review
> > > Tools/ChangeLog:14 > > + Additional information of the change such as approach, rationale. Please add per-function descriptions below (OOPS!). > > You need to remove this line yourself.
Oh, I missed that OOPS. I should have removed it when editing ChangeLog.
WebKit Review Bot
Comment 19
2012-07-27 11:42:53 PDT
Comment on
attachment 154967
[details]
Patch Clearing flags on attachment: 154967 Committed
r123900
: <
http://trac.webkit.org/changeset/123900
>
WebKit Review Bot
Comment 20
2012-07-27 11:42:58 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug