88404 – [WTR] process unresponsiveness is reported to the wrong test

RESOLVED FIXED88404

[WTR] process unresponsiveness is reported to the wrong test

https://bugs.webkit.org/show_bug.cgi?id=88404

Summary [WTR] process unresponsiveness is reported to the wrong test

Balazs Kelemen

Reported 2012-06-06 02:45:03 PDT

If the web process become blocked for some reason with a test the test times out. After that TestController::resetStateToConsistentValues is called which sends another message to the web process and waits for the answer. If the web process is still blocked - which is likely the case - it will also times out and the harness reports unresponsiveness for the test. It's clear that the timeout is the result of the previous test so this is not a correct behavior. I think restarting the driver is the correct solution on timeout.

Attachments
Patch (4.21 KB, patch) 2012-06-06 02:53 PDT, Balazs Kelemen	no flags	Details Formatted Diff Diff
Patch (9.35 KB, patch) 2012-06-12 09:55 PDT, Balazs Kelemen	no flags	Details Formatted Diff Diff
Patch (12.34 KB, patch) 2012-06-15 08:15 PDT, Balazs Kelemen	no flags	Details Formatted Diff Diff
Patch (9.55 KB, patch) 2012-06-18 07:38 PDT, Balazs Kelemen	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ec2-cr-linux-03 (984.72 KB, application/zip) 2012-06-18 13:58 PDT, WebKit Review Bot	no flags	Details
refactored patch (9.58 KB, patch) 2012-06-19 02:23 PDT, Balazs Kelemen	zherczeg: review+	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Balazs Kelemen

Comment 1 2012-06-06 02:47:34 PDT

This can be reproduced with a simple endless loop. The next test will trigger reporting unresponsiveness.

Balazs Kelemen

Comment 2 2012-06-06 02:53:39 PDT

Created attachment 145971 [details] Patch

Balazs Kelemen

Comment 3 2012-06-11 16:31:03 PDT

Comment on attachment 145971 [details] Patch It's actually not compatible with nrwt as it does not expect the driver to exit without commanded to if there is no crash/timeout.

Balazs Kelemen

Comment 4 2012-06-12 09:55:33 PDT

Created attachment 147104 [details] Patch

Balazs Kelemen

Comment 5 2012-06-12 09:57:18 PDT

(In reply to comment #3) > (From update of attachment 145971 [details]) > It's actually not compatible with nrwt as it does not expect the driver to exit without commanded to if there is no crash/timeout. No problem after changed to return 1 on error, and no false-positive timeout.

Balazs Kelemen

Comment 6 2012-06-15 08:11:20 PDT

(In reply to comment #5) > (In reply to comment #3) > > (From update of attachment 145971 [details] [details]) > > It's actually not compatible with nrwt as it does not expect the driver to exit without commanded to if there is no crash/timeout. > > No problem after changed to return 1 on error, and no false-positive timeout. After one more test round I realized that it was still not good.

Balazs Kelemen

Comment 7 2012-06-15 08:15:20 PDT

Created attachment 147824 [details] Patch

Balazs Kelemen

Comment 8 2012-06-18 07:18:38 PDT

Comment on attachment 147824 [details] Patch Err, I'm doing it wrong. Thinked over again, it's not consistent with the behavior we did so far, both in drt and wtr.

Balazs Kelemen

Comment 9 2012-06-18 07:38:46 PDT

Created attachment 148098 [details] Patch

WebKit Review Bot

Comment 10 2012-06-18 13:58:52 PDT

Comment on attachment 148098 [details] Patch Attachment 148098 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/12982148 New failing tests: svg/css/getComputedStyle-basic.xhtml

WebKit Review Bot

Comment 11 2012-06-18 13:58:57 PDT

Created attachment 148168 [details] Archive of layout-test-results from ec2-cr-linux-03 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: ec2-cr-linux-03 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick

Balazs Kelemen

Comment 12 2012-06-19 02:23:04 PDT

Created attachment 148292 [details] refactored patch

Zoltan Herczeg

Comment 13 2012-07-10 04:31:31 PDT

Comment on attachment 148292 [details] refactored patch The patch looks sane to me. Let's wait a few more days, and if there will be no objection, I can r+ this. View in context: https://bugs.webkit.org/attachment.cgi?id=148292&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:208 > + char buffer[256]; > + if (!resetDone) { > + pid_t pid = WKPageGetProcessIdentifier(m_mainWebView->page()); > + sprintf(buffer, "#PROCESS UNRESPONSIVE - WebProcess (pid %ld)\n", static_cast<long>(pid)); > + errorMessageToStderr = buffer; > + } Can't we make this nicer?

Balazs Kelemen

Comment 14 2012-07-10 05:31:20 PDT

(In reply to comment #13) > (From update of attachment 148292 [details]) > The patch looks sane to me. Let's wait a few more days, and if there will be no objection, I can r+ this. > > View in context: https://bugs.webkit.org/attachment.cgi?id=148292&action=review > > > Tools/WebKitTestRunner/TestInvocation.cpp:208 > > + char buffer[256]; > > + if (!resetDone) { > > + pid_t pid = WKPageGetProcessIdentifier(m_mainWebView->page()); > > + sprintf(buffer, "#PROCESS UNRESPONSIVE - WebProcess (pid %ld)\n", static_cast<long>(pid)); > > + errorMessageToStderr = buffer; > > + } > > Can't we make this nicer? I'm not sure. I want to keep the dump logic separated in a function and not duplicate it. It's important to end the streams as the test harness expects, with #EOF at the end. So, avoiding the constant sized char buffer seems a bit overkill, I could not imagine something much better than this: const char* msgParts[] = {"#PROCESS UNRESPONSIVE - WebProcess (pid", ")\n"}; unsigned length = strlen(msgParts[0]) + strlen(msgPargs[1]) + 1 + sizeof(pid); OwnPtr<char> msg = adoptPtr(new char[length]); strcpy(msg.get(), msgPargs[0]); sprintf(msg.get() + strlen(msg.get()), " %ld", static_cast<long>(pid)); strcat(msg.get() + strlen(msg.get()), msgParts[1]); but it's definitely more ugly. Also stack allocating seems to be ok here, this is testing code, not an engine hotspot. So, IMHO trying to make it nicer would be only overengineering :)

Dirk Pranke

Comment 15 2012-07-11 12:20:25 PDT

I'm confused ... I believe NRWT already restarts WTR/DRT when a test times out. Are you not seeing this?

Balazs Kelemen

Comment 16 2012-07-12 00:51:20 PDT

(In reply to comment #15) > I'm confused ... I believe NRWT already restarts WTR/DRT when a test times out. Are you not seeing this? There is no problem in the general case, and anyway the bug is not in NRWT but WTR. There is at least two kinds of timeout with WTR: * WTR don't produce output before timer expires in the harness. it's the same as with DRT, and the harness restarts the driver * web process times out but the driver (the child process in NRWT's point of view, which is the UI process) is still responsive, so WTR reports the timeout In the latter case there is a chance that the web process is not only runs the test slowly but it actually became unresponsive. There is a message pass after every test between UI proc - web proc that tell the web process to reset it's values to a clean state. If we don't get answer to this message we know that the web process is unresponsive, so we output a special message to stderr ("WEB PROCESS UNRESPONSIVE ...") and NRWT knows that it needs to restart the driver (since this is the most reliable way to handle an unresponsive web process). Otherwise, if the resetting was done we don't restart the driver since there is no reason to do that. So far so good. The actual bug is that WTR sends the restart message after it dumped the output for the test that make the web process unresponsive and writes the stderr message that time, so the test harness believes that the next test timed out. I hope I have clarified it. :)

Dirk Pranke

Comment 17 2012-07-12 12:48:36 PDT

(In reply to comment #16) > (In reply to comment #15) > > I'm confused ... I believe NRWT already restarts WTR/DRT when a test times out. Are you not seeing this? > > There is no problem in the general case, and anyway the bug is not in NRWT but WTR. > > There is at least two kinds of timeout with WTR: > * WTR don't produce output before timer expires in the harness. it's the same as with DRT, and the harness restarts the driver > * web process times out but the driver (the child process in NRWT's point of view, which is the UI process) is still responsive, so WTR reports the timeout > > In the latter case there is a chance that the web process is not only runs the test slowly but it actually became unresponsive. There is a message pass after every test between UI proc - web proc that tell the web process to reset it's values to a clean state. If we don't get answer to this message we know that the web process is unresponsive, so we output a special message to stderr ("WEB PROCESS UNRESPONSIVE ...") and NRWT knows that it needs to restart the driver (since this is the most reliable way to handle an unresponsive web process). Otherwise, if the resetting was done we don't restart the driver since there is no reason to do that. So far so good. > > The actual bug is that WTR sends the restart message after it dumped the output for the test that make the web process unresponsive and writes the stderr message that time, so the test harness believes that the next test timed out. > > I hope I have clarified it. :) I see. Yes, thank you. I had thought the situation was that the first test was reported as timeout.

Zoltan Herczeg

Comment 18 2012-08-03 04:34:02 PDT

Comment on attachment 148292 [details] refactored patch View in context: https://bugs.webkit.org/attachment.cgi?id=148292&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:203 > + char buffer[256]; 64 should be neough

Balazs Kelemen

Comment 19 2012-08-03 05:02:20 PDT

Landed in http://trac.webkit.org/changeset/124596. Sorry, I forget about the 64 after merge :) I will fix it but first let's see if bots are happy with the patch.

Balazs Kelemen

Comment 20 2012-08-03 05:55:55 PDT

Build fixes: http://trac.webkit.org/changeset/124598 http://trac.webkit.org/changeset/124600

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Balazs Kelemen

Reported

2012-06-06 02:45 PDT

Modified

2012-08-03 05:55 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks