If the web process become blocked for some reason with a test the test times out. After that TestController::resetStateToConsistentValues is called which sends another message to the web process and waits for the answer. If the web process is still blocked - which is likely the case - it will also times out and the harness reports unresponsiveness for the test. It's clear that the timeout is the result of the previous test so this is not a correct behavior. I think restarting the driver is the correct solution on timeout.
This can be reproduced with a simple endless loop. The next test will trigger reporting unresponsiveness.
Created attachment 145971 [details] Patch
Comment on attachment 145971 [details] Patch It's actually not compatible with nrwt as it does not expect the driver to exit without commanded to if there is no crash/timeout.
Created attachment 147104 [details] Patch
(In reply to comment #3) > (From update of attachment 145971 [details]) > It's actually not compatible with nrwt as it does not expect the driver to exit without commanded to if there is no crash/timeout. No problem after changed to return 1 on error, and no false-positive timeout.
(In reply to comment #5) > (In reply to comment #3) > > (From update of attachment 145971 [details] [details]) > > It's actually not compatible with nrwt as it does not expect the driver to exit without commanded to if there is no crash/timeout. > > No problem after changed to return 1 on error, and no false-positive timeout. After one more test round I realized that it was still not good.
Created attachment 147824 [details] Patch
Comment on attachment 147824 [details] Patch Err, I'm doing it wrong. Thinked over again, it's not consistent with the behavior we did so far, both in drt and wtr.
Created attachment 148098 [details] Patch
Comment on attachment 148098 [details] Patch Attachment 148098 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/12982148 New failing tests: svg/css/getComputedStyle-basic.xhtml
Created attachment 148168 [details] Archive of layout-test-results from ec2-cr-linux-03 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: ec2-cr-linux-03 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Created attachment 148292 [details] refactored patch
Comment on attachment 148292 [details] refactored patch The patch looks sane to me. Let's wait a few more days, and if there will be no objection, I can r+ this. View in context: https://bugs.webkit.org/attachment.cgi?id=148292&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:208 > + char buffer[256]; > + if (!resetDone) { > + pid_t pid = WKPageGetProcessIdentifier(m_mainWebView->page()); > + sprintf(buffer, "#PROCESS UNRESPONSIVE - WebProcess (pid %ld)\n", static_cast<long>(pid)); > + errorMessageToStderr = buffer; > + } Can't we make this nicer?
(In reply to comment #13) > (From update of attachment 148292 [details]) > The patch looks sane to me. Let's wait a few more days, and if there will be no objection, I can r+ this. > > View in context: https://bugs.webkit.org/attachment.cgi?id=148292&action=review > > > Tools/WebKitTestRunner/TestInvocation.cpp:208 > > + char buffer[256]; > > + if (!resetDone) { > > + pid_t pid = WKPageGetProcessIdentifier(m_mainWebView->page()); > > + sprintf(buffer, "#PROCESS UNRESPONSIVE - WebProcess (pid %ld)\n", static_cast<long>(pid)); > > + errorMessageToStderr = buffer; > > + } > > Can't we make this nicer? I'm not sure. I want to keep the dump logic separated in a function and not duplicate it. It's important to end the streams as the test harness expects, with #EOF at the end. So, avoiding the constant sized char buffer seems a bit overkill, I could not imagine something much better than this: const char* msgParts[] = {"#PROCESS UNRESPONSIVE - WebProcess (pid", ")\n"}; unsigned length = strlen(msgParts[0]) + strlen(msgPargs[1]) + 1 + sizeof(pid); OwnPtr<char> msg = adoptPtr(new char[length]); strcpy(msg.get(), msgPargs[0]); sprintf(msg.get() + strlen(msg.get()), " %ld", static_cast<long>(pid)); strcat(msg.get() + strlen(msg.get()), msgParts[1]); but it's definitely more ugly. Also stack allocating seems to be ok here, this is testing code, not an engine hotspot. So, IMHO trying to make it nicer would be only overengineering :)
I'm confused ... I believe NRWT already restarts WTR/DRT when a test times out. Are you not seeing this?
(In reply to comment #15) > I'm confused ... I believe NRWT already restarts WTR/DRT when a test times out. Are you not seeing this? There is no problem in the general case, and anyway the bug is not in NRWT but WTR. There is at least two kinds of timeout with WTR: * WTR don't produce output before timer expires in the harness. it's the same as with DRT, and the harness restarts the driver * web process times out but the driver (the child process in NRWT's point of view, which is the UI process) is still responsive, so WTR reports the timeout In the latter case there is a chance that the web process is not only runs the test slowly but it actually became unresponsive. There is a message pass after every test between UI proc - web proc that tell the web process to reset it's values to a clean state. If we don't get answer to this message we know that the web process is unresponsive, so we output a special message to stderr ("WEB PROCESS UNRESPONSIVE ...") and NRWT knows that it needs to restart the driver (since this is the most reliable way to handle an unresponsive web process). Otherwise, if the resetting was done we don't restart the driver since there is no reason to do that. So far so good. The actual bug is that WTR sends the restart message after it dumped the output for the test that make the web process unresponsive and writes the stderr message that time, so the test harness believes that the next test timed out. I hope I have clarified it. :)
(In reply to comment #16) > (In reply to comment #15) > > I'm confused ... I believe NRWT already restarts WTR/DRT when a test times out. Are you not seeing this? > > There is no problem in the general case, and anyway the bug is not in NRWT but WTR. > > There is at least two kinds of timeout with WTR: > * WTR don't produce output before timer expires in the harness. it's the same as with DRT, and the harness restarts the driver > * web process times out but the driver (the child process in NRWT's point of view, which is the UI process) is still responsive, so WTR reports the timeout > > In the latter case there is a chance that the web process is not only runs the test slowly but it actually became unresponsive. There is a message pass after every test between UI proc - web proc that tell the web process to reset it's values to a clean state. If we don't get answer to this message we know that the web process is unresponsive, so we output a special message to stderr ("WEB PROCESS UNRESPONSIVE ...") and NRWT knows that it needs to restart the driver (since this is the most reliable way to handle an unresponsive web process). Otherwise, if the resetting was done we don't restart the driver since there is no reason to do that. So far so good. > > The actual bug is that WTR sends the restart message after it dumped the output for the test that make the web process unresponsive and writes the stderr message that time, so the test harness believes that the next test timed out. > > I hope I have clarified it. :) I see. Yes, thank you. I had thought the situation was that the first test was reported as timeout.
Comment on attachment 148292 [details] refactored patch View in context: https://bugs.webkit.org/attachment.cgi?id=148292&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:203 > + char buffer[256]; 64 should be neough
Landed in http://trac.webkit.org/changeset/124596. Sorry, I forget about the 64 after merge :) I will fix it but first let's see if bots are happy with the patch.
Build fixes: http://trac.webkit.org/changeset/124598 http://trac.webkit.org/changeset/124600