100686 – WebKitTestRunner flakily hangs when running pixel tests on Qt

RESOLVED FIXED 100686

WebKitTestRunner flakily hangs when running pixel tests on Qt

https://bugs.webkit.org/show_bug.cgi?id=100686

Summary WebKitTestRunner flakily hangs when running pixel tests on Qt

Balazs Kelemen

Reported 2012-10-29 10:11:42 PDT

Well, this is really just a try to catch the issue that makes the pixel tests timing out on the bot sometimes. My theory is that if a forced repaint times out, than it can trigger later when we expect it for a new test and it can confuse our machinery. Actually I could not prove that this is happening becase I could not reproduce the problem locally, but it cannot hurt if we add a check for this.

Attachments
Patch (2.64 KB, patch) 2012-10-29 10:18 PDT, Balazs Kelemen	no flags	Details Formatted Diff Diff
Patch (8.00 KB, patch) 2012-11-06 07:23 PST, Balazs Kelemen	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Balazs Kelemen

Comment 1 2012-10-29 10:18:58 PDT

Created attachment 171276 [details] Patch

WebKit Review Bot

Comment 2 2012-10-29 10:20:36 PDT

Attachment 171276 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/WebKitTestRunner..." exit_code: 1 Tools/WebKitTestRunner/qt/TestInvocationQt.cpp:75: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] Total errors found: 1 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.

Balazs Kelemen

Comment 3 2012-10-29 10:23:03 PDT

(In reply to comment #2) > Attachment 171276 [details] did not pass style-queue: > > Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/WebKitTestRunner..." exit_code: 1 > Tools/WebKitTestRunner/qt/TestInvocationQt.cpp:75: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3] > Total errors found: 1 in 2 files > > > If any of these errors are false positives, please file a bug against check-webkit-style. check-webkit-style is complaining here: fputs("ERROR: unexpected forceRepaintDone callback triggered.\n" " It is likely the result of a timeouted repaint form a previous test.\n", stderr); So it expects the second line to be indented one space less, but I think it would be ugly.

Balazs Kelemen

Comment 4 2012-10-31 02:51:25 PDT

Comment on attachment 171276 [details] Patch According to my testing it will not be enough to eliminate the flakiness.

Balazs Kelemen

Comment 5 2012-10-31 03:13:51 PDT

Let's concentrate on the real bug! As one can see from the history of the bot (http://build.webkit.sed.hu/builders/x86-64%20Linux%20Qt%20Release%20WebKit2%20%28Pixel%20Tests%29) our pixel tests are sometimes start to fail at a given point and than every test is failing, i.e. we timeout waiting for the callback of forceRepaint. It's very bad since in this case the bot have to be stopped manually, otherwise it can run for a whole day. I don't know the exact reason, but what I can tell is that for some reason we don't get the didrenderframe from the web process although it does send it, so we will never push a flushLayerChange operation in the queue of LayerTreeRenderer and never send rendernextframe again and the web process will wait for it forever without rendering a single frame. As I am not too sure how difficult it would be to find the reason behind not getting the message (note that you need to run all tests about 10 times in average even to reproduce it), currently I am thinking about adding yet another private API for the time being for forcing a flushLayerChange. At the first time we timeout we can use it and I believe it would move us towards for the next test. Now I am going to test whether it works.

Jocelyn Turcotte

Comment 6 2012-10-31 05:59:01 PDT

It's best if we can fix this without blindly pushing fixes that add unneeded complexity until it works. If you can reproduce the problem somehow locally or on a bot without having to push changes, you could try to see if this is a result of Qt triggering a cleanup of the nodes, i.e. if ~ContentsSGNode gets called. There could be issues around there where the web and UI processes would be waiting for each other.

Balazs Kelemen

Comment 7 2012-11-05 01:16:08 PST

(In reply to comment #6) > It's best if we can fix this without blindly pushing fixes that add unneeded complexity until it works. If you can reproduce the problem somehow locally or on a bot without having to push changes, you could try to see if this is a result of Qt triggering a cleanup of the nodes, i.e. if ~ContentsSGNode gets called. > I can reproduce it, about once a day :/ Actually the only thing I can do is to add fprintf(stderr,...)'s here and there and watch the output at of the test that timed out and some previous one. > There could be issues around there where the web and UI processes would be waiting for each other. As I understand, ~ContentsSGNode is handled fine, i.e. we throw away all queued update from the LayerTreeRenderer queue, and when we attached to a new node we send a rendernextframe (unconditionally). Ownerships are also seems ok. So I don't see anything that could stop the rendering and syncing loop. First we do an update upon window->show() that makes us send the first rendernextframe, and every didrenderframe triggers a flushLayerChanges in LayerTreeRenderer that also ends in rendernextframe. Anyway, I'm going to try reproducing digging more info via printf's further.

Balazs Kelemen

Comment 8 2012-11-06 07:23:45 PST

Created attachment 172580 [details] Patch

Balazs Kelemen

Comment 9 2012-11-06 07:27:18 PST

(In reply to comment #8) > Created an attachment (id=172580) [details] > Patch This is also a workaround, but I believe a more acceptable one :)

WebKit Review Bot

Comment 10 2012-11-06 11:27:58 PST

Comment on attachment 172580 [details] Patch Attachment 172580 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14761048 New failing tests: inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html

Csaba Osztrogonác

Comment 11 2012-11-07 07:52:17 PST

Comment on attachment 172580 [details] Patch LGTM, r=me. At least the following tests will pass if this flakiness occur.

Balazs Kelemen

Comment 12 2012-11-07 09:33:36 PST

Comment on attachment 172580 [details] Patch Clearing flags on attachment: 172580 Committed r133768: <http://trac.webkit.org/changeset/133768>

Balazs Kelemen

Comment 13 2012-11-07 09:33:41 PST

All reviewed patches have been landed. Closing bug.

Alexey Proskuryakov

Comment 14 2013-04-29 17:20:18 PDT

Comment on attachment 172580 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=172580&action=review > Tools/WebKitTestRunner/TestController.cpp:697 > - TestInvocation::dumpWebProcessUnresponsiveness("Failed to reset to consistent state before the first test"); > + m_currentInvocation->dumpWebProcessUnresponsiveness(); This change was not correct - there is no current invocation at this point yet. So, WebKitTestRunner just crashes when a debugger is attached to WebProcess before the first resetStateToConsistentValues().

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Balazs Kelemen

Reported

2012-10-29 10:11 PDT

Modified

2013-04-29 17:20 PDT History

CC List

5 users Show

URL

Keywords

Depends on

100676

Blocks

Dependencies

tree graph