Bug 38298

Summary: new-run-webkit-tests can deadlock with Chromium's TestShell
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, commit-queue, dpranke, eric, jorlow, ojan, tony, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 38300    
Attachments:
Description Flags
Sample from new-run-webkit-tests when hung waiting on two TestShells
none
sample from first TestShell
none
sample from second TestShell
none
Patch none

Description Eric Seidel (no email) 2010-04-28 17:46:10 PDT
new-run-webkit-tests can deadlock with Chromium's TestShell

TestShell waits in fgets for a new URL from NRWT.

NRWT waits in .readline() for a line which looks like #EOF.

If TestShell never sends an #EOF, or hangs itself before sending the #EOF, then the two will deadlock.

See attached samples.
Comment 1 Eric Seidel (no email) 2010-04-28 17:47:00 PDT
Created attachment 54645 [details]
Sample from new-run-webkit-tests when hung waiting on two TestShells
Comment 2 Eric Seidel (no email) 2010-04-28 17:48:25 PDT
Created attachment 54646 [details]
sample from first TestShell
Comment 3 Eric Seidel (no email) 2010-04-28 17:48:43 PDT
Created attachment 54647 [details]
sample from second TestShell
Comment 4 Eric Seidel (no email) 2010-04-28 17:49:27 PDT
I'm not sure what exact conditions caused this deadlock, but it's clear from reading the code it's possible.  We may have had this for a long time, or it's possible (but unlikely) that this recently regressed.
Comment 5 Eric Seidel (no email) 2010-04-28 22:13:05 PDT
The hang in TestShell happens at this fgets:
http://src.chromium.org/cgi-bin/gitweb.cgi?p=chromium.git;a=blob;f=webkit/tools/test_shell/test_shell_main.cc;h=37c26bcff0bc9475f051cc0025b688109ed816ca;hb=HEAD#l316

If for any reason the first char of: filenameBuffer is 0 we can hang:

 324           if (!*filenameBuffer)
 325             continue;

This started happening more with my unicode changes, because we started sending unicode bytestreams to test_shell on some systems by mistake.  If the first byte left in the buffer is ever 00, then it hangs forever.  That could happen if IO got interupted due to process switching, etc.
Comment 6 Eric Seidel (no email) 2010-04-28 22:13:48 PDT
I have a fix for the added frequency.

Note, this is STILL a bug in test_shell/NRWT that they can deadlock like this.  It's been this way forever.  But it should occur less often now.
Comment 7 Eric Seidel (no email) 2010-04-28 22:16:08 PDT
Created attachment 54674 [details]
Patch
Comment 8 WebKit Review Bot 2010-04-28 22:21:52 PDT
Attachment 54674 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebKitTools/Scripts/webkitpy/layout_tests/port/chromium_unittest.py:77:  expected 1 blank line, found 0  [pep8/E301] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Eric Seidel (no email) 2010-04-28 22:36:40 PDT
pep8.py doesn't like my inline function definition, but I wasn't sure what it really wanted either.
Comment 10 Adam Barth 2010-04-28 22:37:17 PDT
Comment on attachment 54674 [details]
Patch

Bow down before our stylebot overlords.
Comment 11 Eric Seidel (no email) 2010-04-28 22:38:42 PDT
Stylebot is wrong.

PEP8 says nothing about inline methods:
http://www.python.org/dev/peps/pep-0008/

See the "blank lines" section.  pep8 is treating this like a "method in a class" which is wrong, this is a method in a method in a class, which is not covered in pep8.
Comment 12 Adam Barth 2010-04-28 22:39:30 PDT
Comment on attachment 54674 [details]
Patch

I'm suspicious, but ok.
Comment 13 Eric Seidel (no email) 2010-04-28 22:39:50 PDT
Comment on attachment 54674 [details]
Patch

Thank you for the review.  If you have a suggested style fix, I'm happy to make one.  But I think adding blank lines around the inline function is silly in this case, and certainly not covered by pep8.
Comment 14 WebKit Commit Bot 2010-04-29 04:29:18 PDT
Comment on attachment 54674 [details]
Patch

Clearing flags on attachment: 54674

Committed r58503: <http://trac.webkit.org/changeset/58503>
Comment 15 WebKit Commit Bot 2010-04-29 04:29:25 PDT
All reviewed patches have been landed.  Closing bug.
Comment 16 WebKit Review Bot 2010-04-29 05:01:30 PDT
http://trac.webkit.org/changeset/58503 might have broken Qt Linux Release