Bug 88414

Summary: [NRWT] XvfbDriver should choose the next free display
Product: WebKit Reporter: Kristóf Kosztyó <kkristof>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, cdumez, dpranke, galpeter, ojan, ossy, rakuco, rgabor, webkit.review.bot, zan
Priority: P2 Keywords: NRWT
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 89956, 90441, 98346, 103806    
Bug Blocks: 77730, 89880, 98162    
Attachments:
Description Flags
WIP patch
none
WIP patch
none
proposed fix
none
draft patch
none
proposed fix none

Description Kristóf Kosztyó 2012-06-06 07:08:18 PDT
Now the XfvbDriver chose the next display id with these simple way:

running_displays = len(Executive().running_pids(x_filter)) 
display_id = self._worker_number * 2 + running_displays

But if we ran some parallel nrwt's they could choose an already existing display which can cause problems.
Comment 1 Kristóf Kosztyó 2012-06-06 07:25:24 PDT
Created attachment 146020 [details]
WIP patch

We've discussed with Ossy that maybe we could copy the method the way xvfb-run scripts choose the next unreserved display. It works locally but I want to do more tests.
Comment 2 Csaba Osztrogonác 2012-06-07 02:45:02 PDT
Comment on attachment 146020 [details]
WIP patch

View in context: https://bugs.webkit.org/attachment.cgi?id=146020&action=review

> Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:62
> -        display_id = self._worker_number * 2 + running_displays
> +        display_id = next_free_id()
> +        print 'start new xvfb with %d id' % display_id
>          if pixel_tests:
>              display_id += 1

But what if next_free_id() + 1 isn't free?
Comment 3 Csaba Osztrogonác 2012-06-07 02:49:19 PDT
Comment on attachment 146020 [details]
WIP patch

View in context: https://bugs.webkit.org/attachment.cgi?id=146020&action=review

> Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:54
> +        def next_free_id():
> +            for i in range(99):
> +                if not os.path.exists('/tmp/.X%d-lock' % i):
> +                    return i

Hmm ... I think checking /tmp/.X%d-lock isn't enough ...
On the machine where run 3 buildslave, I have only
these lock files: .X0-lock, .X90-lock .
But there are three xvfb-run.XXXXXX directories
with Xauthority files contain the following numbers: 84, 39, 90.
(These numbers were passed to xvfb-run when we started the bots.)
And there are two sockets in /tmp/.X11-unix dir: X0 and X90.
Comment 4 Kristóf Kosztyó 2012-06-19 00:41:08 PDT
Committed r120688: <http://trac.webkit.org/changeset/120688>
Comment 5 Kristóf Kosztyó 2012-06-19 00:42:37 PDT
(In reply to comment #4)
> Committed r120688: <http://trac.webkit.org/changeset/120688>

Temporarily we disable the xvfb driver on qt
Comment 6 Csaba Osztrogonác 2012-06-21 04:07:21 PDT
Is there any progression?
Comment 7 Kristóf Kosztyó 2012-06-21 06:32:34 PDT
(In reply to comment #6)
> Is there any progression?

Yes there is.
Now I check the first free xvfb id with the check of the already existing .X#-lock files.
After when I found the first free one I lock this number with the webkitpy's FileLock. At locally it works fine also when I run more than one NRWT in parallel.
Comment 8 Kristóf Kosztyó 2012-06-21 06:37:37 PDT
Created attachment 148786 [details]
WIP patch

Work in progress patch with the FileLock. I would like to do some tests with it, but if everything go well I will upload the final patch.
Comment 9 Peter Gal 2012-06-22 02:25:42 PDT
View in context: https://bugs.webkit.org/attachment.cgi?id=148786&action=review

> Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:71
> +        environment['DUMPRENDERTREE_TEMP'] = str(self._driver_tempdir)

I did a little digging: Where will be this _driver_tempdir be set to something? In the parent class it'll be set in the _start method, but we override it here. Or am I missing something here?
Comment 10 Kristóf Kosztyó 2012-06-22 04:27:55 PDT
(In reply to comment #9)
> View in context: https://bugs.webkit.org/attachment.cgi?id=148786&action=review
> 
> > Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:71
> > +        environment['DUMPRENDERTREE_TEMP'] = str(self._driver_tempdir)
> 
> I did a little digging: Where will be this _driver_tempdir be set to something? In the parent class it'll be set in the _start method, but we override it here. Or am I missing something here?

You're right I missed out that.
Comment 11 Kristóf Kosztyó 2012-06-22 08:00:34 PDT
Created attachment 149029 [details]
proposed fix
Comment 12 Dirk Pranke 2012-06-22 14:53:53 PDT
Comment on attachment 149029 [details]
proposed fix

View in context: https://bugs.webkit.org/attachment.cgi?id=149029&action=review

the patch basically looks fine with a couple nits ... I'll leave cq? in case you want to change things.

> Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:53
> +                    _guard_lock_file = self._port._filesystem.join('/tmp', 'WebKitXvfb.lock.%i' % i)

nit: there's no reason to use an underscore in front of the name. I'd probably just combine lines 53 and 54 ...

> Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:58
> +        self.display_id = next_free_id()

should this be self._display_id to indicate that it's a private variable?
Comment 13 Kristóf Kosztyó 2012-06-26 00:58:20 PDT
Thank you.
I commited this in r121236: <http://trac.webkit.org/changeset/121236> if everything go well on the bots I will close the bug.
Comment 14 Kristóf Kosztyó 2012-06-26 01:59:06 PDT
(In reply to comment #13)
> Thank you.
> I commited this in r121236: <http://trac.webkit.org/changeset/121236> if everything go well on the bots I will close the bug.

Rollouted in r121239 because it produced two type of error:
* start the xvfb with the same display
* or everything went fine until it starts getting warnings from the file lock and the tests are timed out or crashed
Comment 15 Csaba Osztrogonác 2012-07-12 03:29:50 PDT
Comment on attachment 149029 [details]
proposed fix

It is obsolete patch.
Comment 16 Csaba Osztrogonác 2012-08-23 03:23:15 PDT
After Balázs's patch - https://trac.webkit.org/changeset/124581 - NRWT doesn't run two DRT for non-pixel and pixel tests, so XVFBDriver's display handling should be refactored with this fix too.
Comment 17 Csaba Osztrogonác 2012-08-23 07:36:17 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > Thank you.
> > I commited this in r121236: <http://trac.webkit.org/changeset/121236> if everything go well on the bots I will close the bug.
> 
> Rollouted in r121239 because it produced two type of error:
> * start the xvfb with the same display
> * or everything went fine until it starts getting warnings from the file lock and the tests are timed out or crashed

I think I got the root of the problem. I saw the following failing webkitpy
test on the Qt buildbot:

  Traceback (most recent call last):
    File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/common/system/filesystem_unittest.py", line 220, in test_read_and_write_file
      text_contents = fs.read_text_file(binary_path)
    File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/common/system/filesystem.py", line 213, in read_text_file
      with codecs.open(path, 'r', 'utf8') as f:
    File "/usr/lib/python2.6/codecs.py", line 881, in open
      file = __builtin__.open(filename, mode, buffering)
  IOError: [Errno 2] No such file or directory: '/tmp/tree_unittest_9TdIXB'

[283/1600] webkitpy.common.system.file_lock_integrationtest.FileLockTest.test_lock_lifecycle passed

-------------------------------------

It seems something is wrong with the FileLock class and it 
caused that NRWT started two driver on same display. Could
you check it, please?
Comment 18 Zan Dobersek 2012-08-29 00:59:23 PDT
Instead of checking for the X locks in /tmp directory, how about listing all the current Xorg and Xvfb processes, parsing their commands to get the displays they are using and then using the first free display for the new Xvfb instance? Perhaps -nolock could be passed to new Xvfb processes as well to avoid Xvfb-spawned lock files, but I think using a FileLock to denote that Xvfb (spawned by NRWT) is using a specific display  is a good idea.
Comment 19 Kristóf Kosztyó 2012-09-14 07:29:24 PDT
Created attachment 164148 [details]
draft patch

Now I check the next free display by the running processes. It seems more stable than the xvfb's lock files based approach. If this approach is good I will rewrite some unittest to fit this change.
Comment 20 Kristóf Kosztyó 2012-09-21 02:29:11 PDT
Created attachment 165091 [details]
proposed fix

It is process based choose of the next free display as the draft patch was but it contains unit tests :)
Comment 21 Kristóf Kosztyó 2012-09-28 06:57:54 PDT
Committed r129891: <http://trac.webkit.org/changeset/129891>
Comment 22 Raphael Kubo da Costa (:rakuco) 2012-10-08 01:56:32 PDT
Do you guys have any idea why this keeps failing every once in a while?

See, for example, <http://build.webkit.org/builders/EFL%20Linux%2064-bit%20Debug/builds/6866/steps/webkitpy-test/logs/stdio>:

[1580/1585] webkitpy.layout_tests.port.xvfbdriver_unittest.XvfbDriverTest.test_next_free_display failed:
  Traceback (most recent call last):
    File "/home/buildslave-1/webkit-buildslave/efl-linux-64-debug/build/Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver_unittest.py", line 86, in test_next_free_display
      self.assertEqual(driver._next_free_display(), 2)
  AssertionError: 3 != 2
Comment 23 Zan Dobersek 2012-10-08 02:04:22 PDT
(In reply to comment #22)
> Do you guys have any idea why this keeps failing every once in a while?
> 
> See, for example, <http://build.webkit.org/builders/EFL%20Linux%2064-bit%20Debug/builds/6866/steps/webkitpy-test/logs/stdio>:
> 
> [1580/1585] webkitpy.layout_tests.port.xvfbdriver_unittest.XvfbDriverTest.test_next_free_display failed:
>   Traceback (most recent call last):
>     File "/home/buildslave-1/webkit-buildslave/efl-linux-64-debug/build/Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver_unittest.py", line 86, in test_next_free_display
>       self.assertEqual(driver._next_free_display(), 2)
>   AssertionError: 3 != 2

Set the bug #98346 to be blocking this one.

As for the failure, it's strange. I'm pretty sure that if somehow there are any Xvfb processes left running after the NRWT is done, it still shouldn't affect this unit test as the mock executive is used. As much as I tried, though, I couldn't reproduce even one such failure locally.
Comment 24 Zan Dobersek 2012-10-09 12:39:05 PDT
Reopening so the work can continue.
Comment 25 Csaba Osztrogonác 2012-11-22 03:38:46 PST
As far as I know, xvfbdriver works fine, only the unit test is flakey
because of stucked xvfbs. Am I right?
Comment 26 Zan Dobersek 2012-11-22 05:40:59 PST
(In reply to comment #25)
> As far as I know, xvfbdriver works fine, only the unit test is flakey
> because of stucked xvfbs. Am I right?

As far as I can tell, that's the case.

I think the main problem with the flaky unit tests is that the running Xvfb instances somehow affect them even though they shouldn't.
Comment 27 Zan Dobersek 2012-11-30 09:31:53 PST
Yup, I can reproduce this problem by running NRWT and test-webkitpy in parallel. I'll try to come up with a patch, but probably in another bug.
Comment 28 Zan Dobersek 2013-01-14 23:14:57 PST
(In reply to comment #27)
> Yup, I can reproduce this problem by running NRWT and test-webkitpy in parallel. I'll try to come up with a patch, but probably in another bug.

Handled in bug #103806, landed in r136314.
http://trac.webkit.org/changeset/136314

Closing the bug.