WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
96203
[XvfbDriver] First tests in each worker occasionally crash
https://bugs.webkit.org/show_bug.cgi?id=96203
Summary
[XvfbDriver] First tests in each worker occasionally crash
Zan Dobersek
Reported
2012-09-09 05:21:06 PDT
[XvfbDriver] First tests in each worker occasionally crash
Attachments
Patch
(2.27 KB, patch)
2012-09-09 05:24 PDT
,
Zan Dobersek
no flags
Details
Formatted Diff
Diff
Patch
(4.19 KB, patch)
2012-09-09 06:32 PDT
,
Zan Dobersek
no flags
Details
Formatted Diff
Diff
Patch
(9.67 KB, patch)
2012-09-10 09:59 PDT
,
Zan Dobersek
no flags
Details
Formatted Diff
Diff
Patch
(4.15 KB, patch)
2012-10-01 01:45 PDT
,
Zan Dobersek
no flags
Details
Formatted Diff
Diff
Patch
(3.30 KB, patch)
2012-10-02 08:53 PDT
,
Zan Dobersek
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Zan Dobersek
Comment 1
2012-09-09 05:24:09 PDT
Created
attachment 162991
[details]
Patch
Zan Dobersek
Comment 2
2012-09-09 05:26:11 PDT
(In reply to
comment #0
)
> [XvfbDriver] First tests in each worker occasionally crash
An example of this:
http://build.webkit.org/results/GTK%20Linux%2064-bit%20Debug/r127965%20(36591)/results.html
2 of those tests are real crashers while the other four were just not able to find the Xvfb display. Waiting a bit before the tests are run should fix this.
Zan Dobersek
Comment 3
2012-09-09 06:32:43 PDT
Created
attachment 162993
[details]
Patch
Philippe Normand
Comment 4
2012-09-09 11:39:52 PDT
Each Xvfb instance creates a lock file in /tmp (ex: /tmp/.X0-lock) and a unix socket (ex: /tmp/.X11-unix/X0). Maybe there's a way to be notified when the unix socket is listening and waiting for a connection?
Zan Dobersek
Comment 5
2012-09-10 09:59:23 PDT
Created
attachment 163155
[details]
Patch
Zan Dobersek
Comment 6
2012-09-10 10:10:12 PDT
(In reply to
comment #5
)
> Created an attachment (id=163155) [details] > Patch
This takes a whole new approach towards solving the problem. I've discovered the Xvfb's -displayfd command line flag. This forces the Xvfb to use no locking mechanisms[1] and to search for a free display[2]. When such a display is found, it is used and reported into the -displayfd argument[3]. The good thing about the display number reporting is that it's done only when the Xvfb is ready to operate[4]. To take that into account, I've directed the display number reporting into stderr of the Xvfb which is now Executive.PIPE (= subprocess.Pipe). The stderr is then read until a display number can be located and closed afterwards. This also fixes
bug #88414
as it removes any usage of lock files and lets Xvfb figure out the next free display. [1]
http://cgit.freedesktop.org/xorg/xserver/tree/os/utils.c#n663
[2]
http://cgit.freedesktop.org/xorg/xserver/tree/os/connection.c#n415
[3]
http://cgit.freedesktop.org/xorg/xserver/tree/os/connection.c#n350
[4]
http://cgit.freedesktop.org/xorg/xserver/tree/dix/main.c#n293
Kristóf Kosztyó
Comment 7
2012-09-11 00:48:17 PDT
(In reply to
comment #6
)
> (In reply to
comment #5
) > > Created an attachment (id=163155) [details] [details] > > Patch > > This takes a whole new approach towards solving the problem. > > I've discovered the Xvfb's -displayfd command line flag. This forces the Xvfb to use no locking mechanisms[1] and to search for a free display[2]. When such a display is found, it is used and reported into the -displayfd argument[3]. > > The good thing about the display number reporting is that it's done only when the Xvfb is ready to operate[4]. To take that into account, I've directed the display number reporting into stderr of the Xvfb which is now Executive.PIPE (= subprocess.Pipe). The stderr is then read until a display number can be located and closed afterwards. > > This also fixes
bug #88414
as it removes any usage of lock files and lets Xvfb figure out the next free display.
Does it work when multiple xvfbs start at the same time? Because that was the reason why I used the file lock.
> > [1]
http://cgit.freedesktop.org/xorg/xserver/tree/os/utils.c#n663
> [2]
http://cgit.freedesktop.org/xorg/xserver/tree/os/connection.c#n415
> [3]
http://cgit.freedesktop.org/xorg/xserver/tree/os/connection.c#n350
> [4]
http://cgit.freedesktop.org/xorg/xserver/tree/dix/main.c#n293
Zan Dobersek
Comment 8
2012-09-11 01:32:59 PDT
(In reply to
comment #7
)
> > The good thing about the display number reporting is that it's done only when the Xvfb is ready to operate[4]. To take that into account, I've directed the display number reporting into stderr of the Xvfb which is now Executive.PIPE (= subprocess.Pipe). The stderr is then read until a display number can be located and closed afterwards. > > > > This also fixes
bug #88414
as it removes any usage of lock files and lets Xvfb figure out the next free display. > Does it work when multiple xvfbs start at the same time? > Because that was the reason why I used the file lock.
Yes, I've tested it with 8 parallel workers and there were no problems, 8 separate displays were opened. I'll try to up the number a bit as well to see what happens, just to be sure. Another thing Kristof just pointed out, the -displayfd is quite a fresh addition, Xvfb 1.13.0 being the first stable version to ship it. 1.13.0 was just released though, 5 days ago. (I'm running Ubuntu 12.10.) This probably delays this approach for some time unless people wish to upgrade systems on the builders or install up-to-date Xvfb.
Philippe Normand
Comment 9
2012-09-12 03:21:06 PDT
The bots use Debian Sid (mostly) and xvfb 1.13 doesn't seem to be there yet.
Zan Dobersek
Comment 10
2012-09-17 07:13:15 PDT
(In reply to
comment #4
)
> Each Xvfb instance creates a lock file in /tmp (ex: /tmp/.X0-lock) and a unix socket (ex: /tmp/.X11-unix/X0). Maybe there's a way to be notified when the unix socket is listening and waiting for a connection?
This could be possible, but I don't see a way how to properly test this approach because of its complexity. Given that the builders don't yet run systems with the newest version of Xvfb, I'd recommend returning to the driver startup delaying approach. If that's OK with everyone interested I'll upload an updated patch.
Philippe Normand
Comment 11
2012-09-17 07:20:35 PDT
What about trying the displayfd option and falling back to the startup delay approach if we detect the host's xvfb doesn't have this new option?
Zan Dobersek
Comment 12
2012-10-01 01:35:59 PDT
(In reply to
comment #11
)
> What about trying the displayfd option and falling back to the startup delay approach if we detect the host's xvfb doesn't have this new option?
The next-free-display solution has been reviewed in
bug #88414
and landed in
r129891
, and is based on listing the various X processes and their display numbers:
http://trac.webkit.org/changeset/129891
I'll reupload the patch with the starup delay approach for review.
Zan Dobersek
Comment 13
2012-10-01 01:45:48 PDT
Created
attachment 166425
[details]
Patch
Dirk Pranke
Comment 14
2012-10-01 11:26:52 PDT
Comment on
attachment 166425
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=166425&action=review
> Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:73 > + time.sleep(self._port.driver_startup_delay_secs())
why don't we just sleep for 1.0 seconds here? it's not clear what the abstraction is buying you ... wouldn't any port using xvfbdriver likely need this?
Zan Dobersek
Comment 15
2012-10-01 11:31:54 PDT
(In reply to
comment #14
)
> (From update of
attachment 166425
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=166425&action=review
> > > Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:73 > > + time.sleep(self._port.driver_startup_delay_secs()) > > why don't we just sleep for 1.0 seconds here? it's not clear what the abstraction is buying you ... wouldn't any port using xvfbdriver likely need this?
The abstraction spares 1 second when running webkitpy tests.
Dirk Pranke
Comment 16
2012-10-01 11:44:03 PDT
(In reply to
comment #15
)
> (In reply to
comment #14
) > > (From update of
attachment 166425
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=166425&action=review
> > > > > Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:73 > > > + time.sleep(self._port.driver_startup_delay_secs()) > > > > why don't we just sleep for 1.0 seconds here? it's not clear what the abstraction is buying you ... wouldn't any port using xvfbdriver likely need this? > > The abstraction spares 1 second when running webkitpy tests.
Ah. Looks like this affects ~4 tests, and adds maybe a quarter second to the runtime on my machine if you're running the tests in parallel, so I'm not sure the time savings is worth it, but let me think about this a bit further ...
Dirk Pranke
Comment 17
2012-10-01 11:51:32 PDT
(In reply to
comment #16
)
> (In reply to
comment #15
) > > (In reply to
comment #14
) > > > (From update of
attachment 166425
[details]
[details] [details]) > > > View in context:
https://bugs.webkit.org/attachment.cgi?id=166425&action=review
> > > > > > > Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:73 > > > > + time.sleep(self._port.driver_startup_delay_secs()) > > > > > > why don't we just sleep for 1.0 seconds here? it's not clear what the abstraction is buying you ... wouldn't any port using xvfbdriver likely need this? > > > > The abstraction spares 1 second when running webkitpy tests. > > Ah. Looks like this affects ~4 tests, and adds maybe a quarter second to the runtime on my machine if you're running the tests in parallel, so I'm not sure the time savings is worth it, but let me think about this a bit further ...
Okay, it looks like this only affects the unit tests for XvfbDriver, and all of those are calling make_driver() in XvfbDriverTest(). Can you make the startup delay be a field on the instance set in the constructor and then override it in make_driver?
Zan Dobersek
Comment 18
2012-10-01 12:05:02 PDT
(In reply to
comment #17
)
> (In reply to
comment #16
) > > (In reply to
comment #15
) > > > (In reply to
comment #14
) > > > > (From update of
attachment 166425
[details]
[details] [details] [details]) > > > > View in context:
https://bugs.webkit.org/attachment.cgi?id=166425&action=review
> > > > > > > > > Tools/Scripts/webkitpy/layout_tests/port/xvfbdriver.py:73 > > > > > + time.sleep(self._port.driver_startup_delay_secs()) > > > > > > > > why don't we just sleep for 1.0 seconds here? it's not clear what the abstraction is buying you ... wouldn't any port using xvfbdriver likely need this? > > > > > > The abstraction spares 1 second when running webkitpy tests. > > > > Ah. Looks like this affects ~4 tests, and adds maybe a quarter second to the runtime on my machine if you're running the tests in parallel, so I'm not sure the time savings is worth it, but let me think about this a bit further ... > > Okay, it looks like this only affects the unit tests for XvfbDriver, and all of those are calling make_driver() in XvfbDriverTest(). Can you make the startup delay be a field on the instance set in the constructor and then override it in make_driver?
Right, that works as well, I'll upload a patch tomorrow.
Zan Dobersek
Comment 19
2012-10-02 08:53:25 PDT
Created
attachment 166693
[details]
Patch
Build Bot
Comment 20
2012-10-02 09:17:34 PDT
Comment on
attachment 166693
[details]
Patch
Attachment 166693
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/14130340
New failing tests: http/tests/workers/terminate-during-sync-operation.html
Zan Dobersek
Comment 21
2012-10-02 09:21:33 PDT
(In reply to
comment #20
)
> (From update of
attachment 166693
[details]
) >
Attachment 166693
[details]
did not pass mac-ews (mac): > Output:
http://queues.webkit.org/results/14130340
> > New failing tests: > http/tests/workers/terminate-during-sync-operation.html
I'm pretty sure that's a false alarm.
Zan Dobersek
Comment 22
2012-10-02 11:52:52 PDT
Comment on
attachment 166693
[details]
Patch Clearing flags on attachment: 166693 Committed
r130192
: <
http://trac.webkit.org/changeset/130192
>
Zan Dobersek
Comment 23
2012-10-02 11:53:00 PDT
All reviewed patches have been landed. Closing bug.
Ander Conselvan de Oliveira
Comment 24
2012-10-03 05:42:38 PDT
(In reply to
comment #4
)
> Each Xvfb instance creates a lock file in /tmp (ex: /tmp/.X0-lock) and a unix socket (ex: /tmp/.X11-unix/X0). Maybe there's a way to be notified when the unix socket is listening and waiting for a connection?
From Xserver(1) man page: """ SIGUSR1 [...] When the server starts, it checks to see if it has inherited SIGUSR1 as SIG_IGN instead of the usual SIG_DFL. In this case, the server sends a SIGUSR1 to its parent process after it has set up the various connection schemes. Xdm uses this feature to recognize when connecting to the server is possible. """ This works for launching a single X server. For launching multiple workers, each Xvfb would need a different parent process that sends the "readiness" status to the main launcher, but this should work with the most ancient versions of X.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug