Bug 137725 - layout test on EFL buildbot is too often broken
Summary: layout test on EFL buildbot is too often broken
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-14 17:08 PDT by Gyuyoung Kim
Modified: 2015-09-24 07:16 PDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gyuyoung Kim 2014-10-14 17:08:32 PDT
Layout test has been broken though I restart buildbot very often. It looks apache server is too often locked after layout test ran many times.
Need to fix this problem !


16:53:14.728 14762 Using port 'efl'
16:53:14.728 14762 Test configuration: <, x86, release>
16:53:14.728 14762 Placing test results in /home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/layout-test-results
16:53:14.728 14762 Baseline search path: efl -> wk2 -> generic
16:53:14.728 14762 Using Release build
16:53:14.728 14762 Pixel tests disabled
16:53:14.728 14762 Regular timeout: 35000, slow test timeout: 175000
16:53:14.765 14762 "perl Tools/Scripts/webkit-build-directory --configuration --release --efl" took 0.04s
16:53:14.765 14762 Command line: /home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/jhbuild/jhbuild-wrapper --efl run /home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/WebKitBuild/Release/bin/WebKitTestRunner -
16:53:14.765 14762 
16:53:14.765 14762 Collecting tests ...
16:53:16.245 14762 Parsing expectations ...
16:53:23.550 14762 Found 38184 tests; running 30186, skipping 7998.
16:53:23.550 14762 Checking build ...
16:53:23.600 14762 "Tools/Scripts/build-dumprendertree --release --efl" took 0.04s
16:53:23.600 14762 Output of ['Tools/Scripts/build-dumprendertree', '--release', '--efl']:
16:53:23.646 14762 "Tools/Scripts/build-webkittestrunner --release --efl" took 0.05s
16:53:23.646 14762 Output of ['Tools/Scripts/build-webkittestrunner', '--release', '--efl']:
16:53:23.647 14762 Starting helper ...
16:53:23.647 14762 Checking system dependencies ...
16:53:23.687 14762 "/usr/sbin/apache2 -v" took 0.02s
16:53:23.767 14762 "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/jhbuild/jhbuild-wrapper --efl run which Xvfb" took 0.08s
16:53:23.832 14762 Expect: 29170 passes   (29170 now,    0 wontfix)
16:53:23.833 14762 Expect:   659 failures (  658 now,    1 wontfix)
16:53:23.833 14762 Expect:   357 flaky    (  357 now,    0 wontfix)
16:53:23.833 14762 
16:53:23.920 14762 Sharding tests ...
16:53:23.936 14762 Acquiring http lock ...
16:53:23.937 14762 Creating lock file: /tmp/WebKitHttpd.lock.34
16:53:23.939 14762 Retrieving current lock pid from /tmp/WebKitHttpd.lock.33
16:53:23.939 14762 Checking current lock on pid 12032
16:53:23.939 14762 Removing stuck lock file: /tmp/WebKitHttpd.lock.33
16:53:24.943 14762 Retrieving current lock pid from /tmp/WebKitHttpd.lock.34
16:53:24.944 14762 Checking current lock on pid 14762
16:53:24.944 14762 HTTP lock acquired
16:53:24.944 14762 Starting HTTP server ...
16:53:24.986 14762 "/usr/sbin/apache2 -v" took 0.04s
16:53:25.019 14762 "/usr/sbin/apache2 -v" took 0.03s
16:53:25.020 14762 Starting httpd server, cmd="/usr/sbin/apache2 -f "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/layout-test-results/httpd.conf" -C 'DocumentRoot "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/LayoutTests/http/tests"' -c 'Alias /js-test-resources "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/LayoutTests/resources"' -c 'Alias /media-resources "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/LayoutTests/media"' -c 'TypesConfig "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/LayoutTests/http/conf/mime.types"' -c 'CustomLog "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/layout-test-results/access_log.txt" common' -c 'ErrorLog "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/layout-test-results/error_log.txt"' -C 'User "buildbot"' -c 'PidFile /tmp/WebKit/httpd.pid' -k start -C 'Listen 127.0.0.1:8000' -C 'Listen [::1]:8000' -C 'Listen 127.0.0.1:8080' -C 'Listen [::1]:8080' -C 'Listen 127.0.0.1:8443' -C 'Listen [::1]:8443' -c 'StartServers 2' -c 'MinSpareServers 2' -c 'MaxSpareServers 2' -c 'SSLCertificateFile /home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/LayoutTests/http/conf/webkit-httpd.pem'"
16:53:25.041 14762 Waiting for action: <function <lambda> at 0x7f9fb4369848>
16:53:26.043 14762 Server isn't running at all
16:53:26.043 14762 Flushing stdout
16:53:26.043 14762 Flushing stderr
16:53:26.043 14762 Stopping helper
16:53:26.043 14762 Cleaning up port

ServerError raised: Server exited
Traceback (most recent call last):
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 80, in main
    run_details = run(port, options, args, stderr)
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 419, in run
    run_details = manager.run(args)
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py", line 200, in run
    int(self._options.child_processes), retrying=False)
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py", line 257, in _run_tests
    return self._runner.run_tests(self._expectations, test_inputs, tests_to_skip, num_workers, needs_http, needs_websockets, retrying)
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 120, in run_tests
    self.start_servers_with_lock(2 * min(num_workers, len(locked_shards)))
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 205, in start_servers_with_lock
    self._port.start_http_server(number_of_servers=number_of_servers)
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/port/base.py", line 888, in start_http_server
    server.start()
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/servers/http_server_base.py", line 92, in start
    if self._wait_for_action(self._is_server_running_on_all_ports):
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/servers/http_server_base.py", line 174, in _wait_for_action
    if action():
  File "/home/buildslave/efl-buildslave-2/efl-linux-64-release-wk2/build/Tools/Scripts/webkitpy/layout_tests/servers/http_server_base.py", line 185, in _is_server_running_on_all_ports
    raise ServerError("Server exited")
ServerError: Server exited
program finished with exit code 254
elapsedTime=11.567065
Comment 1 Csaba Osztrogonác 2014-11-11 00:16:30 PST
It fails again. I checked the apache log on the bot:
https://build.webkit.org/results/EFL%20Linux%2064-bit%20Release%20WK2/r175842%20%2817555%29/error_log.txt

[Mon Nov 10 17:27:53.699810 2014] [core:crit] [pid 22702] (28)No space left on device: AH00001: unable to create or access scoreboard "/tmp/WebKit/httpd.scoreboard" (name-based shared memory failure)

It seems the free space is leaking on the bot.
Comment 2 Gyuyoung Kim 2014-11-11 17:10:37 PST
(In reply to comment #1)
> It fails again. I checked the apache log on the bot:
> https://build.webkit.org/results/EFL%20Linux%2064-bit%20Release%20WK2/
> r175842%20%2817555%29/error_log.txt
> 
> [Mon Nov 10 17:27:53.699810 2014] [core:crit] [pid 22702] (28)No space left
> on device: AH00001: unable to create or access scoreboard
> "/tmp/WebKit/httpd.scoreboard" (name-based shared memory failure)
> 
> It seems the free space is leaking on the bot.

I found an article which deals with this problem.It looks this problem can occur when shared memory isn't freed up. 

http://www.kattare.com/docs/faq_view/702/apache-file-exists-unable-to-create-scoreboard-name-based-shared-memory-failure.html

However I don't know how to fix this problem without my manual fix whenever this problem happens.
Comment 3 Csaba Osztrogonác 2014-11-11 23:00:47 PST
I remember that we have not the same, but a similar problem long long
ago in the QtWebKit era with leaking semaphores and shared memory.
It wasn't related to apache, but buggy IPC implementation.

Until the proper fix, we had a magic script to clean up 
trashes regularly with a cron job:
ipcs -m| awk '{ print $2 }'|xargs ipcrm shm >/dev/null
ipcs -s| awk '{ print $2 }'|xargs ipcrm sem >/dev/null

I'm not sure if it is too safe, but it worked, I can't
remember if it removed a neccessary thing ever.
Comment 4 Gyuyoung Kim 2014-11-13 22:16:09 PST
(In reply to comment #3)
> I remember that we have not the same, but a similar problem long long
> ago in the QtWebKit era with leaking semaphores and shared memory.
> It wasn't related to apache, but buggy IPC implementation.
> 
> Until the proper fix, we had a magic script to clean up 
> trashes regularly with a cron job:
> ipcs -m| awk '{ print $2 }'|xargs ipcrm shm >/dev/null
> ipcs -s| awk '{ print $2 }'|xargs ipcrm sem >/dev/null
> 
> I'm not sure if it is too safe, but it worked, I can't
> remember if it removed a neccessary thing ever.

Ossy, I increased shared memory on EFL buildbot yesterday. In /etc/sysctl.conf file, I set "kernel.shmmax" with "4294967296" (4GB).

So now layout test on EFL buildbot looks fine until now. Let's see that this fix can solve this issue.
Comment 5 Gyuyoung Kim 2014-12-09 07:07:54 PST
(In reply to comment #3)
> I remember that we have not the same, but a similar problem long long
> ago in the QtWebKit era with leaking semaphores and shared memory.
> It wasn't related to apache, but buggy IPC implementation.
> 
> Until the proper fix, we had a magic script to clean up 
> trashes regularly with a cron job:
> ipcs -m| awk '{ print $2 }'|xargs ipcrm shm >/dev/null
> ipcs -s| awk '{ print $2 }'|xargs ipcrm sem >/dev/null

Ossy, I set up a crontab using above commands for now. Thanks.
Comment 6 Gyuyoung Kim 2014-12-11 17:03:46 PST
(In reply to comment #5)
> (In reply to comment #3)
> > I remember that we have not the same, but a similar problem long long
> > ago in the QtWebKit era with leaking semaphores and shared memory.
> > It wasn't related to apache, but buggy IPC implementation.
> > 
> > Until the proper fix, we had a magic script to clean up 
> > trashes regularly with a cron job:
> > ipcs -m| awk '{ print $2 }'|xargs ipcrm shm >/dev/null
> > ipcs -s| awk '{ print $2 }'|xargs ipcrm sem >/dev/null
> 
> Ossy, I set up a crontab using above commands for now. Thanks.

This workaround fix seems to work well ! Thanks.