Bug 182420 - EWS bots hitting network issues
Summary: EWS bots hitting network issues
Status: RESOLVED CONFIGURATION CHANGED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P1 Normal
Assignee: Aakash Jain
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-02-01 22:38 PST by Aakash Jain
Modified: 2020-03-21 04:17 PDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aakash Jain 2018-02-01 22:38:02 PST
Patch 332939 (on https://bugs.webkit.org/show_bug.cgi?id=182036) got stuck on commit-queue (bot webkit-cq-02) https://webkit-queues.webkit.org/patch/332939/commit-queue

webkit-cq-02 had following error in logs:

2018-02-01 21:58:28,594 - Fetching: https://bugs.webkit.org/attachment.cgi?id=332939&action=edit
2018-02-01 21:58:29,005 - Fetching: https://bugs.webkit.org/show_bug.cgi?id=182036&ctype=xml&excludefield=attachmentdata
2018-02-01 21:58:29,303 - Running: webkit-patch --status-host=webkit-queues.webkit.org --bot-id=webkit-cq-02 update --port=mac
2018-02-01 21:58:33,932 - Updated working directory
Traceback (most recent call last):
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/queueengine.py", line 103, in run
    if not self._delegate.process_work_item(work_item):
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 340, in process_work_item
    if task.run():
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py", line 77, in run
    if not self._update():
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py", line 121, in _update
    "Unable to update working directory")
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py", line 101, in _run_command
    self._delegate.command_passed(success_message, patch=self._patch)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 383, in command_passed
    self._update_status(message, patch=patch)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 210, in _update_status
    return self._tool.status_server.update_status(self.name, message, patch, results_file)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 160, in update_status
    return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/networktransaction.py", line 53, in run
    return request()
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 160, in <lambda>
    return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 85, in _post_status_to_server
    self._browser.open(update_status_url)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_mechanize.py", line 230, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_opener.py", line 193, in open
    response = urlopen(self, req, data)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 344, in _open
    '_open', req)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 332, in _call_chain
    result = func(*args)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 1142, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 1118, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 60] Operation timed out>

2018-02-01 21:59:48,953 - Exception while preparing queue Sleeping until 2018-02-01 22:01:48 (120 seconds).
2018-02-01 22:01:48,963 - Fetching next work item for commit-queue
Traceback (most recent call last):
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/queueengine.py", line 97, in run
    work_item = self._delegate.next_work_item()
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 334, in next_work_item
    return self._next_patch()
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 217, in _next_patch
    patch_id = self._tool.status_server.next_work_item(self.name)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 128, in next_work_item
    return self._fetch_url(next_patch_url)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 169, in _fetch_url
    return urllib2.urlopen(url, timeout=300).read()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1227, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1197, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 60] Operation timed out>
2018-02-01 22:03:04,001 - Exception while preparing queue Sleeping until 2018-02-01 22:05:04 (120 seconds).
2018-02-01 22:05:04,002 - Fetching next work item for commit-queue
2018-02-01 22:05:04,203 - No work item. Sleeping until 2018-02-01 22:07:04 (120 seconds).
2018-02-01 22:07:04,204 - Delegate terminated queue.
Comment 1 Tim Horton 2018-02-03 15:45:48 PST
I just got the same thing on https://bugs.webkit.org/show_bug.cgi?id=182460
Comment 2 Aakash Jain 2018-02-03 17:55:44 PST
workaround: If a bot is stuck processing the patch, unlock the patch at: https://webkit-queues.webkit.org/release-lock  (the patch is unlocked by default after 2 hours)
Comment 3 Aakash Jain 2018-02-05 07:59:05 PST
Happnned again with https://webkit-queues.webkit.org/patch/333081/commit-queue , i unlocked the patch so that the bot picked up the patch again.

Logs:
2018-02-05 07:50:50,621 - Fetching: https://bugs.webkit.org/attachment.cgi?id=333081&action=edit
2018-02-05 07:50:50,902 - Fetching: https://bugs.webkit.org/show_bug.cgi?id=179743&ctype=xml&excludefield=attachmentdata
2018-02-05 07:50:51,232 - Running: webkit-patch --status-host=webkit-queues.webkit.org --bot-id=webkit-cq-02 apply-attachment --no-update --non-interactive 333081 --port=mac
2018-02-05 07:50:56,176 - Applied patch
Traceback (most recent call last):
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/queueengine.py", line 103, in run
    if not self._delegate.process_work_item(work_item):
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 340, in process_work_item
    if task.run():
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py", line 79, in run
    if not self._apply():
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py", line 131, in _apply
    "Patch does not apply")
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py", line 101, in _run_command
    self._delegate.command_passed(success_message, patch=self._patch)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 383, in command_passed
    self._update_status(message, patch=patch)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 210, in _update_status
    return self._tool.status_server.update_status(self.name, message, patch, results_file)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 160, in update_status
    return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/networktransaction.py", line 53, in run
    return request()
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 160, in <lambda>
    return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 85, in _post_status_to_server
    self._browser.open(update_status_url)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_mechanize.py", line 230, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_opener.py", line 193, in open
    response = urlopen(self, req, data)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 344, in _open
    '_open', req)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 332, in _call_chain
    result = func(*args)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 1142, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 1118, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 60] Operation timed out>
2018-02-05 07:52:11,198 - Exception while preparing queue Sleeping until 2018-02-05 07:54:11 (120 seconds).
Comment 4 Aakash Jain 2018-02-05 07:59:46 PST
Seems to be happening frequently. This completely break commit-queue. Raising to P1. Need to investigate asap.
Comment 5 Aakash Jain 2018-02-05 08:09:04 PST
We can consider adding retry when posting status to webkit-queues server fails. We should also figure out why this network issue is happening frequently lately.
Comment 6 Jonathan Bedard 2018-02-05 08:30:02 PST
I think a retry is the best option.

If this is being caused by network flakiness (which seems likely), we can investigate the root cause, but that will take a few days, at least. It seems like we need this back up and running reliably as fast as possible.
Comment 7 Wenson Hsieh 2018-02-05 08:30:48 PST
Another datapoint, this time on mac-debug-ews: https://webkit-queues.webkit.org/results/6361047 for https://bugs.webkit.org/show_bug.cgi?id=182472
Comment 8 Tim Horton 2018-02-19 10:42:04 PST
And https://bugs.webkit.org/show_bug.cgi?id=182919#attach_334163

Any idea why this seems to have gotten worse in the last few months?
Comment 9 Aakash Jain 2018-02-20 15:59:58 PST
This issue is happening with both Bugzilla server as well as webkit-queues.webkit.org server. It seems like intermittent network issue, maybe something specific to the lab network in which the machines are. Debugging it further in <rdar://problem/37716391>.

Meanwhile, adding the retry logic for webkit-queues network transactions in https://bugs.webkit.org/show_bug.cgi?id=182987
Comment 10 Aakash Jain 2018-02-26 15:43:13 PST
(In reply to Tim Horton from comment #8)
> Any idea why this seems to have gotten worse in the last few months?

No idea why these network issues have become so frequent now. Maybe something changed in our lab network causing frequent intermittent network issues.

Adding retry while talking to webkit-queues helped (https://bugs.webkit.org/show_bug.cgi?id=182987). We should add the similar retry in Bugzilla code as well (Bugzilla code should use NetworkTransaction class instead of directly using mechanize or urllib).
Comment 11 Aakash Jain 2020-03-21 04:17:18 PDT
EWS has been re-implemented from scratch and is now based on Buildbot. In case of network issue between the worker and buildbot-master, the build is automatically retried. In most cases of network issue with Bugzilla, build is either retried or handled appropriately. Please file a new bug if you notice any issue.