Bug 182420
Summary: | EWS bots hitting network issues | ||
---|---|---|---|
Product: | WebKit | Reporter: | Aakash Jain <aakash_jain> |
Component: | Tools / Tests | Assignee: | Aakash Jain <aakash_jain> |
Status: | RESOLVED CONFIGURATION CHANGED | ||
Severity: | Normal | CC: | ap, jbedard, lforschler, thorton, wenson_hsieh |
Priority: | P1 | Keywords: | InRadar |
Version: | Other | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
See Also: |
https://bugs.webkit.org/show_bug.cgi?id=182987 https://bugs.webkit.org/show_bug.cgi?id=183156 https://bugs.webkit.org/show_bug.cgi?id=183463 https://bugs.webkit.org/show_bug.cgi?id=183222 |
Aakash Jain
Patch 332939 (on https://bugs.webkit.org/show_bug.cgi?id=182036) got stuck on commit-queue (bot webkit-cq-02) https://webkit-queues.webkit.org/patch/332939/commit-queue
webkit-cq-02 had following error in logs:
2018-02-01 21:58:28,594 - Fetching: https://bugs.webkit.org/attachment.cgi?id=332939&action=edit
2018-02-01 21:58:29,005 - Fetching: https://bugs.webkit.org/show_bug.cgi?id=182036&ctype=xml&excludefield=attachmentdata
2018-02-01 21:58:29,303 - Running: webkit-patch --status-host=webkit-queues.webkit.org --bot-id=webkit-cq-02 update --port=mac
2018-02-01 21:58:33,932 - Updated working directory
Traceback (most recent call last):
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/queueengine.py", line 103, in run
if not self._delegate.process_work_item(work_item):
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 340, in process_work_item
if task.run():
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py", line 77, in run
if not self._update():
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py", line 121, in _update
"Unable to update working directory")
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py", line 101, in _run_command
self._delegate.command_passed(success_message, patch=self._patch)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 383, in command_passed
self._update_status(message, patch=patch)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 210, in _update_status
return self._tool.status_server.update_status(self.name, message, patch, results_file)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 160, in update_status
return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/networktransaction.py", line 53, in run
return request()
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 160, in <lambda>
return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 85, in _post_status_to_server
self._browser.open(update_status_url)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_mechanize.py", line 230, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_opener.py", line 193, in open
response = urlopen(self, req, data)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 344, in _open
'_open', req)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 332, in _call_chain
result = func(*args)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 1142, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 1118, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 60] Operation timed out>
2018-02-01 21:59:48,953 - Exception while preparing queue Sleeping until 2018-02-01 22:01:48 (120 seconds).
2018-02-01 22:01:48,963 - Fetching next work item for commit-queue
Traceback (most recent call last):
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/queueengine.py", line 97, in run
work_item = self._delegate.next_work_item()
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 334, in next_work_item
return self._next_patch()
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 217, in _next_patch
patch_id = self._tool.status_server.next_work_item(self.name)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 128, in next_work_item
return self._fetch_url(next_patch_url)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 169, in _fetch_url
return urllib2.urlopen(url, timeout=300).read()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 60] Operation timed out>
2018-02-01 22:03:04,001 - Exception while preparing queue Sleeping until 2018-02-01 22:05:04 (120 seconds).
2018-02-01 22:05:04,002 - Fetching next work item for commit-queue
2018-02-01 22:05:04,203 - No work item. Sleeping until 2018-02-01 22:07:04 (120 seconds).
2018-02-01 22:07:04,204 - Delegate terminated queue.
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Tim Horton
I just got the same thing on https://bugs.webkit.org/show_bug.cgi?id=182460
Aakash Jain
workaround: If a bot is stuck processing the patch, unlock the patch at: https://webkit-queues.webkit.org/release-lock (the patch is unlocked by default after 2 hours)
Aakash Jain
Happnned again with https://webkit-queues.webkit.org/patch/333081/commit-queue , i unlocked the patch so that the bot picked up the patch again.
Logs:
2018-02-05 07:50:50,621 - Fetching: https://bugs.webkit.org/attachment.cgi?id=333081&action=edit
2018-02-05 07:50:50,902 - Fetching: https://bugs.webkit.org/show_bug.cgi?id=179743&ctype=xml&excludefield=attachmentdata
2018-02-05 07:50:51,232 - Running: webkit-patch --status-host=webkit-queues.webkit.org --bot-id=webkit-cq-02 apply-attachment --no-update --non-interactive 333081 --port=mac
2018-02-05 07:50:56,176 - Applied patch
Traceback (most recent call last):
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/queueengine.py", line 103, in run
if not self._delegate.process_work_item(work_item):
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 340, in process_work_item
if task.run():
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py", line 79, in run
if not self._apply():
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py", line 131, in _apply
"Patch does not apply")
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py", line 101, in _run_command
self._delegate.command_passed(success_message, patch=self._patch)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 383, in command_passed
self._update_status(message, patch=patch)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/queues.py", line 210, in _update_status
return self._tool.status_server.update_status(self.name, message, patch, results_file)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 160, in update_status
return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/networktransaction.py", line 53, in run
return request()
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 160, in <lambda>
return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/net/statusserver.py", line 85, in _post_status_to_server
self._browser.open(update_status_url)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_mechanize.py", line 230, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_opener.py", line 193, in open
response = urlopen(self, req, data)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 344, in _open
'_open', req)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 332, in _call_chain
result = func(*args)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 1142, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/thirdparty/autoinstalled/mechanize/_urllib2_fork.py", line 1118, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 60] Operation timed out>
2018-02-05 07:52:11,198 - Exception while preparing queue Sleeping until 2018-02-05 07:54:11 (120 seconds).
Aakash Jain
Seems to be happening frequently. This completely break commit-queue. Raising to P1. Need to investigate asap.
Aakash Jain
We can consider adding retry when posting status to webkit-queues server fails. We should also figure out why this network issue is happening frequently lately.
Jonathan Bedard
I think a retry is the best option.
If this is being caused by network flakiness (which seems likely), we can investigate the root cause, but that will take a few days, at least. It seems like we need this back up and running reliably as fast as possible.
Wenson Hsieh
Another datapoint, this time on mac-debug-ews: https://webkit-queues.webkit.org/results/6361047 for https://bugs.webkit.org/show_bug.cgi?id=182472
Tim Horton
And https://bugs.webkit.org/show_bug.cgi?id=182919#attach_334163
Any idea why this seems to have gotten worse in the last few months?
Aakash Jain
This issue is happening with both Bugzilla server as well as webkit-queues.webkit.org server. It seems like intermittent network issue, maybe something specific to the lab network in which the machines are. Debugging it further in <rdar://problem/37716391>.
Meanwhile, adding the retry logic for webkit-queues network transactions in https://bugs.webkit.org/show_bug.cgi?id=182987
Aakash Jain
(In reply to Tim Horton from comment #8)
> Any idea why this seems to have gotten worse in the last few months?
No idea why these network issues have become so frequent now. Maybe something changed in our lab network causing frequent intermittent network issues.
Adding retry while talking to webkit-queues helped (https://bugs.webkit.org/show_bug.cgi?id=182987). We should add the similar retry in Bugzilla code as well (Bugzilla code should use NetworkTransaction class instead of directly using mechanize or urllib).
Aakash Jain
EWS has been re-implemented from scratch and is now based on Buildbot. In case of network issue between the worker and buildbot-master, the build is automatically retried. In most cases of network issue with Bugzilla, build is either retried or handled appropriately. Please file a new bug if you notice any issue.