WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
218994
[ews] Add timeout to network requests
https://bugs.webkit.org/show_bug.cgi?id=218994
Summary
[ews] Add timeout to network requests
Aakash Jain
Reported
2020-11-16 11:35:22 PST
Sometimes network requests can get stuck or take very long time. It can introduce delays on EWS. Few days back (on nov 13), we noticed slowness on EWS when bugs.webkit.org requests were taking extremely long or getting stuck. We should add an explicit timeout to network requests, so that the network request is timely terminated. Also, as per
https://requests.readthedocs.io/en/master/user/quickstart/#timeouts
: "Nearly all production code should use this parameter in nearly all requests. Failure to do so can cause your program to hang indefinitely"
Attachments
Patch
(2.89 KB, patch)
2020-11-16 11:38 PST
,
Aakash Jain
no flags
Details
Formatted Diff
Diff
Patch
(2.89 KB, patch)
2020-11-18 13:44 PST
,
Aakash Jain
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Aakash Jain
Comment 1
2020-11-16 11:38:08 PST
Created
attachment 414261
[details]
Patch
Aakash Jain
Comment 2
2020-11-16 11:44:15 PST
Note that the network requests failures in all these calls are gracefully handled. For e.g.: fetch_data_from_url_with_authentication() is used in get_patch_json/get_bug_json which are used in ValidatePatch step. In case of network request failure, validate-patch step will print a log about network request failure, mark the validate-patch step as warning, and continue with the subsequent build-steps. For load_contributors_from_trac() network request failure, it falls back to reading the contributors.json file from disk. For get_patch_status() network request failure in CheckPatchStatusOnEWSQueues step, commit-queue will continue processing the patch, although it might be inefficient, as it might run the layout-tests even if not strictly necessary.
lingho@apple.com
Comment 3
2020-11-16 11:57:23 PST
I think the timeout value to bugs.webkit.org should probably be set to a much lower value, like 2 seconds. Similarly for trac and ews-build connection, if you have't seen previous timeout, that should probably set to 2 seconds to make it useful.
Jonathan Bedard
Comment 4
2020-11-16 12:36:05 PST
Comment on
attachment 414261
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414261&action=review
> Tools/CISupport/ews-build/steps.py:402 > + response = requests.get(url, timeout=15, params={'Bugzilla_api_key': self.get_bugzilla_api_key()})
I would second Ling's opinion that this timeout should be lower (I would say 5 seconds) This is the time to wait for any bytes, not the entire response:
https://requests.readthedocs.io/en/master/user/quickstart/
Alexey Proskuryakov
Comment 5
2020-11-16 12:42:53 PST
I'm not necessarily disagreeing, but can you explain why the timeout would be short? My instinct was to suggest making it at least 1 minute. Failing these requests could put the system in a bad state.
Jonathan Bedard
Comment 6
2020-11-16 12:54:41 PST
(In reply to Alexey Proskuryakov from
comment #5
)
> I'm not necessarily disagreeing, but can you explain why the timeout would > be short? My instinct was to suggest making it at least 1 minute. > > Failing these requests could put the system in a bad state.
My thinking is that if we can't receive any response (as in, not a single byte) from the server in 5 seconds, I wouldn't expect things to be different after 1 minute. That being said, I'm not sure that this change would even fix the original problem, was our issue that requests where not connecting to the server? The timeout option for requests doesn't apply to how long it takes to download the response, only to the time it takes for the server to start the response.
lingho@apple.com
Comment 7
2020-11-16 14:06:36 PST
I might have misunderstood the reason the timeouts are to be addded. I thought these timeouts are acceptable and will be handled accordingly. If the end goal is to not have a failed request, I guess there shouldn't even be a timeout specified. If a request eventually gets a timeout (not sure what's the default set up Python but I think the default Linux socket timeout is 60s), retry again and again until it eventually succeeded. As of why ews-build was affected as a whole by unavailable connection on bugs.webkit.org last week, the cause might need to be figured out and fixed differently.
Aakash Jain
Comment 8
2020-11-18 13:44:30 PST
Created
attachment 414481
[details]
Patch
Aakash Jain
Comment 9
2020-11-18 14:02:30 PST
Increased the timeout to 60s. It's not very clear why we saw performance issue on EWS on last Friday, but there were clearly many network requests which were stuck, taking extremely long etc. e.g.: in
https://ews-build.webkit.org/#/builders/46/builds/2265
step #17 (validate-patch) took 2m:42s. Having 60s timeout is still much better than infinite timeout. Also, for most of these network requests, failure only make EWS build slightly inefficient (e.g.: validate-patch step being skipped causing EWS to process obsolete/r- patches). For load_contributors_from_trac(), it will read local copy from disk (which might be slightly old)
Jonathan Bedard
Comment 10
2020-11-18 14:20:03 PST
Comment on
attachment 414481
[details]
Patch I think nit-picking the value this should be is not a good use of our time, so let's land!
EWS
Comment 11
2020-11-18 14:50:58 PST
Committed
r269992
: <
https://trac.webkit.org/changeset/269992
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 414481
[details]
.
Radar WebKit Bug Importer
Comment 12
2020-11-18 14:51:20 PST
<
rdar://problem/71556775
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug