Bug 83230

Summary: nrwt is failing to upload test results on the chromium-mac-leopard bots
Product: WebKit Reporter: Dirk Pranke <dpranke>
Component: New BugsAssignee: Dirk Pranke <dpranke>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dglazkov, eric, ojan, tony
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
add tests none

Dirk Pranke
Reported 2012-04-04 15:55:22 PDT
nrwt is failing to upload test results on the chromium-mac-leopard bots
Attachments
Patch (4.31 KB, patch)
2012-04-04 15:55 PDT, Dirk Pranke
no flags
Patch (5.41 KB, patch)
2012-04-04 18:10 PDT, Dirk Pranke
no flags
Patch (6.62 KB, patch)
2012-04-05 16:04 PDT, Dirk Pranke
no flags
Patch (2.62 KB, patch)
2012-04-09 13:55 PDT, Dirk Pranke
no flags
add tests (18.00 KB, patch)
2012-04-09 16:15 PDT, Dirk Pranke
no flags
Dirk Pranke
Comment 1 2012-04-04 15:55:51 PDT
Ojan Vafai
Comment 2 2012-04-04 15:59:38 PDT
Comment on attachment 135706 [details] Patch Thanks for digging into this.
Dirk Pranke
Comment 3 2012-04-04 16:01:41 PDT
Comment on attachment 135706 [details] Patch Clearing flags on attachment: 135706 Committed r113256: <http://trac.webkit.org/changeset/113256>
Dirk Pranke
Comment 4 2012-04-04 16:01:50 PDT
All reviewed patches have been landed. Closing bug.
Dirk Pranke
Comment 5 2012-04-04 18:10:38 PDT
Reopening to attach new patch.
Dirk Pranke
Comment 6 2012-04-04 18:10:41 PDT
Ojan Vafai
Comment 7 2012-04-04 18:13:27 PDT
Comment on attachment 135735 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=135735&action=review > Tools/Scripts/webkitpy/common/net/file_uploader.py:119 > + if not self._debug: This should presumably have a FIXME to get rid of this once we figure out what's going wrong.
Dirk Pranke
Comment 8 2012-04-04 18:17:24 PDT
Dirk Pranke
Comment 9 2012-04-04 18:18:22 PDT
(In reply to comment #7) > (From update of attachment 135735 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=135735&action=review > > > Tools/Scripts/webkitpy/common/net/file_uploader.py:119 > > + if not self._debug: > > This should presumably have a FIXME to get rid of this once we figure out what's going wrong. True. Done.
Dirk Pranke
Comment 10 2012-04-04 20:21:33 PDT
Okay, with the latest set of changes (incl. the typo fix in r113277 I *think* I've fixed the issues, and we've uploaded at least one run from the 10.5 Release bot. Will watch further to see what happens, and then do more diagnostics and clean up the patches tomorrow.
Dirk Pranke
Comment 11 2012-04-05 15:17:26 PDT
The root problem appears to be the socket.setdefaulttimeout() call. From threads on the web (e.g., http://code.activestate.com/lists/python-list/328319/ ) it looks like this either doesn't work well with urllib, or with multiprocessing, or both. However, in 2.6 urllib2.open() grew a timeout parameter, so we should be able to just use that instead ... final patch coming momentarily.
Dirk Pranke
Comment 12 2012-04-05 16:04:58 PDT
Ojan Vafai
Comment 13 2012-04-05 16:25:29 PDT
Comment on attachment 135925 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review > Tools/Scripts/webkitpy/common/net/file_uploader.py:111 > + return urllib2.urlopen(request, timeout=(self._deadline - now)) I don't get this. Why aren't we just setting timeout=self._timeout_seconds?
Dirk Pranke
Comment 14 2012-04-05 16:36:57 PDT
(In reply to comment #13) > (From update of attachment 135925 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review > > > Tools/Scripts/webkitpy/common/net/file_uploader.py:111 > > + return urllib2.urlopen(request, timeout=(self._deadline - now)) > > I don't get this. Why aren't we just setting timeout=self._timeout_seconds? Because we might retry the transaction and should (in theory) use an increasingly shorter timeout on the retries. Then again, I'm not convinced that we really need to retry the transaction at all ...
Ojan Vafai
Comment 15 2012-04-05 16:39:39 PDT
(In reply to comment #14) > (In reply to comment #13) > > (From update of attachment 135925 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review > > > > > Tools/Scripts/webkitpy/common/net/file_uploader.py:111 > > > + return urllib2.urlopen(request, timeout=(self._deadline - now)) > > > > I don't get this. Why aren't we just setting timeout=self._timeout_seconds? > > Because we might retry the transaction and should (in theory) use an increasingly shorter timeout on the retries. This is the part that doesn't make sense to me. Why would we want to use a shorter timeout? > Then again, I'm not convinced that we really need to retry the transaction at all ... I agree actually. We certainly don't need to retry 3 times. 1 retry is more than enough.
Dirk Pranke
Comment 16 2012-04-05 16:42:25 PDT
(In reply to comment #15) > > This is the part that doesn't make sense to me. Why would we want to use a shorter timeout? > Because otherwise you might exceed the the original deadline. If you want to wait for no longer than 120 seconds, then waiting for 120 seconds, failing after 10, retrying, and waiting for 120 again could exceed the original deadline. I don't really care about any of this, though; this is all weird-error-case land, and we could certainly either pass 120 hear, or rip out networktransaction completely, or do something else ... This code is also only used by json_results_generator and perftestsrunner (presumably also to upload something to appengine).
Ojan Vafai
Comment 17 2012-04-05 16:56:14 PDT
(In reply to comment #16) > (In reply to comment #15) > > > > This is the part that doesn't make sense to me. Why would we want to use a shorter timeout? > > > > Because otherwise you might exceed the the original deadline. If you want to wait for no longer than 120 seconds, then waiting for 120 seconds, failing after 10, retrying, and waiting for 120 again could exceed the original deadline. > > I don't really care about any of this, though; this is all weird-error-case land, and we could certainly either pass 120 hear, or rip out networktransaction completely, or do something else ... > > This code is also only used by json_results_generator and perftestsrunner (presumably also to upload something to appengine). I see. I guess that's fine.
Ojan Vafai
Comment 18 2012-04-05 16:57:31 PDT
Also, I'm *so* happy this is finally resolved.
Dirk Pranke
Comment 19 2012-04-05 18:14:16 PDT
Dirk Pranke
Comment 20 2012-04-09 13:46:21 PDT
Re-opening, it looks like the last change doesn't actually work on 10.5 :(.
Dirk Pranke
Comment 21 2012-04-09 13:55:53 PDT
Dirk Pranke
Comment 22 2012-04-09 14:09:30 PDT
Another fix landed in http://trac.webkit.org/changeset/113617 ; let's see how this one goes ...
Dirk Pranke
Comment 23 2012-04-09 16:15:19 PDT
Created attachment 136330 [details] add tests
Ojan Vafai
Comment 24 2012-04-09 16:37:16 PDT
Comment on attachment 136330 [details] add tests I think you meant to upload this to a different bug.
Dirk Pranke
Comment 25 2012-04-09 16:49:30 PDT
(In reply to comment #24) > (From update of attachment 136330 [details]) > I think you meant to upload this to a different bug. You're right, stupid confused changelogs ...
Note You need to log in before you can comment on or make changes to this bug.