nrwt is failing to upload test results on the chromium-mac-leopard bots
Created attachment 135706 [details] Patch
Comment on attachment 135706 [details] Patch Thanks for digging into this.
Comment on attachment 135706 [details] Patch Clearing flags on attachment: 135706 Committed r113256: <http://trac.webkit.org/changeset/113256>
All reviewed patches have been landed. Closing bug.
Reopening to attach new patch.
Created attachment 135735 [details] Patch
Comment on attachment 135735 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=135735&action=review > Tools/Scripts/webkitpy/common/net/file_uploader.py:119 > + if not self._debug: This should presumably have a FIXME to get rid of this once we figure out what's going wrong.
Committed r113271: <http://trac.webkit.org/changeset/113271>
(In reply to comment #7) > (From update of attachment 135735 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=135735&action=review > > > Tools/Scripts/webkitpy/common/net/file_uploader.py:119 > > + if not self._debug: > > This should presumably have a FIXME to get rid of this once we figure out what's going wrong. True. Done.
Okay, with the latest set of changes (incl. the typo fix in r113277 I *think* I've fixed the issues, and we've uploaded at least one run from the 10.5 Release bot. Will watch further to see what happens, and then do more diagnostics and clean up the patches tomorrow.
The root problem appears to be the socket.setdefaulttimeout() call. From threads on the web (e.g., http://code.activestate.com/lists/python-list/328319/ ) it looks like this either doesn't work well with urllib, or with multiprocessing, or both. However, in 2.6 urllib2.open() grew a timeout parameter, so we should be able to just use that instead ... final patch coming momentarily.
Created attachment 135925 [details] Patch
Comment on attachment 135925 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review > Tools/Scripts/webkitpy/common/net/file_uploader.py:111 > + return urllib2.urlopen(request, timeout=(self._deadline - now)) I don't get this. Why aren't we just setting timeout=self._timeout_seconds?
(In reply to comment #13) > (From update of attachment 135925 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review > > > Tools/Scripts/webkitpy/common/net/file_uploader.py:111 > > + return urllib2.urlopen(request, timeout=(self._deadline - now)) > > I don't get this. Why aren't we just setting timeout=self._timeout_seconds? Because we might retry the transaction and should (in theory) use an increasingly shorter timeout on the retries. Then again, I'm not convinced that we really need to retry the transaction at all ...
(In reply to comment #14) > (In reply to comment #13) > > (From update of attachment 135925 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review > > > > > Tools/Scripts/webkitpy/common/net/file_uploader.py:111 > > > + return urllib2.urlopen(request, timeout=(self._deadline - now)) > > > > I don't get this. Why aren't we just setting timeout=self._timeout_seconds? > > Because we might retry the transaction and should (in theory) use an increasingly shorter timeout on the retries. This is the part that doesn't make sense to me. Why would we want to use a shorter timeout? > Then again, I'm not convinced that we really need to retry the transaction at all ... I agree actually. We certainly don't need to retry 3 times. 1 retry is more than enough.
(In reply to comment #15) > > This is the part that doesn't make sense to me. Why would we want to use a shorter timeout? > Because otherwise you might exceed the the original deadline. If you want to wait for no longer than 120 seconds, then waiting for 120 seconds, failing after 10, retrying, and waiting for 120 again could exceed the original deadline. I don't really care about any of this, though; this is all weird-error-case land, and we could certainly either pass 120 hear, or rip out networktransaction completely, or do something else ... This code is also only used by json_results_generator and perftestsrunner (presumably also to upload something to appengine).
(In reply to comment #16) > (In reply to comment #15) > > > > This is the part that doesn't make sense to me. Why would we want to use a shorter timeout? > > > > Because otherwise you might exceed the the original deadline. If you want to wait for no longer than 120 seconds, then waiting for 120 seconds, failing after 10, retrying, and waiting for 120 again could exceed the original deadline. > > I don't really care about any of this, though; this is all weird-error-case land, and we could certainly either pass 120 hear, or rip out networktransaction completely, or do something else ... > > This code is also only used by json_results_generator and perftestsrunner (presumably also to upload something to appengine). I see. I guess that's fine.
Also, I'm *so* happy this is finally resolved.
Committed r113399: <http://trac.webkit.org/changeset/113399>
Re-opening, it looks like the last change doesn't actually work on 10.5 :(.
Created attachment 136303 [details] Patch
Another fix landed in http://trac.webkit.org/changeset/113617 ; let's see how this one goes ...
Created attachment 136330 [details] add tests
Comment on attachment 136330 [details] add tests I think you meant to upload this to a different bug.
(In reply to comment #24) > (From update of attachment 136330 [details]) > I think you meant to upload this to a different bug. You're right, stupid confused changelogs ...