Bug 83230 - nrwt is failing to upload test results on the chromium-mac-leopard bots
Summary: nrwt is failing to upload test results on the chromium-mac-leopard bots
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dirk Pranke
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-04 15:55 PDT by Dirk Pranke
Modified: 2012-04-09 16:49 PDT (History)
5 users (show)

See Also:


Attachments
Patch (4.31 KB, patch)
2012-04-04 15:55 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
Patch (5.41 KB, patch)
2012-04-04 18:10 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
Patch (6.62 KB, patch)
2012-04-05 16:04 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
Patch (2.62 KB, patch)
2012-04-09 13:55 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
add tests (18.00 KB, patch)
2012-04-09 16:15 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Pranke 2012-04-04 15:55:22 PDT
nrwt is failing to upload test results on the chromium-mac-leopard bots
Comment 1 Dirk Pranke 2012-04-04 15:55:51 PDT
Created attachment 135706 [details]
Patch
Comment 2 Ojan Vafai 2012-04-04 15:59:38 PDT
Comment on attachment 135706 [details]
Patch

Thanks for digging into this.
Comment 3 Dirk Pranke 2012-04-04 16:01:41 PDT
Comment on attachment 135706 [details]
Patch

Clearing flags on attachment: 135706

Committed r113256: <http://trac.webkit.org/changeset/113256>
Comment 4 Dirk Pranke 2012-04-04 16:01:50 PDT
All reviewed patches have been landed.  Closing bug.
Comment 5 Dirk Pranke 2012-04-04 18:10:38 PDT
Reopening to attach new patch.
Comment 6 Dirk Pranke 2012-04-04 18:10:41 PDT
Created attachment 135735 [details]
Patch
Comment 7 Ojan Vafai 2012-04-04 18:13:27 PDT
Comment on attachment 135735 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=135735&action=review

> Tools/Scripts/webkitpy/common/net/file_uploader.py:119
> +            if not self._debug:

This should presumably have a FIXME to get rid of this once we figure out what's going wrong.
Comment 8 Dirk Pranke 2012-04-04 18:17:24 PDT
Committed r113271: <http://trac.webkit.org/changeset/113271>
Comment 9 Dirk Pranke 2012-04-04 18:18:22 PDT
(In reply to comment #7)
> (From update of attachment 135735 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=135735&action=review
> 
> > Tools/Scripts/webkitpy/common/net/file_uploader.py:119
> > +            if not self._debug:
> 
> This should presumably have a FIXME to get rid of this once we figure out what's going wrong.

True. Done.
Comment 10 Dirk Pranke 2012-04-04 20:21:33 PDT
Okay, with the latest set of changes (incl. the typo fix in r113277 I *think* I've fixed the issues, and we've uploaded at least one run from the 10.5 Release bot. Will watch further to see what happens, and then do more diagnostics and clean up the patches tomorrow.
Comment 11 Dirk Pranke 2012-04-05 15:17:26 PDT
The root problem appears to be the socket.setdefaulttimeout() call. From threads on the web (e.g., http://code.activestate.com/lists/python-list/328319/ ) it looks like this either doesn't work well with urllib, or with multiprocessing, or both. However, in 2.6 urllib2.open() grew a timeout parameter, so we should be able to just use that instead ... final patch coming momentarily.
Comment 12 Dirk Pranke 2012-04-05 16:04:58 PDT
Created attachment 135925 [details]
Patch
Comment 13 Ojan Vafai 2012-04-05 16:25:29 PDT
Comment on attachment 135925 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review

> Tools/Scripts/webkitpy/common/net/file_uploader.py:111
> +            return urllib2.urlopen(request, timeout=(self._deadline - now))

I don't get this. Why aren't we just setting timeout=self._timeout_seconds?
Comment 14 Dirk Pranke 2012-04-05 16:36:57 PDT
(In reply to comment #13)
> (From update of attachment 135925 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review
> 
> > Tools/Scripts/webkitpy/common/net/file_uploader.py:111
> > +            return urllib2.urlopen(request, timeout=(self._deadline - now))
> 
> I don't get this. Why aren't we just setting timeout=self._timeout_seconds?

Because we might retry the transaction and should (in theory) use an increasingly shorter timeout on the retries.

Then again, I'm not convinced that we really need to retry the transaction at all ...
Comment 15 Ojan Vafai 2012-04-05 16:39:39 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > (From update of attachment 135925 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=135925&action=review
> > 
> > > Tools/Scripts/webkitpy/common/net/file_uploader.py:111
> > > +            return urllib2.urlopen(request, timeout=(self._deadline - now))
> > 
> > I don't get this. Why aren't we just setting timeout=self._timeout_seconds?
> 
> Because we might retry the transaction and should (in theory) use an increasingly shorter timeout on the retries.

This is the part that doesn't make sense to me. Why would we want to use a shorter timeout?

> Then again, I'm not convinced that we really need to retry the transaction at all ...

I agree actually. We certainly don't need to retry 3 times. 1 retry is more than enough.
Comment 16 Dirk Pranke 2012-04-05 16:42:25 PDT
(In reply to comment #15)
> 
> This is the part that doesn't make sense to me. Why would we want to use a shorter timeout?
> 

Because otherwise you might exceed the the original deadline. If you want to wait for no longer than 120 seconds, then waiting for 120 seconds, failing after 10, retrying, and waiting for 120 again could exceed the original deadline.

I don't really care about any of this, though; this is all weird-error-case land, and we could certainly either pass 120 hear, or rip out networktransaction completely, or do something else ...

This code is also only used by json_results_generator and perftestsrunner (presumably also to upload something to appengine).
Comment 17 Ojan Vafai 2012-04-05 16:56:14 PDT
(In reply to comment #16)
> (In reply to comment #15)
> > 
> > This is the part that doesn't make sense to me. Why would we want to use a shorter timeout?
> > 
> 
> Because otherwise you might exceed the the original deadline. If you want to wait for no longer than 120 seconds, then waiting for 120 seconds, failing after 10, retrying, and waiting for 120 again could exceed the original deadline.
> 
> I don't really care about any of this, though; this is all weird-error-case land, and we could certainly either pass 120 hear, or rip out networktransaction completely, or do something else ...
> 
> This code is also only used by json_results_generator and perftestsrunner (presumably also to upload something to appengine).

I see. I guess that's fine.
Comment 18 Ojan Vafai 2012-04-05 16:57:31 PDT
Also, I'm *so* happy this is finally resolved.
Comment 19 Dirk Pranke 2012-04-05 18:14:16 PDT
Committed r113399: <http://trac.webkit.org/changeset/113399>
Comment 20 Dirk Pranke 2012-04-09 13:46:21 PDT
Re-opening, it looks like the last change doesn't actually work on 10.5 :(.
Comment 21 Dirk Pranke 2012-04-09 13:55:53 PDT
Created attachment 136303 [details]
Patch
Comment 22 Dirk Pranke 2012-04-09 14:09:30 PDT
Another fix landed in http://trac.webkit.org/changeset/113617 ; let's see how this one goes ...
Comment 23 Dirk Pranke 2012-04-09 16:15:19 PDT
Created attachment 136330 [details]
add tests
Comment 24 Ojan Vafai 2012-04-09 16:37:16 PDT
Comment on attachment 136330 [details]
add tests

I think you meant to upload this to a different bug.
Comment 25 Dirk Pranke 2012-04-09 16:49:30 PDT
(In reply to comment #24)
> (From update of attachment 136330 [details])
> I think you meant to upload this to a different bug.

You're right, stupid confused changelogs ...