Bug 201100 - results.webkit.org: Add endpoints to upload and download archives
Summary: results.webkit.org: Add endpoints to upload and download archives
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jonathan Bedard
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-08-23 16:22 PDT by Jonathan Bedard
Modified: 2019-08-30 15:04 PDT (History)
5 users (show)

See Also:


Attachments
Patch (14.81 KB, patch)
2019-08-23 16:26 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (14.82 KB, patch)
2019-08-29 14:45 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews210 for win-future (13.95 MB, application/zip)
2019-08-29 17:18 PDT, EWS Watchlist
no flags Details
Patch (14.88 KB, patch)
2019-08-30 07:51 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (17.20 KB, patch)
2019-08-30 09:14 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (17.25 KB, patch)
2019-08-30 10:28 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (19.00 KB, patch)
2019-08-30 12:30 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Bedard 2019-08-23 16:22:19 PDT
Add the upload and download endpoints for managing archive uploads and downloads, respectively.
Comment 1 Jonathan Bedard 2019-08-23 16:26:14 PDT
Created attachment 377175 [details]
Patch
Comment 2 Jonathan Bedard 2019-08-23 16:28:10 PDT
Comment on attachment 377175 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:107
> +                commits = [self.commit_controller.register(commit=commit) for commit in json.loads(data.get('commits', '[]'))]

Not a huge fan of this recursively defined json dictionary, but I couldn't come up with a better way.
Comment 3 Aakash Jain 2019-08-29 08:55:41 PDT
Comment on attachment 377175 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review

> Tools/resultsdbpy/resultsdbpy/controller/api_routes.py:63
> +        self.add_url_rule('/upload/archive', 'upload_archive', self.archive_controller.endpoint, methods=('GET', 'POST',))

is the extra , after 'POST' intentional?

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:56
> +    ):

What happens if no parameters are passed to this method (since all parameters are optional)? 
Shouldn't we have a validation to exit early in case sufficient parameters are not passed?

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:59
> +        recent = boolean_query(*recent)[0] if recent else True

what does 'recent' indicate and why are we re-assigning it?
I would prefer a explicit if else here for easy readability and clarity.

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:63
> +            for config_suites in self.upload_context.find_suites(configurations=configurations, recent=recent).values():

Seems like if both suite and configuration is None, suites will be empty set. We should just exit early if both suite and configuration are None with a clear error message, rather than relying on generic error message at the end of this method.

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:73
> +                    configurations=configurations, suite=suite, branch=branch[0],

wouldn't it crash if branch is None (which is the default value)?

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:74
> +                    begin=begin, end=end, recent=recent, limit=2,

why limit=2?

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:79
> +                                abort(400, description='Too many archives matching the specified criteria')

It would be nice if the error message specifies all the archives matching the criteria. Otherwise it might be very hard for someone to debug this situation, he would have to hit-and-try.

Also "Too many" => "Multiple". (this error message just check for more than one archives).

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:81
> +                            filename = f'{configuration}@{archive["uuid"]}.zip'.replace(' ', '_')

can you give an example of filename which would be downloaded?

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:95
> +                abort(400, description='Expected meta-data to be json')

Should be re-phrased as an error message, something like: 'Meta-data is not json'.
might also consider including the error message 'e'.

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:100
> +                abort(400, description='Invalid configuration' + str(e))

'e' doesn't specify configuration name. It should be something like: 'Invalid configuration, error: {}'.format(str(e))

Also, what kind of errors are we catching here? Can you give an example of a value which would trigger this case?

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:104
> +                abort(400, description='No test suite specified')

You can consider making it more descriptive by including some examples of test suites in the error message.

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:107
>> +                commits = [self.commit_controller.register(commit=commit) for commit in json.loads(data.get('commits', '[]'))]
> 
> Not a huge fan of this recursively defined json dictionary, but I couldn't come up with a better way.

what are we doing in this line? Uploaded data has commits in it? is it just to associate the test results with a particular commit? If so, why are there multiple commits?

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:113
> +            if 'file' not in request.files:

Seems like this check should be done earlier, before registering commits.

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:59
> +    def test_invalid_upload(self, client, **kwargs):

you can consider adding unit-tests for download as well.

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:70
> +                self.URL + '/api/upload/archive',

where is self.URL defined?

Also, can make this overall URL a separate variable and re-use it.

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:86
> +    def test_upload(self, client, **kwargs):

Please add test-cases for various failure cases as well, e.g.: 'No test suite specified'
Comment 4 Jonathan Bedard 2019-08-29 10:20:19 PDT
Comment on attachment 377175 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:56
>> +    ):
> 
> What happens if no parameters are passed to this method (since all parameters are optional)? 
> Shouldn't we have a validation to exit early in case sufficient parameters are not passed?

None, in these cases, acts as a wildcard, so it will return everything. We don't have validation on configurations, begin/end and begin_query_time/end_query_time because @configuration_for_query, @uuid_range_for_query and @time_range_for_query respectively are responsible for validating these parameters.

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:59
>> +        recent = boolean_query(*recent)[0] if recent else True
> 
> what does 'recent' indicate and why are we re-assigning it?
> I would prefer a explicit if else here for easy readability and clarity.

We're basically setting a default value of 'True', but 'recent' is actually a string (since it's coming straight from url parameters), so we need to convert it to a boolean first.

What 'recent' does has a lot to do with your comment on line 63. Basically, we have a record of all configurations ever reported, but that's rarely what we actually want. Usually, we only care about queues who have reported results recently, which is why by default, we tell self.upload_context.find_suites that we only care about recent configurations. The upload_context is responsible for deciding what 'recent' means, under the current configuration, it's 2 weeks, if I recall.

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:63
>> +            for config_suites in self.upload_context.find_suites(configurations=configurations, recent=recent).values():
> 
> Seems like if both suite and configuration is None, suites will be empty set. We should just exit early if both suite and configuration are None with a clear error message, rather than relying on generic error message at the end of this method.

Not exactly.

Basically what the logic here is doing is, if suite are undefined, cross-reference our record of configuration and suite information to populate them with valid values. Suites and configurations are related, so if a user says, for example, 'I would like Production results', those arguments would specifically exclude API tests.

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:73
>> +                    configurations=configurations, suite=suite, branch=branch[0],
> 
> wouldn't it crash if branch is None (which is the default value)?

No, we map 'None' to master/trunk since those are reasonable defaults (and git and svn have different names for that branch if you are tracking multiple repositories, other branches will likely have the same name)

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:74
>> +                    begin=begin, end=end, recent=recent, limit=2,
> 
> why limit=2?

Because we only want a single result, but we also want to be able to tell if the arguments provided refer to multiple archives.

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:79
>> +                                abort(400, description='Too many archives matching the specified criteria')
> 
> It would be nice if the error message specifies all the archives matching the criteria. Otherwise it might be very hard for someone to debug this situation, he would have to hit-and-try.
> 
> Also "Too many" => "Multiple". (this error message just check for more than one archives).

Not sure that we can reasonably do this. We can if the number is small (like 5ish), but getting the exact number of matching archives involves actually pulling those archives out of Cassandra. Unlike our other tables, we can't reasonably pull out hundreds of archives.

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:81
>> +                            filename = f'{configuration}@{archive["uuid"]}.zip'.replace(' ', '_')
> 
> can you give an example of filename which would be downloaded?

It's going to be something like 'High_Sierra_Debug@156706136800.zip

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:100
>> +                abort(400, description='Invalid configuration' + str(e))
> 
> 'e' doesn't specify configuration name. It should be something like: 'Invalid configuration, error: {}'.format(str(e))
> 
> Also, what kind of errors are we catching here? Can you give an example of a value which would trigger this case?

Ultimately, it's the Configuration class that's responsible for these errors, which is why I haven't done anything other than display the error it gave us (it's not obvious from this code, but these errors are not generic Python errors, they're specific to the configuration)

A simple example, though, would be something like: dict(version_number ='string'), that's an invalid Configuration because a version_number needs to be something of the form '#.#.#'. The 'documentation' endpoint explains in detail what exactly a configuration looks like.

>>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:107
>>> +                commits = [self.commit_controller.register(commit=commit) for commit in json.loads(data.get('commits', '[]'))]
>> 
>> Not a huge fan of this recursively defined json dictionary, but I couldn't come up with a better way.
> 
> what are we doing in this line? Uploaded data has commits in it? is it just to associate the test results with a particular commit? If so, why are there multiple commits?

The results database supports tracking changes in multiple repositories, so every archive might be associated with multiple commits. One of these commits will be 'prevailing' (ie, newest) but that's not really the responsibility of the client to know (for git especially, the client might not be aware of the timestamp a commit is landed)

What we're doing in this line is converting the uploaded commit data to commit objects, so we can use the commit uuid (which is basically timestamp + order, since in git, a patch series can be landed at exactly the same time). The only catch is that we might actually have to talk to the underlying SCM service to know when the commit was landed, which is what commit_controller.register(...) will do, if needed.

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:70
>> +                self.URL + '/api/upload/archive',
> 
> where is self.URL defined?
> 
> Also, can make this overall URL a separate variable and re-use it.

It's defined out parent class, FlaskTestCase. It's value is pretty configuration dependent, because we have to pick an open port to run on (if we're running a real web server)
Comment 5 Jonathan Bedard 2019-08-29 14:26:28 PDT
Comment on attachment 377175 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:104
>> +                abort(400, description='No test suite specified')
> 
> You can consider making it more descriptive by including some examples of test suites in the error message.

The trouble is, nothing in the results database code doesn't actually know what those test suites are. That list is determined by which suites are reported.
Comment 6 Jonathan Bedard 2019-08-29 14:27:53 PDT
(In reply to Jonathan Bedard from comment #5)
> Comment on attachment 377175 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=377175&action=review
> 
> >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:104
> >> +                abort(400, description='No test suite specified')
> > 
> > You can consider making it more descriptive by including some examples of test suites in the error message.
> 
> The trouble is, nothing in the results database code doesn't actually know
> what those test suites are. That list is determined by which suites are
> reported.

Apparently I can't write.

The code of the results database doesn't have a list of valid suites, that list is determined by uploaded data, so we can't give examples of valid suites when the user is uploading them.
Comment 7 Jonathan Bedard 2019-08-29 14:39:48 PDT
Comment on attachment 377175 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:86
>> +    def test_upload(self, client, **kwargs):
> 
> Please add test-cases for various failure cases as well, e.g.: 'No test suite specified'

We actually do test downloading, line 113.
Comment 8 Jonathan Bedard 2019-08-29 14:45:46 PDT
Created attachment 377628 [details]
Patch
Comment 9 EWS Watchlist 2019-08-29 17:18:05 PDT
Comment on attachment 377628 [details]
Patch

Attachment 377628 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/12982085

New failing tests:
js/error-should-not-strong-reference-global-object.html
Comment 10 EWS Watchlist 2019-08-29 17:18:08 PDT
Created attachment 377657 [details]
Archive of layout-test-results from ews210 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews210  Port: win-future  Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Comment 11 Jonathan Bedard 2019-08-30 07:51:25 PDT
Created attachment 377704 [details]
Patch
Comment 12 Jonathan Bedard 2019-08-30 09:14:32 PDT
Created attachment 377717 [details]
Patch
Comment 13 Aakash Jain 2019-08-30 09:44:00 PDT
Comment on attachment 377717 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377717&action=review

> Tools/resultsdbpy/resultsdbpy/controller/api_routes.py:63
> +        self.add_url_rule('/upload/archive', 'upload_archive', self.archive_controller.endpoint, methods=('GET', 'POST'))

Is the URL/route to download is '/upload/archive'? That seems weird.

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:81
> +                            filename = f'{configuration}@{archive["uuid"]}'.replace(' ', '_').replace('.', '-')

Would the @ character work fine on windows? Maybe it's better to user _ or -

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:99
> +                abort(400, description='Meta-data to be json')

Typo: missing "Expected".

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:113
> +                abort(400, description='Expected meta-data to be json')

In this case, is this meta-data, or commits info inside meta-data, which is invalid json?

Please ensure to keep this error message not identical to error message on line 99, so as to make it easier to debug these two scenarios.

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:85
> +        response = self.upload_file(client, f'{self.URL}/api/upload/archive', upload_meta, upload_content)

'/api/upload/archive' is used multiple times in this class, it might be a good idea to make it a variable (probably class variable), and re-use that.
Comment 14 Jonathan Bedard 2019-08-30 10:03:58 PDT
Comment on attachment 377717 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377717&action=review

>> Tools/resultsdbpy/resultsdbpy/controller/api_routes.py:63
>> +        self.add_url_rule('/upload/archive', 'upload_archive', self.archive_controller.endpoint, methods=('GET', 'POST'))
> 
> Is the URL/route to download is '/upload/archive'? That seems weird.

Yes, but only because the endpoint isn't really intended for accessing the archive, it would just be silly to disallow downloading. Ultimately, the user would be expected to use the future /archive endpoint to access the contents of the archive

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:81
>> +                            filename = f'{configuration}@{archive["uuid"]}'.replace(' ', '_').replace('.', '-')
> 
> Would the @ character work fine on windows? Maybe it's better to user _ or -

Can't find anything that says it's disallowed: https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:113
>> +                abort(400, description='Expected meta-data to be json')
> 
> In this case, is this meta-data, or commits info inside meta-data, which is invalid json?
> 
> Please ensure to keep this error message not identical to error message on line 99, so as to make it easier to debug these two scenarios.

It would be info inside of the meta-data, most likely. This is probably going to most often protect against the json.loads on line 111 failing, although, it's possible (but not expected) that something inside commit_controller.register could go wrong and raise a ValueError.
Comment 15 Jonathan Bedard 2019-08-30 10:28:27 PDT
Created attachment 377723 [details]
Patch
Comment 16 Aakash Jain 2019-08-30 10:37:27 PDT
Comment on attachment 377723 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377723&action=review

> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:108
> +    def test_invalid_metadata(self, client, **kwargs):

Would be nice to have unit-tests to cover all the error messages, like 'No archive provided', 'Expected meta-data to be json', 'Expected commit meta-data to be json', 'Invalid configuration, error:', 'No archives matching the specified criteria'.
Comment 17 Jonathan Bedard 2019-08-30 12:30:28 PDT
Created attachment 377739 [details]
Patch
Comment 18 Jonathan Bedard 2019-08-30 12:40:40 PDT
(In reply to Aakash Jain from comment #16)
> Comment on attachment 377723 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=377723&action=review
> 
> > Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:108
> > +    def test_invalid_metadata(self, client, **kwargs):
> 
> Would be nice to have unit-tests to cover all the error messages, like 'No
> archive provided', 'Expected meta-data to be json', 'Expected commit
> meta-data to be json', 'Invalid configuration, error:', 'No archives
> matching the specified criteria'.

'All' isn't going to be practical, since some of the functions we're calling can generate error messages too (for example, a syntactically valid commit that does not exist will generate an error), but I tested the error messages that this patch adds, at least.
Comment 19 WebKit Commit Bot 2019-08-30 15:03:52 PDT
Comment on attachment 377739 [details]
Patch

Clearing flags on attachment: 377739

Committed r249348: <https://trac.webkit.org/changeset/249348>
Comment 20 WebKit Commit Bot 2019-08-30 15:03:54 PDT
All reviewed patches have been landed.  Closing bug.
Comment 21 Radar WebKit Bug Importer 2019-08-30 15:04:22 PDT
<rdar://problem/54898162>