201100 – results.webkit.org: Add endpoints to upload and download archives

RESOLVED FIXED 201100

results.webkit.org: Add endpoints to upload and download archives

https://bugs.webkit.org/show_bug.cgi?id=201100

Summary results.webkit.org: Add endpoints to upload and download archives

Jonathan Bedard

Reported 2019-08-23 16:22:19 PDT

Add the upload and download endpoints for managing archive uploads and downloads, respectively.

Attachments
Patch (14.81 KB, patch) 2019-08-23 16:26 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (14.82 KB, patch) 2019-08-29 14:45 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews210 for win-future (13.95 MB, application/zip) 2019-08-29 17:18 PDT, EWS Watchlist	no flags	Details
Patch (14.88 KB, patch) 2019-08-30 07:51 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (17.20 KB, patch) 2019-08-30 09:14 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (17.25 KB, patch) 2019-08-30 10:28 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (19.00 KB, patch) 2019-08-30 12:30 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Jonathan Bedard

Comment 1 2019-08-23 16:26:14 PDT

Created attachment 377175 [details] Patch

Jonathan Bedard

Comment 2 2019-08-23 16:28:10 PDT

Comment on attachment 377175 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:107 > + commits = [self.commit_controller.register(commit=commit) for commit in json.loads(data.get('commits', '[]'))] Not a huge fan of this recursively defined json dictionary, but I couldn't come up with a better way.

Aakash Jain

Comment 3 2019-08-29 08:55:41 PDT

Comment on attachment 377175 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review > Tools/resultsdbpy/resultsdbpy/controller/api_routes.py:63 > + self.add_url_rule('/upload/archive', 'upload_archive', self.archive_controller.endpoint, methods=('GET', 'POST',)) is the extra , after 'POST' intentional? > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:56 > + ): What happens if no parameters are passed to this method (since all parameters are optional)? Shouldn't we have a validation to exit early in case sufficient parameters are not passed? > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:59 > + recent = boolean_query(*recent)[0] if recent else True what does 'recent' indicate and why are we re-assigning it? I would prefer a explicit if else here for easy readability and clarity. > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:63 > + for config_suites in self.upload_context.find_suites(configurations=configurations, recent=recent).values(): Seems like if both suite and configuration is None, suites will be empty set. We should just exit early if both suite and configuration are None with a clear error message, rather than relying on generic error message at the end of this method. > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:73 > + configurations=configurations, suite=suite, branch=branch[0], wouldn't it crash if branch is None (which is the default value)? > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:74 > + begin=begin, end=end, recent=recent, limit=2, why limit=2? > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:79 > + abort(400, description='Too many archives matching the specified criteria') It would be nice if the error message specifies all the archives matching the criteria. Otherwise it might be very hard for someone to debug this situation, he would have to hit-and-try. Also "Too many" => "Multiple". (this error message just check for more than one archives). > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:81 > + filename = f'{configuration}@{archive["uuid"]}.zip'.replace(' ', '_') can you give an example of filename which would be downloaded? > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:95 > + abort(400, description='Expected meta-data to be json') Should be re-phrased as an error message, something like: 'Meta-data is not json'. might also consider including the error message 'e'. > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:100 > + abort(400, description='Invalid configuration' + str(e)) 'e' doesn't specify configuration name. It should be something like: 'Invalid configuration, error: {}'.format(str(e)) Also, what kind of errors are we catching here? Can you give an example of a value which would trigger this case? > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:104 > + abort(400, description='No test suite specified') You can consider making it more descriptive by including some examples of test suites in the error message. >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:107 >> + commits = [self.commit_controller.register(commit=commit) for commit in json.loads(data.get('commits', '[]'))] > > Not a huge fan of this recursively defined json dictionary, but I couldn't come up with a better way. what are we doing in this line? Uploaded data has commits in it? is it just to associate the test results with a particular commit? If so, why are there multiple commits? > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:113 > + if 'file' not in request.files: Seems like this check should be done earlier, before registering commits. > Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:59 > + def test_invalid_upload(self, client, **kwargs): you can consider adding unit-tests for download as well. > Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:70 > + self.URL + '/api/upload/archive', where is self.URL defined? Also, can make this overall URL a separate variable and re-use it. > Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:86 > + def test_upload(self, client, **kwargs): Please add test-cases for various failure cases as well, e.g.: 'No test suite specified'

Jonathan Bedard

Comment 4 2019-08-29 10:20:19 PDT

Comment on attachment 377175 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:56 >> + ): > > What happens if no parameters are passed to this method (since all parameters are optional)? > Shouldn't we have a validation to exit early in case sufficient parameters are not passed? None, in these cases, acts as a wildcard, so it will return everything. We don't have validation on configurations, begin/end and begin_query_time/end_query_time because @configuration_for_query, @uuid_range_for_query and @time_range_for_query respectively are responsible for validating these parameters. >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:59 >> + recent = boolean_query(*recent)[0] if recent else True > > what does 'recent' indicate and why are we re-assigning it? > I would prefer a explicit if else here for easy readability and clarity. We're basically setting a default value of 'True', but 'recent' is actually a string (since it's coming straight from url parameters), so we need to convert it to a boolean first. What 'recent' does has a lot to do with your comment on line 63. Basically, we have a record of all configurations ever reported, but that's rarely what we actually want. Usually, we only care about queues who have reported results recently, which is why by default, we tell self.upload_context.find_suites that we only care about recent configurations. The upload_context is responsible for deciding what 'recent' means, under the current configuration, it's 2 weeks, if I recall. >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:63 >> + for config_suites in self.upload_context.find_suites(configurations=configurations, recent=recent).values(): > > Seems like if both suite and configuration is None, suites will be empty set. We should just exit early if both suite and configuration are None with a clear error message, rather than relying on generic error message at the end of this method. Not exactly. Basically what the logic here is doing is, if suite are undefined, cross-reference our record of configuration and suite information to populate them with valid values. Suites and configurations are related, so if a user says, for example, 'I would like Production results', those arguments would specifically exclude API tests. >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:73 >> + configurations=configurations, suite=suite, branch=branch[0], > > wouldn't it crash if branch is None (which is the default value)? No, we map 'None' to master/trunk since those are reasonable defaults (and git and svn have different names for that branch if you are tracking multiple repositories, other branches will likely have the same name) >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:74 >> + begin=begin, end=end, recent=recent, limit=2, > > why limit=2? Because we only want a single result, but we also want to be able to tell if the arguments provided refer to multiple archives. >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:79 >> + abort(400, description='Too many archives matching the specified criteria') > > It would be nice if the error message specifies all the archives matching the criteria. Otherwise it might be very hard for someone to debug this situation, he would have to hit-and-try. > > Also "Too many" => "Multiple". (this error message just check for more than one archives). Not sure that we can reasonably do this. We can if the number is small (like 5ish), but getting the exact number of matching archives involves actually pulling those archives out of Cassandra. Unlike our other tables, we can't reasonably pull out hundreds of archives. >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:81 >> + filename = f'{configuration}@{archive["uuid"]}.zip'.replace(' ', '_') > > can you give an example of filename which would be downloaded? It's going to be something like 'High_Sierra_Debug@156706136800.zip >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:100 >> + abort(400, description='Invalid configuration' + str(e)) > > 'e' doesn't specify configuration name. It should be something like: 'Invalid configuration, error: {}'.format(str(e)) > > Also, what kind of errors are we catching here? Can you give an example of a value which would trigger this case? Ultimately, it's the Configuration class that's responsible for these errors, which is why I haven't done anything other than display the error it gave us (it's not obvious from this code, but these errors are not generic Python errors, they're specific to the configuration) A simple example, though, would be something like: dict(version_number ='string'), that's an invalid Configuration because a version_number needs to be something of the form '#.#.#'. The 'documentation' endpoint explains in detail what exactly a configuration looks like. >>> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:107 >>> + commits = [self.commit_controller.register(commit=commit) for commit in json.loads(data.get('commits', '[]'))] >> >> Not a huge fan of this recursively defined json dictionary, but I couldn't come up with a better way. > > what are we doing in this line? Uploaded data has commits in it? is it just to associate the test results with a particular commit? If so, why are there multiple commits? The results database supports tracking changes in multiple repositories, so every archive might be associated with multiple commits. One of these commits will be 'prevailing' (ie, newest) but that's not really the responsibility of the client to know (for git especially, the client might not be aware of the timestamp a commit is landed) What we're doing in this line is converting the uploaded commit data to commit objects, so we can use the commit uuid (which is basically timestamp + order, since in git, a patch series can be landed at exactly the same time). The only catch is that we might actually have to talk to the underlying SCM service to know when the commit was landed, which is what commit_controller.register(...) will do, if needed. >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:70 >> + self.URL + '/api/upload/archive', > > where is self.URL defined? > > Also, can make this overall URL a separate variable and re-use it. It's defined out parent class, FlaskTestCase. It's value is pretty configuration dependent, because we have to pick an open port to run on (if we're running a real web server)

Jonathan Bedard

Comment 5 2019-08-29 14:26:28 PDT

Comment on attachment 377175 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:104 >> + abort(400, description='No test suite specified') > > You can consider making it more descriptive by including some examples of test suites in the error message. The trouble is, nothing in the results database code doesn't actually know what those test suites are. That list is determined by which suites are reported.

Jonathan Bedard

Comment 6 2019-08-29 14:27:53 PDT

(In reply to Jonathan Bedard from comment #5) > Comment on attachment 377175 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=377175&action=review > > >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:104 > >> + abort(400, description='No test suite specified') > > > > You can consider making it more descriptive by including some examples of test suites in the error message. > > The trouble is, nothing in the results database code doesn't actually know > what those test suites are. That list is determined by which suites are > reported. Apparently I can't write. The code of the results database doesn't have a list of valid suites, that list is determined by uploaded data, so we can't give examples of valid suites when the user is uploading them.

Jonathan Bedard

Comment 7 2019-08-29 14:39:48 PDT

Comment on attachment 377175 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377175&action=review >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:86 >> + def test_upload(self, client, **kwargs): > > Please add test-cases for various failure cases as well, e.g.: 'No test suite specified' We actually do test downloading, line 113.

Jonathan Bedard

Comment 8 2019-08-29 14:45:46 PDT

Created attachment 377628 [details] Patch

EWS Watchlist

Comment 9 2019-08-29 17:18:05 PDT

Comment on attachment 377628 [details] Patch Attachment 377628 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/12982085 New failing tests: js/error-should-not-strong-reference-global-object.html

EWS Watchlist

Comment 10 2019-08-29 17:18:08 PDT

Created attachment 377657 [details] Archive of layout-test-results from ews210 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews210 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit

Jonathan Bedard

Comment 11 2019-08-30 07:51:25 PDT

Created attachment 377704 [details] Patch

Jonathan Bedard

Comment 12 2019-08-30 09:14:32 PDT

Created attachment 377717 [details] Patch

Aakash Jain

Comment 13 2019-08-30 09:44:00 PDT

Comment on attachment 377717 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377717&action=review > Tools/resultsdbpy/resultsdbpy/controller/api_routes.py:63 > + self.add_url_rule('/upload/archive', 'upload_archive', self.archive_controller.endpoint, methods=('GET', 'POST')) Is the URL/route to download is '/upload/archive'? That seems weird. > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:81 > + filename = f'{configuration}@{archive["uuid"]}'.replace(' ', '_').replace('.', '-') Would the @ character work fine on windows? Maybe it's better to user _ or - > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:99 > + abort(400, description='Meta-data to be json') Typo: missing "Expected". > Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:113 > + abort(400, description='Expected meta-data to be json') In this case, is this meta-data, or commits info inside meta-data, which is invalid json? Please ensure to keep this error message not identical to error message on line 99, so as to make it easier to debug these two scenarios. > Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:85 > + response = self.upload_file(client, f'{self.URL}/api/upload/archive', upload_meta, upload_content) '/api/upload/archive' is used multiple times in this class, it might be a good idea to make it a variable (probably class variable), and re-use that.

Jonathan Bedard

Comment 14 2019-08-30 10:03:58 PDT

Comment on attachment 377717 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377717&action=review >> Tools/resultsdbpy/resultsdbpy/controller/api_routes.py:63 >> + self.add_url_rule('/upload/archive', 'upload_archive', self.archive_controller.endpoint, methods=('GET', 'POST')) > > Is the URL/route to download is '/upload/archive'? That seems weird. Yes, but only because the endpoint isn't really intended for accessing the archive, it would just be silly to disallow downloading. Ultimately, the user would be expected to use the future /archive endpoint to access the contents of the archive >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:81 >> + filename = f'{configuration}@{archive["uuid"]}'.replace(' ', '_').replace('.', '-') > > Would the @ character work fine on windows? Maybe it's better to user _ or - Can't find anything that says it's disallowed: https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file >> Tools/resultsdbpy/resultsdbpy/controller/archive_controller.py:113 >> + abort(400, description='Expected meta-data to be json') > > In this case, is this meta-data, or commits info inside meta-data, which is invalid json? > > Please ensure to keep this error message not identical to error message on line 99, so as to make it easier to debug these two scenarios. It would be info inside of the meta-data, most likely. This is probably going to most often protect against the json.loads on line 111 failing, although, it's possible (but not expected) that something inside commit_controller.register could go wrong and raise a ValueError.

Jonathan Bedard

Comment 15 2019-08-30 10:28:27 PDT

Created attachment 377723 [details] Patch

Aakash Jain

Comment 16 2019-08-30 10:37:27 PDT

Comment on attachment 377723 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377723&action=review > Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:108 > + def test_invalid_metadata(self, client, **kwargs): Would be nice to have unit-tests to cover all the error messages, like 'No archive provided', 'Expected meta-data to be json', 'Expected commit meta-data to be json', 'Invalid configuration, error:', 'No archives matching the specified criteria'.

Jonathan Bedard

Comment 17 2019-08-30 12:30:28 PDT

Created attachment 377739 [details] Patch

Jonathan Bedard

Comment 18 2019-08-30 12:40:40 PDT

(In reply to Aakash Jain from comment #16) > Comment on attachment 377723 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=377723&action=review > > > Tools/resultsdbpy/resultsdbpy/controller/archive_controller_unittest.py:108 > > + def test_invalid_metadata(self, client, **kwargs): > > Would be nice to have unit-tests to cover all the error messages, like 'No > archive provided', 'Expected meta-data to be json', 'Expected commit > meta-data to be json', 'Invalid configuration, error:', 'No archives > matching the specified criteria'. 'All' isn't going to be practical, since some of the functions we're calling can generate error messages too (for example, a syntactically valid commit that does not exist will generate an error), but I tested the error messages that this patch adds, at least.

WebKit Commit Bot

Comment 19 2019-08-30 15:03:52 PDT

Comment on attachment 377739 [details] Patch Clearing flags on attachment: 377739 Committed r249348: <https://trac.webkit.org/changeset/249348>

WebKit Commit Bot

Comment 20 2019-08-30 15:03:54 PDT

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 21 2019-08-30 15:04:22 PDT

<rdar://problem/54898162>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Jonathan Bedard

Reported

2019-08-23 16:22 PDT

Modified

2019-08-30 15:04 PDT History

CC List

5 users Show

URL

Keywords InRadar

Depends on

Blocks