running this command does not get me all builds: bisect-builds -p mac-highsierra --list my initial research suggests we have reached the 1MB payload limit on amazon lambda functions. This is visible when testing the API gateway function. The following is returned: "LastEvaluatedKey": { "revision": { "N": "231282" }, "identifier": { "S": "mac-highsierra-x86_64-release" } }, "ScannedCount": 5271 LastEvaluatedKey is used to signal when we have reached the limit. If there is an entry there, it means more rows exist. We need to query again, using the LastEvaluatedKey as our starting point. Apparently, we have so many builds on this OS, that we need to investigate this.
https://stackoverflow.com/questions/43410541/dynamodb-scan-query-does-not-return-all-the-data
<rdar://problem/40380222>
diving at the raw json returned from the gateway, I'm also seeing this: u'LastEvaluatedKey': {u'identifier': {u'S': u'mac-highsierra-x86_64-release'}, u'revision': {u'N': u'231282'}}, u'ScannedCount': 5271}}
I don't see a clear way to fix this in the AWS API Gateway. It's a dumb mapping that simply does the query and returns what it finds. It doesn't have any place to add logic to 'continue' the query from the LastEvaluatedKey. I will have to research this more.
I believe to fix this we will need to do the following: Update the API Gateway to accept an optional "startRevision". Teach the bisect-builds script to parse the json, and if it finds a non-null value for LastEvaluatedKey, then we need to send another request, using the LastEvaluatedKey value as the 'start revision' for the API. Then accumulate results until we get them all.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html ExclusiveStartKey The primary key of the first item that this operation will evaluate. Use the value that was returned for LastEvaluatedKey in the previous operation. The data type for ExclusiveStartKey must be String, Number or Binary. No set data types are allowed. In a parallel scan, a Scan request that includes ExclusiveStartKey must specify the same segment whose previous Scan returned the corresponding value of LastEvaluatedKey. Type: String to AttributeValue object map Key Length Constraints: Maximum length of 65535. Required: No
Created attachment 343260 [details] Teach bisect-builds to use the LastEvaluatedKey option.
looks good to me.
Committed revision 233057.
Comment on attachment 343260 [details] Teach bisect-builds to use the LastEvaluatedKey option. View in context: https://bugs.webkit.org/attachment.cgi?id=343260&action=review > Tools/Scripts/bisect-builds:96 > +def get_api_url(options, LastEvaluatedKey=None): Nit: Use snake_case instead of CamelCase on LastEvaluatedKey. Also, it isn't immediately obvious what 'LastEvaluatedKey' is supposed to mean or be used for here -- can you add a comment or rename the variable to be more descriptive? > Tools/Scripts/bisect-builds:236 > + if 'LastEvaluatedKey' in data['revisions']: Is it possible to need to run this multiple times with 'LastEvaluatedKey'? It looks like maybe this is doing something similar to pagination? If so, we should do this in a `while` loop, or re-call fetch_revision_list with a new LastEvaluatedKey while more results need to be fetched.