Bug 185790 - bisect-builds --list not showing all builds
Summary: bisect-builds --list not showing all builds
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: Safari 11
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Lucas Forschler
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-05-18 15:34 PDT by Lucas Forschler
Modified: 2018-06-21 14:47 PDT (History)
4 users (show)

See Also:


Attachments
Teach bisect-builds to use the LastEvaluatedKey option. (3.32 KB, patch)
2018-06-21 13:14 PDT, Lucas Forschler
aakash_jain: review+
lforschler: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Lucas Forschler 2018-05-18 15:34:48 PDT
running this command does not get me all builds:
bisect-builds -p mac-highsierra --list

my initial research suggests we have reached the 1MB payload limit on amazon lambda functions.

This is visible when testing the API gateway function. The following is returned:

  "LastEvaluatedKey": {
      "revision": {
        "N": "231282"
      },
      "identifier": {
        "S": "mac-highsierra-x86_64-release"
      }
    },
    "ScannedCount": 5271

LastEvaluatedKey is used to signal when we have reached the limit. If there is an entry there, it means more rows exist. We need to query again, using the LastEvaluatedKey as our starting point.

Apparently, we have so many builds on this OS, that we need to investigate this.
Comment 2 Radar WebKit Bug Importer 2018-05-18 15:35:20 PDT
<rdar://problem/40380222>
Comment 3 Lucas Forschler 2018-05-18 15:51:16 PDT
diving at the raw json returned from the gateway, I'm also seeing this:

u'LastEvaluatedKey': {u'identifier': {u'S': u'mac-highsierra-x86_64-release'}, u'revision': {u'N': u'231282'}}, u'ScannedCount': 5271}}
Comment 4 Lucas Forschler 2018-05-18 15:52:25 PDT
I don't see a clear way to fix this in the AWS API Gateway. It's a dumb mapping that simply does the query and returns what it finds. It doesn't have any place to add logic to 'continue' the query from the LastEvaluatedKey. I will have to research this more.
Comment 5 Lucas Forschler 2018-05-18 15:58:32 PDT
I believe to fix this we will need to do the following:

Update the API Gateway to accept an optional "startRevision".

Teach the bisect-builds script to parse the json, and if it finds a non-null value for LastEvaluatedKey, then we need to send another request, using the LastEvaluatedKey value as the 'start revision' for the API.

Then accumulate results until we get them all.
Comment 6 Lucas Forschler 2018-05-18 16:20:52 PDT
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html
ExclusiveStartKey
The primary key of the first item that this operation will evaluate. Use the value that was returned for LastEvaluatedKey in the previous operation.

The data type for ExclusiveStartKey must be String, Number or Binary. No set data types are allowed.

In a parallel scan, a Scan request that includes ExclusiveStartKey must specify the same segment whose previous Scan returned the corresponding value of LastEvaluatedKey.

Type: String to AttributeValue object map

Key Length Constraints: Maximum length of 65535.

Required: No
Comment 7 Lucas Forschler 2018-06-21 13:14:33 PDT
Created attachment 343260 [details]
Teach bisect-builds to use the LastEvaluatedKey option.
Comment 8 Aakash Jain 2018-06-21 13:24:33 PDT
looks good to me.
Comment 9 Lucas Forschler 2018-06-21 13:59:06 PDT
Committed revision 233057.
Comment 10 Dean Johnson 2018-06-21 14:47:17 PDT
Comment on attachment 343260 [details]
Teach bisect-builds to use the LastEvaluatedKey option.

View in context: https://bugs.webkit.org/attachment.cgi?id=343260&action=review

> Tools/Scripts/bisect-builds:96
> +def get_api_url(options, LastEvaluatedKey=None):

Nit: Use snake_case instead of CamelCase on LastEvaluatedKey.

Also, it isn't immediately obvious what 'LastEvaluatedKey' is supposed to mean or be used for here -- can you add a comment or rename the variable to be more descriptive?

> Tools/Scripts/bisect-builds:236
> +    if 'LastEvaluatedKey' in data['revisions']:

Is it possible to need to run this multiple times with 'LastEvaluatedKey'? It looks like maybe this is doing something similar to pagination?

If so, we should do this in a `while` loop, or re-call fetch_revision_list with a new LastEvaluatedKey while more results need to be fetched.