Bug 190188 - Add retry for test groups with failed build requests.
Summary: Add retry for test groups with failed build requests.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-10-01 23:52 PDT by dewei_zhu
Modified: 2018-10-08 11:40 PDT (History)
2 users (show)

See Also:


Attachments
Patch (51.25 KB, patch)
2018-10-02 00:56 PDT, dewei_zhu
no flags Details | Formatted Diff | Diff
Patch (63.64 KB, patch)
2018-10-03 00:49 PDT, dewei_zhu
no flags Details | Formatted Diff | Diff
Patch (67.79 KB, patch)
2018-10-03 20:56 PDT, dewei_zhu
rniwa: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description dewei_zhu 2018-10-01 23:52:24 PDT
Add retry for test groups with failed build requests.
Comment 1 dewei_zhu 2018-10-02 00:56:45 PDT
Created attachment 351353 [details]
Patch
Comment 2 Ryosuke Niwa 2018-10-02 01:34:24 PDT
Comment on attachment 351353 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=351353&action=review

> Websites/perf.webkit.org/ChangeLog:18
> +                ADD COLUMN testgroup_expected_successful_build_requests integer DEFAULT NULL,
> +                ADD COLUMN testgroup_may_need_additional_build_requests boolean DEFAULT FALSE;

These names are too long. How about just: testgroup_iteration_count & testgroup_may_need_more_requests.

> Websites/perf.webkit.org/public/api/build-requests.php:77
> +    if (count($test_groups_may_need_additional_build_requests))
> +        $db->query('UPDATE analysis_test_groups SET testgroup_may_need_additional_build_requests = TRUE WHERE testgroup_id = ANY($1)',
> +            array('{' . implode(',', array_keys($test_groups_may_need_additional_build_requests)) . '}'));

I don't think we need to coerce queries like this.
Just issue a SQL query per build request.
We can certainly have a map which stores the set of test groups we've already set this flag.

> Websites/perf.webkit.org/public/api/test-groups.php:25
> -    } elseif ($path[0] == 'ready-for-notification') {
> +    } elseif ($path[0] == 'ready-for-further-processing') {

I think we should just add a new API surface instead of replacing an existing one.

> Websites/perf.webkit.org/public/privileged-api/update-test-group.php:19
> +        if (array_key_exists('name', $data) || array_key_exists('hidden', $data) || array_key_exists('mayNeedAdditionalBuildRequests', $data))
> +            exit_with_error('TooManyArgumentsProvided');

Why don't we add a new dedicated API instead?
I think that'd be cleaner than re-using this API and having restrictions like these.

> Websites/perf.webkit.org/public/privileged-api/update-test-group.php:28
>          $values['hidden'] = Database::to_database_boolean($data['hidden']);
> +        $values['may_need_additional_build_requests'] = FALSE;

I think we probably want another check at where we create build requests
that we haven't recently canceled this test group.

> Websites/perf.webkit.org/public/privileged-api/update-test-group.php:58
> +        for ($i = 0; $i < $additional_build_request_count; $i++) {
> +            $processed_commit_set = array();

Why don't we split the part to find the list of commit sets in this group into a separate helper function?

> Websites/perf.webkit.org/public/v3/models/test-group.js:14
> +        this._mayNeedAdditionalBuildRequests = object.mayNeedAdditionalBuildRequests;
> +        this._expectedSuccessfulBuildRequests = object.expectedSuccessfulBuildRequests;

Let's also shorten these names.

> Websites/perf.webkit.org/public/v3/models/test-group.js:231
> +    async addBuildRequests(buildRequestCount)

Just call it "count".

> Websites/perf.webkit.org/public/v3/models/test-group.js:239
> +    async clearNeedAdditionalBuildRequestsFlag()

No need to say "flag".

> Websites/perf.webkit.org/server-tests/resources/mock-data.js:67
> +    addMockData: function (db, statusList, needs_notification=true)

Use camelCase.

> Websites/perf.webkit.org/server-tests/resources/mock-data.js:82
> +            db.insert('analysis_test_groups', {id: 600, task: 500, name: 'some test group', expected_successful_build_requests: 4, needs_notification}),

Ditto here and elsewhere in this test.

> Websites/perf.webkit.org/tools/run-analysis.js:19
> +        {name: '--max-retry-ratio', type: parseFloat, default: 2.5},

Maybe even 3 might be okay.

> Websites/perf.webkit.org/tools/run-analysis.js:67
> +async function processTestGroupsNeedAdditionalBuildRequests(testGroups, maximumRetryRatio)

I don't think we want to use vague terms like "processing".
Let's just say createAdditionalBuildRequestsForTestGroupsWithFailedRequests or something.
We can use a verbose name here.

> Websites/perf.webkit.org/tools/run-analysis.js:80
> +        let maxMissingSuccessfulBuildRequestCount = 0;
> +        for (const commitSet of this.requestedCommitSet()) {
> +            const buildRequests = testGroup.requestsForCommitSet(commitSet);
> +            maxMissingSuccessfulBuildRequestCount = Math.max(maxMissingSuccessfulBuildRequestCount, buildRequests.filter((buildRequest) => !buildRequest.hasCompleted()).length);
> +        }

I don't think this logic is quite right.
buildRequest.hasCompleted() returns false for pending & canceled requests.
r- because of this.

In fact, we probably want to make sure this code handles the case when some request are still pending, scheduled, or canceled.
Comment 3 dewei_zhu 2018-10-03 00:49:11 PDT
Created attachment 351487 [details]
Patch
Comment 4 Ryosuke Niwa 2018-10-03 01:37:56 PDT
Comment on attachment 351487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=351487&action=review

> Websites/perf.webkit.org/public/api/build-requests.php:70
> +            $db->update_row('analysis_test_groups', 'testgroup', array('id' => $test_group_id), array('may_need_more_requests' => TRUE));
> +            $test_groups_may_need_more_requests[$request_row['request_group']] = TRUE;

Let's move this code out of the if-else to share the code between two cases.

> Websites/perf.webkit.org/public/api/build-requests.php:80
> +            if ($status != 'failed' || array_key_exists($test_group_id, $test_groups_may_need_more_requests))
> +                continue;

And keep this continue when $status != 'failed'

> Websites/perf.webkit.org/public/privileged-api/add-build-requests.php:10
> +    $test_group_id = array_get($data, 'group');

We need to bail out if the test group had been marked as "hidden" here.

> Websites/perf.webkit.org/public/privileged-api/add-build-requests.php:19
> +
> +
> +

Extra blank lines.

> Websites/perf.webkit.org/public/privileged-api/add-build-requests.php:23
> +    if ($current_order < 0)
> +        exit_with_error('NoBuildRequestOfTestType');

I think we should return NoTestingBuildRequests.

> Websites/perf.webkit.org/public/privileged-api/add-build-requests.php:29
> +    for ($i = 0; $i < $additional_build_request_count; $i++) {
> +        $processed_commit_set = array();
> +        foreach ($existing_build_requests as $build_request) {

I think we need to split this into two separate loops:
The first one which finds an ordered list of commit sets (1),
and the second loop which generates N build requests for each commit set we found in (1).
Otherwise, this algorithm is really O(N^2).

$commit_sets = array();
$processed_commit_set = array();
foreach ($existing_build_requests as $build_request) {
    if ($build_request['request_order'] < 0)
        continue;
    $requested_commit_set = $build_request['request_commit_set'];
    if (array_key_exists($requested_commit_set, $processed_commit_set))
        continue;
    $processed_commit_set[$requested_commit_set] = &$build_request;
    array_push($commit_sets, $requested_commit_set);
}
for ($i = 0; $i < $additional_build_request_count; $i++) {
    foreach ($commit_sets as $commit_set) {...}
}

> Websites/perf.webkit.org/public/v3/models/test-group.js:231
> +    async addExtraBuildRequests(count)

This should be called addMoreBuildRequests

> Websites/perf.webkit.org/public/v3/models/test-group.js:239
> +    async clearNeedAdditionalBuildRequests()

This should be called clearMayNeedMoreBuildRequests

> Websites/perf.webkit.org/public/v3/models/test-group.js:297
> +    static fetchAllMayNeedMoreRequests()

fetchAllThatMayNeedMoreRequests / fetchAllWhichMayNeedMoreRequests?

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:7
> +        if (testGroup.isHidden()) {

This early bail out is useless because we filter the results in API anyway.

Also we need to have an API test for this.

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:18
> +            hasCompletedBuildRequestForEachCommitSet &= (completedBuildRequestCount > 0);

I think this logic doesn't work when one side all failed.

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:29
> +        await testGroup.clearNeedAdditionalBuildRequests();

I think it's not sound to clear the flag in a separate API call when when we're creating more build requests.
In fact, clearing the flag may is also racy but let's not worry about that for now.
We should, however, clear the flag automatically in /add-build-requests API.

> Websites/perf.webkit.org/tools/run-analysis.js:20
> +        {name: '--max-retry-ratio', type: parseFloat, default: 3},

We should probably call this a factor since it's not really a ratio.

> Websites/perf.webkit.org/unit-tests/retry-failed-build-requests-tests.js:144
> +    it('should add 1 more build request when one failed build request found', async () => {

This is rather confusing. Just say "should add one more build request when one of the existing requests failed".

> Websites/perf.webkit.org/unit-tests/retry-failed-build-requests-tests.js:145
> +        const data = sampleTestGroup(false, 3, true, false, ["completed", "completed", "completed", "completed", "completed", "failed"]);

Please use a dictionary to pass options to sampleTestGroup.
e.g. sampleTestGroup({needsNotification: true, iterationCount: 3, mayNeedMoreRequests: true, hidden: false}, [~])

> Websites/perf.webkit.org/unit-tests/retry-failed-build-requests-tests.js:174
> +    it('should not schedule more build requests when no completed build request for a commit set', async () => {

should read "when all requests for a commit set had failed".

> Websites/perf.webkit.org/unit-tests/retry-failed-build-requests-tests.js:192
> +    it('should not schedule more build requests when no completed build request is hidden', async () => {

"should not schedule more build requests when the test group is hidden"?

> Websites/perf.webkit.org/unit-tests/retry-failed-build-requests-tests.js:203
> +    it('should not schedule more build request while exceeding maximum retry', async () => {

when* we've already hit the maximum retry count.
Comment 5 Ryosuke Niwa 2018-10-03 01:38:28 PDT
Comment on attachment 351487 [details]
Patch

r- because there are enough problems with this patch. Let's go through another iteration but it's getting close!
Comment 6 Ryosuke Niwa 2018-10-03 01:42:06 PDT
Comment on attachment 351487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=351487&action=review

> Websites/perf.webkit.org/init-database.sql:284
> +    testgroup_iteration_count integer NOT NULL,

We should probably call this testgroup_initial_repetition_count or something
since the actual number of repetitions can be higher.
Comment 7 dewei_zhu 2018-10-03 15:59:03 PDT
Comment on attachment 351487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=351487&action=review

>> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:7
>> +        if (testGroup.isHidden()) {
> 
> This early bail out is useless because we filter the results in API anyway.
> 
> Also we need to have an API test for this.

Can we remove if condition that checks testGroup.mayNeedMoreRequests() as well as the API also did that?
More general question, should we make assumptions on the testGroups provided in the argument?
Comment 8 dewei_zhu 2018-10-03 20:56:45 PDT
Created attachment 351579 [details]
Patch
Comment 9 Ryosuke Niwa 2018-10-03 21:44:18 PDT
Comment on attachment 351579 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=351579&action=review

> Websites/perf.webkit.org/public/api/build-requests.php:64
> +            $test_group_id = $request_row['request_group'];

Do this outside of if-else.

> Websites/perf.webkit.org/public/api/build-requests.php:79
> +        $test_groups_may_need_more_requests[$request_row['request_group']] = TRUE;

Use $test_group_id once you've done that.

> Websites/perf.webkit.org/public/privileged-api/add-build-requests.php:10
> +    $additional_build_request_count = array_get($data, 'count');

Maybe we should call this addCount?

> Websites/perf.webkit.org/public/privileged-api/add-build-requests.php:45
> +        foreach($commit_sets as $commit_set) {

NIt: Need a space between foreach and (.

> Websites/perf.webkit.org/public/privileged-api/add-build-requests.php:51
> +            insert_build_request_for_configuration($db, $configuration, ++$current_order, $build_request['request_triggerable'],
> +                $build_request['request_platform'], $build_request['request_test'], $build_request['request_group']);

Just use $db->insert_row directly here. I don't think we get any benefit from using this helper function.

> Websites/perf.webkit.org/public/privileged-api/add-build-requests.php:54
> +    $db->update_row('analysis_test_groups', 'testgroup', array('id' => $test_group_id), array('may_need_more_requests' => FALSE));

Hm... on my second thought maybe it's better to clear this flag in a separate API call
so that we can allow addition of more build requests without clearing of this flag.

> Websites/perf.webkit.org/server-tests/api-test-groups.js:94
> +            assert.ok(!data['testGroups'][0].mayNeedMoreRequests);

Maybe use assert.equals to be more explicit about true/false?

> Websites/perf.webkit.org/server-tests/privileged-api-update-test-group-tests.js:208
> +    it('should be created with "may_need_more_requests" field defaults to false', async () => {

This should read: "should create a test group with"

> Websites/perf.webkit.org/server-tests/privileged-api-update-test-group-tests.js:212
> +        let result = await PrivilegedAPI.sendRequest('create-test-group',

Use const?

> Websites/perf.webkit.org/server-tests/privileged-api-update-test-group-tests.js:229
> +        let result = await PrivilegedAPI.sendRequest('create-test-group',

Use const?

> Websites/perf.webkit.org/server-tests/privileged-api-update-test-group-tests.js:252
> +    it('should clear "may_need_more_requests" while hiding test group', async () => {

while -> when, hiding *a* test group.

> Websites/perf.webkit.org/server-tests/privileged-api-update-test-group-tests.js:256
> +        let result = await PrivilegedAPI.sendRequest('create-test-group',

Use const?

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:16
> +            const buildRequests = testGroup.requestsForCommitSet(commitSet);

We should probably filter out non-testing build requests here.

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:19
> +            hasCompletedBuildRequestForEachCommitSet &= (completedBuildRequestCount > 0);

This math is rather confusing. Just do something like:
if (!completedBuildRequestCount)
    hasCompletedBuildRequestForEachCommitSet = false;

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:21
> +            maxMissingBuildRequestCount = Math.max(maxMissingBuildRequestCount, testGroup.initialRepetitionCount() - completedBuildRequestCount - unfinishedBuildRequestCount);

Let's add an assertion that: unfinishedBuildRequestCount <= testGroup.initialRepetitionCount()
and declare a local variable like: potentiallySuccessfulCount = completedBuildRequestCount + unfinishedBuildRequestCount;

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:22
> +            hasUnfinishedBuildRequest = hasUnfinishedBuildRequest || (unfinishedBuildRequestCount > 0);

Ditto about using if.

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:24
> +        const willExceedMaximumRetry = (testGroup.repetitionCount() + maxMissingBuildRequestCount) > maximumRetryFactor * testGroup.initialRepetitionCount();

No need for parenthesis.

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:27
> +        if (maxMissingBuildRequestCount && !willExceedMaximumRetry && hasUnfinishedBuildRequest && !hasCompletedBuildRequestForEachCommitSet)
> +            continue;

I think a cleaner way to do this might be that:
if (!hasCompletedBuildRequestForEachCommitSet) {
    if (maxMissingBuildRequestCount && hasUnfinishedBuildRequest)
        await testGroup.clearMayNeedMoreBuildRequests();
    continue;
}

> Websites/perf.webkit.org/tools/js/retry-failed-build-requests.js:29
> +        if (maxMissingBuildRequestCount && !willExceedMaximumRetry && hasCompletedBuildRequestForEachCommitSet)

Then we don't have to check hasCompletedBuildRequestForEachCommitSet here.

> Websites/perf.webkit.org/unit-tests/retry-failed-build-requests-tests.js:150
> +        const testGroupConfig = {needsNotification:false, initialRepetitionCount: 3, mayNeedMoreRequests: true, hidden: false,

Nit: Need a space between : and false, and ditto elsewhere in this file.
Comment 10 dewei_zhu 2018-10-04 18:32:11 PDT
Landed in r236861.
Comment 11 dewei_zhu 2018-10-08 11:40:41 PDT
rdar://problem/34471207