Bug 191557 - Extend commits table to contain testability warning information.
Summary: Extend commits table to contain testability warning information.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: dewei_zhu
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-11-12 12:50 PST by dewei_zhu
Modified: 2019-01-18 20:33 PST (History)
2 users (show)

See Also:


Attachments
Patch (70.38 KB, patch)
2018-11-12 13:34 PST, dewei_zhu
no flags Details | Formatted Diff | Diff
Patch (60.42 KB, patch)
2018-12-13 17:05 PST, dewei_zhu
no flags Details | Formatted Diff | Diff
Patch (76.20 KB, patch)
2018-12-14 01:45 PST, dewei_zhu
no flags Details | Formatted Diff | Diff
Patch (76.28 KB, patch)
2018-12-17 15:41 PST, dewei_zhu
no flags Details | Formatted Diff | Diff
Patch (75.39 KB, patch)
2018-12-18 22:38 PST, dewei_zhu
no flags Details | Formatted Diff | Diff
Patch (74.10 KB, patch)
2018-12-20 01:59 PST, dewei_zhu
rniwa: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description dewei_zhu 2018-11-12 12:50:09 PST
Perf dashboard should show testability warning when scheduling an A/B task.
Comment 1 dewei_zhu 2018-11-12 13:34:34 PST
Created attachment 354587 [details]
Patch
Comment 2 Ryosuke Niwa 2018-11-12 23:32:53 PST
Comment on attachment 354587 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=354587&action=review

> Websites/perf.webkit.org/init-database.sql:104
> +    commit_testability_warning text DEFAULT NULL,

text is a heavy handed approach. varchar(256) should do it.

> Websites/perf.webkit.org/public/api/update-commit-testability.php:23
> +        if (!array_key_exists('testabilityWarning', $commit_info))
> +            exit_with_error('MissingTestabilityWarning', array('commit' => $commit_info));

Rather than adding a new API, why don't we just update /api/report-commits?

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:193
> +        this._renderRepositoryPanesLazily.evaluate(triggerable, error, this._selectedPlatform,this._repositoryGroupByConfiguration,

Nit: Need a space after ",".

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:305
> +

What's up with the insertion of a new line here?

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:310
> +        const currentComparison = this._commitSetMap['Comparison'];

Is this some kind of a bug fix? If so, can we add a test?
At minimum, please add a description as to what why we're changing this function.

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:312
> +        const sameBaselineConfig = currentBaseline == newBaseline || (currentBaseline && newBaseline && currentBaseline.equals(newBaseline));
> +        const sameComparisionConfig = currentComparison == newComparison || (currentComparison && newComparison && currentComparison.equals(newComparison));

Rather than repeating the same logic twice, why don't we create a lambda?
e.g. areCommitSetsEqual = (a, b) = a == b || (a && a.equals(b));

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:352
> +            this._fetchCommitByRevision(repository, revision);

We shouldn't be fetching revisions as a side effect of rendering them.
r- because of this. It should be done when a commit set is set or when the user starts mutating them.
Otherwise, we'd delay fetching of commit sets by up to 16ms while we wait for requestAnimationFrame.
In general, render() functions shouldn't have side-effects.

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:364
> +        if (!this._commitByRepositoryAndRevision.has(repository))
> +            this._commitByRepositoryAndRevision.set(repository, new Map);

This kind of cache should really exist as a static method on CommitLog instead.

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:370
> +        CommitLog.fetchForSingleRevision(repository, revision).then((commits) => {

For example, fetchForSingleRevision could have such a cache internally, or we can wrap it in a helper function.

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:495
> +            !commitsWithTestabilityWarning.length ? [] : element('tbody', {id: 'testability-warning-table'}, commitsWithTestabilityWarning.map((commit) =>
> +                element('tr', element('td', {colspan: 2}, `${commit.repository().name()} - ${commit.label()}: ${commit.testabilityWarning()}`))))];

This code is way too much to inline like this.
Why don't we add a new method which loops over repositories and generates this tbody instead?

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:528
> -        const input = element('input', {value: revision, oninput: () => {
> +        const input = element('input', {value: revision, onchange: () => {

Why is this change made?
change event wouldn't fire until the input element loses focus. That seems like a regression to me. 
Again, at minimum, this needs a change log description.

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:722
> +            .revision-table #testability-warning-table,
> +            .revision-table #testabilty-warning-table tr,
> +            .revision-table #testability-warning-table td {

Why do these selectors need .revision-table? We don't use testability-warning-table anywhere else.
Also, it's super weird to call a tbody testability-warning-*table*.
Furthermore, none of the CSS properties specified here are needed for the tbody itself.

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:725
> +                font-weight: inherit;

Why do we need this?

> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:727
> +                border: 0px;
> +                padding: 0px;

Or these?

> Websites/perf.webkit.org/public/v3/components/customizable-test-group-form.js:142
> +    _constructTestabilityWarningTable(commitSetMap)

This function doesn't return a table, it returns tbody. It should probably called like _constructTestabilityWarningRows.

> Websites/perf.webkit.org/public/v3/components/customizable-test-group-form.js:149
> +            commitSet.commits().forEach(async (commit) => await CommitLog.fetchForSingleRevision(commit.repository(), commit.revision()));

Again, constructing a view of something shouldn't randomly fetch things.
We need to be doing this right when we get the commit set.
r- because of this.

> Websites/perf.webkit.org/public/v3/components/customizable-test-group-form.js:151
> +            hasCommitWithWarning |= !!commitSet.commitsWithTestabilityWarnings();
> +        }

Once we've moved the above line out of here, we can do:
const hasCommitWithWarning = commitSets.some(() => !!commitSet.commitsWithTestabilityWarnings());

> Websites/perf.webkit.org/public/v3/components/customizable-test-group-form.js:155
> +
> +

Nit: Two blank lines.

> Websites/perf.webkit.org/public/v3/components/customizable-test-group-form.js:159
> +                element('tr', {class: 'testability-warning'}, `${commit.repository().name()} - ${commit.label()}: ${commit.testabilityWarning()}`));

Looks like this code to generate a human readable label is duplicated here and custom-analysis-task-configurator.js
Why don't we add a helper function to CommitLog to share code?

> Websites/perf.webkit.org/public/v3/components/customizable-test-group-form.js:284
> +                    () => this.enqueueToRender(),

Why is this needed? We need an inline change log description here.

> Websites/perf.webkit.org/public/v3/components/customizable-test-group-form.js:306
> +                    this.enqueueToRender();

And here.
Comment 3 dewei_zhu 2018-11-13 13:06:16 PST
Comment on attachment 354587 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=354587&action=review

>> Websites/perf.webkit.org/public/api/update-commit-testability.php:23
>> +            exit_with_error('MissingTestabilityWarning', array('commit' => $commit_info));
> 
> Rather than adding a new API, why don't we just update /api/report-commits?

The major difference is that report-commits does insert conditionally which will create a new entry. This API only focus on updating, we should not insert any entry to table in this API.
Adding if conditions in '/api/report-commits' would make the code a little bit complicated.
Comment 4 Ryosuke Niwa 2018-11-13 13:17:13 PST
(In reply to dewei_zhu from comment #3)
> Comment on attachment 354587 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=354587&action=review
> 
> >> Websites/perf.webkit.org/public/api/update-commit-testability.php:23
> >> +            exit_with_error('MissingTestabilityWarning', array('commit' => $commit_info));
> > 
> > Rather than adding a new API, why don't we just update /api/report-commits?
> 
> The major difference is that report-commits does insert conditionally which
> will create a new entry. This API only focus on updating, we should not
> insert any entry to table in this API.
> Adding if conditions in '/api/report-commits' would make the code a little
> bit complicated.

But fetchReportAndUpdateBuilds is literally calling report-commits and then update-commit-testability. Why can't we simply fold testability information into the first API call and call it done?
Comment 5 dewei_zhu 2018-11-14 19:08:18 PST
Comment on attachment 354587 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=354587&action=review

>>>> Websites/perf.webkit.org/public/api/update-commit-testability.php:23
>>>> +            exit_with_error('MissingTestabilityWarning', array('commit' => $commit_info));
>>> 
>>> Rather than adding a new API, why don't we just update /api/report-commits?
>> 
>> The major difference is that report-commits does insert conditionally which will create a new entry. This API only focus on updating, we should not insert any entry to table in this API.
>> Adding if conditions in '/api/report-commits' would make the code a little bit complicated.
> 
> But fetchReportAndUpdateBuilds is literally calling report-commits and then update-commit-testability. Why can't we simply fold testability information into the first API call and call it done?

I think the problem is to differentiate the intentions of updating testability warning message vs reporting new commits. The existing '/api/report-commits' does not prevent us reporting a new commit with testability warning message, in this case, this commit will be inserted into database. However, with same commit info that posted to the server, it's also a legitimate request to update existing commit, but in this case we should not insert commit if there is no existing commit.
Even though we have script doing the report then update, it would be better if we don't explicitly build this logic.

Another alternative I can think of is that, we make a single query to '/api/report-commits' which contains all  commits need to be inserted/updated. Those commits can be splitted into 3 types:
1. commit after last reported commit with owned commit info and testability warning message.
2. commit after last reported commit with owned commit info
3. commit after last reported commit without owned commit info
4. commit before last report commit with testability warning message

For the commit of type 4, we explicitly avoid fetching owned commit information on the script side (fetching those info can be pretty expensive) even those commit may have owned commit info.
What do you think?
Comment 6 dewei_zhu 2018-11-14 19:15:35 PST
Comment on attachment 354587 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=354587&action=review

>> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:352
>> +            this._fetchCommitByRevision(repository, revision);
> 
> We shouldn't be fetching revisions as a side effect of rendering them.
> r- because of this. It should be done when a commit set is set or when the user starts mutating them.
> Otherwise, we'd delay fetching of commit sets by up to 16ms while we wait for requestAnimationFrame.
> In general, render() functions shouldn't have side-effects.

Could you clarify why it is part of rendering?
'_computeCommitSet' is invoked by '_updateCommitSetMap'. The reason I chose to do the fetching here is this is the central place where all UI manipulation will finally impact '_commitSetMap'.
I think a better place is to do it in '_updateCommitSetMap' after we think either baseline or comparison has changed. How about this?
Comment 7 Ryosuke Niwa 2018-11-14 21:20:11 PST
(In reply to dewei_zhu from comment #5)
> Comment on attachment 354587 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=354587&action=review
> 
> >>>> Websites/perf.webkit.org/public/api/update-commit-testability.php:23
> >>>> +            exit_with_error('MissingTestabilityWarning', array('commit' => $commit_info));
> >>> 
> >>> Rather than adding a new API, why don't we just update /api/report-commits?
> >> 
> >> The major difference is that report-commits does insert conditionally which will create a new entry. This API only focus on updating, we should not insert any entry to table in this API.
> >> Adding if conditions in '/api/report-commits' would make the code a little bit complicated.
> > 
> > But fetchReportAndUpdateBuilds is literally calling report-commits and then update-commit-testability. Why can't we simply fold testability information into the first API call and call it done?
> 
> I think the problem is to differentiate the intentions of updating
> testability warning message vs reporting new commits. The existing
> '/api/report-commits' does not prevent us reporting a new commit with
> testability warning message, in this case, this commit will be inserted into
> database. However, with same commit info that posted to the server, it's
> also a legitimate request to update existing commit, but in this case we
> should not insert commit if there is no existing commit.
> Even though we have script doing the report then update, it would be better
> if we don't explicitly build this logic.

Why does that matter at all given we'd never specify a testability warning without having a corresponding OS version in the first place?

> Another alternative I can think of is that, we make a single query to
> '/api/report-commits' which contains all  commits need to be
> inserted/updated. Those commits can be splitted into 3 types:
> 1. commit after last reported commit with owned commit info and testability
> warning message.
> 2. commit after last reported commit with owned commit info
> 3. commit after last reported commit without owned commit info
> 4. commit before last report commit with testability warning message
> 
> For the commit of type 4, we explicitly avoid fetching owned commit
> information on the script side (fetching those info can be pretty expensive)
> even those commit may have owned commit info.
> What do you think?

One of the things we don't currently support in /api/report-commits is a partial update of a commit. We can totally support that.

This is a classic example of generalizing a problem making a solution a lot simpler. We just need to add a generic support for partially updating a commit, then there is no need to distinguish testability from the rest of fields of a commit.
Comment 8 Ryosuke Niwa 2018-11-14 21:42:23 PST
Comment on attachment 354587 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=354587&action=review

>>> Websites/perf.webkit.org/public/v3/components/custom-analysis-task-configurator.js:352
>>> +            this._fetchCommitByRevision(repository, revision);
>> 
>> We shouldn't be fetching revisions as a side effect of rendering them.
>> r- because of this. It should be done when a commit set is set or when the user starts mutating them.
>> Otherwise, we'd delay fetching of commit sets by up to 16ms while we wait for requestAnimationFrame.
>> In general, render() functions shouldn't have side-effects.
> 
> Could you clarify why it is part of rendering?
> '_computeCommitSet' is invoked by '_updateCommitSetMap'. The reason I chose to do the fetching here is this is the central place where all UI manipulation will finally impact '_commitSetMap'.
> I think a better place is to do it in '_updateCommitSetMap' after we think either baseline or comparison has changed. How about this?

Oh oops, this was misreading of the code. This code is totally fine.
Comment 9 dewei_zhu 2018-12-13 17:05:52 PST
Created attachment 357273 [details]
Patch
Comment 10 Ryosuke Niwa 2018-12-13 17:44:25 PST
Comment on attachment 357273 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357273&action=review

> Websites/perf.webkit.org/public/api/report-commits.php:7
> +    list($db, $commits) = connect_database_and_sanitize_commits($post_data);

We shouldn't do two unrelated things at once like this. Keep new Database here and pass it to this function.
Also, instead of sanitize call it validate since you're not doing any kind of sanitization / normalization
in this function but rather checking that certain conditions are met.

> Websites/perf.webkit.org/public/api/update-commits.php:14
> +            $db->rollback_transaction();
> +            exit_with_error('InvalidRepository', array('commit' => $commit_info));

We shouldn't start a transaction until all these validations are done. r- because of this.

In general, we want to do all validations upfront as much as possible, and execute all queries at once without any checks.
Having an open transaction incurs a significant database-side cost.
e.g. see http://blog.lerner.co.il/in-postgresql-as-in-life-dont-wait-too-long-to-commit/

> Websites/perf.webkit.org/public/api/update-commits.php:19
> +        foreach($commit_info as $key => $value) {

I'd rather have a bunch of if's instead of looping over keys and checking conditions for each as in:
if ($commit_info['time'])
    $commit_update['time'] = $commit_info['time'];

> Websites/perf.webkit.org/public/api/update-commits.php:21
> +            switch ($key) {
> +                case "repository":

Nit: Wrong indentation. It should look like:
    switch (~) {
    case "~":
        ~
        break;
    }

> Websites/perf.webkit.org/public/include/json-header.php:208
> +    $db = new Database;
> +    if (!$db->connect())
> +        exit_with_error('DatabaseConnectionFailure');
> +

This shouldn't be done in a helper function like this.

> Websites/perf.webkit.org/public/include/json-header.php:211
> +    $report = json_decode($post_data, true);
> +
> +    verify_slave($db, $report);

Neither are these. With these code being baked in this function, 
we won't be able to use this validation logic in any other backend code which may want to validate commit information
which process other non-commit information, or API where commits appear within some other data structures.

> Websites/perf.webkit.org/public/include/json-header.php:232
> +    $committer_data = $committer_query;

The fact this is making a copy of $committer_query is rather subtle.
I think it's better we just constructed these arrays where we call update_or_insert_row instead (as in duplicate the code).

> Websites/perf.webkit.org/public/include/json-header.php:239
> +        $db->rollback_transaction();
> +        exit_with_error('FailedToInsertCommitter', array('committer' => $committer_data));

A helper function shouldn't be rolling back a transaction it didn't start like this.
The caller should be doing this work instead.

Alternatively, we could create a class which knows how to add or update a commit.
That class then start & manage its own transaction.

> Websites/perf.webkit.org/public/include/json-header.php:249
> +        $db->rollback_transaction();
> +        exit_with_error('FailedToFindPreviousCommit', array('commit' => $commit_info));

Ditto.

> Websites/perf.webkit.org/tools/js/os-build-fetcher.js:34
> +        const splittedArrayList = [];
> +        for (let startIndex = 0; startIndex < array.length; startIndex += maxCount)

Why is this function defined here? It's making the diff impossible to read.
Also, we probably don't need this function. We can simply keep calling
commitsToPost.splice(0, maxCount) until commitsToPost is empty.

> Websites/perf.webkit.org/tools/js/os-build-fetcher.js:53
> +        const repository = config['name'];

It's important to clarify whether this variable stores a Repository object, or its name.
So this rename is a regression as far as I'm concerned.

> Websites/perf.webkit.org/tools/js/os-build-fetcher.js:100
> +        const output = await this._subprocess.execute(command);
> +        return JSON.parse(output);

This change would make the syncing script to be incompatible with the old script.
We shouldn't make that kind of backwards incompatible change at the same modifying the database schema.
Make this work with both old & new scripts, or split this part into its own patch.
r- because of this.

In general, we should always make these changes backwards compatible so that
we can isolate the deployment of new version of dashboards from deployment of new helper scripts, etc...
Comment 11 dewei_zhu 2018-12-13 18:54:18 PST
Comment on attachment 357273 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357273&action=review

>> Websites/perf.webkit.org/public/api/update-commits.php:19
>> +        foreach($commit_info as $key => $value) {
> 
> I'd rather have a bunch of if's instead of looping over keys and checking conditions for each as in:
> if ($commit_info['time'])
>     $commit_update['time'] = $commit_info['time'];

Changing from 'swtich' to 'if' conditions makes checking unrecognized keys a little bit trickier.
Comment 12 Ryosuke Niwa 2018-12-13 19:42:32 PST
(In reply to dewei_zhu from comment #11)
> Comment on attachment 357273 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=357273&action=review
> 
> >> Websites/perf.webkit.org/public/api/update-commits.php:19
> >> +        foreach($commit_info as $key => $value) {
> > 
> > I'd rather have a bunch of if's instead of looping over keys and checking conditions for each as in:
> > if ($commit_info['time'])
> >     $commit_update['time'] = $commit_info['time'];
> 
> Changing from 'swtich' to 'if' conditions makes checking unrecognized keys a
> little bit trickier.

Why would it? You just need to check that upfront. Also, I'm not certain checking for a key we don't care about is all that important...
Comment 13 dewei_zhu 2018-12-14 01:45:54 PST
Created attachment 357308 [details]
Patch
Comment 14 dewei_zhu 2018-12-17 15:41:53 PST
Created attachment 357486 [details]
Patch
Comment 15 dewei_zhu 2018-12-17 15:45:27 PST
Comment on attachment 357486 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357486&action=review

> Websites/perf.webkit.org/public/api/update-commits.php:18
> +    $commit_modifier->update_commits($commits);

Since the only difference between update-commits vs report-commits API is above invocation, do you prefer the idea that we had a single api that decides to use `update_commits` vs `update_or_insert_commits` based on the existence of `update-only` parameter?
Comment 16 Ryosuke Niwa 2018-12-17 16:18:36 PST
Comment on attachment 357486 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357486&action=review

> Websites/perf.webkit.org/init-database.sql:104
> +    commit_testability_warning varchar(64) DEFAULT NULL,

Now that I think about more, I don't think _warning adds any more semantics here.
I think we should just call it commit_testability.

>> Websites/perf.webkit.org/public/api/update-commits.php:18
>> +    $commit_modifier->update_commits($commits);
> 
> Since the only difference between update-commits vs report-commits API is above invocation, do you prefer the idea that we had a single api that decides to use `update_commits` vs `update_or_insert_commits` based on the existence of `update-only` parameter?

I think a better approach might be to migrate the existing clients of /api/report-commits to use /api/update-commits?insert=true or something.
Saying that we don't add new entries is less clear than saying we insert one if it doesn't already exist.
It's like doX is a better name than notDoY

> Websites/perf.webkit.org/public/include/commit-modifier.php:3
> +class CommitModifier {

I think the term we nominally use in perf dashboard is update, not modify
so we should probably call this class CommitUpdater.

> Websites/perf.webkit.org/public/include/commit-modifier.php:23
> +    function update_commits($commits)

We need to very clearly distinguish the commit rows / database compatible array of commits from the input to this function.
I think we normally call the latter *_info so I think we should call this $commit_info_list or something.

> Websites/perf.webkit.org/public/include/commit-modifier.php:40
> +        $this->reset();

Why do we need to do this? Nobody is calling this function twice is it?
If anything, we should just initialize all the instance variables in this function
instead of the constructor if there is any plan to do that in the future.
But since we don't do that now, we might as well as forget about this.

> Websites/perf.webkit.org/public/include/commit-modifier.php:43
> +    private function prepare_commit_update($commits)

I really don't like this kind of code structuring where we do prepare, etc...
it doesn't tell us at all what this function does especially if it has side effects.
We should call validate_commits commits in the caller,
and rename this function to something like updates_from_commit_info_list.

> Websites/perf.webkit.org/public/include/commit-modifier.php:45
> +        $this->validate_commits($commits);

Use reference?

> Websites/perf.webkit.org/public/include/commit-modifier.php:47
> +        foreach ($commits as $commit_info) {

Use & so that we use reference.

> Websites/perf.webkit.org/public/include/commit-modifier.php:49
> +            $commit_identifier = array('repository' => $repository_id, 'revision' => $commit_info['revision']);

It's very confusing to call this an identifier. This is really a query parameter.
Maybe $commit_select_query?

> Websites/perf.webkit.org/public/include/commit-modifier.php:76
> +            array_push($this->commit_identifier_and_update_list, array($commit_identifier, $commit_update));

I don't like the use of pair/list like this. In general, we prefer using named associated arrays.
We should just do: array('query' => ~, 'update' => ~)
Also, we should probably use call this $this->updates.

> Websites/perf.webkit.org/public/include/commit-modifier.php:80
> +    private function validate_commits($commits)

Since this function doesn't depend on any instance variable, it should be static.
(Note you call it with self::validate_commits once you made this function static).

Static functions are generally preferable to instance variables
since clarifies that it won't have side effects on instance variables.

> Websites/perf.webkit.org/public/include/commit-modifier.php:90
> +        }

It seems that we should also validate that previousCommit if set is an integer.

> Websites/perf.webkit.org/public/include/commit-modifier.php:114
> +            if ($rollback_on_error)
> +                $this->rollback_with_error('FailedToFindPreviousCommit', array('revision' => $revision));

It's super confusing that this function sometimes rolls back the transition and sometimes not.
I think we should just always do these resolutions upfront so that we don't have to do this.

> Websites/perf.webkit.org/public/include/commit-modifier.php:159
> +        $this->prepare_commit_update_or_insert($commits);

Why can't this function simply validate everything upfront?
Since we're doing this refactoring anyway, we might as well as do that.

> Websites/perf.webkit.org/public/include/commit-modifier.php:162
> +        foreach ($this->commit_data_list as list($commit_data, $owned_commit_list)) {

Ugh... we're gonna create a bunch of copies here.
If we switched to use named associative array, be sure to use a reference here as well.

> Websites/perf.webkit.org/public/include/commit-modifier.php:164
> +            foreach ($owned_commit_list as $owned_commit_data)

Ditto about using a reference.

> Websites/perf.webkit.org/public/include/commit-modifier.php:171
> +    private function prepare_commit_update_or_insert($commits)

Again, validation & resolution of various information should be done in two separate steps.

> Websites/perf.webkit.org/public/include/commit-modifier.php:178
> +                foreach($commit_info['ownedCommits'] as $owned_commit_repository_name => $owned_commit_info) {

Use a reference.

> Websites/perf.webkit.org/public/include/commit-modifier.php:179
> +                    $owned_commit_info['repository'] = $owned_commit_repository_name;

Why is this mutation needed?
It seems to me that passing this as an extra argument to prepare_single_commit_data
with a descriptively named argument would be cleaner.

> Websites/perf.webkit.org/public/include/commit-modifier.php:187
> +    private function prepare_single_commit_data($commit_info, $is_owned_commit)

Again, names such as "prepare" doesn't really tell us what it does.
Here, we're simply constructing the commit data from the commit info,
we should just call it commit_update_from_commit_info or something.

But how is this function different from what's in the loop of prepare_commit_update?
Can't we share some code here?

> Websites/perf.webkit.org/public/include/commit-modifier.php:238
> +    private function resolve_repository($repository_name, $owner_repository_id)
> +    {

We should just merge this function with find_top_level_repository, and do it upfront.
Comment 17 dewei_zhu 2018-12-18 22:38:22 PST
Created attachment 357656 [details]
Patch
Comment 18 Ryosuke Niwa 2018-12-19 20:30:21 PST
Comment on attachment 357656 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357656&action=review

> Websites/perf.webkit.org/public/include/commit-updater.php:21
> +        $this->updates = array();

This should be a local variable in update_commits

> Websites/perf.webkit.org/public/include/commit-updater.php:38
> +        foreach ($this->updates as $update) {

We shouldn't make a copy here.

> Websites/perf.webkit.org/public/include/commit-updater.php:41
> +            if (array_key_exists('committer', $commit_update)) {

We should store committer in some other field than the update array which contains the content of the database query.

> Websites/perf.webkit.org/public/include/commit-updater.php:42
> +                $committer_id = $this->resolve_committer_by_repository_id_and_author_account($commit_select_query['repository'], $commit_update['committer']);

I think we can just call this function resolve_committer.

> Websites/perf.webkit.org/public/include/commit-updater.php:66
> +            if (array_key_exists('previousCommit', $commit_info))
> +                $commit_update['previous_commit'] = $this->resolve_previous_commit($repository_id, $commit_info['previousCommit']);

We should just resolve this later during update_commits / update_or_insert_commits
so that we can share more code between update case & insertion case.

> Websites/perf.webkit.org/public/include/commit-updater.php:75
> +        }

This function should return a new array.

> Websites/perf.webkit.org/public/include/commit-updater.php:78
> +    private function update_or_insert_commits($commit_info_list)

Why don't we put this function next to update_commits?

> Websites/perf.webkit.org/public/include/commit-updater.php:94
> +    private function construct_commit_data_list(&$commit_info_list)

It's unclear how this function is different from construct_commit_updates.
We should just merge them together into a single function.

> Websites/perf.webkit.org/public/include/commit-updater.php:108
> +    private function &construct_commit_data($commit_info, $owned_repository_name)

We shouldn't have a separate function for this. Just merge it with validate_and_construct_commit_data_from_commit_info.

> Websites/perf.webkit.org/public/include/commit-updater.php:152
> +    private function &validate_and_construct_commit_data_from_commit_info(&$commit_info, $owned_repository_name)

This should be a static function.

> Websites/perf.webkit.org/public/include/commit-updater.php:196
> +            $repository = $this->db->select_or_insert_row('repositories', 'repository', array('name' => $repository_name, 'owner' => $owner_repository_id), NULL, '*');

Why don't we just retrieve the repository ID instead?

> Websites/perf.webkit.org/public/include/commit-updater.php:200
> +        }
> +        else {

Nit: } else {
Comment 19 dewei_zhu 2018-12-20 01:59:39 PST
Created attachment 357803 [details]
Patch
Comment 20 Ryosuke Niwa 2018-12-20 14:59:02 PST
Comment on attachment 357803 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357803&action=review

> Websites/perf.webkit.org/public/include/commit-updater.php:74
> +    private function &construct_update_data_list(&$commit_info_list, $should_insert)

We can probably just call this construct_update_list.

> Websites/perf.webkit.org/public/include/commit-updater.php:83
> +            $update_data =  array('commit' => $commit_data, 'repository' => $commit_info['repository']);

Nit: Two spaces after =.
I don't think we really need to call this update_data. Just call it update.
Also, use references?

> Websites/perf.webkit.org/public/include/commit-updater.php:95
> +                $this->exit_with_error('OwnedCommitsUpdateNotSupportedYet', array('commit' => $commit_info));

I think a better / less verbose error would be AttemptToUpdateOwnedCommits

> Websites/perf.webkit.org/public/include/commit-updater.php:101
> +                $update_data_list[] = $update_data;

I think we can just use array_push.

> Websites/perf.webkit.org/public/include/commit-updater.php:106
> +            foreach($commit_info['ownedCommits'] as $owned_repository_name => $owned_commit_info) {

Use references?

> Websites/perf.webkit.org/public/include/commit-updater.php:108
> +                $owned_commit_update_data = array('commit' => $owned_commit, 'repository' => $owned_repository_name);

Ditto.

> Websites/perf.webkit.org/public/include/commit-updater.php:111
> +                    $owned_commit_update_data['previous_commit'] = $owned_commit_info['previousCommit'];

Ditto.

> Websites/perf.webkit.org/public/include/commit-updater.php:114
> +                    $owned_commit_update_data['author'] = $owned_commit_info['author'];

Ditto.

> Websites/perf.webkit.org/public/include/commit-updater.php:115
> +                $owned_commit_update_data_list[] = $owned_commit_update_data;

Just use array_push

> Websites/perf.webkit.org/public/include/commit-updater.php:118
> +                $update_data['owned_commits'] = $owned_commit_update_data_list;

Use a reference.

> Websites/perf.webkit.org/public/include/commit-updater.php:120
> +            $update_data_list[] = $update_data;

Use array_push.

> Websites/perf.webkit.org/public/include/commit-updater.php:142
> +                exit_with_error('OwnedCommitShouldNotContainTimestamp', array('commit' => $commit_info));

I think it should be OwnedCommitShouldNot*Have*Timestamp.

> Websites/perf.webkit.org/public/include/commit-updater.php:147
> +        if ($should_insert)
> +            $commit_data['reported'] = true;

We should have a test to check that an API call with insert=false wouldn't set reported to true if there isn't one already.
Comment 21 dewei_zhu 2019-01-16 21:58:36 PST
Landed in r238329.
Comment 22 dewei_zhu 2019-01-16 21:58:58 PST
<rdar://problem/44369873>
Comment 23 dewei_zhu 2019-01-16 21:59:04 PST
rdar://problem/44369873