Bug 184419

Summary: Write a script that detects chart changes by using v3 API.
Product: WebKit Reporter: dewei_zhu
Component: New BugsAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: dewei_zhu, rniwa
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 184766, 184958, 185137    
Bug Blocks:    
Description Flags
Patch rniwa: review+

Description dewei_zhu 2018-04-09 09:32:55 PDT
Write a script that detects chart changes by using v3 API.
Comment 1 dewei_zhu 2018-04-09 09:38:20 PDT
Created attachment 337503 [details]
Comment 2 Ryosuke Niwa 2018-04-13 00:49:54 PDT
Comment on attachment 337503 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=337503&action=review

> Websites/perf.webkit.org/public/v3/models/measurement-set.js:293
> +        const asyncTaskIsAvailable = typeof module == 'undefined' && typeof window == 'undefined';

This should be a helper function / static variable on AsyncTask itself.

> Websites/perf.webkit.org/tools/run-analysis.js:2
> +'use strict';

No need to use strict mode.

> Websites/perf.webkit.org/tools/run-analysis.js:26
> +    const configs = JSON.parse(fs.readFileSync(options['--change-detection-config-json'], 'utf-8'));

This isn't really a list/set of configurations so it should be called changeDetectionConfig instead.

> Websites/perf.webkit.org/tools/run-analysis.js:40
> +    constructor(configs, remoteAPI, logger)

Ditto about calling the first argument simply config or changeDetectionConfig.

> Websites/perf.webkit.org/tools/run-analysis.js:42
> +        this._configs = configs;

We should be parsing & validating the parsed config here.

> Websites/perf.webkit.org/tools/run-analysis.js:53
> +        try {
> +            manifest = await Manifest.fetch();
> +        } catch (reason) {

try-catch should be happening in the loop.
In general, we should almost never use try-catch in our codebase.
They hide errors & make debugging much harder.
r- because of this.

> Websites/perf.webkit.org/tools/run-analysis.js:62
> +            } catch(error) {
> +                this._logger.log(`Failed analyzing "${metric.fullName()}" on "${platform.name()}" due to ${error}`);
> +            }

Ditto. We shouldn't be try-catching it here.
Let it propagate all the way up to loop instead.

> Websites/perf.webkit.org/tools/run-analysis.js:81
> +                        configurations.push([platform, metric]);

We should assert that we found valid platform & metric.

> Websites/perf.webkit.org/tools/run-analysis.js:103
> +        if (!segmentedValues || !segmentedValues.length)
> +            return;

We should probably log something here.

> Websites/perf.webkit.org/tools/run-analysis.js:106
> +        const ranges = Statistics.findRangesForChangeDetectionsWithWelchsTTest(rawValues, segmentedValues, 0.99).map((range) => {
> +            return {

Just use (range) => ({~}) trick.

> Websites/perf.webkit.org/tools/run-analysis.js:117
> +        const allAnalysisTasks = await AnalysisTask.fetchByPlatformAndMetric(platform.id(), metric.id());
> +        const analysisTasks = allAnalysisTasks.filter((task) => task.startTime() && task.endTime());

I'm pretty sure all analysis tasks fetched by AnalysisTask.fetchByPlatformAndMetric are guaranteed to have startTime & endTime.

> Websites/perf.webkit.org/tools/run-analysis.js:125
> +            for (const range of ranges) {

This is O(N^2)... it probably doesn't matter though.

> Websites/perf.webkit.org/tools/run-analysis.js:130
> +                    this._logger.info(`Found overlapped range: ${range.description} with task id: ${task.id()}`);

This is going to be really spammy because we'll be finding the same range over & over once we created an analysis task.

> Websites/perf.webkit.org/tools/run-analysis.js:137
> +        const filteredRanges = ranges.filter((range) => range.endPoint.time >= startTime && !range.overlappingAnalysisTasks.length)

What's the point of range.endPoint.time >= startTime?

> Websites/perf.webkit.org/tools/run-analysis.js:141
> +        let summary, range = null;
> +        for (range of filteredRanges) {

We're finding the last range??
Why don't we find the range with the biggest change instead?
I'm pretty sure that's what I had in v2.

> Websites/perf.webkit.org/tools/run-analysis.js:158
> +        const content = await AnalysisTask.create(summary, range.startPoint.id, range.endPoint.id);
> +        const analysisTask = await AnalysisTask.fetchById(content['taskId']);
> +        const startCommitSet = range.startPoint.commitSet();
> +        const endCommitSet = range.endPoint.commitSet();
> +        await TestGroup.createAndRefetchTestGroups(analysisTask, 'Confirm', this._configs.confirmTaskRepeatCount, [startCommitSet, endCommitSet]);

This will create a dangling analysis task if the creation of an analysis task succeeds but the creation of a test group fails.
Please use /privileged-api/create-test-group to create both of them at once.
Comment 3 dewei_zhu 2018-04-20 15:32:21 PDT
Comment on attachment 337503 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=337503&action=review

>> Websites/perf.webkit.org/tools/run-analysis.js:137
>> +        const filteredRanges = ranges.filter((range) => range.endPoint.time >= startTime && !range.overlappingAnalysisTasks.length)
> What's the point of range.endPoint.time >= startTime?

I translated the logic from detect-change-script. I think what this is trying to do is the make sure at least part of the range should inside the time range specified by configurations.

>> Websites/perf.webkit.org/tools/run-analysis.js:141
>> +        for (range of filteredRanges) {
> We're finding the last range??
> Why don't we find the range with the biggest change instead?
> I'm pretty sure that's what I had in v2.

Actually, we don't. But that sounds a better strategy.
Comment 4 dewei_zhu 2018-04-26 23:51:08 PDT
Created attachment 338979 [details]
Comment 5 Ryosuke Niwa 2018-04-29 17:23:51 PDT
Comment on attachment 338979 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=338979&action=review

r- because there is a significant amount of restructuring of the code we need to do here.
Also r- because we definitely need tests for this new script.

> Websites/perf.webkit.org/public/v3/async-task.js:34
> +        return typeof module == 'undefined' && typeof window == 'undefined';

This doesn't make sense. "window" is always defined inside a browser outside a WebWorker.
r- because of this bug. Also, we clearly need a browser test to make sure this function returns true inside a browser.

> Websites/perf.webkit.org/tools/run-analysis.js:24
> +async function analysisLoop(options)

This function needs to have try-catch to make sure the script doesn't stop running after encountering the first exception.
But we shouldn't have try-catch anywhere else in the codebase.

> Websites/perf.webkit.org/tools/run-analysis.js:41
> +class Analyzer {

This is an extremely generic name... It doesn't even tell us what it analyzes.
How about MeasurementSetAnalyzer?

> Websites/perf.webkit.org/tools/run-analysis.js:45
> +        this._testingConfigurations = Analyzer.configurationsForTesting(manifest);

If we're exposing Analyzer.configurationsForTesting as a static method,
it would be cleaner for the caller (analysisLoop) to call this method,
and then Analyzer to simply take the result.

> Websites/perf.webkit.org/tools/run-analysis.js:52
> +        this._logger.info("Start detecting.");

"Start detecting" is probably not the most descriptive log.
It's probably better to say "starting analysis" since we're gonna add more analysis features to this script.

> Websites/perf.webkit.org/tools/run-analysis.js:57
> +    static configurationsForTesting(manifest)

Please add a FIXME here to share this code with DashboardPage.

> Websites/perf.webkit.org/tools/run-analysis.js:82
> +    async analyzeTestConfiguration(platform, metric)

This function should probably take a measurement set instead and be renamed to analyzeMeasurementSet.

> Websites/perf.webkit.org/tools/run-analysis.js:84
> +        const measurementSet = MeasurementSet.findSet(platform.id(), metric.id(), platform.lastModified(metric));

Why isn't done during configurationsForTesting?

> Websites/perf.webkit.org/tools/run-analysis.js:85
> +        this._logger.info(`\n==== Analyzing the last ${this._changeDetectionConfigs.maxDays} days: "${metric.fullName()}" on "${platform.name()}" ====\n`);

It's kind of silly to log the number of days we're analyzing since that's identical for all platforms & metrics.
Why don't we spit this out when we log "start detecting".

> Websites/perf.webkit.org/tools/run-analysis.js:86
> +        const endTime = Date.now();

This end time would be different for each platform/metric we fetch.
I think we should compute this once when we start the analysis so that they're consistent across platforms & metrics.

> Websites/perf.webkit.org/tools/run-analysis.js:93
> +        let segmentedValues = await measurementSet.fetchSegmentation('segmentTimeSeriesByMaximizingSchwarzCriterion', [], 'current', false);

Use const.

> Websites/perf.webkit.org/tools/run-analysis.js:100
> +        const ranges = Statistics.findRangesForChangeDetectionsWithWelchsTTest(rawValues, segmentedValues, 0.99).map((range) =>({

Either omit 0.99 or make it configurable.

> Websites/perf.webkit.org/tools/run-analysis.js:118
> +                const disjoint = range.endPoint.seriesIndex < taskStartPoint.seriesIndex

Nit: isDisjoint but with respect to what?
Maybe isDisjoint

> Websites/perf.webkit.org/tools/run-analysis.js:122
> +                if (!disjoint)
> +                    range.overlappingAnalysisTasks.push(task);

This is silly. We're collecting the list of overlapping analysis tasks just to filter ranges in the next statement.
We should simply invert the nesting of these two loops, and omit ranges right away.
Also, we don't have to find starting & ending points to check the overlapped-ness;
simply call startTime() / endTime() on the task itself. i.e.
ranges.filter((range) => {
    bool hasOverlappingTask = false;
    for (const task of analysisTasks) {
        bool taskEndsBeforeRangeStart = task.endTime() < range.startPoint.time;
        bool taskStartsAfterRangeEnd = range.endPoint.time < task.startTime();
        if (!(taskEndsBeforeRangeStart || taskStartsAfterRangeEnd))
            hasOverlappingTask = true;
    return !hasOverlappingTask;

> Websites/perf.webkit.org/tools/run-analysis.js:128
> +        const filteredRanges = ranges.filter((range) => range.endPoint.time >= startTime && !range.overlappingAnalysisTasks.length)
> +            .sort((a, b) => a.endPoint.time - b.endPoint.time);

Sounds like we should do this filtering before filtering ranges based on overlapping tasks since the latter computation is a lot more expensive.
You can merge that to the above filter code I wrote.
Also, there is no point in sorting them based on start/end time since we're picking the one with the largest change in the next loop.

> Websites/perf.webkit.org/tools/run-analysis.js:130
> +        let summary, rangeWithMostSignificantChange = null, valueChangeSummaryForRangeWithMostSignificantChange = null;

Nit: Don't allocate multiple variables in a single statement like this.

> Websites/perf.webkit.org/tools/run-analysis.js:143
> +            const currentChangeIsMoreSignificant = !rangeWithMostSignificantChange
> +                || (valueChangeSummaryForRangeWithMostSignificantChange.changeType === valueChangeSummary.changeType
> +                    && Math.abs(valueChangeSummaryForRangeWithMostSignificantChange.relativeChange) < Math.abs(valueChangeSummary.relativeChange))
> +                || (valueChangeSummaryForRangeWithMostSignificantChange.changeType === progressionString
> +                    && valueChangeSummary.changeType === regressionString);

This is a very complicated logic.
1. We can instead, remember whether there was any regression in the previous filtering step,
2. Ignore all progressions if there were regressions
3. Sort them by diff.

But really, I'm not certain if we want to always prioritize regressions over progressions.
e.g. if we had 50% progression and 0.5% regression, we probably want to figure out where the progression occurred
since that probably means the test is broken somehow.

> Websites/perf.webkit.org/unit-tests/measurement-set-tests.js:44
> -            assert.equal(requests[0].url, '../data/measurement-set-1-1.json');
> +            assert.equal(requests[0].url, '/data/measurement-set-1-1.json');

Can we make this refactoring in a separate patch?
Comment 6 dewei_zhu 2018-04-30 17:29:22 PDT
Created attachment 339165 [details]
Comment 7 Ryosuke Niwa 2018-04-30 21:07:13 PDT
Comment on attachment 339165 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=339165&action=review

We still need tests.

> Websites/perf.webkit.org/public/v3/async-task.js:34
> +        return typeof module === 'undefined';

We should just check directly typeof Worker.

> Websites/perf.webkit.org/public/v3/models/bug.js:33
> +
> +

Reduce this to one blank line?

> Websites/perf.webkit.org/tools/run-analysis.js:32
> +        global.RemoteAPI = new RemoteAPI(serverConfig.server);

I don't think this makes sense. We should just construct a local instance.

> Websites/perf.webkit.org/tools/run-analysis.js:43
> +    analyzer.showAnalysisStatistics();

I don't think we need this logging because we're logging whenever we analyze a new configuration.

> Websites/perf.webkit.org/tools/run-analysis.js:105
> +    async analyzeMeasurementSet(platform, metric)

I think it's still better for this function to take in measurementSet.
Finding platform & metric is cheap enough.
We should just assert that they're not null right away.

> Websites/perf.webkit.org/tools/run-analysis.js:108
> +        this._logger.info(`\n==== "${metric.fullName()}" on "${platform.name()}" ====\n`);

I don't think we need to create blank lines before/after this.

> Websites/perf.webkit.org/tools/run-analysis.js:128
> +                description: `series index range: [${range.startIndex} - ${range.endIndex}]`

Remove this.

> Websites/perf.webkit.org/tools/run-analysis.js:147
> +        }).sort((a, b) => {
> +            if (a.valueChangeSummary.changeType === b.valueChangeSummary.changeType)
> +                return Math.abs(a.valueChangeSummary.relativeChange) - Math.abs(b.valueChangeSummary.relativeChange);
> +            return a.valueChangeSummary.changeType === progressionString ? -1 : 1;
> +        });

I don't think we should be sorting ranges like this.
Just find the one with the biggest value instead.

> Websites/perf.webkit.org/tools/run-analysis.js:157
> +            this._logger.log('Detected:', summary);

I don't think we should be making logging like this.
Comment 8 dewei_zhu 2018-05-01 13:47:42 PDT
Comment on attachment 339165 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=339165&action=review

>> Websites/perf.webkit.org/tools/run-analysis.js:32
>> +        global.RemoteAPI = new RemoteAPI(serverConfig.server);
> I don't think this makes sense. We should just construct a local instance.

We need to change the global object because this remote api is used by multiple places for example `Manifest.fetch`. We may consider we can find a way to do this without overwriting the global object.
Comment 9 dewei_zhu 2018-05-01 19:01:17 PDT
Created attachment 339255 [details]
Comment 10 Ryosuke Niwa 2018-05-01 20:00:56 PDT
Comment on attachment 339255 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=339255&action=review

> Websites/perf.webkit.org/browser-tests/time-series-chart-tests.js:6
> +        expect(context.symbols.AsyncTask.isAvailable()).to.be(true);

This is nothing to do with time-series-chart. Please create a new test file instead.

> Websites/perf.webkit.org/tools/run-analysis.js:49
> +class MeasurementSetAnalyzer {

This class should be moved to its own file inside tools/js/

> Websites/perf.webkit.org/tools/run-analysis.js:52
> +        this._changeDetectionConfigs = changeDetectionConfigs;

We shouldn't be storing the JSON we read off of a file like this.
This would make it a lot harder to figure out what you can and can't specify in the configuration.
We should instead make MeasurementSetAnalyzer take each configuration as an argument,
and store each configuration value separately.

> Websites/perf.webkit.org/tools/run-analysis.js:103
> +        const rawValues = currentTimeSeries.values();

We should exit early without any error or warning when rawValues is empty.

> Websites/perf.webkit.org/tools/run-analysis.js:106
> +            this._logger.warn(`Failed fetching segmentations for ${metric.fullName()}" on "${platform.name()}"`);

Missing " after "for" and before the full metric name.

> Websites/perf.webkit.org/tools/run-analysis.js:113
> +            this._changeDetectionConfigs.tTestSignificance).map((range) =>({

I really don't think it makes sense to make this configurable.
We almost never used any other value but 0.99.
We can make it configurable in the future when we need to.
Just omit the argument for now.

> Websites/perf.webkit.org/tools/run-analysis.js:136
> +        filteredRanges.forEach((range) => {

I think for (const range of filteredRange) would probably read better
since the block here don't involve return, etc...

> Websites/perf.webkit.org/tools/run-analysis.js:140
> +            // Take a square root on progression relative value, so that we would not favor regression so
> +            // much that we miss some huge progressions.

Instead of this comment, just give it a more descriptive name.

> Websites/perf.webkit.org/tools/run-analysis.js:141
> +            const weightedSignificance = range.valueChangeSummary.changeType === regressionString ?

e.g. weightFavoringRegression.

> Websites/perf.webkit.org/tools/run-analysis.js:159
> +        this._logger.info(`Creating analysis task and confirming: "${summary}".`);

We should probably log this after we've created analysis task along with its ID.

> Websites/perf.webkit.org/tools/run-analysis.js:161
> +        await AnalysisTask.create(summary, rangeWithMostSignificantChange.startPoint, rangeWithMostSignificantChange.endPoint,
> +            'Confirm', this._changeDetectionConfigs.confirmTaskRepeatCount);

I don't think it makes sense to have a single iteration count.
If this is a temporary treatment, it's better to hard-code it here with a FIXME instead.

> Websites/perf.webkit.org/unit-tests/measurement-set-analyzer-tests.js:15
> +    beforeEach(() => {
> +        PrivilegedAPI.configure('test', 'password');
> +    });

Why do we need to configure this before each test case?

> Websites/perf.webkit.org/unit-tests/measurement-set-analyzer-tests.js:20
> +            assert.equal(configurations.length, 0);

Use deepEqual with [] so that when this test fails, we'd see the contents.

> Websites/perf.webkit.org/unit-tests/measurement-set-analyzer-tests.js:74
> +        it('should not analyze and show a warning message if failed to fetch segnmentation', async () => {

Nit: segnmentation -> segmentation.
I don't think this test case is valid, however. When there are no values in the time series, we shouldn't emit any error or warning.
Consequently, we need another test case which explicitly test segmentation to fail.

> Websites/perf.webkit.org/unit-tests/measurement-set-analyzer-tests.js:95
> +            assert.deepEqual(logger.warn_logs, ['Failed fetching segmentations for Some test : Some metric" on "Some platform"']);

I don't think we should be emitting this error. When the graph is empty, it's expected that there won't be anything to investigate.
We should be successfully exiting without logging any errors or warnings.

> Websites/perf.webkit.org/unit-tests/measurement-set-analyzer-tests.js:104
> +            const measurementSetAnalyzer = new MeasurementSetAnalyzer(changeDetectionConfigs, [measurementSet], logger);
> +            measurementSetAnalyzer._startTime = 4000;
> +            measurementSetAnalyzer._endTime = 5000;

Modifying private instance variables like this isn't great.
It would mean that whenever we modify MeasurementSetAnalyzer, we have to update all these tests as well.
We should make startTime & endTime simply arguments to MeasurementSetAnalyzer instead.