Bug 188425 - Show t-test results based on individual measurements to analysis task page
Summary: Show t-test results based on individual measurements to analysis task page
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
Depends on:
Reported: 2018-08-08 17:30 PDT by dewei_zhu
Modified: 2018-08-23 16:11 PDT (History)
3 users (show)

See Also:

Patch (71.17 KB, patch)
2018-08-08 17:32 PDT, dewei_zhu
no flags Details | Formatted Diff | Diff
Patch (71.29 KB, patch)
2018-08-08 21:16 PDT, dewei_zhu
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews206 for win-future (13.04 MB, application/zip)
2018-08-09 10:42 PDT, EWS Watchlist
no flags Details
Patch (65.69 KB, patch)
2018-08-22 22:18 PDT, dewei_zhu
rniwa: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description dewei_zhu 2018-08-08 17:30:57 PDT
Added comparision for individual iterations on analysis task page.
Comment 1 dewei_zhu 2018-08-08 17:32:32 PDT
Created attachment 346806 [details]
Comment 2 dewei_zhu 2018-08-08 21:16:44 PDT
Created attachment 346818 [details]
Comment 3 EWS Watchlist 2018-08-09 10:42:47 PDT
Comment on attachment 346818 [details]

Attachment 346818 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/8809653

New failing tests:
Comment 4 EWS Watchlist 2018-08-09 10:42:58 PDT
Created attachment 346843 [details]
Archive of layout-test-results from ews206 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews206  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 5 Ryosuke Niwa 2018-08-21 17:28:51 PDT
Comment on attachment 346818 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=346818&action=review

> Websites/perf.webkit.org/ChangeLog:10
> +        Refactored t-distribution inverse lookup to any degree of freedom with 4 digit precision.

4 digit precision -> 5 significant figures.

> Websites/perf.webkit.org/public/shared/statistics.js:37
> +        for (var probability in tDistributionTable)

Use const?

> Websites/perf.webkit.org/public/shared/statistics.js:71
> +    function sampleMeanAndVarianceForAggregatedInfo(statisticalInfo) {

This is not a descriptive name. "info" doesn't provide any meaningful semantics and "Aggregated"
makes it sound like this is the one that computes statistics of the means since mean is an aggregated value.
In general, we should avoid vague variable or argument names such as "statisticalInfo"

How about sampleMeanAndVarianceFromMultipleSamples? since I think what's important here is that
we're trying to compute the mean & variance of multiple sets of samples.

> Websites/perf.webkit.org/public/shared/statistics.js:108
> +        for (var probability in tDistributionTable) {

Use const?

> Websites/perf.webkit.org/public/shared/statistics.js:295
> +            let mid = low + parseInt((high - low) / 2);
> +            const entry = tDistributionTableForProbability[mid];

This binary search isn't quite right. We need to be binary-searching based on ends, not the index of each entry.

> Websites/perf.webkit.org/public/shared/statistics.js:305
> +    const tDistributionTable = {

I don't think we should rename this table.
It's important to know whether these t-distribution values are one-sided or two-sided values.

> Websites/perf.webkit.org/public/shared/statistics.js:307
> +            {end: 1, value: 3.0777},    {end: 2, value: 1.8856},    {end: 3, value: 1.6377},    {end: 4, value: 1.5332},

For values that only occupy a single slot, I think we're better off omitting the number.
so, that we can see it like:
3.0777, 1.8856, 1.6377, 1.5332
for the first hundred values.
Comment 6 dewei_zhu 2018-08-22 22:18:16 PDT
Created attachment 347900 [details]
Comment 7 Ryosuke Niwa 2018-08-22 22:51:55 PDT
Comment on attachment 347900 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=347900&action=review

> Websites/perf.webkit.org/public/shared/statistics.js:78
> +            size += sample.iterationCount;

We should probably call this sampleSize.
It's important that this function doesn't randomly rely on the way measurement set stores its data.
So it's better for the caller of sampleMeanAndVarianceFromMultipleSamples to be manually converting
form one format to another.

> Websites/perf.webkit.org/public/shared/statistics.js:95
> +    this.probabilityRangeForWelchsTFromTwoSampleSets = function (sampleSet1, sampleSet2) {

T-test is always constructed for two sets of samples so it's kind of redundant to say "from two sample sets".
It's probably better to call this function probabilityRangeForWelchsT*For*MultipleSamples to match sampleMeanAndVarianceFromMultipleSamples.

> Websites/perf.webkit.org/public/shared/statistics.js:128
> +    this._combineWelchsT = function(stat1, stat2) {

This function is really the one that computes Welch's t.
We should probably call this _computeWelchsTFromStatistics.

> Websites/perf.webkit.org/public/shared/statistics.js:293
> +        const binarySearchTable = tDistributionTableForProbability.binarySearchTable;

Probably need a blank line here.
Also, we should probably add an early return where for when degreesOfFreedom is greater than end of the last binary search table.

> Websites/perf.webkit.org/public/shared/statistics.js:297
> +            let mid = low + parseInt((high - low) / 2);

Use Math.floor instead. Also use const.

> Websites/perf.webkit.org/public/shared/statistics.js:310
> +            linearLookupTable: [

I don't think linearLookupTable is a good name here.
We should describe what the table is, not how it's used.
e.g. probabilityToTValue

> Websites/perf.webkit.org/public/shared/statistics.js:322
> +            binarySearchTable: [

e.g. tValuesSortedByProbability

> Websites/perf.webkit.org/public/shared/statistics.js:323
> +                {end: 101, value: 1.2900},  {end: 102, value: 1.2899},  {end: 103, value: 1.2898},  {end: 105, value: 1.2897},

Instead of {end: 101, value: 1.2900}, I'd call these {maxDF: 101, tValue: 1.2900} to make them more descriptive.

> Websites/perf.webkit.org/public/v3/models/test-group.js:172
> +            const significanceLabelForMean = isSignificantForMean? `significant with ${(probabilityRangeForMean.range[0] * 100).toFixed()}% probability` : 'insignificant';

This should probably be extracted as a local lambda function to be used again.

> Websites/perf.webkit.org/public/v3/models/test-group.js:177
> +            const probabilityRangeForIndividual = Statistics.probabilityRangeForWelchsTFromTwoSampleSets(beforeMeasurements, afterMeasurements);

Again, it's bad that Statistics.probabilityRangeForWelchsTFromTwoSampleSets is implicitly relying on the property names used by MeasurementSet.

> Websites/perf.webkit.org/unit-tests/statistics-tests.js:257
> +            return [firstHalf, secondHalf].map((values) => ({sum: Statistics.sum(values), squareSum: Statistics.squareSum(values), iterationCount: values.length}));

As I mentioned in person, we need test cases where this function returns an array of length 1 & 3.
Comment 8 dewei_zhu 2018-08-23 16:11:33 PDT
Landed in r235255