188425 – Show t-test results based on individual measurements to analysis task page

RESOLVED FIXED188425

Show t-test results based on individual measurements to analysis task page

https://bugs.webkit.org/show_bug.cgi?id=188425

Summary Show t-test results based on individual measurements to analysis task page

dewei_zhu

Reported 2018-08-08 17:30:57 PDT

Added comparision for individual iterations on analysis task page.

Attachments
Patch (71.17 KB, patch) 2018-08-08 17:32 PDT, dewei_zhu	no flags	Details Formatted Diff Diff
Patch (71.29 KB, patch) 2018-08-08 21:16 PDT, dewei_zhu	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews206 for win-future (13.04 MB, application/zip) 2018-08-09 10:42 PDT, EWS Watchlist	no flags	Details
Patch (65.69 KB, patch) 2018-08-22 22:18 PDT, dewei_zhu	rniwa: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

dewei_zhu

Comment 1 2018-08-08 17:32:32 PDT

Created attachment 346806 [details] Patch

dewei_zhu

Comment 2 2018-08-08 21:16:44 PDT

Created attachment 346818 [details] Patch

EWS Watchlist

Comment 3 2018-08-09 10:42:47 PDT

Comment on attachment 346818 [details] Patch Attachment 346818 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/8809653 New failing tests: http/tests/security/local-video-source-from-remote.html

EWS Watchlist

Comment 4 2018-08-09 10:42:58 PDT

Created attachment 346843 [details] Archive of layout-test-results from ews206 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews206 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit

Ryosuke Niwa

Comment 5 2018-08-21 17:28:51 PDT

Comment on attachment 346818 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=346818&action=review > Websites/perf.webkit.org/ChangeLog:10 > + Refactored t-distribution inverse lookup to any degree of freedom with 4 digit precision. 4 digit precision -> 5 significant figures. > Websites/perf.webkit.org/public/shared/statistics.js:37 > + for (var probability in tDistributionTable) Use const? > Websites/perf.webkit.org/public/shared/statistics.js:71 > + function sampleMeanAndVarianceForAggregatedInfo(statisticalInfo) { This is not a descriptive name. "info" doesn't provide any meaningful semantics and "Aggregated" makes it sound like this is the one that computes statistics of the means since mean is an aggregated value. In general, we should avoid vague variable or argument names such as "statisticalInfo" How about sampleMeanAndVarianceFromMultipleSamples? since I think what's important here is that we're trying to compute the mean & variance of multiple sets of samples. > Websites/perf.webkit.org/public/shared/statistics.js:108 > + for (var probability in tDistributionTable) { Use const? > Websites/perf.webkit.org/public/shared/statistics.js:295 > + let mid = low + parseInt((high - low) / 2); > + const entry = tDistributionTableForProbability[mid]; This binary search isn't quite right. We need to be binary-searching based on ends, not the index of each entry. > Websites/perf.webkit.org/public/shared/statistics.js:305 > + const tDistributionTable = { I don't think we should rename this table. It's important to know whether these t-distribution values are one-sided or two-sided values. > Websites/perf.webkit.org/public/shared/statistics.js:307 > + {end: 1, value: 3.0777}, {end: 2, value: 1.8856}, {end: 3, value: 1.6377}, {end: 4, value: 1.5332}, For values that only occupy a single slot, I think we're better off omitting the number. so, that we can see it like: 3.0777, 1.8856, 1.6377, 1.5332 for the first hundred values.

dewei_zhu

Comment 6 2018-08-22 22:18:16 PDT

Created attachment 347900 [details] Patch

Ryosuke Niwa

Comment 7 2018-08-22 22:51:55 PDT

Comment on attachment 347900 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=347900&action=review > Websites/perf.webkit.org/public/shared/statistics.js:78 > + size += sample.iterationCount; We should probably call this sampleSize. It's important that this function doesn't randomly rely on the way measurement set stores its data. So it's better for the caller of sampleMeanAndVarianceFromMultipleSamples to be manually converting form one format to another. > Websites/perf.webkit.org/public/shared/statistics.js:95 > + this.probabilityRangeForWelchsTFromTwoSampleSets = function (sampleSet1, sampleSet2) { T-test is always constructed for two sets of samples so it's kind of redundant to say "from two sample sets". It's probably better to call this function probabilityRangeForWelchsT*For*MultipleSamples to match sampleMeanAndVarianceFromMultipleSamples. > Websites/perf.webkit.org/public/shared/statistics.js:128 > + this._combineWelchsT = function(stat1, stat2) { This function is really the one that computes Welch's t. We should probably call this _computeWelchsTFromStatistics. > Websites/perf.webkit.org/public/shared/statistics.js:293 > + const binarySearchTable = tDistributionTableForProbability.binarySearchTable; Probably need a blank line here. Also, we should probably add an early return where for when degreesOfFreedom is greater than end of the last binary search table. > Websites/perf.webkit.org/public/shared/statistics.js:297 > + let mid = low + parseInt((high - low) / 2); Use Math.floor instead. Also use const. > Websites/perf.webkit.org/public/shared/statistics.js:310 > + linearLookupTable: [ I don't think linearLookupTable is a good name here. We should describe what the table is, not how it's used. e.g. probabilityToTValue > Websites/perf.webkit.org/public/shared/statistics.js:322 > + binarySearchTable: [ e.g. tValuesSortedByProbability > Websites/perf.webkit.org/public/shared/statistics.js:323 > + {end: 101, value: 1.2900}, {end: 102, value: 1.2899}, {end: 103, value: 1.2898}, {end: 105, value: 1.2897}, Instead of {end: 101, value: 1.2900}, I'd call these {maxDF: 101, tValue: 1.2900} to make them more descriptive. > Websites/perf.webkit.org/public/v3/models/test-group.js:172 > + const significanceLabelForMean = isSignificantForMean? `significant with ${(probabilityRangeForMean.range[0] * 100).toFixed()}% probability` : 'insignificant'; This should probably be extracted as a local lambda function to be used again. > Websites/perf.webkit.org/public/v3/models/test-group.js:177 > + const probabilityRangeForIndividual = Statistics.probabilityRangeForWelchsTFromTwoSampleSets(beforeMeasurements, afterMeasurements); Again, it's bad that Statistics.probabilityRangeForWelchsTFromTwoSampleSets is implicitly relying on the property names used by MeasurementSet. > Websites/perf.webkit.org/unit-tests/statistics-tests.js:257 > + return [firstHalf, secondHalf].map((values) => ({sum: Statistics.sum(values), squareSum: Statistics.squareSum(values), iterationCount: values.length})); As I mentioned in person, we need test cases where this function returns an array of length 1 & 3.

dewei_zhu

Comment 8 2018-08-23 16:11:33 PDT

Landed in r235255

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Nobody

Reported

2018-08-08 17:30 PDT

Modified

2018-08-23 16:11 PDT History

CC List

3 users Show

URL

Keywords

Depends on

Blocks