We need a set of shared objects to be used by the tests and the test runner. This patch is separated from the patch attached to https://bugs.webkit.org/show_bug.cgi?id=149053.

Created attachment 262219 [details] Patch

Comment on attachment 262219 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=262219&action=review > PerformanceTests/Animometer/resources/statistics.js:1 > +var Statistics = This is a copy of Speedometer's statistics.js, right? Please use "svn cp" instead of copy-paste. > PerformanceTests/Animometer/resources/statistics.js:95 > +function Experiment() Please put this in a separate file so that syncing statistics.js won't require copying this code into it.

Created attachment 262302 [details] Patch

Comment on attachment 262219 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=262219&action=review >> PerformanceTests/Animometer/resources/statistics.js:1 >> +var Statistics = > > This is a copy of Speedometer's statistics.js, right? > Please use "svn cp" instead of copy-paste. Done. >> PerformanceTests/Animometer/resources/statistics.js:95 >> +function Experiment() > > Please put this in a separate file so that syncing statistics.js won't require copying this code into it. Done.

Comment on attachment 262302 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=262302&action=review > PerformanceTests/Animometer/resources/statistics.js:84 > - var degreesOfFreedom = numberOfSamples - 1; > - if (degreesOfFreedom > cdfForProbability.length) > - throw 'We only support up to ' + deltas.length + ' degrees of freedom'; > + var degreesOfFreedom = Math.min(numberOfSamples - 1, cdfForProbability.length); We can't just assume an arbitrary degrees of freedom. When the number of samples (and therefore the degrees of freedom) is sufficiently large, we should be using normal distributions instead of student's t-distribution. The table you're copying here contains values up until 100 degrees of freedom. I'd say that's sufficiently large that we can start using normal approximation. We should use a numerical approximation of normal distributions as outlined in: https://en.wikipedia.org/wiki/Normal_distribution#Numerical_approximations_for_the_normal_CDF for example.

Comment on attachment 262302 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=262302&action=review >> PerformanceTests/Animometer/resources/statistics.js:84 >> + var degreesOfFreedom = Math.min(numberOfSamples - 1, cdfForProbability.length); > > We can't just assume an arbitrary degrees of freedom. > When the number of samples (and therefore the degrees of freedom) is sufficiently large, > we should be using normal distributions instead of student's t-distribution. > > The table you're copying here contains values up until 100 degrees of freedom. > I'd say that's sufficiently large that we can start using normal approximation. > We should use a numerical approximation of normal distributions as outlined in: > https://en.wikipedia.org/wiki/Normal_distribution#Numerical_approximations_for_the_normal_CDF > for example. Why not? The t distribution tables usually begins by listing the individual values for small degrees of freedom. Once the degree of freedom becomes large (>30), the tables list rows; each row covers a range of 10 degree of freedoms. And then they list rows for a range of 20 degrees of freedom and then they stop. The last row is used for all the degree of freedoms which is larger than the last value before it: https://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values. So my choice to to use the cdfForProbability[cdfForProbability.length - 1] is not arbitrary it is a valid approximation. And this is obvious from the table I copied. The last values for every degree of freedom are equal for 2 or 3 significant figures after the decimal point. That means if we had tables for infinite degree of freedoms, the difference between this approximation and the actual value will be most likely in the fourth significant digit after the decimal point. You are suggesting using the normal approximation but using the last row of the t distribution table is not bad approximation. Besides I am using toFixed(2) for the calculated value anyway because there is no much space and it is also distracting to show more than two digits after the decimal point.

(In reply to comment #6) > Comment on attachment 262302 [details] > Patch > > The t distribution tables usually begins by listing the individual values > for small degrees of freedom. Once the degree of freedom becomes large > (>30), the tables list rows; each row covers a range of 10 degree of > freedoms. And then they list rows for a range of 20 degrees of freedom and > then they stop. The last row is used for all the degree of freedoms which is > larger than the last value before it: > https://en.wikipedia.org/wiki/Student%27s_t- > distribution#Table_of_selected_values. So my choice to to use the > cdfForProbability[cdfForProbability.length - 1] is not arbitrary it is a > valid approximation. It isn't because I added that damn table (by computing them on R), and the last value is not appropriate as the approximation for an arbitrary degrees of freedom. It's for 100 degrees of freedom, and there is 2-3% margins of error compared to when n -> infinity. It would mean that your benchmark can never detect any regression of a magnitude less than say, 5%, reliably because your confidence interval calculation itself has an inherent error of 2-3%. > You are suggesting using the normal approximation but using the last row of > the t distribution table is not bad approximation. Besides I am using > toFixed(2) for the calculated value anyway because there is no much space > and it is also distracting to show more than two digits after the decimal > point. Is your test result guaranteed to be less than 9? If not, toFixed(2) doesn't guarantee the number of significant figures at all since the function only forces two decimal points, not two significant figures. There's a big difference between the two.

Created attachment 262372 [details] Patch

I talked to Jon Lee and Simon Fraser and we agreed on removing the 95% confidence interval delta measurement altogether. In speedometer it is used to measure the confidence interval of the test score mean. That makes since because the sampled data are totally independent. But in our case, we are sampling the test complexities in the same iteration. Since the sampled test complexities are dependent we can't say this measurement is useful in this case. We may use the confidence interval measurement in the future if we decide to run the same test multiple times as speedometer. But for now we do not need it for the iteration sampled data.

statistics.js was removed because the only function I needed from it was sampleStandardDeviation().

Makes sense.

Comment on attachment 262372 [details] Patch r= based on previous reviews and rniwa's comments being addressed.

Comment on attachment 262372 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=262372&action=review r=me provided the following comments are addressed before the patch is landed. > PerformanceTests/Animometer/resources/extensions.js:163 > + th.innerHTML += "<br>[" + message + "]"; For the sake of XSS prevention, I'd suggest manually constructing text nodes instead as in: th.appendChild(document.createElement('br')); th.appendChild(document.createTextNode('[' + message +]')); > PerformanceTests/Animometer/resources/sampler.js:10 > + sampleStandardDeviation: function(numberOfSamples, sum, squareSum) It's a bit odd to refer to the version with Bessel's correction as sampleStandardDeviation since the unqualified "sample standard deviation" usually refers to biased/uncorrected standard deviation. But I suppose anyone reading the code will see the Bessel's correction so it's not a big deal but I'd prefer calling this correctedSampleStandardDeviation or unbiasedSampleStandardDeviation instead. By the way, you removed the comment above this function: http://trac.webkit.org/browser/trunk/PerformanceTests/resources/statistics.js#L44 Please add it back as it's an important documentation.

Comment on attachment 262372 [details] Patch cq- because my comments have not been addressed.

Created attachment 262376 [details] Patch

Comment on attachment 262372 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=262372&action=review >> PerformanceTests/Animometer/resources/extensions.js:163 >> + th.innerHTML += "<br>[" + message + "]"; > > For the sake of XSS prevention, I'd suggest manually constructing text nodes instead as in: > th.appendChild(document.createElement('br')); > th.appendChild(document.createTextNode('[' + message +]')); Done. >> PerformanceTests/Animometer/resources/sampler.js:10 >> + sampleStandardDeviation: function(numberOfSamples, sum, squareSum) > > It's a bit odd to refer to the version with Bessel's correction as sampleStandardDeviation > since the unqualified "sample standard deviation" usually refers to biased/uncorrected standard deviation. > > But I suppose anyone reading the code will see the Bessel's correction so it's not a big deal > but I'd prefer calling this correctedSampleStandardDeviation or unbiasedSampleStandardDeviation instead. > > By the way, you removed the comment above this function: > http://trac.webkit.org/browser/trunk/PerformanceTests/resources/statistics.js#L44 > > Please add it back as it's an important documentation. Function was renamed correctedSampleStandardDeviation() and the comment was added.

Comment on attachment 262372 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=262372&action=review >>> PerformanceTests/Animometer/resources/sampler.js:10 >>> + sampleStandardDeviation: function(numberOfSamples, sum, squareSum) >> >> It's a bit odd to refer to the version with Bessel's correction as sampleStandardDeviation >> since the unqualified "sample standard deviation" usually refers to biased/uncorrected standard deviation. >> >> But I suppose anyone reading the code will see the Bessel's correction so it's not a big deal >> but I'd prefer calling this correctedSampleStandardDeviation or unbiasedSampleStandardDeviation instead. >> >> By the way, you removed the comment above this function: >> http://trac.webkit.org/browser/trunk/PerformanceTests/resources/statistics.js#L44 >> >> Please add it back as it's an important documentation. > > Function was renamed correctedSampleStandardDeviation() and the comment was added. Correction: Function was renamed unbiasedSampleStandardDeviation()

Comment on attachment 262376 [details] Patch Rejecting attachment 262376 [details] from commit-queue. Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-03', 'validate-changelog', '--check-oops', '--non-interactive', 262376, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit ChangeLog entry in PerformanceTests/ChangeLog contains OOPS!. Full output: http://webkit-queues.webkit.org/results/238447

Created attachment 262377 [details] Patch

Comment on attachment 262377 [details] Patch Clearing flags on attachment: 262377 Committed r190541: <http://trac.webkit.org/changeset/190541>

All reviewed patches have been landed. Closing bug.