Bug 172439

Summary:

Figure out why Firefox's score goes up by 40% on Speedometer 2 compared to 1

Product:

WebKit

Reporter:

Ryosuke Niwa <rniwa>

Component:

Tools / Tests

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED WORKSFORME

Severity:

Normal

CC:

addyo, ap, bbouvier+webkit, bugs, bzbarsky, cpeterson, ehsan, ggaren, jdemooij, lforschler, mathias, mjs, nicolas.b.pierron

Priority:

Version:

Safari Technology Preview

Hardware:

Unspecified

OS:

Unspecified

Bug Depends on:

Bug Blocks:

172339

Attachments:

Description	Flags
Firefox speedometer results	none
Safari Tech Preview 30 results	none

Ryosuke Niwa

Reported 2017-05-22 00:06:41 PDT

I've been doing some measurements on Speedometer 2.0 to sanity check, and I'm seeing that Firefox's score goes up by 40% even though Safari & Chrome's score go up by 5%. That's big enough difference that we should look into where the difference comes from.

Attachments
Firefox speedometer results (12.53 KB, text/plain) 2017-05-30 19:47 PDT, Ryosuke Niwa	no flags	Details
Safari Tech Preview 30 results (12.25 KB, text/plain) 2017-05-30 19:47 PDT, Ryosuke Niwa	no flags	Details
View All Add attachment proposed patch, testcase, etc.

Olli Pettay (:smaug)

Comment 1 2017-05-23 06:40:25 PDT

Given that Speedometer measures somewhat random edge cases of the platform, there has been couple of cases where webkit/blink relied on optimizations against the spec. Like when to flush layout when accessing mouse event's coordinates. I don't recall if that showed up only in v2, or also in v1. There has been also other similar cases of which some are v2 only.

Geoffrey Garen

Comment 2 2017-05-23 07:42:53 PDT

I don't think that "Speedometer measures somewhat random edge cases" is a good starting premise for improving our benchmark.

Olli Pettay (:smaug)

Comment 3 2017-05-23 07:59:52 PDT

That wasn't supposed to be any negative comment. Benchmarking just is hard and one may accidentally end up testing features which aren't yet really spec'ed properly or where some browsers have against the spec optimizations. I was just trying to explain why things may look quite different in FF.

Addy Osmani

Comment 4 2017-05-23 14:23:42 PDT

Just adding a comment to say that I've been able to reproduce a 40-50% difference in the Firefox scores across both stable and nightly compared to latest stable Chrome and Safari. What surprises me is automatically running the suite via InteractiveRunner.html (which isn't as intensive) appears to show FF as being slower than either of these other browsers. I'm going to keep digging into what might be causing this disparity in either case.

Ryosuke Niwa

Comment 5 2017-05-23 15:31:58 PDT

(In reply to Addy Osmani from comment #4) > > What surprises me is automatically running the suite via > InteractiveRunner.html (which isn't as intensive) appears to show FF as > being slower than either of these other browsers. I'm going to keep digging > into what might be causing this disparity in either case. That's not really surprising. Firefox's score has been floating around half of what Safari gets for the last two years or so.

Ryosuke Niwa

Comment 6 2017-05-23 15:36:11 PDT

I'll note that when I initially released Speedometer 2, Firefox was faster than Safari. We just made a whole bunch of improvements to Safari to basically double Safari's score over the last four years.

Ryosuke Niwa

Comment 7 2017-05-23 15:39:42 PDT

For the record, see https://www.cnet.com/news/safari-8-browser-on-yosemite-shows-major-speed-boost/ which clearly shows Firefox 29 and Chrome 38 were both faster than Safari 7.

Addy Osmani

Comment 8 2017-05-23 19:06:48 PDT

Thanks for the pointers on scoring trends, Ryosuke. That's helpful to know. What I'm seeing in Chrome is that we appear to spend more time on Inferno, Ember and React with Redux than Firefox does. I wonder to what extent specific implementations are contributing to the scoring change here vs. the collection of apps as a whole.

Ryosuke Niwa

Comment 9 2017-05-30 19:47:45 PDT

Created attachment 311569 [details] Firefox speedometer results

Ryosuke Niwa

Comment 10 2017-05-30 19:47:59 PDT

Created attachment 311570 [details] Safari Tech Preview 30 results

Ryosuke Niwa

Comment 11 2017-05-30 19:52:49 PDT

Okay, I made measurements of Speedometer 1 and Speedometer 2 but with the same subset of tests as Speedometer 1 using InteractiveRunner.html The results seem to indicate that Firefox's runtime goes up by ~20% whereas Safari's runtime goes up by ~15% despite of the fact the total runtime goes up by 1.9x for Firefox and 2.1x for Safari. This seems to indicate somehow all new frameworks and libraries we added in Speedometer 2 are more optimized in Firefox than in Safari. This kind of makes sense since we've been optimizing for Speedometer for 2.5 years; we expect idioms used in the libraries and frameworks included in Speedometer 1 to be getting faster over time relative to ones that are not included. We need to analyze the runtime difference in each framework's test more closely, however. For example, Ember JS's runtime went up from 535ms to 1236ms in Safari whereas Backbone JS's runtime went form 209ms to 112ms. Discrepancy like that are worth a further investigation.

Jan de Mooij

Comment 12 2017-05-31 07:44:37 PDT

(In reply to Ryosuke Niwa from comment #11) > For example, Ember JS's runtime went up from 535ms to 1236ms in Safari One thing I noticed while profiling EmberJS in Firefox is that we spend quite a lot of time in debug code, stuff like this in addObserverForContentKey: _emberMetalDebug.assert('When using @each to observe the array ' + content + ', the array must return an object', typeof item === 'object'); Here they concatenate an array of objects to a string, so we end up with a string containing "[object Object],[object Object],[object Object]...". We can all optimize this in our engines, but it seems pretty silly to have this kind of code in these benchmarks/frameworks. Similar issues have been fixed upstream (in Ember) as it came up in other Ember benchmarks too.

Ryosuke Niwa

Comment 13 2017-05-31 08:07:38 PDT

(In reply to Jan de Mooij from comment #12) > (In reply to Ryosuke Niwa from comment #11) > > For example, Ember JS's runtime went up from 535ms to 1236ms in Safari > > One thing I noticed while profiling EmberJS in Firefox is that we spend > quite a lot of time in debug code, stuff like this in > addObserverForContentKey: > > _emberMetalDebug.assert('When using @each to observe the array ' + content > + ', the array must return an object', typeof item === 'object'); > > Here they concatenate an array of objects to a string, so we end up with a > string containing "[object Object],[object Object],[object Object]...". > > We can all optimize this in our engines, but it seems pretty silly to have > this kind of code in these benchmarks/frameworks. Similar issues have been > fixed upstream (in Ember) as it came up in other Ember benchmarks too. On one hand, it seems silly but on the other hand if it's actually in Ember.js, then it 's probably also pushed to the production code. At that point, we might be on hook to optimize that code. We also see console errors like assertion failures and exceptions getting throwing on production websites all the time. We could argue that those things are silly too but at the same time, getting rid of them would also introduce a measurement bias so we need to be careful in "fixing" these silly things libraries and frameworks do.

Addy Osmani

Comment 14 2017-05-31 10:52:20 PDT

> On one hand, it seems silly but on the other hand if it's actually in Ember.js, then it 's probably also pushed to the production code. At that point, we might be on hook to optimize that code. One observation from the field I've noticed developers ship a mix of proper production code (stripped of debug statements) and non-stripped code irrespective of framework. I've seen this occur with React, Angular and Ember at the very least in part because they historically hadn't made it as straight-forward to just ship the right thing when deploying. "getting rid of them would also introduce a measurement bias" is a fair position to have. I'm personally on the fence about assuming all production code doesn't include such debug statements. > We can all optimize this in our engines, but it seems pretty silly to have this kind of code in these benchmarks/frameworks. Similar issues have been fixed upstream (in Ember) as it came up in other Ember benchmarks too. Do you have any links to upstream Ember issues or bugfixes around this that we can take a look at? Anything from either Ember core or benchmarks (DBMon?) where this was an issue would be interesting to look at.

Jan de Mooij

Comment 15 2017-06-01 04:28:09 PDT

(In reply to Addy Osmani from comment #14) > Do you have any links to upstream Ember issues or bugfixes around this that > we can take a look at? Anything from either Ember core or benchmarks > (DBMon?) where this was an issue would be interesting to look at. Sure, see https://bugzilla.mozilla.org/show_bug.cgi?id=1352486#c12

Mathias Bynens

Comment 16 2017-08-18 02:18:32 PDT

This should be revisited in light of the recent updates to Speedometer in trunk. Specifically, the Ember benchmark has been updated to use a modern production build of Ember.

Addy Osmani

Comment 17 2017-08-18 07:40:43 PDT

I recently spoke to Harald Kirschner about Mozilla's investigation into the discrepancies with the S2 runtime numbers. They were unable to discover anything that seemed out of place. They also re-checked the figures after the recent Ember changes to use the production build were applied and didn't see noticeable changes or improvements. Ryosuke, what are the next steps to address this issue? Were you after a cross-browser breakdown of time spent in runtime for each implementation?

Ryosuke Niwa

Comment 18 2017-08-18 13:54:36 PDT

We need to re-measure the performance of Safari, Chrome, Firefox between Speedometer 1 & 2 and analyze the results. We should also test versions of Safari & Chrome prior to Speedometer 1's release and see if they see a similar improvement to Firefox. Since our hypothesis is that Safari & Chrome's score don't improve simply because they were already optimized for Speedometer 1 content, then we'd expect that versions of Safari & Chrome prior to Speedometer 1's release would behave much like Firefox instead.

Ryosuke Niwa

Comment 19 2017-08-18 20:29:32 PDT

I've done the same experiment of running Speedometer 1 and Speedometer 2 on Safari 7.0.6, and I'm seeing ~44% progression from Speedometer 1 to Speedometer 2 for the total time of subtests excluding Angular, which no longer runs on Safari 7.0.6 due to the lack of support of Promise. This confirms my hypothesis that the relatively smaller speedups in Chrome and Safari compared to Firefox comes from the fact Chrome and Safari have been optimized for Speedometer content in the last three years. Closing this bug given the observation.

Addy Osmani

Comment 20 2017-08-18 20:59:15 PDT

> I've done the same experiment of running Speedometer 1 and Speedometer 2 on Safari 7.0.6, and I'm seeing ~44% progression from Speedometer 1 to Speedometer 2 for the total time of subtests excluding Angular, which no longer runs on Safari 7.0.6 due to the lack of support of Promise. > This confirms my hypothesis that the relatively smaller speedups in Chrome and Safari compared to Firefox comes from the fact Chrome and Safari have been optimized for Speedometer content in the last three years. > Closing this bug given the observation. Our sincere thanks for spending time analyzing the deltas between 1 and 2 in more depth. It's very useful to know that Safari and Chrome's minor speedups here are probably been due to historical investment in looking at the Speedometer benchmark. We appreciate the observation being shared. With this issue being closed, https://bugs.webkit.org/show_bug.cgi?id=175715 is currently the remaining blocker to finalizing S2. If there are any other framework implementations you would like updated further, we're happy to spend time on that next week.

Note You need to log in before you can comment on or make changes to this bug.