96382 – Android's mock scrollbars shows up as a difference in layout test results

RESOLVED FIXED96382

Android's mock scrollbars shows up as a difference in layout test results

https://bugs.webkit.org/show_bug.cgi?id=96382

Summary Android's mock scrollbars shows up as a difference in layout test results

Peter Beverloo

Reported 2012-09-11 06:57:25 PDT

Example output: http://build.webkit.org/results/Chromium%20Android%20Release%20%28Tests%29/r128173%20%2864%29/fast/backgrounds/background-inherit-color-bug-diffs.html With the diff being: http://build.webkit.org/results/Chromium%20Android%20Release%20%28Tests%29/r128173%20%2864%29/fast/backgrounds/background-inherit-color-bug-diff.png It only points out the scrollbars. This is causing a lot of unnecessary failures.

Attachments
Patch (4.75 KB, patch) 2012-09-24 04:21 PDT, Peter Beverloo	no flags	Details Formatted Diff Diff
Fix typos (4.75 KB, patch) 2012-09-24 04:29 PDT, Peter Beverloo	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Peter Beverloo

Comment 1 2012-09-12 06:03:41 PDT

I'm not sure we can address this without needing new Android baselines for every test that has a scrollbar. Windows, Mac and Linux do the same (see links below). Our scrollbars are painted using the compositor, which we force to be on in the shipping version of the product itself. I think the best way forward would be to have these show up on the pixel results (i.e. what the product itself uses), and then submit a few massive commits with baselines. Ideas / confirmation before I do this would be most welcome! Another example output: http://build.webkit.org/results/Chromium%20Android%20Release%20(Tests)/r128190%20(72)/css2.1/t1202-counters-09-b-diffs.html Mac example: http://build.webkit.org/results/Chromium%20Mac%20Release%20%28Tests%29/r128297%20%2823853%29/fast/block/float/028-diffs.html Windows using mock scrollbars: http://build.webkit.org/results/Chromium%20Win%20Release%20%28Tests%29/r128290%20%2829919%29/fast/block/float/028-diffs.html

Peter Beverloo

Comment 2 2012-09-12 06:25:40 PDT

Android always uses the compositor, and forces it to be on. The scrollbars have been implemented in the compositor as well. For the most accurate results, we'd need to turn the compositor on for all tests. This means that we'd have to add new baselines for (a) all text baselines, as they would now start showing layers, (b) all image baselines with a scrollbar and (c) image baselines needing other adjustments. Adding everything up, this means that we'd need new baselines for pretty much every test. Maybe some intermediary result directory between chromium-linux and chromium-android, i.e. chromium-linux-compositing, could mean that we can still share many results with Linux (when/if they start forcing the compositor)...

Tony Chang

Comment 3 2012-09-12 10:15:34 PDT

FWIW, I don't think it's a big deal to land different pixel results for chromium-android. There are "only" 7218 chromium-linux png files. If there are the same number of chromium-android files, that's not much compared to the 183551 total files in the repository. The main problem is keeping the results up to date, but once the bots start running tests and we hook up garden-o-matic to it, that should be pretty easy.

Peter Beverloo

Comment 4 2012-09-12 10:17:39 PDT

(In reply to comment #3) > FWIW, I don't think it's a big deal to land different pixel results for chromium-android. There are "only" 7218 chromium-linux png files. If there are the same number of chromium-android files, that's not much compared to the 183551 total files in the repository. > > The main problem is keeping the results up to date, but once the bots start running tests and we hook up garden-o-matic to it, that should be pretty easy. Getting garden-o-matic to work is definitely a priority. One nit here: afaik turning on the compositor will also start showing the layers in the normal DumpRenderTree output, which means new text results as well.

James Robinson

Comment 5 2012-09-12 10:30:07 PDT

Layers don't show up in the text output unless the test asks for them by testRunner.layerTreeAsText(). Do you want scrollbars to show up in the .pngs at all on android? They don't occupy space (do they?) so maybe you should just leave them out. That means you can't share pixel baselines with other ports on tests with visible scrollbars, but you aren't sharing behavior either thanks to overlay so that seems appropriate.

Xianzhu Wang

Comment 6 2012-09-12 10:37:04 PDT

How does garden-o-matic distinguish tests to be rebaselined and bugs to be fixed, among the thousands of mismatches? The following are a bit off topic: Perhaps we can make the pixel results automatically rebaselined, but for crash/timeout/text mismatches, I think we still need to manually check one by one to distinguish bugs from rebaselines. So I think keeping the text expectations as many matching is important to reduce the cost. For the scrollbar case, the current code makes the text expectations matching chromium-linux/chromium-win. So does the code for font configurations. For those features, DumpRenderTree doesn't execute the actual code paths of chromium-android. I think we can add several Android-specific test cases for them with the actual paths selected with some Android-specific layout test API. For the other tests the DumpRenderTree code paths are used to reduce the number of mismatches.

Peter Beverloo

Comment 7 2012-09-12 10:37:25 PDT

(In reply to comment #5) > Layers don't show up in the text output unless the test asks for them by testRunner.layerTreeAsText(). > > Do you want scrollbars to show up in the .pngs at all on android? They don't occupy space (do they?) so maybe you should just leave them out. That means you can't share pixel baselines with other ports on tests with visible scrollbars, but you aren't sharing behavior either thanks to overlay so that seems appropriate. I see, that's good news, so we can enable forced compositing. Do you think it'd be fine to now show scrollbars altogether? They're layers in a parallel tree and are drawn a top of the content, so it seems to me that they indeed don't occupy space. My concern is that this would make it impossible to determine whether the content exceeds the result's viewport or not. Since we're not going to be able to share the results anyway, both (a) being closer to what we ship and (b) making it more obvious why a test may be failing in case of scrolling content make me think that we should enable them.

Peter Beverloo

Comment 8 2012-09-12 10:40:03 PDT

(In reply to comment #6) > How does garden-o-matic distinguish tests to be rebaselined and bugs to be fixed, among the thousands of mismatches? > > > The following are a bit off topic: > > Perhaps we can make the pixel results automatically rebaselined, but for crash/timeout/text mismatches, I think we still need to manually check one by one to distinguish bugs from rebaselines. So I think keeping the text expectations as many matching is important to reduce the cost. > > For the scrollbar case, the current code makes the text expectations matching chromium-linux/chromium-win. So does the code for font configurations. For those features, DumpRenderTree doesn't execute the actual code paths of chromium-android. I think we can add several Android-specific test cases for them with the actual paths selected with some Android-specific layout test API. For the other tests the DumpRenderTree code paths are used to reduce the number of mismatches. We shouldn't be having thousands of mismatches to start with, but rather a good baseline of the expected results. Crashes will show up independently. With our own larger set of baselines, we can also remove a number of these exceptions and start testing what we use. Fonts would be an exception here due to the shaping and dimensions, unfortunately.

Dirk Pranke

Comment 9 2012-09-12 10:42:52 PDT

(In reply to comment #6) > How does garden-o-matic distinguish tests to be rebaselined and bugs to be fixed, among the thousands of mismatches? > It doesn't. That is up the dev's discretion. > > The following are a bit off topic: > > Perhaps we can make the pixel results automatically rebaselined, but for crash/timeout/text mismatches, I think we still need to manually check one by one to distinguish bugs from rebaselines. So I think keeping the text expectations as many matching is important to reduce the cost. > I'm not sure I understand what you mean by "automatically rebaselined" here?

Dirk Pranke

Comment 10 2012-09-12 10:43:19 PDT

(In reply to comment #7) > My concern is that this would make it impossible to determine whether the content exceeds the result's viewport or not. The render tree output would at least tell you this.

Xianzhu Wang

Comment 11 2012-09-12 10:48:04 PDT

(In reply to comment #9) > (In reply to comment #6) > > How does garden-o-matic distinguish tests to be rebaselined and bugs to be fixed, among the thousands of mismatches? > > > > It doesn't. That is up the dev's discretion. > > > > > The following are a bit off topic: > > > > Perhaps we can make the pixel results automatically rebaselined, but for crash/timeout/text mismatches, I think we still need to manually check one by one to distinguish bugs from rebaselines. So I think keeping the text expectations as many matching is important to reduce the cost. > > > > I'm not sure I understand what you mean by "automatically rebaselined" here? This is related to my first question. Maybe I'm wrong but it seems impractical to me to check the thousands of mismatches one by one, so I guessed pixel mismatches are rebaselined without checking ("automatically rebaselined").

Dirk Pranke

Comment 12 2012-09-12 10:49:02 PDT

(In reply to comment #11) > > I'm not sure I understand what you mean by "automatically rebaselined" here? > > This is related to my first question. Maybe I'm wrong but it seems impractical to me to check the thousands of mismatches one by one, so I guessed pixel mismatches are rebaselined without checking ("automatically rebaselined"). Ah, I see. No, typically when this happens people actually at least glance at all of the results (or blindly approve them, though that is "discouraged" :).

Xianzhu Wang

Comment 13 2012-09-12 10:53:06 PDT

(In reply to comment #8) > We shouldn't be having thousands of mismatches to start with, but rather a good baseline of the expected results. Crashes will show up independently. > This is true for text expectations because we have created special code paths in layout test mode. However for pixel results, based on our experience in downstream, there would be thousands of mismatches because of the slight differences at the edges of the font strokes. Because of this we didn't run pixel tests in downstream to avoid the workload and hope this be resolved in upstream with the gardening process.

Xianzhu Wang

Comment 14 2012-09-12 11:36:26 PDT

FYI, the following layout test specific code paths have been created to reduce text mismatches of scrollbars (but not to reduce pixel mismatches): - let measurement of scrollbar match chromium-linux: http://trac.webkit.org/changeset/104139 - disable overlay scrollbar: http://trac.webkit.org/changeset/123825 - why the scrollbar is black in layout test mode: http://trac.webkit.org/browser/trunk/Source/WebCore/platform/chromium/ScrollbarThemeChromiumAndroid.cpp?annotate=blame#L165

James Robinson

Comment 15 2012-09-12 12:14:41 PDT

You should have DRT behave in as close to the same way as your actual browser does as possible. Disabling overlay scrollbars seems like a huge mistake, you want to test what users are actually experiencing.

Peter Beverloo

Comment 16 2012-09-12 12:21:09 PDT

We'll (In reply to comment #10) > (In reply to comment #7) > > My concern is that this would make it impossible to determine whether the content exceeds the result's viewport or not. > > The render tree output would at least tell you this. Yes, but is there a reason not to? What do you think about the following steps forward? 1) We'll force compositor mode and threaded compositing for DumpRenderTree on Android, matching the product itself. 2) We'll make the mock scrollbars light grey (so the scroller is visible and it's visually distinguishable from the test's content), if possible, keep the reserved space, and enable the overlay scrollbars again. 3) We'll diagnose further failures, to make sure there isn't any low hanging fruit that may cause major re-baselines after we're done. 4) We'll start a plan to get the tests re-based while verifying every result visually. There are various WebKit committers in London and with some tools we should be able to do this quickly and accurately. Note for (2): Some tests may be depending on the predictable inner layout width, so I don't think we should change that, but this is a triviality. The eventual goal is to match the product as closely as possible. Integration with garden-o-matic is planned and shouldn't be too long away, but we do need a good baseline in the first place in order for that to work.

Peter Beverloo

Comment 17 2012-09-12 12:22:57 PDT

(In reply to comment #16) > Note for (2): Some tests may be depending on the predictable inner layout width, so I don't think we should change that, but this is a triviality. Err, the "but this is a triviality" part is from a sentence I removed. I am not experienced enough with WebKit layout tests to judge what impact removing the reserved space would have, and would like to defer to one of you guys for making that decision.

Tony Chang

Comment 18 2012-09-12 12:24:28 PDT

You should be able to use garden-o-matic to examine and rebaseline the chromium-android results. I've used it in the past for rebaselining hundreds of pixel test changes (I think it took less than an hour).

Xianzhu Wang

Comment 19 2012-09-12 12:28:29 PDT

(In reply to comment #15) > You should have DRT behave in as close to the same way as your actual browser does as possible. Disabling overlay scrollbars seems like a huge mistake, you want to test what users are actually experiencing. I think this is a trade-off and we can reduce the mistake by adding a few tests with an layout test API to enable overlay scrollbars. For the existing tests, most of them are not actually testing scrollbars, keeping them matching with existing baselines can reduce the cost of managing the new baselines.

Peter Beverloo

Comment 20 2012-09-24 04:21:54 PDT

Created attachment 165351 [details] Patch

Sami Kyöstilä

Comment 21 2012-09-24 04:27:44 PDT

Comment on attachment 165351 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=165351&action=review > Source/WebCore/ChangeLog:10 > + bringing the tests closer to what we actually skip. s/skip/ship/ > Source/WebCore/ChangeLog:13 > + take and width on Android, they're rendered on top of the content. Therefore s/and/any/

Peter Beverloo

Comment 22 2012-09-24 04:29:43 PDT

Created attachment 165354 [details] Fix typos

Adam Barth

Comment 23 2012-09-24 11:22:01 PDT

Comment on attachment 165354 [details] Fix typos Removing calls to isRunningLayoutTest LGTM.

WebKit Review Bot

Comment 24 2012-09-24 11:42:03 PDT

Comment on attachment 165354 [details] Fix typos Clearing flags on attachment: 165354 Committed r129394: <http://trac.webkit.org/changeset/129394>

WebKit Review Bot

Comment 25 2012-09-24 11:42:07 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Peter Beverloo

Reported

2012-09-11 06:57 PDT

Modified

2012-09-24 11:42 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks

96398

Dependencies

tree graph