Bug 96382 - Android's mock scrollbars shows up as a difference in layout test results
Summary: Android's mock scrollbars shows up as a difference in layout test results
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Peter Beverloo
URL:
Keywords:
Depends on:
Blocks: 96398
  Show dependency treegraph
 
Reported: 2012-09-11 06:57 PDT by Peter Beverloo
Modified: 2012-09-24 11:42 PDT (History)
7 users (show)

See Also:


Attachments
Patch (4.75 KB, patch)
2012-09-24 04:21 PDT, Peter Beverloo
no flags Details | Formatted Diff | Diff
Fix typos (4.75 KB, patch)
2012-09-24 04:29 PDT, Peter Beverloo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Peter Beverloo 2012-09-12 06:03:41 PDT
I'm not sure we can address this without needing new Android baselines for every test that has a scrollbar. Windows, Mac and Linux do the same (see links below).

Our scrollbars are painted using the compositor, which we force to be on in the shipping version of the product itself. I think the best way forward would be to have these show up on the pixel results (i.e. what the product itself uses), and then submit a few massive commits with baselines.

Ideas / confirmation before I do this would be most welcome!


Another example output:
http://build.webkit.org/results/Chromium%20Android%20Release%20(Tests)/r128190%20(72)/css2.1/t1202-counters-09-b-diffs.html

Mac example:
http://build.webkit.org/results/Chromium%20Mac%20Release%20%28Tests%29/r128297%20%2823853%29/fast/block/float/028-diffs.html

Windows using mock scrollbars:
http://build.webkit.org/results/Chromium%20Win%20Release%20%28Tests%29/r128290%20%2829919%29/fast/block/float/028-diffs.html
Comment 2 Peter Beverloo 2012-09-12 06:25:40 PDT
Android always uses the compositor, and forces it to be on. The scrollbars have been implemented in the compositor as well.

For the most accurate results, we'd need to turn the compositor on for all tests. This means that we'd have to add new baselines for (a) all text baselines, as they would now start showing layers, (b) all image baselines with a scrollbar and (c) image baselines needing other adjustments. Adding everything up, this means that we'd need new baselines for pretty much every test.

Maybe some intermediary result directory between chromium-linux and chromium-android, i.e. chromium-linux-compositing, could mean that we can still share many results with Linux (when/if they start forcing the compositor)...
Comment 3 Tony Chang 2012-09-12 10:15:34 PDT
FWIW, I don't think it's a big deal to land different pixel results for chromium-android.  There are "only" 7218 chromium-linux png files.  If there are the same number of chromium-android files, that's not much compared to the 183551 total files in the repository.

The main problem is keeping the results up to date, but once the bots start running tests and we hook up garden-o-matic to it, that should be pretty easy.
Comment 4 Peter Beverloo 2012-09-12 10:17:39 PDT
(In reply to comment #3)
> FWIW, I don't think it's a big deal to land different pixel results for chromium-android.  There are "only" 7218 chromium-linux png files.  If there are the same number of chromium-android files, that's not much compared to the 183551 total files in the repository.
> 
> The main problem is keeping the results up to date, but once the bots start running tests and we hook up garden-o-matic to it, that should be pretty easy.

Getting garden-o-matic to work is definitely a priority. One nit here: afaik turning on the compositor will also start showing the layers in the normal DumpRenderTree output, which means new text results as well.
Comment 5 James Robinson 2012-09-12 10:30:07 PDT
Layers don't show up in the text output unless the test asks for them by testRunner.layerTreeAsText().

Do you want scrollbars to show up in the .pngs at all on android?  They don't occupy space (do they?) so maybe you should just leave them out.  That means you can't share pixel baselines with other ports on tests with visible scrollbars, but you aren't sharing behavior either thanks to overlay so that seems appropriate.
Comment 6 Xianzhu Wang 2012-09-12 10:37:04 PDT
How does garden-o-matic distinguish tests to be rebaselined and bugs to be fixed, among the thousands of mismatches?


The following are a bit off topic:

Perhaps we can make the pixel results automatically rebaselined, but for crash/timeout/text mismatches, I think we still need to manually check one by one to distinguish bugs from rebaselines. So I think keeping the text expectations as many matching is important to reduce the cost.

For the scrollbar case, the current code makes the text expectations matching chromium-linux/chromium-win. So does the code for font configurations. For those features, DumpRenderTree doesn't execute the actual code paths of chromium-android. I think we can add several Android-specific test cases for them with the actual paths selected with some Android-specific layout test API. For the other tests the DumpRenderTree code paths are used to reduce the number of mismatches.
Comment 7 Peter Beverloo 2012-09-12 10:37:25 PDT
(In reply to comment #5)
> Layers don't show up in the text output unless the test asks for them by testRunner.layerTreeAsText().
> 
> Do you want scrollbars to show up in the .pngs at all on android?  They don't occupy space (do they?) so maybe you should just leave them out.  That means you can't share pixel baselines with other ports on tests with visible scrollbars, but you aren't sharing behavior either thanks to overlay so that seems appropriate.

I see, that's good news, so we can enable forced compositing.

Do you think it'd be fine to now show scrollbars altogether? They're layers in a parallel tree and are drawn a top of the content, so it seems to me that they indeed don't occupy space.

My concern is that this would make it impossible to determine whether the content exceeds the result's viewport or not.
Since we're not going to be able to share the results anyway, both (a) being closer to what we ship and (b) making it more obvious why a test may be failing in case of scrolling content make me think that we should enable them.
Comment 8 Peter Beverloo 2012-09-12 10:40:03 PDT
(In reply to comment #6)
> How does garden-o-matic distinguish tests to be rebaselined and bugs to be fixed, among the thousands of mismatches?
> 
> 
> The following are a bit off topic:
> 
> Perhaps we can make the pixel results automatically rebaselined, but for crash/timeout/text mismatches, I think we still need to manually check one by one to distinguish bugs from rebaselines. So I think keeping the text expectations as many matching is important to reduce the cost.
> 
> For the scrollbar case, the current code makes the text expectations matching chromium-linux/chromium-win. So does the code for font configurations. For those features, DumpRenderTree doesn't execute the actual code paths of chromium-android. I think we can add several Android-specific test cases for them with the actual paths selected with some Android-specific layout test API. For the other tests the DumpRenderTree code paths are used to reduce the number of mismatches.

We shouldn't be having thousands of mismatches to start with, but rather a good baseline of the expected results. Crashes will show up independently.

With our own larger set of baselines, we can also remove a number of these exceptions and start testing what we use. Fonts would be an exception here due to the shaping and dimensions, unfortunately.
Comment 9 Dirk Pranke 2012-09-12 10:42:52 PDT
(In reply to comment #6)
> How does garden-o-matic distinguish tests to be rebaselined and bugs to be fixed, among the thousands of mismatches?
> 

It doesn't. That is up the dev's discretion.

> 
> The following are a bit off topic:
> 
> Perhaps we can make the pixel results automatically rebaselined, but for crash/timeout/text mismatches, I think we still need to manually check one by one to distinguish bugs from rebaselines. So I think keeping the text expectations as many matching is important to reduce the cost.
>

I'm not sure I understand what you mean by "automatically rebaselined" here?
Comment 10 Dirk Pranke 2012-09-12 10:43:19 PDT
(In reply to comment #7)
> My concern is that this would make it impossible to determine whether the content exceeds the result's viewport or not.

The render tree output would at least tell you this.
Comment 11 Xianzhu Wang 2012-09-12 10:48:04 PDT
(In reply to comment #9)
> (In reply to comment #6)
> > How does garden-o-matic distinguish tests to be rebaselined and bugs to be fixed, among the thousands of mismatches?
> > 
> 
> It doesn't. That is up the dev's discretion.
> 
> > 
> > The following are a bit off topic:
> > 
> > Perhaps we can make the pixel results automatically rebaselined, but for crash/timeout/text mismatches, I think we still need to manually check one by one to distinguish bugs from rebaselines. So I think keeping the text expectations as many matching is important to reduce the cost.
> >
> 
> I'm not sure I understand what you mean by "automatically rebaselined" here?

This is related to my first question. Maybe I'm wrong but it seems impractical to me to check the thousands of mismatches one by one, so I guessed pixel mismatches are rebaselined without checking ("automatically rebaselined").
Comment 12 Dirk Pranke 2012-09-12 10:49:02 PDT
(In reply to comment #11)
> > I'm not sure I understand what you mean by "automatically rebaselined" here?
> 
> This is related to my first question. Maybe I'm wrong but it seems impractical to me to check the thousands of mismatches one by one, so I guessed pixel mismatches are rebaselined without checking ("automatically rebaselined").

Ah, I see. No, typically when this happens people actually at least glance at all of the results (or blindly approve them, though that is "discouraged" :).
Comment 13 Xianzhu Wang 2012-09-12 10:53:06 PDT
(In reply to comment #8)
 
> We shouldn't be having thousands of mismatches to start with, but rather a good baseline of the expected results. Crashes will show up independently.
> 

This is true for text expectations because we have created special code paths in layout test mode. However for pixel results, based on our experience in downstream, there would be thousands of mismatches because of the slight differences at the edges of the font strokes. Because of this we didn't run pixel tests in downstream to avoid the workload and hope this be resolved in upstream with the gardening process.
Comment 14 Xianzhu Wang 2012-09-12 11:36:26 PDT
FYI, the following layout test specific code paths have been created to reduce text mismatches of scrollbars (but not to reduce pixel mismatches):

  - let measurement of scrollbar match chromium-linux: http://trac.webkit.org/changeset/104139

  - disable overlay scrollbar: http://trac.webkit.org/changeset/123825

  - why the scrollbar is black in layout test mode: http://trac.webkit.org/browser/trunk/Source/WebCore/platform/chromium/ScrollbarThemeChromiumAndroid.cpp?annotate=blame#L165
Comment 15 James Robinson 2012-09-12 12:14:41 PDT
You should have DRT behave in as close to the same way as your actual browser does as possible.  Disabling overlay scrollbars seems like a huge mistake, you want to test what users are actually experiencing.
Comment 16 Peter Beverloo 2012-09-12 12:21:09 PDT
We'll (In reply to comment #10)
> (In reply to comment #7)
> > My concern is that this would make it impossible to determine whether the content exceeds the result's viewport or not.
> 
> The render tree output would at least tell you this.

Yes, but is there a reason not to?

What do you think about the following steps forward?

    1) We'll force compositor mode and threaded compositing for DumpRenderTree on Android, matching the product itself.
    2) We'll make the mock scrollbars light grey (so the scroller is visible and it's visually distinguishable from the test's content), if possible, keep the reserved space, and enable the overlay scrollbars again.
    3) We'll diagnose further failures, to make sure there isn't any low hanging fruit that may cause major re-baselines after we're done.
    4) We'll start a plan to get the tests re-based while verifying every result visually. There are various WebKit committers in London and with some tools we should be able to do this quickly and accurately.


Note for (2): Some tests may be depending on the predictable inner layout width, so I don't think we should change that, but this is a triviality.

The eventual goal is to match the product as closely as possible. Integration with garden-o-matic is planned and shouldn't be too long away, but we do need a good baseline in the first place in order for that to work.
Comment 17 Peter Beverloo 2012-09-12 12:22:57 PDT
(In reply to comment #16)
> Note for (2): Some tests may be depending on the predictable inner layout width, so I don't think we should change that, but this is a triviality.

Err, the "but this is a triviality" part is from a sentence I removed.
I am not experienced enough with WebKit layout tests to judge what impact removing the reserved space would have, and would like to defer to one of you guys for making that decision.
Comment 18 Tony Chang 2012-09-12 12:24:28 PDT
You should be able to use garden-o-matic to examine and rebaseline the chromium-android results.  I've used it in the past for rebaselining hundreds of pixel test changes (I think it took less than an hour).
Comment 19 Xianzhu Wang 2012-09-12 12:28:29 PDT
(In reply to comment #15)
> You should have DRT behave in as close to the same way as your actual browser does as possible.  Disabling overlay scrollbars seems like a huge mistake, you want to test what users are actually experiencing.

I think this is a trade-off and we can reduce the mistake by adding a few tests with an layout test API to enable overlay scrollbars. For the existing tests, most of them are not actually testing scrollbars, keeping them matching with existing baselines can reduce the cost of managing the new baselines.
Comment 20 Peter Beverloo 2012-09-24 04:21:54 PDT
Created attachment 165351 [details]
Patch
Comment 21 Sami Kyöstilä 2012-09-24 04:27:44 PDT
Comment on attachment 165351 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=165351&action=review

> Source/WebCore/ChangeLog:10
> +        bringing the tests closer to what we actually skip.

s/skip/ship/

> Source/WebCore/ChangeLog:13
> +        take and width on Android, they're rendered on top of the content. Therefore

s/and/any/
Comment 22 Peter Beverloo 2012-09-24 04:29:43 PDT
Created attachment 165354 [details]
Fix typos
Comment 23 Adam Barth 2012-09-24 11:22:01 PDT
Comment on attachment 165354 [details]
Fix typos

Removing calls to isRunningLayoutTest LGTM.
Comment 24 WebKit Review Bot 2012-09-24 11:42:03 PDT
Comment on attachment 165354 [details]
Fix typos

Clearing flags on attachment: 165354

Committed r129394: <http://trac.webkit.org/changeset/129394>
Comment 25 WebKit Review Bot 2012-09-24 11:42:07 PDT
All reviewed patches have been landed.  Closing bug.