Summary: | [Chromium-Android] Apply all Linux applicable layout test expectations | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Xianzhu Wang <wangxianzhu> | ||||||||
Component: | Tools / Tests | Assignee: | Xianzhu Wang <wangxianzhu> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Normal | CC: | abarth, dpranke, james.wei, laszlo.gombos, ojan, peter, webkit.review.bot, xingnan.wang | ||||||||
Priority: | P2 | ||||||||||
Version: | 528+ (Nightly build) | ||||||||||
Hardware: | Android | ||||||||||
OS: | Android | ||||||||||
Bug Depends on: | |||||||||||
Bug Blocks: | 66687, 93627 | ||||||||||
Attachments: |
|
Description
Xianzhu Wang
2012-07-30 09:51:40 PDT
My main concern here is maintainability. What happens when someone adds a new entry to TestExpectations for LINUX. Do we expect them to blindly add ANDROID too? If they don't, will we need to fork TestExpectations downstream? These problems all go way once we have chromium-android test bots running upstream so that folks can see whether the test actually fails on Android. Created attachment 155308 [details]
Patch
Comment on attachment 155308 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=155308&action=review > LayoutTests/platform/chromium/TestExpectations:127 > +BUGCR89468 WIN LINUX ANDROID ANDROID LION : css3/selectors3/xml/css3-modsel-d1.xml = IMAGE IMAGE+TEXT > +BUGCR89468 WIN LINUX ANDROID ANDROID LION : css3/selectors3/xml/css3-modsel-d1b.xml = IMAGE IMAGE+TEXT > +BUGCR89468 WIN LINUX ANDROID ANDROID LION : css3/selectors3/xml/css3-modsel-d2.xml = IMAGE IMAGE+TEXT > +BUGCR89468 WIN LINUX ANDROID ANDROID LION : css3/selectors3/xml/css3-modsel-d3.xml = IMAGE IMAGE+TEXT > +BUGCR89468 WIN LINUX ANDROID ANDROID LION : css3/selectors3/xml/css3-modsel-d4.xml = IMAGE IMAGE+TEXT These seem to list ANDROID already. Comment on attachment 155308 [details]
Patch
Remove review request. I think I should only add 'ANDROID' where the test is applicable.
(In reply to comment #1) > My main concern here is maintainability. What happens when someone adds a new entry to TestExpectations for LINUX. Do we expect them to blindly add ANDROID too? If they don't, will we need to fork TestExpectations downstream? > I think they can just ignore android for now. For new failures on Android, we will still use test_expectations_android.txt in downstream until it is completely upstreamed. > These problems all go way once we have chromium-android test bots running upstream so that folks can see whether the test actually fails on Android. > I think they can just ignore android for now. For new failures on Android, we will still use test_expectations_android.txt in downstream until it is completely upstreamed.
Ok. When this patch lands, would you be willing to send a message to the webkit-gardening list to that effect?
(In reply to comment #6) > Ok. When this patch lands, would you be willing to send a message to the webkit-gardening list to that effect? Sure. Created attachment 155319 [details]
Patch
I feel it's a bit early to land this, given that we're not even able to run layout tests on a bot configuration yet, and it therefore would be a void change.. Given the number of updates to the TestExpectations file, it'd definitely need another round of changes when we do activate the tester. Should we wait until we do run layout tests upstream? Furthermore, I expect a very, very big amount of failures related to other issues, i.e. the color and font issues we're seeing right now. There will be plenty of triaging to do. Do you think it may be better to start from zero, when doing that? (In reply to comment #9) > I feel it's a bit early to land this, given that we're not even able to run layout tests on a bot configuration yet, and it therefore would be a void change.. Given the number of updates to the TestExpectations file, it'd definitely need another round of changes when we do activate the tester. Should we wait until we do run layout tests upstream? > I'm also hesitating. I made this change mainly for downstream to fix the current about 200 failures. There is also a downstream change to add the failure expectations in test_expectations_android.txt. I have no preference between them. Another way is to fix ChromiumAndroidPort.test_expectations() so that we can still automatically inherit Linux expectations, but I don't like it because it is still temporary. What's your opinion? > Furthermore, I expect a very, very big amount of failures related to other issues, i.e. the color and font issues we're seeing right now. There will be plenty of triaging to do. Do you think it may be better to start from zero, when doing that? Are these failures already in the downstream test_expectations_android.txt? If not, there must be caused by some difference between upstream/downstream. Comment on attachment 155319 [details]
Patch
Now I feel the same as Peter: it's early to do this. I will do it in downstream first.
(In reply to comment #10) > I'm also hesitating. I made this change mainly for downstream to fix the current about 200 failures. There is also a downstream change to add the failure expectations in test_expectations_android.txt. I have no preference between them. Another way is to fix ChromiumAndroidPort.test_expectations() so that we can still automatically inherit Linux expectations, but I don't like it because it is still temporary. What's your opinion? Downstream we have an Android-specific file with test expectations, but it's hidden by the diff tool. Before deciding which way to go, I think it's important to assess where exactly we are in terms of layout test "performance". > Are these failures already in the downstream test_expectations_android.txt? If not, there must be caused by some difference between upstream/downstream. Parts will be. I'm still in favor of removing some of the text-related code we have in place to mimic Linux as closely as possible. We should be testing code paths that are being used in our product, even if that comes at the cost of having to maintain more baselines. There are tools to do that anyway :). (In reply to comment #12) > Downstream we have an Android-specific file with test expectations, but it's hidden by the diff tool. Before deciding which way to go, I think it's important to assess where exactly we are in terms of layout test "performance". > What do you mean about "layout test performance"? > > Parts will be. I'm still in favor of removing some of the text-related code we have in place to mimic Linux as closely as possible. We should be testing code paths that are being used in our product, even if that comes at the cost of having to maintain more baselines. There are tools to do that anyway :). In fact chromium-linux does the similar things about fonts to match chromium-win expectations. The difference that chromium-linux uses a special fontconfig file while chromium-android uses several lines of hard-coded font rendering style code and Android-specific font configuration files. Without this I guess the complexity of maintenance work would increase by an order of magnitude. Based on our data collected before (around 12/2010), we would need to rebaseline more than 4000 text expectations without the font configurations. (In reply to comment #13) > What do you mean about "layout test performance"? Number of passing v.s. failing tests. > In fact chromium-linux does the similar things about fonts to match chromium-win expectations. The difference that chromium-linux uses a special fontconfig file while chromium-android uses several lines of hard-coded font rendering style code and Android-specific font configuration files. Without this I guess the complexity of maintenance work would increase by an order of magnitude. Based on our data collected before (around 12/2010), we would need to rebaseline more than 4000 text expectations without the font configurations. I'm not sure. Adam, what would you advise? It's the many custom baseline v.s. not testing exactly what we're using question. (In reply to comment #14) > (In reply to comment #13) > > What do you mean about "layout test performance"? > > Number of passing v.s. failing tests. > As of 6/12/2012, 1% tests were failing to be fixed/investigated, 2.5% tests were failing but won't be fixed (because they are not applicable on Android). Note these are Android-specific, i.e. tests that were also failing on Linux are not counted. We have 335 lines (containing '=') in downstream test_expectations_android.txt. Created attachment 157260 [details]
Patch
I think we'd better make the change now otherwise we need to update individual expectations which will cost us more. For the issues of font rendering and color, I think we can work on them concurrently. Comment on attachment 157260 [details]
Patch
I'm fine with this as long as Peter is happy.
While I still feel a bit wary, go for it. We should make up a final plan about layout tests/WebKit infrastructure offline, to make sure that we're all on the same line. I'll start a thread about it tomorrow. Should we instead just make android a subset of linux? In other words, LINUX would imply LUCID and ANDROID instead of just LUCID. This is the same as MAC implying LION, SNOWLEOPARD, MOUNTAINLION. > Should we instead just make android a subset of linux?
Nope. There have been long discussions about that topic, and the result of those discussions is that considering Android to be a flavor of Linux causes more problems than it fixes. We're just being consistent with that decision here.
Comment on attachment 157260 [details]
Patch
Thanks all for review and comments. Just ran 'new-run-webkit-tests --no-pixel-tests fast' with the changed expectations. Results:
8215 tests ran as expected, 169 didn't
Expected to timeout, but passed (2)
Expected to fail, but passed (6)
Regressions: Unexpected text failures : (128)
Regressions: Unexpected image-only failures : (1)
Regressions: Unexpected crashes : (9)
Regressions: Unexpected timeouts : (23)
In the 169 failed tests, 155 of them have been in downstream text_expectations_android.txt or have been rebaselined in downstream (which will be upstreamed). Only 14 of them are real unexpected which might be because of new changes not merged into downstream. I think some of the text mismatches Peter observed were because of this bug.
About the concern about not testing real product paths, my thoughts are:
1) about using the customized fonts, I think this is not another product path because the different Android devices might also use different fonts;
2) about font rendering styles, the differences are about hinting and sub-pixel positioning. This is a trade-off between running real code path and thousands of tests to be rebaselined. In addition, we could test product code path by explicitly set hinting and sub-pixel positioning in particular platform-specific tests, like platform/chromium-linux/fast/text/chromium-linux-text-subpixel-positioning.html.
3) about scroll bar overlay, my opinion is the same as 2).
Comment on attachment 157260 [details] Patch Clearing flags on attachment: 157260 Committed r125094: <http://trac.webkit.org/changeset/125094> All reviewed patches have been landed. Closing bug. *** Bug 90515 has been marked as a duplicate of this bug. *** *** Bug 90520 has been marked as a duplicate of this bug. *** |