83076 – [Chromium] Lots of timeouts causing Mac10.6 to exit early.

RESOLVED WORKSFORME83076

[Chromium] Lots of timeouts causing Mac10.6 to exit early.

https://bugs.webkit.org/show_bug.cgi?id=83076

Summary [Chromium] Lots of timeouts causing Mac10.6 to exit early.

Ojan Vafai

Reported 2012-04-03 14:49:33 PDT

http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/14439/steps/webkit_tests/logs/stdio Exiting early after 0 crashes and 20 timeouts. 21784 tests run. Regressions: Unexpected tests timed out : (20) animations/cross-fade-background-image.html = TIMEOUT compositing/geometry/empty-embed-rects.html = TIMEOUT compositing/self-painting-layers.html = TIMEOUT compositing/transitions/scale-transition-no-start.html = TIMEOUT css1/basic/class_as_selector.html = TIMEOUT css1/box_properties/acid_test.html = TIMEOUT css1/cascade/cascade_order.html = TIMEOUT css1/classification/display.html = TIMEOUT css1/color_and_background/background.html = TIMEOUT css1/conformance/forward_compatible_parsing.html = TIMEOUT css1/font_properties/font.html = TIMEOUT css1/pseudo/anchor.html = TIMEOUT fast/forms/search-rtl.html = TIMEOUT fast/images/embed-does-not-propagate-dimensions-to-object-ancestor.html = TIMEOUT fast/loader/local-CSS-from-local.html = TIMEOUT fast/table/invisible-cell-background.html = TIMEOUT fast/text/international/plane2.html = TIMEOUT fast/text/justify-ideograph-complex.html = TIMEOUT fast/workers/storage/interrupt-database.html = TIMEOUT http/tests/appcache/remove-cache.html = TIMEOUT

Attachments
Patch (1.71 KB, patch) 2012-04-03 18:21 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
add Changelog, port same logic to apple mac (3.41 KB, patch) 2012-04-03 18:29 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Dirk Pranke

Comment 1 2012-04-03 15:01:34 PDT

I'm on this one ...

Dirk Pranke

Comment 2 2012-04-03 18:21:47 PDT

Created attachment 135477 [details] Patch

Eric Seidel (no email)

Comment 3 2012-04-03 18:23:43 PDT

Comment on attachment 135477 [details] Patch Bleh. Why not just up the amount of ram we expect child processes to take? That seems like a less gross hack.

Dirk Pranke

Comment 4 2012-04-03 18:29:52 PDT

Created attachment 135478 [details] add Changelog, port same logic to apple mac

Dirk Pranke

Comment 5 2012-04-03 18:32:54 PDT

(In reply to comment #3) > (From update of attachment 135477 [details]) > Bleh. Why not just up the amount of ram we expect child processes to take? That seems like a less gross hack. At the moment, at least on the Chromium SL bot, it doesn't look ram-related. It looks like we're thrashing on something else, but have plenty of RAM free.

Dirk Pranke

Comment 6 2012-04-03 18:35:01 PDT

I'm going to land this as-is, so that I can get the bot back online and we can get more data. Unfortunately, it's been flaky and aborting early for so long that I can't easily reproduce things or figure debug it (I've tried rolling back builds on that bot and run into a sordid list of issues that is stopping me that I need to work through in parallel).

Dirk Pranke

Comment 7 2012-04-03 18:35:24 PDT

I'll be happy to roll this out if there are other issues or if we really think this is the wrong thing to do.

Dirk Pranke

Comment 8 2012-04-03 18:40:54 PDT

Committed r113122: <http://trac.webkit.org/changeset/113122>

Dirk Pranke

Comment 9 2012-04-03 18:41:27 PDT

re-opening, I don't consider this fixed yet.

Dirk Pranke

Comment 10 2012-05-03 18:21:12 PDT

Note that we're seeing this quite a bit lately, even after the patch (see, e.g., http://build.chromium.org/p/chromium.webkit/waterfall?builder=Webkit%20Mac10.6&last_time=1336069767 ) ... it's possible that r115490 has made things worse, but I don't know what else might be contributing.

Dirk Pranke

Comment 11 2012-05-04 11:58:19 PDT

It seems like we're frequently seeing many of the same tests timing out this week, so I'm going to start marking them as flaky timeouts here and we'll see if this contains the problem, or if we're seeing systemic flakiness. Here's the first batch: compositing/geometry/outline-change.html css3/selectors3/xml/css3-modsel-161.xml css3/selectors3/xml/css3-modsel-166.xml css3/selectors3/xml/css3-modsel-166a.xml editing/deleting/delete-3857753-fix.html editing/deleting/delete-3865854-fix.html editing/deleting/delete-3928305-fix.html editing/execCommand/4747450.html editing/execCommand/4786404-1.html editing/execCommand/4786404-2.html editing/execCommand/4916235.html editing/input/caret-at-the-edge-of-input.html editing/execCommand/format-block-with-trailing-br.html editing/execCommand/format-block-without-body-crash.html editing/execCommand/format-block.html editing/execCommand/forward-delete-no-scroll.html editing/execCommand/hilitecolor.html editing/input/emacs-ctrl-o.html editing/input/div-first-child-rule-input.html editing/input/div-first-child-rule-textarea.html editing/input/ime-composition-clearpreedit.html editing/input/insert-wrapping-space-in-textarea.html editing/input/option-page-up-down.html editing/input/page-up-down-scrolls.html editing/inserting/12882.html editing/inserting/4278698.html http/tests/history/back-with-fragment-change.php http/tests/history/cross-origin-replace-history-object.html http/tests/history/history-navigations-set-referrer.html http/tests/history/popstate-fires-with-pending-requests.html http/tests/history/redirect-200-refresh-0-seconds.pl http/tests/history/redirect-200-refresh-2-seconds.pl http/tests/history/redirect-301.html

Dirk Pranke

Comment 12 2012-05-04 12:00:12 PDT

rniwa - it looks like maybe these editing tests started being flaky earlier this week. Can you take a look?

Ryosuke Niwa

Comment 13 2012-05-04 12:05:41 PDT

Are you sure they're really timing out? Aren't they just slow? I don't see any changes that can cause things to timeout: http://trac.webkit.org/log/trunk/Source/WebCore/editing

Dirk Pranke

Comment 14 2012-05-04 12:11:09 PDT

(In reply to comment #13) > Are you sure they're really timing out? Aren't they just slow? Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine. > > I don't see any changes that can cause things to timeout: > http://trac.webkit.org/log/trunk/Source/WebCore/editing Yeah, I didn't either, but I don't tend to like to mark tests as slow unless I'm familiar with them and would expect them to take a while to run.

Ryosuke Niwa

Comment 15 2012-05-04 12:12:18 PDT

(In reply to comment #14) > (In reply to comment #13) > > Are you sure they're really timing out? Aren't they just slow? > > Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine. I don't mind marking the entire "editing" directory as "slow" for that matter. Many of editing tests are integration tests and take a long time to run.

Dirk Pranke

Comment 16 2012-05-04 12:14:29 PDT

(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #13) > > > Are you sure they're really timing out? Aren't they just slow? > > > > Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine. > > I don't mind marking the entire "editing" directory as "slow" for that matter. Many of editing tests are integration tests and take a long time to run. Okay, I'll update the expectations for editing tests. Thanks!

Dirk Pranke

Comment 17 2012-05-04 12:45:11 PDT

Here's some more ... I'm not filled with confidence in this approach: fast/workers/storage/multiple-databases-garbage-collection.html = TIMEOUT fast/workers/storage/multiple-transactions-on-different-handles-sync.html = TIMEOUT http/tests/history/redirect-302.html = TIMEOUT http/tests/history/redirect-303.html = TIMEOUT http/tests/misc/object-embedding-svg-delayed-size-negotiation.xhtml = TIMEOUT platform/chromium/virtual/gpu/canvas/philip/tests/2d.text-custom-font-load-crash.html = TIMEOUT platform/chromium/virtual/gpu/fast/canvas/2d.text.draw.fill.maxWidth.gradient.html = TIMEOUT

Ryosuke Niwa

Comment 18 2012-05-04 12:53:54 PDT

Maybe something in webkitpy is affecting the timing?

Dirk Pranke

Comment 19 2012-05-04 12:59:45 PDT

(In reply to comment #18) > Maybe something in webkitpy is affecting the timing? It's possible, but I don't know what it would be. I will probably let this approach go for the afternoon or so to get more data on the flakiness, and if it doesn't clear up I will try going back to --test-shell mode. As I've noted elsewhere, one aspect of using DRT mode is that NRWT itself enforces the timeout and kills DRT when the test times out; maybe this is leaving something in an unhappy state w/ the O/S, or we're leaving things locked somewhere, and that's causing things to go downhill.

Tony Chang

Comment 20 2012-05-04 13:32:25 PDT

I feel like we're playing whack-a-mole and even if we find the culprit, tests we mark as timeout/slow now will be forgotten. I would feel better about reverting changes until the bots improve. Once the bots improve, we can reland patches (maybe with speculative fixes) to isolate the cause. I.e., I would handle unknown flakiness the same way we handle perf regressions.

Dirk Pranke

Comment 21 2012-05-04 13:35:58 PDT

(In reply to comment #20) > I feel like we're playing whack-a-mole and even if we find the culprit, tests we mark as timeout/slow now will be forgotten. > This is a valid concern. > I would feel better about reverting changes until the bots improve. Once the bots improve, we can reland patches (maybe with speculative fixes) to isolate the cause. I.e., I would handle unknown flakiness the same way we handle perf regressions. Apart from the one python change -- which I'm already planning to revert to see if it help -- any suggestions for what other changes to revert?

Dirk Pranke

Comment 22 2012-05-04 13:46:12 PDT

Okay, I've switched back to "test shell" mode on SL in http://trac.webkit.org/changeset/116161 . Let's see what happens now.

Tony Chang

Comment 23 2012-05-04 14:17:47 PDT

Looking at the waterfall, it looks like the set of failing tests isn't at all consistent. I doubt adding suppressions will green the tree. Here's the first set of timeouts I see. It's from the beginning of Wednesday. http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/15522/steps/webkit_tests/logs/stdio But zmo said the flakiness started earlier, maybe last Friday? Here are NRWT changes that touch NRWT code around that time: 115377 115452 115490 115729? None of the changes look that suspect, but I don't know of any other way to determine the cause of the regression.

Dirk Pranke

Comment 24 2012-05-04 14:28:18 PDT

When I(In reply to comment #23) > Looking at the waterfall, it looks like the set of failing tests isn't at all consistent. I doubt adding suppressions will green the tree. > > Here's the first set of timeouts I see. It's from the beginning of Wednesday. > http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/15522/steps/webkit_tests/logs/stdio > There are definitely timeouts earlier, e.g.: http://build.chromium.org/p/chromium.webkit/waterfall?last_time=1335833009&show=Webkit%20Mac10.6 > But zmo said the flakiness started earlier, maybe last Friday? Here are NRWT changes that touch NRWT code around that time: > > 115377 > 115452 > 115490 > 115729? > > None of the changes look that suspect, but I don't know of any other way to determine the cause of the regression. Well, 115490 is definitely suspicious (and already disabled, so now we're just waiting). You can see a marked uptick in flakiness in the first build after that changes: http://build.chromium.org/p/chromium.webkit/waterfall?force=true&last_time=1335569069&show=Webkit%20Mac10.6 (see build 15326, in particular).

Dirk Pranke

Comment 25 2012-05-18 15:10:43 PDT

closing this as WORKSFORME (the status is debatable; it probably could be WONTFIX or FIXED as well). For whatever reason, our old Xserves appear to be flaky in the release build. Since we haven't seen this issue anywhere else, and we're migrated off of the Xserves, we're gonna ignore this.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution WORKSFORME

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Dirk Pranke

Reported

2012-04-03 14:49 PDT

Modified

2012-05-18 15:10 PDT History

CC List

10 users Show

URL

Keywords

Depends on

Blocks