WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED WORKSFORME
83076
[Chromium] Lots of timeouts causing Mac10.6 to exit early.
https://bugs.webkit.org/show_bug.cgi?id=83076
Summary
[Chromium] Lots of timeouts causing Mac10.6 to exit early.
Ojan Vafai
Reported
2012-04-03 14:49:33 PDT
http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/14439/steps/webkit_tests/logs/stdio
Exiting early after 0 crashes and 20 timeouts. 21784 tests run. Regressions: Unexpected tests timed out : (20) animations/cross-fade-background-image.html = TIMEOUT compositing/geometry/empty-embed-rects.html = TIMEOUT compositing/self-painting-layers.html = TIMEOUT compositing/transitions/scale-transition-no-start.html = TIMEOUT css1/basic/class_as_selector.html = TIMEOUT css1/box_properties/acid_test.html = TIMEOUT css1/cascade/cascade_order.html = TIMEOUT css1/classification/display.html = TIMEOUT css1/color_and_background/background.html = TIMEOUT css1/conformance/forward_compatible_parsing.html = TIMEOUT css1/font_properties/font.html = TIMEOUT css1/pseudo/anchor.html = TIMEOUT fast/forms/search-rtl.html = TIMEOUT fast/images/embed-does-not-propagate-dimensions-to-object-ancestor.html = TIMEOUT fast/loader/local-CSS-from-local.html = TIMEOUT fast/table/invisible-cell-background.html = TIMEOUT fast/text/international/plane2.html = TIMEOUT fast/text/justify-ideograph-complex.html = TIMEOUT fast/workers/storage/interrupt-database.html = TIMEOUT http/tests/appcache/remove-cache.html = TIMEOUT
Attachments
Patch
(1.71 KB, patch)
2012-04-03 18:21 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
add Changelog, port same logic to apple mac
(3.41 KB, patch)
2012-04-03 18:29 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Dirk Pranke
Comment 1
2012-04-03 15:01:34 PDT
I'm on this one ...
Dirk Pranke
Comment 2
2012-04-03 18:21:47 PDT
Created
attachment 135477
[details]
Patch
Eric Seidel (no email)
Comment 3
2012-04-03 18:23:43 PDT
Comment on
attachment 135477
[details]
Patch Bleh. Why not just up the amount of ram we expect child processes to take? That seems like a less gross hack.
Dirk Pranke
Comment 4
2012-04-03 18:29:52 PDT
Created
attachment 135478
[details]
add Changelog, port same logic to apple mac
Dirk Pranke
Comment 5
2012-04-03 18:32:54 PDT
(In reply to
comment #3
)
> (From update of
attachment 135477
[details]
) > Bleh. Why not just up the amount of ram we expect child processes to take? That seems like a less gross hack.
At the moment, at least on the Chromium SL bot, it doesn't look ram-related. It looks like we're thrashing on something else, but have plenty of RAM free.
Dirk Pranke
Comment 6
2012-04-03 18:35:01 PDT
I'm going to land this as-is, so that I can get the bot back online and we can get more data. Unfortunately, it's been flaky and aborting early for so long that I can't easily reproduce things or figure debug it (I've tried rolling back builds on that bot and run into a sordid list of issues that is stopping me that I need to work through in parallel).
Dirk Pranke
Comment 7
2012-04-03 18:35:24 PDT
I'll be happy to roll this out if there are other issues or if we really think this is the wrong thing to do.
Dirk Pranke
Comment 8
2012-04-03 18:40:54 PDT
Committed
r113122
: <
http://trac.webkit.org/changeset/113122
>
Dirk Pranke
Comment 9
2012-04-03 18:41:27 PDT
re-opening, I don't consider this fixed yet.
Dirk Pranke
Comment 10
2012-05-03 18:21:12 PDT
Note that we're seeing this quite a bit lately, even after the patch (see, e.g.,
http://build.chromium.org/p/chromium.webkit/waterfall?builder=Webkit%20Mac10.6&last_time=1336069767
) ... it's possible that
r115490
has made things worse, but I don't know what else might be contributing.
Dirk Pranke
Comment 11
2012-05-04 11:58:19 PDT
It seems like we're frequently seeing many of the same tests timing out this week, so I'm going to start marking them as flaky timeouts here and we'll see if this contains the problem, or if we're seeing systemic flakiness. Here's the first batch: compositing/geometry/outline-change.html css3/selectors3/xml/css3-modsel-161.xml css3/selectors3/xml/css3-modsel-166.xml css3/selectors3/xml/css3-modsel-166a.xml editing/deleting/delete-3857753-fix.html editing/deleting/delete-3865854-fix.html editing/deleting/delete-3928305-fix.html editing/execCommand/4747450.html editing/execCommand/4786404-1.html editing/execCommand/4786404-2.html editing/execCommand/4916235.html editing/input/caret-at-the-edge-of-input.html editing/execCommand/format-block-with-trailing-br.html editing/execCommand/format-block-without-body-crash.html editing/execCommand/format-block.html editing/execCommand/forward-delete-no-scroll.html editing/execCommand/hilitecolor.html editing/input/emacs-ctrl-o.html editing/input/div-first-child-rule-input.html editing/input/div-first-child-rule-textarea.html editing/input/ime-composition-clearpreedit.html editing/input/insert-wrapping-space-in-textarea.html editing/input/option-page-up-down.html editing/input/page-up-down-scrolls.html editing/inserting/12882.html editing/inserting/4278698.html http/tests/history/back-with-fragment-change.php http/tests/history/cross-origin-replace-history-object.html http/tests/history/history-navigations-set-referrer.html http/tests/history/popstate-fires-with-pending-requests.html http/tests/history/redirect-200-refresh-0-seconds.pl http/tests/history/redirect-200-refresh-2-seconds.pl http/tests/history/redirect-301.html
Dirk Pranke
Comment 12
2012-05-04 12:00:12 PDT
rniwa - it looks like maybe these editing tests started being flaky earlier this week. Can you take a look?
Ryosuke Niwa
Comment 13
2012-05-04 12:05:41 PDT
Are you sure they're really timing out? Aren't they just slow? I don't see any changes that can cause things to timeout:
http://trac.webkit.org/log/trunk/Source/WebCore/editing
Dirk Pranke
Comment 14
2012-05-04 12:11:09 PDT
(In reply to
comment #13
)
> Are you sure they're really timing out? Aren't they just slow?
Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine.
> > I don't see any changes that can cause things to timeout: >
http://trac.webkit.org/log/trunk/Source/WebCore/editing
Yeah, I didn't either, but I don't tend to like to mark tests as slow unless I'm familiar with them and would expect them to take a while to run.
Ryosuke Niwa
Comment 15
2012-05-04 12:12:18 PDT
(In reply to
comment #14
)
> (In reply to
comment #13
) > > Are you sure they're really timing out? Aren't they just slow? > > Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine.
I don't mind marking the entire "editing" directory as "slow" for that matter. Many of editing tests are integration tests and take a long time to run.
Dirk Pranke
Comment 16
2012-05-04 12:14:29 PDT
(In reply to
comment #15
)
> (In reply to
comment #14
) > > (In reply to
comment #13
) > > > Are you sure they're really timing out? Aren't they just slow? > > > > Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine. > > I don't mind marking the entire "editing" directory as "slow" for that matter. Many of editing tests are integration tests and take a long time to run.
Okay, I'll update the expectations for editing tests. Thanks!
Dirk Pranke
Comment 17
2012-05-04 12:45:11 PDT
Here's some more ... I'm not filled with confidence in this approach: fast/workers/storage/multiple-databases-garbage-collection.html = TIMEOUT fast/workers/storage/multiple-transactions-on-different-handles-sync.html = TIMEOUT http/tests/history/redirect-302.html = TIMEOUT http/tests/history/redirect-303.html = TIMEOUT http/tests/misc/object-embedding-svg-delayed-size-negotiation.xhtml = TIMEOUT platform/chromium/virtual/gpu/canvas/philip/tests/2d.text-custom-font-load-crash.html = TIMEOUT platform/chromium/virtual/gpu/fast/canvas/2d.text.draw.fill.maxWidth.gradient.html = TIMEOUT
Ryosuke Niwa
Comment 18
2012-05-04 12:53:54 PDT
Maybe something in webkitpy is affecting the timing?
Dirk Pranke
Comment 19
2012-05-04 12:59:45 PDT
(In reply to
comment #18
)
> Maybe something in webkitpy is affecting the timing?
It's possible, but I don't know what it would be. I will probably let this approach go for the afternoon or so to get more data on the flakiness, and if it doesn't clear up I will try going back to --test-shell mode. As I've noted elsewhere, one aspect of using DRT mode is that NRWT itself enforces the timeout and kills DRT when the test times out; maybe this is leaving something in an unhappy state w/ the O/S, or we're leaving things locked somewhere, and that's causing things to go downhill.
Tony Chang
Comment 20
2012-05-04 13:32:25 PDT
I feel like we're playing whack-a-mole and even if we find the culprit, tests we mark as timeout/slow now will be forgotten. I would feel better about reverting changes until the bots improve. Once the bots improve, we can reland patches (maybe with speculative fixes) to isolate the cause. I.e., I would handle unknown flakiness the same way we handle perf regressions.
Dirk Pranke
Comment 21
2012-05-04 13:35:58 PDT
(In reply to
comment #20
)
> I feel like we're playing whack-a-mole and even if we find the culprit, tests we mark as timeout/slow now will be forgotten.
> This is a valid concern.
> I would feel better about reverting changes until the bots improve. Once the bots improve, we can reland patches (maybe with speculative fixes) to isolate the cause. I.e., I would handle unknown flakiness the same way we handle perf regressions.
Apart from the one python change -- which I'm already planning to revert to see if it help -- any suggestions for what other changes to revert?
Dirk Pranke
Comment 22
2012-05-04 13:46:12 PDT
Okay, I've switched back to "test shell" mode on SL in
http://trac.webkit.org/changeset/116161
. Let's see what happens now.
Tony Chang
Comment 23
2012-05-04 14:17:47 PDT
Looking at the waterfall, it looks like the set of failing tests isn't at all consistent. I doubt adding suppressions will green the tree. Here's the first set of timeouts I see. It's from the beginning of Wednesday.
http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/15522/steps/webkit_tests/logs/stdio
But zmo said the flakiness started earlier, maybe last Friday? Here are NRWT changes that touch NRWT code around that time: 115377 115452 115490 115729? None of the changes look that suspect, but I don't know of any other way to determine the cause of the regression.
Dirk Pranke
Comment 24
2012-05-04 14:28:18 PDT
When I(In reply to
comment #23
)
> Looking at the waterfall, it looks like the set of failing tests isn't at all consistent. I doubt adding suppressions will green the tree. > > Here's the first set of timeouts I see. It's from the beginning of Wednesday. >
http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/15522/steps/webkit_tests/logs/stdio
>
There are definitely timeouts earlier, e.g.:
http://build.chromium.org/p/chromium.webkit/waterfall?last_time=1335833009&show=Webkit%20Mac10.6
> But zmo said the flakiness started earlier, maybe last Friday? Here are NRWT changes that touch NRWT code around that time: > > 115377 > 115452 > 115490 > 115729? > > None of the changes look that suspect, but I don't know of any other way to determine the cause of the regression.
Well, 115490 is definitely suspicious (and already disabled, so now we're just waiting). You can see a marked uptick in flakiness in the first build after that changes:
http://build.chromium.org/p/chromium.webkit/waterfall?force=true&last_time=1335569069&show=Webkit%20Mac10.6
(see build 15326, in particular).
Dirk Pranke
Comment 25
2012-05-18 15:10:43 PDT
closing this as WORKSFORME (the status is debatable; it probably could be WONTFIX or FIXED as well). For whatever reason, our old Xserves appear to be flaky in the release build. Since we haven't seen this issue anywhere else, and we're migrated off of the Xserves, we're gonna ignore this.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug