http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/14439/steps/webkit_tests/logs/stdio Exiting early after 0 crashes and 20 timeouts. 21784 tests run. Regressions: Unexpected tests timed out : (20) animations/cross-fade-background-image.html = TIMEOUT compositing/geometry/empty-embed-rects.html = TIMEOUT compositing/self-painting-layers.html = TIMEOUT compositing/transitions/scale-transition-no-start.html = TIMEOUT css1/basic/class_as_selector.html = TIMEOUT css1/box_properties/acid_test.html = TIMEOUT css1/cascade/cascade_order.html = TIMEOUT css1/classification/display.html = TIMEOUT css1/color_and_background/background.html = TIMEOUT css1/conformance/forward_compatible_parsing.html = TIMEOUT css1/font_properties/font.html = TIMEOUT css1/pseudo/anchor.html = TIMEOUT fast/forms/search-rtl.html = TIMEOUT fast/images/embed-does-not-propagate-dimensions-to-object-ancestor.html = TIMEOUT fast/loader/local-CSS-from-local.html = TIMEOUT fast/table/invisible-cell-background.html = TIMEOUT fast/text/international/plane2.html = TIMEOUT fast/text/justify-ideograph-complex.html = TIMEOUT fast/workers/storage/interrupt-database.html = TIMEOUT http/tests/appcache/remove-cache.html = TIMEOUT
I'm on this one ...
Created attachment 135477 [details] Patch
Comment on attachment 135477 [details] Patch Bleh. Why not just up the amount of ram we expect child processes to take? That seems like a less gross hack.
Created attachment 135478 [details] add Changelog, port same logic to apple mac
(In reply to comment #3) > (From update of attachment 135477 [details]) > Bleh. Why not just up the amount of ram we expect child processes to take? That seems like a less gross hack. At the moment, at least on the Chromium SL bot, it doesn't look ram-related. It looks like we're thrashing on something else, but have plenty of RAM free.
I'm going to land this as-is, so that I can get the bot back online and we can get more data. Unfortunately, it's been flaky and aborting early for so long that I can't easily reproduce things or figure debug it (I've tried rolling back builds on that bot and run into a sordid list of issues that is stopping me that I need to work through in parallel).
I'll be happy to roll this out if there are other issues or if we really think this is the wrong thing to do.
Committed r113122: <http://trac.webkit.org/changeset/113122>
re-opening, I don't consider this fixed yet.
Note that we're seeing this quite a bit lately, even after the patch (see, e.g., http://build.chromium.org/p/chromium.webkit/waterfall?builder=Webkit%20Mac10.6&last_time=1336069767 ) ... it's possible that r115490 has made things worse, but I don't know what else might be contributing.
It seems like we're frequently seeing many of the same tests timing out this week, so I'm going to start marking them as flaky timeouts here and we'll see if this contains the problem, or if we're seeing systemic flakiness. Here's the first batch: compositing/geometry/outline-change.html css3/selectors3/xml/css3-modsel-161.xml css3/selectors3/xml/css3-modsel-166.xml css3/selectors3/xml/css3-modsel-166a.xml editing/deleting/delete-3857753-fix.html editing/deleting/delete-3865854-fix.html editing/deleting/delete-3928305-fix.html editing/execCommand/4747450.html editing/execCommand/4786404-1.html editing/execCommand/4786404-2.html editing/execCommand/4916235.html editing/input/caret-at-the-edge-of-input.html editing/execCommand/format-block-with-trailing-br.html editing/execCommand/format-block-without-body-crash.html editing/execCommand/format-block.html editing/execCommand/forward-delete-no-scroll.html editing/execCommand/hilitecolor.html editing/input/emacs-ctrl-o.html editing/input/div-first-child-rule-input.html editing/input/div-first-child-rule-textarea.html editing/input/ime-composition-clearpreedit.html editing/input/insert-wrapping-space-in-textarea.html editing/input/option-page-up-down.html editing/input/page-up-down-scrolls.html editing/inserting/12882.html editing/inserting/4278698.html http/tests/history/back-with-fragment-change.php http/tests/history/cross-origin-replace-history-object.html http/tests/history/history-navigations-set-referrer.html http/tests/history/popstate-fires-with-pending-requests.html http/tests/history/redirect-200-refresh-0-seconds.pl http/tests/history/redirect-200-refresh-2-seconds.pl http/tests/history/redirect-301.html
rniwa - it looks like maybe these editing tests started being flaky earlier this week. Can you take a look?
Are you sure they're really timing out? Aren't they just slow? I don't see any changes that can cause things to timeout: http://trac.webkit.org/log/trunk/Source/WebCore/editing
(In reply to comment #13) > Are you sure they're really timing out? Aren't they just slow? Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine. > > I don't see any changes that can cause things to timeout: > http://trac.webkit.org/log/trunk/Source/WebCore/editing Yeah, I didn't either, but I don't tend to like to mark tests as slow unless I'm familiar with them and would expect them to take a while to run.
(In reply to comment #14) > (In reply to comment #13) > > Are you sure they're really timing out? Aren't they just slow? > > Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine. I don't mind marking the entire "editing" directory as "slow" for that matter. Many of editing tests are integration tests and take a long time to run.
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #13) > > > Are you sure they're really timing out? Aren't they just slow? > > > > Well, by definition they're timing out, but it could be because they're slow and should just be marked as slow :). If you think we should try marking them as slow instead that's fine. > > I don't mind marking the entire "editing" directory as "slow" for that matter. Many of editing tests are integration tests and take a long time to run. Okay, I'll update the expectations for editing tests. Thanks!
Here's some more ... I'm not filled with confidence in this approach: fast/workers/storage/multiple-databases-garbage-collection.html = TIMEOUT fast/workers/storage/multiple-transactions-on-different-handles-sync.html = TIMEOUT http/tests/history/redirect-302.html = TIMEOUT http/tests/history/redirect-303.html = TIMEOUT http/tests/misc/object-embedding-svg-delayed-size-negotiation.xhtml = TIMEOUT platform/chromium/virtual/gpu/canvas/philip/tests/2d.text-custom-font-load-crash.html = TIMEOUT platform/chromium/virtual/gpu/fast/canvas/2d.text.draw.fill.maxWidth.gradient.html = TIMEOUT
Maybe something in webkitpy is affecting the timing?
(In reply to comment #18) > Maybe something in webkitpy is affecting the timing? It's possible, but I don't know what it would be. I will probably let this approach go for the afternoon or so to get more data on the flakiness, and if it doesn't clear up I will try going back to --test-shell mode. As I've noted elsewhere, one aspect of using DRT mode is that NRWT itself enforces the timeout and kills DRT when the test times out; maybe this is leaving something in an unhappy state w/ the O/S, or we're leaving things locked somewhere, and that's causing things to go downhill.
I feel like we're playing whack-a-mole and even if we find the culprit, tests we mark as timeout/slow now will be forgotten. I would feel better about reverting changes until the bots improve. Once the bots improve, we can reland patches (maybe with speculative fixes) to isolate the cause. I.e., I would handle unknown flakiness the same way we handle perf regressions.
(In reply to comment #20) > I feel like we're playing whack-a-mole and even if we find the culprit, tests we mark as timeout/slow now will be forgotten. > This is a valid concern. > I would feel better about reverting changes until the bots improve. Once the bots improve, we can reland patches (maybe with speculative fixes) to isolate the cause. I.e., I would handle unknown flakiness the same way we handle perf regressions. Apart from the one python change -- which I'm already planning to revert to see if it help -- any suggestions for what other changes to revert?
Okay, I've switched back to "test shell" mode on SL in http://trac.webkit.org/changeset/116161 . Let's see what happens now.
Looking at the waterfall, it looks like the set of failing tests isn't at all consistent. I doubt adding suppressions will green the tree. Here's the first set of timeouts I see. It's from the beginning of Wednesday. http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/15522/steps/webkit_tests/logs/stdio But zmo said the flakiness started earlier, maybe last Friday? Here are NRWT changes that touch NRWT code around that time: 115377 115452 115490 115729? None of the changes look that suspect, but I don't know of any other way to determine the cause of the regression.
When I(In reply to comment #23) > Looking at the waterfall, it looks like the set of failing tests isn't at all consistent. I doubt adding suppressions will green the tree. > > Here's the first set of timeouts I see. It's from the beginning of Wednesday. > http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/15522/steps/webkit_tests/logs/stdio > There are definitely timeouts earlier, e.g.: http://build.chromium.org/p/chromium.webkit/waterfall?last_time=1335833009&show=Webkit%20Mac10.6 > But zmo said the flakiness started earlier, maybe last Friday? Here are NRWT changes that touch NRWT code around that time: > > 115377 > 115452 > 115490 > 115729? > > None of the changes look that suspect, but I don't know of any other way to determine the cause of the regression. Well, 115490 is definitely suspicious (and already disabled, so now we're just waiting). You can see a marked uptick in flakiness in the first build after that changes: http://build.chromium.org/p/chromium.webkit/waterfall?force=true&last_time=1335569069&show=Webkit%20Mac10.6 (see build 15326, in particular).
closing this as WORKSFORME (the status is debatable; it probably could be WONTFIX or FIXED as well). For whatever reason, our old Xserves appear to be flaky in the release build. Since we haven't seen this issue anywhere else, and we're migrated off of the Xserves, we're gonna ignore this.