Bug 140819 - Fix the false positive build failures on the Windows buildbots
Summary: Fix the false positive build failures on the Windows buildbots
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Csaba Osztrogonác
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-23 00:43 PST by Csaba Osztrogonác
Modified: 2015-01-27 09:50 PST (History)
8 users (show)

See Also:


Attachments
Patch (1.29 KB, patch)
2015-01-23 00:44 PST, Csaba Osztrogonác
no flags Details | Formatted Diff | Diff
Patch (1.41 KB, patch)
2015-01-23 01:09 PST, Csaba Osztrogonác
no flags Details | Formatted Diff | Diff
Patch (1.82 KB, patch)
2015-01-23 09:17 PST, Csaba Osztrogonác
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Csaba Osztrogonác 2015-01-23 00:43:00 PST
examples:
https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/66798
https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/66807
https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/66832
https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/66849

The problem is that buildbot kills the compile step when there is output since 20 minutes.
If a change is too big, the build time takes more than 20 minutes on the Windows bots,
but the Visual Studio Express doesn't write anything to stdout during building.

The only one fix here is to increase the timeout of this buildstep.
Comment 1 Csaba Osztrogonác 2015-01-23 00:44:48 PST
Created attachment 245215 [details]
Patch
Comment 2 Csaba Osztrogonác 2015-01-23 01:09:40 PST
Created attachment 245217 [details]
Patch
Comment 3 Brent Fulgham 2015-01-23 08:48:50 PST
Comment on attachment 245217 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=245217&action=review

I'm in favor of this change, but I think some of the tools people should have final say. It certainly takes longer than 10 minutes to build on Windows from a clean slate; as things currently stand I think we hit this timeout any time we have to rebuild most of WebKit.

> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:203
> +        kwargs['timeout'] = 60 * 60

It certainly can take an hour or so on Windows for a clean build, but I think most of our Mac bots do it in less time. Is there any way to configure this so that Windows has a long timeout and perhaps leave others alone?

Or do other bots have a similar timeout issue and we should just change it across the board?
Comment 4 David Kilzer (:ddkilzer) 2015-01-23 08:56:28 PST
Comment on attachment 245217 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=245217&action=review

>> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:203
>> +        kwargs['timeout'] = 60 * 60
> 
> It certainly can take an hour or so on Windows for a clean build, but I think most of our Mac bots do it in less time. Is there any way to configure this so that Windows has a long timeout and perhaps leave others alone?
> 
> Or do other bots have a similar timeout issue and we should just change it across the board?

If it's only Windows that has this issue, we should only set the timeout to be 2 hours on Windows bots.

Can we use self.getProperty('platform') here (or kwargs['platform']) to only set this for Windows bots?
Comment 5 Brent Fulgham 2015-01-23 08:58:30 PST
(In reply to comment #0)
> examples:
> https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/
> 66798
> https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/
> 66807
> https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/
> 66832
> https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/
> 66849
> 
> The problem is that buildbot kills the compile step when there is output
> since 20 minutes.
> If a change is too big, the build time takes more than 20 minutes on the
> Windows bots,
> but the Visual Studio Express doesn't write anything to stdout during
> building.
> 
> The only one fix here is to increase the timeout of this buildstep.

I wonder if there's any way to output to stdout from the build system as another way to avoid this.
Comment 6 Csaba Osztrogonác 2015-01-23 08:59:55 PST
(In reply to comment #3)
> It certainly can take an hour or so on Windows for a clean build, but I
> think most of our Mac bots do it in less time. Is there any way to configure
> this so that Windows has a long timeout and perhaps leave others alone?
> 
> Or do other bots have a similar timeout issue and we should just change it
> across the board?

The default 20 minutes timeout isn't mean that the build should be finished
in 20 minutes. The buildmaster will kill the build only if it doesn't produce
any output in 20 minutes. It isn't problem for Linux and Mac builders, but
Visual Studio Express doesn't produce any output during the build.
Comment 7 Csaba Osztrogonác 2015-01-23 09:04:02 PST
(In reply to comment #4)
> Comment on attachment 245217 [details]
> If it's only Windows that has this issue, we should only set the timeout to
> be 2 hours on Windows bots.
> 
> Can we use self.getProperty('platform') here (or kwargs['platform']) to only
> set this for Windows bots?

Unfortunately it isn't so easy, because properties aren't accessible when
the buildstep is instantiated in BuildFactory". But maybe we can pass the
platform to CompileWebKit() or instantiate with timeout if platform=="win".

Let me try it.
Comment 8 Csaba Osztrogonác 2015-01-23 09:17:23 PST
Created attachment 245230 [details]
Patch

I checked, CompileWebKit(timeout=xxxx) works and platform is passed to BuildFactory.__init__, so we can set timeout for Windows only.
Comment 9 Alexey Proskuryakov 2015-01-23 10:09:06 PST
That's an amazing find! Can't wait for these failures to be a thing of the past.

Two comments:

1. Why are Windows builds so slow? We have decent hardware as far as I know, is there some sort of misconfiguration by any chance?

This comment obviously doesn't block reviewing or landing the fix.

2. I think that a better way to fix this would be to actually pipe the logs to output as they come. This way, one could watch progress in real time on a web page, like we do for non-Windows builds. And the bot wouldn't remain stuck for a longer time when an actual freeze occurs. Is that possible to implement with Visual Studio?

I think that we should fix it this way if possible.
Comment 10 Brent Fulgham 2015-01-23 13:30:36 PST
(In reply to comment #9)
> That's an amazing find! Can't wait for these failures to be a thing of the
> past.
> 
> Two comments:
> 
> 1. Why are Windows builds so slow? We have decent hardware as far as I know,
> is there some sort of misconfiguration by any chance?
> 
> This comment obviously doesn't block reviewing or landing the fix.

The build itself is no slower than on Mac. The problem is that the way the VS build runs, it does not output any information to stdout; therefore the script thinks the build is hung.

I spent about 5 minutes looking for a setting in Visual Studio to output the "Output Window" text to stdout, but didn't find anything.

Another option might be to switch to using MSBuild directly, rather than driving it via Visual Studio. This would give us stdout logging like we have using xcodebuild on Mac.

The only downside would be having to write an MSBuild input file to control things, but that's not a huge problem.

> 2. I think that a better way to fix this would be to actually pipe the logs
> to output as they come. This way, one could watch progress in real time on a
> web page, like we do for non-Windows builds. And the bot wouldn't remain
> stuck for a longer time when an actual freeze occurs. Is that possible to
> implement with Visual Studio?
> 
> I think that we should fix it this way if possible.

I'll do a little more digging before giving up. Alternatively, we could land this patch now and revise things later if we figure out how to do it.

If we leave things as they stand, we get false build failures from time-to-time. With ossy's proposed patch, that wouldn't happen anymore.
Comment 11 Alexey Proskuryakov 2015-01-23 13:49:10 PST
> The build itself is no slower than on Mac

A clean OS X build takes 44 minutes on a Mac mini (see <https://build.webkit.org/builders/Apple%20Yosemite%20Release%20%28Build%29/builds/2387>).

I don't know how long each target takes, but even WebCore is likely under 20 minutes. Also, Mac mini is not all that fast.
Comment 12 Brent Fulgham 2015-01-23 13:51:50 PST
(In reply to comment #11)
> > The build itself is no slower than on Mac
> 
> A clean OS X build takes 44 minutes on a Mac mini (see
> <https://build.webkit.org/builders/Apple%20Yosemite%20Release%20%28Build%29/
> builds/2387>).
> 
> I don't know how long each target takes, but even WebCore is likely under 20
> minutes. Also, Mac mini is not all that fast.

Yes, but you are getting build output during that period. The Visual Studio process that build-webkit is watching produces no output until the entire build (consisting of all projects) has completed.
Comment 13 Alexey Proskuryakov 2015-01-23 13:57:49 PST
I understand how not getting the output breaks the build. What I'm saying is that it shouldn't be taking that long in the first place. And as previously mentioned, this question doesn't block the patch at all.
Comment 14 Brent Fulgham 2015-01-23 16:04:20 PST
Comment on attachment 245230 [details]
Patch

r=me. I'm filing a separate bug to deal with this more permanently.
Comment 15 WebKit Commit Bot 2015-01-23 16:49:08 PST
Comment on attachment 245230 [details]
Patch

Clearing flags on attachment: 245230

Committed r179043: <http://trac.webkit.org/changeset/179043>
Comment 16 WebKit Commit Bot 2015-01-23 16:49:14 PST
All reviewed patches have been landed.  Closing bug.
Comment 17 Csaba Osztrogonác 2015-01-27 05:38:45 PST
Just out of curiosity I checked the clean build time on the bots:
- release bot:  27 mins, 32 secs - https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/66907
- debug bot: 28 mins, 6 secs - https://build.webkit.org/builders/Apple%20Win%20Debug%20%28Build%29/builds/85014
Comment 18 Alexey Proskuryakov 2015-01-27 09:50:54 PST
This is not fast, but reasonable. Is there no stdout output until all the targets are built? We may not need to pump the output in real time if we can make each target dump the results once done.