Bug 186443 - Test262-Runner: Improve files queue to optimize CPU usage/balancing
Summary: Test262-Runner: Improve files queue to optimize CPU usage/balancing
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-06-08 13:44 PDT by Leo Balter
Modified: 2018-06-19 15:43 PDT (History)
10 users (show)

See Also:


Attachments
Patch (6.16 KB, patch)
2018-06-08 13:50 PDT, Leo Balter
no flags Details | Formatted Diff | Diff
Patch (6.10 KB, patch)
2018-06-08 14:10 PDT, Leo Balter
no flags Details | Formatted Diff | Diff
Patch (6.23 KB, patch)
2018-06-12 15:15 PDT, Leo Balter
no flags Details | Formatted Diff | Diff
Patch (5.98 KB, patch)
2018-06-12 15:16 PDT, Leo Balter
no flags Details | Formatted Diff | Diff
Patch (5.92 KB, patch)
2018-06-12 16:10 PDT, Leo Balter
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews202 for win-future (12.90 MB, application/zip)
2018-06-13 04:43 PDT, Build Bot
no flags Details
Patch (6.60 KB, patch)
2018-06-13 12:25 PDT, Leo Balter
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews201 for win-future (12.76 MB, application/zip)
2018-06-13 15:36 PDT, Build Bot
no flags Details
Patch (6.62 KB, patch)
2018-06-14 10:58 PDT, Leo Balter
no flags Details | Formatted Diff | Diff
Patch (6.51 KB, patch)
2018-06-14 18:47 PDT, Leo Balter
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ltilve-gtk-wk2-ews for gtk-wk2 (2.88 MB, application/zip)
2018-06-18 07:30 PDT, Igalia-pontevedra EWS
no flags Details
Patch (6.57 KB, patch)
2018-06-18 10:57 PDT, Leo Balter
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Leo Balter 2018-06-08 13:44:21 PDT
Test262-Runner: Improve files queue to optimize CPU usage/balancing
Comment 1 Leo Balter 2018-06-08 13:50:22 PDT
Created attachment 342318 [details]
Patch
Comment 2 Leo Balter 2018-06-08 14:10:18 PDT
Created attachment 342324 [details]
Patch
Comment 3 Filip Pizlo 2018-06-08 15:11:32 PDT
Comment on attachment 342324 [details]
Patch

Why is it necessary to only do for Kim when the number of files is large?

Why is there chunking?

Would it be possible to start nproc = ncpu + 1, and then have a balancer in proc0 that, if asked to do so via a request on a pipe, will bend a test index. All other processes ask proc0 for a test after each test they run. Running a test requires starting a VM, right? So that’s way more expensive than a packet of data on a pipe (or socketpair). 

That’s what I meant by load balancer. I’m not sure exactly what this patch achieves - perhaps an improvement in some specific case of load balancing. Maybe you could quote the speedup in the changelog?
Comment 4 Filip Pizlo 2018-06-08 15:12:23 PDT
(In reply to Filip Pizlo from comment #3)
> Comment on attachment 342324 [details]
> Patch
> 
> Why is it necessary to only do for Kim when the number of files is large?

Lol I meant forking, not for Kim. 

> 
> Why is there chunking?
> 
> Would it be possible to start nproc = ncpu + 1, and then have a balancer in
> proc0 that, if asked to do so via a request on a pipe, will bend a test
> index. All other processes ask proc0 for a test after each test they run.
> Running a test requires starting a VM, right? So that’s way more expensive
> than a packet of data on a pipe (or socketpair). 
> 
> That’s what I meant by load balancer. I’m not sure exactly what this patch
> achieves - perhaps an improvement in some specific case of load balancing.
> Maybe you could quote the speedup in the changelog?
Comment 5 Filip Pizlo 2018-06-08 15:13:18 PDT
(In reply to Filip Pizlo from comment #3)
> Comment on attachment 342324 [details]
> Patch
> 
> Why is it necessary to only do for Kim when the number of files is large?
> 
> Why is there chunking?
> 
> Would it be possible to start nproc = ncpu + 1, and then have a balancer in
> proc0 that, if asked to do so via a request on a pipe, will bend a test

Vend a test. 

> index. All other processes ask proc0 for a test after each test they run.
> Running a test requires starting a VM, right? So that’s way more expensive
> than a packet of data on a pipe (or socketpair). 
> 
> That’s what I meant by load balancer. I’m not sure exactly what this patch
> achieves - perhaps an improvement in some specific case of load balancing.
> Maybe you could quote the speedup in the changelog?
Comment 6 Leo Balter 2018-06-12 09:55:47 PDT
I had a very long time trying to get pipe/socketpair to work here. I also relying on some help from friends as Rick Waldron and even Perl committers.

Turns out, there is no way to create a queue were we can communicate from a single parent to many children properly. Perl will block the first children to set a 1:1 fetching. I have a non-working example here: https://gist.github.com/leobalter/779cbaa8a4da1e7a0148c475ab822d65

I also tried parallel queues, waiting for the children to signal for more items, but I also found no lucky there. The current resources are limiting me to block the parent itself or closing the sockets earlier than I want. Another example here: https://gist.github.com/leobalter/d5c2817af13cfbe5e59b9ec958071352

I'm avoiding to rely on threads as I'm guessing the targets won't have any compiled Perls using threads appropriately. 

I still want to fix this, but at this point I'm way out of hope. Let me know how should I follow from here.
Comment 7 Leo Balter 2018-06-12 15:15:44 PDT
Created attachment 342601 [details]
Patch
Comment 8 Leo Balter 2018-06-12 15:16:55 PDT
Created attachment 342602 [details]
Patch
Comment 9 Leo Balter 2018-06-12 15:19:33 PDT
ok, the latest patch should handle the queue manager as desired.

there are some uncommon loops in the processes to avoid blocking states in the parent and the children processes.

I'm yet to check the final timing results.
Comment 10 Leo Balter 2018-06-12 15:38:35 PDT
I broke something else... I'll need to fix it in a next path
Comment 11 Leo Balter 2018-06-12 16:10:19 PDT
Created attachment 342607 [details]
Patch
Comment 12 Leo Balter 2018-06-12 16:11:17 PDT
Ok, final fix is done.
Comment 13 Build Bot 2018-06-13 04:43:34 PDT
Comment on attachment 342607 [details]
Patch

Attachment 342607 [details] did not pass win-ews (win):
Output: http://webkit-queues.webkit.org/results/8161866

New failing tests:
http/tests/security/canvas-remote-read-remote-video-localhost.html
Comment 14 Build Bot 2018-06-13 04:43:45 PDT
Created attachment 342646 [details]
Archive of layout-test-results from ews202 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews202  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 15 Michael Saboff 2018-06-13 07:51:50 PDT
Comment on attachment 342607 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=342607&action=review

r-

See my update to 1.pm which uses IO::Select at https://gist.github.com/msaboff/8540538562a411f3b2b6b9029aa23cb2.js

> Tools/Scripts/test262/Runner.pm:358
> +        foreach my $child (@children) {
> +            if ($child->getline) {

Instead of busy polling, this code should use IO::Select.
Comment 16 Michael Saboff 2018-06-13 07:52:48 PDT
(In reply to Michael Saboff from comment #15)
> Comment on attachment 342607 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=342607&action=review
> 
> r-
> 
> See my update to 1.pm which uses IO::Select at
> https://gist.github.com/msaboff/8540538562a411f3b2b6b9029aa23cb2.js
> 
> > Tools/Scripts/test262/Runner.pm:358
> > +        foreach my $child (@children) {
> > +            if ($child->getline) {
> 
> Instead of busy polling, this code should use IO::Select.

This link works better - https://gist.github.com/msaboff/8540538562a411f3b2b6b9029aa23cb2
Comment 17 Leo Balter 2018-06-13 12:25:55 PDT
Created attachment 342681 [details]
Patch
Comment 18 Leo Balter 2018-06-13 12:27:28 PDT
oh thanks! The new patch is using select as you suggested.
Comment 19 Build Bot 2018-06-13 15:36:47 PDT
Comment on attachment 342681 [details]
Patch

Attachment 342681 [details] did not pass win-ews (win):
Output: http://webkit-queues.webkit.org/results/8168574

New failing tests:
http/tests/preload/onload_event.html
Comment 20 Build Bot 2018-06-13 15:36:59 PDT
Created attachment 342699 [details]
Archive of layout-test-results from ews201 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews201  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 21 Leo Balter 2018-06-14 10:58:45 PDT
Created attachment 342745 [details]
Patch
Comment 22 Michael Saboff 2018-06-14 14:16:25 PDT
Comment on attachment 342745 [details]
Patch

r=me
Comment 23 WebKit Commit Bot 2018-06-14 14:44:19 PDT
Comment on attachment 342745 [details]
Patch

Rejecting attachment 342745 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-03', 'land-attachment', '--force-clean', '--non-interactive', '--parent-command=commit-queue', 342745, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Logging in as commit-queue@webkit.org...
Fetching: https://bugs.webkit.org/attachment.cgi?id=342745&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=186443&ctype=xml&excludefield=attachmentdata
Processing 1 patch from 1 bug.
Updating working directory
Processing patch 342745 from bug 186443.
Fetching: https://bugs.webkit.org/attachment.cgi?id=342745
Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force', '--reviewer', u'Michael Saboff']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Parsed 2 diffs from patch file(s).
patching file Tools/ChangeLog
Hunk #1 succeeded at 1 with fuzz 3.
patching file Tools/Scripts/test262/Runner.pm
Hunk #4 FAILED at 246.
Hunk #5 succeeded at 268 with fuzz 1.
Hunk #7 succeeded at 1061 (offset 34 lines).
1 out of 7 hunks FAILED -- saving rejects to file Tools/Scripts/test262/Runner.pm.rej

Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force', '--reviewer', u'Michael Saboff']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Parsed 2 diffs from patch file(s).
patching file Tools/ChangeLog
Hunk #1 succeeded at 1 with fuzz 3.
patching file Tools/Scripts/test262/Runner.pm
Hunk #4 FAILED at 246.
Hunk #5 succeeded at 268 with fuzz 1.
Hunk #7 succeeded at 1061 (offset 34 lines).
1 out of 7 hunks FAILED -- saving rejects to file Tools/Scripts/test262/Runner.pm.rej

Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force', '--reviewer', u'Michael Saboff']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit
Updating OpenSource
From https://git.webkit.org/git/WebKit
   22e11a8051f..ab90f1dc70f  master     -> origin/master
Partial-rebuilding .git/svn/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc ...
Currently at 232852 = 22e11a8051f314f141d406c5e996b1782140e67e
r232853 = af256119d9cbac2ff09685739e5208e4f9648933
r232854 = 6a3a1c2e9fda291c1dbd5df60847132e4c68241c
r232855 = ab90f1dc70f49c274e8de47947c29273c5a3381a
Done rebuilding .git/svn/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/origin/master.

Full output: http://webkit-queues.webkit.org/results/8185091
Comment 24 Leo Balter 2018-06-14 18:47:26 PDT
Created attachment 342782 [details]
Patch
Comment 25 Michael Saboff 2018-06-16 10:48:52 PDT
Comment on attachment 342782 [details]
Patch

r=me
Comment 26 WebKit Commit Bot 2018-06-16 11:15:06 PDT
Comment on attachment 342782 [details]
Patch

Rejecting attachment 342782 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-01', 'land-attachment', '--force-clean', '--non-interactive', '--parent-command=commit-queue', 342782, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Logging in as commit-queue@webkit.org...
Fetching: https://bugs.webkit.org/attachment.cgi?id=342782&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=186443&ctype=xml&excludefield=attachmentdata
Processing 1 patch from 1 bug.
Updating working directory
Processing patch 342782 from bug 186443.
Fetching: https://bugs.webkit.org/attachment.cgi?id=342782
Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	Tools/ChangeLog

ERROR from SVN:
Item is out of date: File '/trunk/Tools/ChangeLog' is out of date
W: fb5beed2b61cc9bc0f5e14278ad5dd0c882ec74d and refs/remotes/origin/master differ, using rebase:
:040000 040000 bfac9e8560236c3df3aa663672e9a859a2fe6a6e 2a8c19cc9ab51c68fade1351d0ddd533f6ae630c M	Tools
Current branch master is up to date.
ERROR: Not all changes have been committed into SVN, however the committed
ones (if any) seem to be successfully integrated into the working tree.
Please see the above messages for details.


Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	Tools/ChangeLog

ERROR from SVN:
Item is out of date: File '/trunk/Tools/ChangeLog' is out of date
W: fb5beed2b61cc9bc0f5e14278ad5dd0c882ec74d and refs/remotes/origin/master differ, using rebase:
:040000 040000 bfac9e8560236c3df3aa663672e9a859a2fe6a6e 2a8c19cc9ab51c68fade1351d0ddd533f6ae630c M	Tools
Current branch master is up to date.
ERROR: Not all changes have been committed into SVN, however the committed
ones (if any) seem to be successfully integrated into the working tree.
Please see the above messages for details.


Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit
Updating OpenSource
Current branch master is up to date.

Full output: http://webkit-queues.webkit.org/results/8212515
Comment 27 Igalia-pontevedra EWS 2018-06-18 07:30:43 PDT
Comment on attachment 342782 [details]
Patch

Attachment 342782 [details] did not pass gtk-wk2-ews (gtk-wk2):
Output: http://webkit-queues.webkit.org/results/8231543

New failing tests:
http/tests/misc/cached-scripts.html
Comment 28 Igalia-pontevedra EWS 2018-06-18 07:30:48 PDT
Created attachment 342934 [details]
Archive of layout-test-results from ltilve-gtk-wk2-ews for gtk-wk2

The attached test failures were seen while running run-webkit-tests on the gtk-wk2-ews.
Bot: ltilve-gtk-wk2-ews  Port: gtk-wk2  Platform: Linux-4.16.0-0.bpo.1-amd64-x86_64-with-debian-9.4
Comment 29 Michael Catanzaro 2018-06-18 10:47:27 PDT
(In reply to Igalia-pontevedra EWS from comment #27)
> Comment on attachment 342782 [details]
> Patch
> 
> Attachment 342782 [details] did not pass gtk-wk2-ews (gtk-wk2):
> Output: http://webkit-queues.webkit.org/results/8231543
> 
> New failing tests:
> http/tests/misc/cached-scripts.html

Reported bug #186778
Comment 30 Leo Balter 2018-06-18 10:57:47 PDT
Created attachment 342951 [details]
Patch
Comment 31 WebKit Commit Bot 2018-06-19 11:59:57 PDT
Comment on attachment 342951 [details]
Patch

Clearing flags on attachment: 342951

Committed r232972: <https://trac.webkit.org/changeset/232972>
Comment 32 WebKit Commit Bot 2018-06-19 11:59:59 PDT
All reviewed patches have been landed.  Closing bug.
Comment 33 Radar WebKit Bug Importer 2018-06-19 12:02:39 PDT
<rdar://problem/41258631>
Comment 34 Dawei Fenton (:realdawei) 2018-06-19 15:01:49 PDT
Seeing perl test failure after this latest revision

for instance: https://build.webkit.org/builders/Apple%20Sierra%20Release%20WK2%20(Tests)/builds/10036/steps/webkitperl-test/logs/stdio


Can't exec "/usr/bin/non-existent-command": No such file or directory at /Volumes/Data/slave/sierra-release-tests-wk2/build/Tools/Scripts/VCSUtils.pm line 2403.
Failed to exec(): No such file or directory at /Volumes/Data/slave/sierra-release-tests-wk2/build/Tools/Scripts/VCSUtils.pm line 2403.

and

#   Failed test 'expectations yaml file format'
#   at Tools/Scripts/webkitperl/test262_unittest/test262-runner-tests.pl line 157.
# Looks like you failed 2 tests of 13.
Comment 35 Leo Balter 2018-06-19 15:43:56 PDT
Thanks for the report, David. 

I filed a new patch here: https://bugs.webkit.org/show_bug.cgi?id=186824