227715 – [webkitcorepy] run-webkit-tests may hang with python2 after r271683

RESOLVED FIXED 227715

[webkitcorepy] run-webkit-tests may hang with python2 after r271683

https://bugs.webkit.org/show_bug.cgi?id=227715

Summary [webkitcorepy] run-webkit-tests may hang with python2 after r271683

Carlos Alberto Lopez Perez

Reported 2021-07-06 11:24:26 PDT

Checking the testing EWS for GTK WK2 layout test I noticed most of them were failing to complete the layout-test step and they were hanging at the end. Example: https://ews-build.webkit-uat.org/#/builders/34/builds/34722 09:54:19.733 3 worker/5 imported/w3c/web-platform-tests/service-workers/service-worker/fetch-cors-xhr.https.html passed 09:54:19.748 3 worker/15 imported/w3c/web-platform-tests/streams/piping/flow-control.any.worker.html passed 09:54:19.778 3 worker/6 imported/w3c/web-platform-tests/streams/transform-streams/lipfuzz.any.worker.html passed 09:54:19.780 3 worker/2 "ruby -I /app/webkit/Websites/bugs.webkit.org/PrettyPatch /app/webkit/Websites/bugs.webkit.org/PrettyPatch/prettify.rb /home/ews/worker/GTK-WK2-Tests-EWS/build/layout-test-results/imported/w3c/web-platform-tests/streams/readable-streams/general.any-diff.txt" took 0.18s 09:54:19.780 3 [42759/54367] imported/w3c/web-platform-tests/streams/readable-streams/general.any.html failed unexpectedly (text diff) 09:54:20.079 3 worker/2 imported/w3c/web-platform-tests/streams/readable-streams/general.any.html failed: 09:54:20.079 3 worker/2 text diff 09:54:20.128 3 Some workers failed to gracefully shut down, but in-flight exception taking precedence command timed out: 1200 seconds without output running ['python3', 'Tools/Scripts/run-webkit-tests', '--no-build', '--no-show-results', '--no-new-test-results', '--clobber-old-results', '--release', '--gtk', '--results-directory', 'layout-test-results', '--debug-rwt-logging', '--exit-after-n-failures', '30', '--skip-failing-tests'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=1895.925172 The issue happens because a combination of two things: - The GTK workers are still using python2 for running the layout tests. This is because the shebang on the scripts points to "python" that defaults to python2. And even if you run the script with "python3" like in "python3 Tools/Scripts/run-webkit-tests" you actually end running it with python2 because of the flatpak-sdk and how the process are re-executed inside the container. We need to change the shebangs here. - A race condition happens when the layout tests are run with "--exit-after-n-failures N" and more than N unexpected failures happen when RWT is executed with python2. When "--exit-after-n-failures" triggers because more unexpected failures than N happen, then a TestRunInterruptedException() exception is raised from the worker and then on the exit handler for the task-queue the workers are terminated via os.kill(). Then the queue of workers is closed, but that causes a hang with python2 when waiting for the threads to be joined. This issue is not reproducible with python3. A way to reproduce this that worked for me reliable is: # Invalidate all the expected results for the css3 tests (truncate them to 10 chars) $ find LayoutTests/css3 -name \*expected.txt -exec truncate -s10 '{}' \; # Run the tests with python2 $ python2 Tools/Scripts/run-webkit-tests --no-build --release --gtk --debug-rwt-logging --exit-after-n-failures 100 fast css3

Attachments
Patch (1.86 KB, patch) 2021-07-06 11:38 PDT, Carlos Alberto Lopez Perez	no flags	Details Formatted Diff Diff
Patch (1.83 KB, patch) 2021-07-08 17:45 PDT, Carlos Alberto Lopez Perez	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Carlos Alberto Lopez Perez

Comment 1 2021-07-06 11:38:04 PDT

Created attachment 432954 [details] Patch

Carlos Alberto Lopez Perez

Comment 2 2021-07-06 11:47:18 PDT

(In reply to Carlos Alberto Lopez Perez from comment #0) > > The issue happens because a combination of two things: > > - The GTK workers are still using python2 for running the layout tests. > This is because the shebang on the scripts points to "python" that defaults > to python2. And even if you run the script with "python3" like in "python3 > Tools/Scripts/run-webkit-tests" you actually end running it with python2 > because of the flatpak-sdk and how the process are re-executed inside the > container. We need to change the shebangs here. Seems running with python2 is still the default everywhere but in the UAT? I got confused because layout tests are running with a python3 prefix on https://ews-build.webkit-uat.org/ but not on https://ews-build.webkit.org/

Aakash Jain

Comment 3 2021-07-06 12:12:16 PDT

(In reply to Carlos Alberto Lopez Perez from comment #2) > Seems running with python2 is still the default everywhere but in the UAT? That's right. Jonathan was using UAT for testing Python 3 change. It has not yet landed. See https://bugs.webkit.org/show_bug.cgi?id=226658

Carlos Alberto Lopez Perez

Comment 4 2021-07-06 16:43:30 PDT

(In reply to Aakash Jain from comment #3) > (In reply to Carlos Alberto Lopez Perez from comment #2) > > Seems running with python2 is still the default everywhere but in the UAT? > That's right. Jonathan was using UAT for testing Python 3 change. It has not > yet landed. See https://bugs.webkit.org/show_bug.cgi?id=226658 I see. Then this issue potentially affects all EWS runners because they run with '--exit-after-n-failures 30'. In the event that more than 30 unexpected failures happen then run-webkit-tests will hang (at least on Linux)

Michael Catanzaro

Comment 5 2021-07-08 04:24:14 PDT

(In reply to Carlos Alberto Lopez Perez from comment #0) > And even if you run the script with "python3" like in "python3 > Tools/Scripts/run-webkit-tests" you actually end running it with python2 > because of the flatpak-sdk and how the process are re-executed inside the > container. We need to change the shebangs here. Note: in freedesktop-sdk, /usr/bin/python is python3. It's only python2 in our runtime because we build our own python2. I believe that is done specifically for it to be used by run-webkit-tests, but maybe we have other developer scripts that still require python2? Phil would probably know for sure.

Philippe Normand

Comment 6 2021-07-08 04:50:17 PDT

https://bugs.webkit.org/show_bug.cgi?id=221043#c7

Carlos Alberto Lopez Perez

Comment 7 2021-07-08 17:22:20 PDT

Can someone please review this? Now this is causing issues on our Debug bot: See https://build.webkit.org/#/builders/63/builds/1638 15:27:52.105 3 [199/58328] accessibility/adjacent-continuations-cause-assertion-failure.html failed unexpectedly (WebKitWebProcess crashed [pid=11420]) 15:27:53.613 3 worker/0 killing driver 15:27:53.613 3 worker/0 accessibility/adjacent-continuations-cause-assertion-failure.html failed: 15:27:53.613 3 worker/0 WebKitWebProcess crashed [pid=11420] 15:27:53.664 3 Some workers failed to gracefully shut down, but in-flight exception taking precedence command timed out: 1200 seconds without output running ['python', './Tools/Scripts/run-webkit-tests', '--no-build', '--no-show-results', '--no-new-test-results', '--clobber-old-results', '--builder-name', 'GTK-Linux-64-bit-Debug-Tests', '--build-number', '1638', '--buildbot-worker', 'gtk-linux-bot-7', '--buildbot-master', 'build.webkit.org', '--report', 'https://results.webkit.org', '--exit-after-n-crashes-or-timeouts', '50', '--exit-after-n-failures', '500', '--debug', '--gtk', '--results-directory', 'layout-test-results', '--debug-rwt-logging'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=1635.329821 It seems some commit caused more than 500 failures and the bot should be exiting early, but instead of exiting early is hanging due to this bug.

Carlos Alberto Lopez Perez

Comment 8 2021-07-08 17:41:55 PDT

(In reply to Carlos Alberto Lopez Perez from comment #7) > It seems some commit caused more than 500 failures and the bot should be > exiting early, but instead of exiting early is hanging due to this bug. Actually is crashing and exiting due to more than 50 crashes :\

Carlos Alberto Lopez Perez

Comment 9 2021-07-08 17:45:27 PDT

Created attachment 433190 [details] Patch Just an edit on the comment

EWS

Comment 10 2021-07-09 01:10:26 PDT

Committed r279775 (239544@main): <https://commits.webkit.org/239544@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 433190 [details].

Radar WebKit Bug Importer

Comment 11 2021-07-09 01:11:17 PDT

<rdar://problem/80364971>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Carlos Alberto Lopez Perez

Reported

2021-07-06 11:24 PDT

Modified

2021-07-09 01:11 PDT History

CC List

8 users Show

URL

Keywords InRadar

Depends on

Blocks