205291 – Python 3: Add support to run-webkit-tests

RESOLVED FIXED 205291

Python 3: Add support to run-webkit-tests

https://bugs.webkit.org/show_bug.cgi?id=205291

Summary Python 3: Add support to run-webkit-tests

Jonathan Bedard

Reported 2019-12-16 11:26:53 PST

We should be able to run run-webkit-tests with Python 3.

Attachments
Patch (62.63 KB, patch) 2019-12-16 13:30 PST, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (62.58 KB, patch) 2019-12-16 17:42 PST, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (63.04 KB, patch) 2020-01-09 08:37 PST, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch for landing (3.87 KB, patch) 2020-01-10 09:11 PST, Jonathan Bedard	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Jonathan Bedard

Comment 1 2019-12-16 13:30:59 PST

Created attachment 385803 [details] Patch

Stephanie Lewis

Comment 2 2019-12-16 17:08:58 PST

Comment on attachment 385803 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385803&action=review > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:182 > + self.should_test_processes = not self._platform.is_win() and sys.version_info < (3, 0) This seems like a huge change. Do we know what the downsides are going to be? > Tools/Scripts/webkitpy/layout_tests/views/printing.py:242 > + if timing[1] > min_seconds_to_print: why the change? > Tools/Scripts/webkitpy/port/driver.py:652 > + if early_return: This doesn't seem any more readable to me than just using the if > Tools/Scripts/webkitpy/port/driver.py:695 > + if out_line[-1] != '\n' and out_line[-1] != 10: Why do we need the second check? > Tools/Scripts/webkitpy/port/image_diff.py:62 > + self._process.write(actual_contents) Have you compared the output? You're inserting a newline here where there wasn't one.

Jonathan Bedard

Comment 3 2019-12-16 17:28:52 PST

Comment on attachment 385803 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=385803&action=review >> Tools/Scripts/webkitpy/layout_tests/views/printing.py:242 >> + if timing[1] > min_seconds_to_print: > > why the change? The old code was just wrong, not sure how we lived so long with it. Line 236 is: timings.append((directory, round(stats[directory]['total_time'], 1), stats[directory]['num_tests'])) Clearly, timing[0] is the directory, timing[1] is the time the directory took. >> Tools/Scripts/webkitpy/port/driver.py:695 >> + if out_line[-1] != '\n' and out_line[-1] != 10: > > Why do we need the second check? Because Python 2 is silly. The second check is actually the correct one, but Python 2 doesn't like comparing characters against the integer representations of those same characters, so 'foo\n'[-1] != 10. On the other hand, Python 3 doesn't like comparing bytes with characters, so b'foo\n'[-1] != '\n', and even b'foo\n'[-1] != b'\n' won't work in Python 3 >> Tools/Scripts/webkitpy/port/image_diff.py:62 >> + self._process.write(actual_contents) > > Have you compared the output? You're inserting a newline here where there wasn't one. I think there was a newline before, just hiding between %d and %s: self._process.write('Content-Length: %d\n%s...

Jonathan Bedard

Comment 4 2019-12-16 17:42:17 PST

Created attachment 385839 [details] Patch

Alexey Proskuryakov

Comment 5 2019-12-17 09:22:34 PST

>>> + if timing[1] > min_seconds_to_print: >> >> why the change? > > > The old code was just wrong, not sure how we lived so long with it. How does this change the output? I never paid very close attention to the timing printed at the end of test runs, but what I did see appeared reasonable.

Jonathan Bedard

Comment 6 2019-12-17 09:38:24 PST

(In reply to Alexey Proskuryakov from comment #5) > >>> + if timing[1] > min_seconds_to_print: > >> > >> why the change? > > > > > > The old code was just wrong, not sure how we lived so long with it. > > How does this change the output? I never paid very close attention to the > timing printed at the end of test runs, but what I did see appeared > reasonable. I think the old behavior would have ended up being pretty non-deterministic. We were comparing a string against some limit, the string probably would have usually resolved to something "greater" than min_seconds_to_print. In effect, we probably were printing more than expected, but I'm not exactly sure, and it would depend on the directory name.

WebKit Commit Bot

Comment 7 2019-12-19 17:06:30 PST

Comment on attachment 385839 [details] Patch Clearing flags on attachment: 385839 Committed r253804: <https://trac.webkit.org/changeset/253804>

WebKit Commit Bot

Comment 8 2019-12-19 17:06:31 PST

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 9 2019-12-19 17:07:25 PST

<rdar://problem/58097078>

WebKit Commit Bot

Comment 10 2019-12-28 19:35:33 PST

Re-opened since this is blocked by bug 205622

Alexey Proskuryakov

Comment 11 2019-12-28 19:37:07 PST

I'm seeing a lot of strange test failures that seem to have started after this patch. Will try rolling back, and will re-land if it doesn't help.

Alexey Proskuryakov

Comment 12 2019-12-29 12:31:17 PST

The rollback did eliminate the "diff (null%)" failures seen on many tests. These same tests continue to fail with larger diffs on other hardware though. I think that the regression from this patch is about dealing with reftest mismatches that are within noise level. Before: 04:02:09.776 30742 svg/gradients/spreadMethodDiagonal4.svg -> ref test hashes didn't match but diff passed After: 22:24:23.922 2955 svg/gradients/spreadMethodDiagonal4.svg : ImageDiff timed out 22:24:23.923 2955 ref test mismatch did not produce an image diff. 22:24:23.924 2911 [48488/52894] svg/gradients/spreadMethodDiagonal4.svg failed unexpectedly (reference mismatch)

Jonathan Bedard

Comment 13 2020-01-09 08:37:31 PST

Created attachment 387233 [details] Patch

Jonathan Bedard

Comment 14 2020-01-09 09:05:14 PST

(In reply to Alexey Proskuryakov from comment #12) > The rollback did eliminate the "diff (null%)" failures seen on many tests. > These same tests continue to fail with larger diffs on other hardware though. > > I think that the regression from this patch is about dealing with reftest > mismatches that are within noise level. > > Before: > > 04:02:09.776 30742 svg/gradients/spreadMethodDiagonal4.svg -> ref test > hashes didn't match but diff passed > > After: > > 22:24:23.922 2955 svg/gradients/spreadMethodDiagonal4.svg : ImageDiff > timed out > 22:24:23.923 2955 ref test mismatch did not produce an image diff. > 22:24:23.924 2911 [48488/52894] svg/gradients/spreadMethodDiagonal4.svg > failed unexpectedly (reference mismatch) I am pretty sure that this was caused by a failure to decode the content-length. I ran a test with the decode, with the previous patch and now I'm comparing those results against no patch at all.

Jonathan Bedard

Comment 15 2020-01-09 10:01:51 PST

(In reply to Jonathan Bedard from comment #14) > (In reply to Alexey Proskuryakov from comment #12) > > The rollback did eliminate the "diff (null%)" failures seen on many tests. > > These same tests continue to fail with larger diffs on other hardware though. > > > > I think that the regression from this patch is about dealing with reftest > > mismatches that are within noise level. > > > > Before: > > > > 04:02:09.776 30742 svg/gradients/spreadMethodDiagonal4.svg -> ref test > > hashes didn't match but diff passed > > > > After: > > > > 22:24:23.922 2955 svg/gradients/spreadMethodDiagonal4.svg : ImageDiff > > timed out > > 22:24:23.923 2955 ref test mismatch did not produce an image diff. > > 22:24:23.924 2911 [48488/52894] svg/gradients/spreadMethodDiagonal4.svg > > failed unexpectedly (reference mismatch) > > I am pretty sure that this was caused by a failure to decode the > content-length. I ran a test with the decode, with the previous patch and > now I'm comparing those results against no patch at all. Was able to confirm this is the problem by looking at test results without the change. This makes sense, bytes are likely using their ASCII representation when cast to integers rather than the number that the string actually represents. ImageDiff was timing out because it was waiting on data that would never come.

WebKit Commit Bot

Comment 16 2020-01-10 08:16:26 PST

Comment on attachment 387233 [details] Patch Clearing flags on attachment: 387233 Committed r254340: <https://trac.webkit.org/changeset/254340>

WebKit Commit Bot

Comment 17 2020-01-10 08:16:28 PST

All reviewed patches have been landed. Closing bug.

Jonathan Bedard

Comment 18 2020-01-10 09:11:09 PST

Reopening to attach new patch.

Jonathan Bedard

Comment 19 2020-01-10 09:11:10 PST

Created attachment 387344 [details] Patch for landing

Jonathan Bedard

Comment 20 2020-01-10 09:16:59 PST

(In reply to Jonathan Bedard from comment #19) > Created attachment 387344 [details] > Patch for landing Accidentally broke some Python 3 unit tests in this process, fixing those. We aren't running them in automation yet.

WebKit Commit Bot

Comment 21 2020-01-10 10:14:09 PST

Comment on attachment 387344 [details] Patch for landing Clearing flags on attachment: 387344 Committed r254346: <https://trac.webkit.org/changeset/254346>

WebKit Commit Bot

Comment 22 2020-01-10 10:14:11 PST

All reviewed patches have been landed. Closing bug.

Jonathan Bedard

Comment 23 2020-01-17 13:06:57 PST

Committed r254762: <https://trac.webkit.org/changeset/254762>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Jonathan Bedard

Reported

2019-12-16 11:26 PST

Modified

2020-01-17 13:06 PST History

CC List

7 users Show

URL

Keywords InRadar

Depends on

205622

Blocks

Dependencies

tree graph