Bug 205291

Summary: Python 3: Add support to run-webkit-tests
Product: WebKit Reporter: Jonathan Bedard <jbedard>
Component: Tools / TestsAssignee: Jonathan Bedard <jbedard>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, ap, commit-queue, ews-watchlist, glenn, slewis, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=205280
https://bugs.webkit.org/show_bug.cgi?id=184986
https://bugs.webkit.org/show_bug.cgi?id=205622
Bug Depends on: 205622    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch for landing none

Description Jonathan Bedard 2019-12-16 11:26:53 PST
We should be able to run run-webkit-tests with Python 3.
Comment 1 Jonathan Bedard 2019-12-16 13:30:59 PST
Created attachment 385803 [details]
Patch
Comment 2 Stephanie Lewis 2019-12-16 17:08:58 PST
Comment on attachment 385803 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=385803&action=review

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:182
> +        self.should_test_processes = not self._platform.is_win() and sys.version_info < (3, 0)

This seems like a huge change.  Do we know what the downsides are going to be?

> Tools/Scripts/webkitpy/layout_tests/views/printing.py:242
> +            if timing[1] > min_seconds_to_print:

why the change?

> Tools/Scripts/webkitpy/port/driver.py:652
> +        if early_return:

This doesn't seem any more readable to me than just using the if

> Tools/Scripts/webkitpy/port/driver.py:695
> +                if out_line[-1] != '\n' and out_line[-1] != 10:

Why do we need the second check?

> Tools/Scripts/webkitpy/port/image_diff.py:62
> +            self._process.write(actual_contents)

Have you compared the output?  You're inserting a newline here where there wasn't one.
Comment 3 Jonathan Bedard 2019-12-16 17:28:52 PST
Comment on attachment 385803 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=385803&action=review

>> Tools/Scripts/webkitpy/layout_tests/views/printing.py:242
>> +            if timing[1] > min_seconds_to_print:
> 
> why the change?

The old code was just wrong, not sure how we lived so long with it.

Line 236 is:

timings.append((directory, round(stats[directory]['total_time'], 1), stats[directory]['num_tests']))

Clearly, timing[0] is the directory, timing[1] is the time the directory took.

>> Tools/Scripts/webkitpy/port/driver.py:695
>> +                if out_line[-1] != '\n' and out_line[-1] != 10:
> 
> Why do we need the second check?

Because Python 2 is silly.

The second check is actually the correct one, but Python 2 doesn't like comparing characters against the integer representations of those same characters, so 'foo\n'[-1] != 10.

On the other hand, Python 3 doesn't like comparing bytes with characters, so b'foo\n'[-1] != '\n', and even b'foo\n'[-1] != b'\n' won't work in Python 3

>> Tools/Scripts/webkitpy/port/image_diff.py:62
>> +            self._process.write(actual_contents)
> 
> Have you compared the output?  You're inserting a newline here where there wasn't one.

I think there was a newline before, just hiding between %d and %s:

self._process.write('Content-Length: %d\n%s...
Comment 4 Jonathan Bedard 2019-12-16 17:42:17 PST
Created attachment 385839 [details]
Patch
Comment 5 Alexey Proskuryakov 2019-12-17 09:22:34 PST
>>> +            if timing[1] > min_seconds_to_print:
>> 
>> why the change?
>
>
> The old code was just wrong, not sure how we lived so long with it.

How does this change the output? I never paid very close attention to the timing printed at the end of test runs, but what I did see appeared reasonable.
Comment 6 Jonathan Bedard 2019-12-17 09:38:24 PST
(In reply to Alexey Proskuryakov from comment #5)
> >>> +            if timing[1] > min_seconds_to_print:
> >> 
> >> why the change?
> >
> >
> > The old code was just wrong, not sure how we lived so long with it.
> 
> How does this change the output? I never paid very close attention to the
> timing printed at the end of test runs, but what I did see appeared
> reasonable.

I think the old behavior would have ended up being pretty non-deterministic. We were comparing a string against some limit, the string probably would have usually resolved to something "greater" than min_seconds_to_print. In effect, we probably were printing more than expected, but I'm not exactly sure, and it would depend on the directory name.
Comment 7 WebKit Commit Bot 2019-12-19 17:06:30 PST
Comment on attachment 385839 [details]
Patch

Clearing flags on attachment: 385839

Committed r253804: <https://trac.webkit.org/changeset/253804>
Comment 8 WebKit Commit Bot 2019-12-19 17:06:31 PST
All reviewed patches have been landed.  Closing bug.
Comment 9 Radar WebKit Bug Importer 2019-12-19 17:07:25 PST
<rdar://problem/58097078>
Comment 10 WebKit Commit Bot 2019-12-28 19:35:33 PST
Re-opened since this is blocked by bug 205622
Comment 11 Alexey Proskuryakov 2019-12-28 19:37:07 PST
I'm seeing a lot of strange test failures that seem to have started after this patch. Will try rolling back, and will re-land if it doesn't help.
Comment 12 Alexey Proskuryakov 2019-12-29 12:31:17 PST
The rollback did eliminate the "diff (null%)" failures seen on many tests. These same tests continue to fail with larger diffs on other hardware though.

I think that the regression from this patch is about dealing with reftest mismatches that are within noise level.

Before:

04:02:09.776 30742   svg/gradients/spreadMethodDiagonal4.svg -> ref test hashes didn't match but diff passed

After:

22:24:23.922 2955   svg/gradients/spreadMethodDiagonal4.svg : ImageDiff timed out
22:24:23.923 2955 ref test mismatch did not produce an image diff.
22:24:23.924 2911 [48488/52894] svg/gradients/spreadMethodDiagonal4.svg failed unexpectedly (reference mismatch)
Comment 13 Jonathan Bedard 2020-01-09 08:37:31 PST
Created attachment 387233 [details]
Patch
Comment 14 Jonathan Bedard 2020-01-09 09:05:14 PST
(In reply to Alexey Proskuryakov from comment #12)
> The rollback did eliminate the "diff (null%)" failures seen on many tests.
> These same tests continue to fail with larger diffs on other hardware though.
> 
> I think that the regression from this patch is about dealing with reftest
> mismatches that are within noise level.
> 
> Before:
> 
> 04:02:09.776 30742   svg/gradients/spreadMethodDiagonal4.svg -> ref test
> hashes didn't match but diff passed
> 
> After:
> 
> 22:24:23.922 2955   svg/gradients/spreadMethodDiagonal4.svg : ImageDiff
> timed out
> 22:24:23.923 2955 ref test mismatch did not produce an image diff.
> 22:24:23.924 2911 [48488/52894] svg/gradients/spreadMethodDiagonal4.svg
> failed unexpectedly (reference mismatch)

I am pretty sure that this was caused by a failure to decode the content-length. I ran a test with the decode, with the previous patch and now I'm comparing those results against no patch at all.
Comment 15 Jonathan Bedard 2020-01-09 10:01:51 PST
(In reply to Jonathan Bedard from comment #14)
> (In reply to Alexey Proskuryakov from comment #12)
> > The rollback did eliminate the "diff (null%)" failures seen on many tests.
> > These same tests continue to fail with larger diffs on other hardware though.
> > 
> > I think that the regression from this patch is about dealing with reftest
> > mismatches that are within noise level.
> > 
> > Before:
> > 
> > 04:02:09.776 30742   svg/gradients/spreadMethodDiagonal4.svg -> ref test
> > hashes didn't match but diff passed
> > 
> > After:
> > 
> > 22:24:23.922 2955   svg/gradients/spreadMethodDiagonal4.svg : ImageDiff
> > timed out
> > 22:24:23.923 2955 ref test mismatch did not produce an image diff.
> > 22:24:23.924 2911 [48488/52894] svg/gradients/spreadMethodDiagonal4.svg
> > failed unexpectedly (reference mismatch)
> 
> I am pretty sure that this was caused by a failure to decode the
> content-length. I ran a test with the decode, with the previous patch and
> now I'm comparing those results against no patch at all.

Was able to confirm this is the problem by looking at test results without the change.

This makes sense, bytes are likely using their ASCII representation when cast to integers rather than the number that the string actually represents. ImageDiff was timing out because it was waiting on data that would never come.
Comment 16 WebKit Commit Bot 2020-01-10 08:16:26 PST
Comment on attachment 387233 [details]
Patch

Clearing flags on attachment: 387233

Committed r254340: <https://trac.webkit.org/changeset/254340>
Comment 17 WebKit Commit Bot 2020-01-10 08:16:28 PST
All reviewed patches have been landed.  Closing bug.
Comment 18 Jonathan Bedard 2020-01-10 09:11:09 PST
Reopening to attach new patch.
Comment 19 Jonathan Bedard 2020-01-10 09:11:10 PST
Created attachment 387344 [details]
Patch for landing
Comment 20 Jonathan Bedard 2020-01-10 09:16:59 PST
(In reply to Jonathan Bedard from comment #19)
> Created attachment 387344 [details]
> Patch for landing

Accidentally broke some Python 3 unit tests in this process, fixing those. We aren't running them in automation yet.
Comment 21 WebKit Commit Bot 2020-01-10 10:14:09 PST
Comment on attachment 387344 [details]
Patch for landing

Clearing flags on attachment: 387344

Committed r254346: <https://trac.webkit.org/changeset/254346>
Comment 22 WebKit Commit Bot 2020-01-10 10:14:11 PST
All reviewed patches have been landed.  Closing bug.
Comment 23 Jonathan Bedard 2020-01-17 13:06:57 PST
Committed r254762: <https://trac.webkit.org/changeset/254762>