213424 – compositing/video/video-border-radius-clipping.html was a flaky failure after r263223

RESOLVED FIXED 213424

compositing/video/video-border-radius-clipping.html was a flaky failure after r263223

https://bugs.webkit.org/show_bug.cgi?id=213424

Summary compositing/video/video-border-radius-clipping.html was a flaky failure after...

Geoffrey Garen

Reported 2020-06-19 22:09:08 PDT

compositing/video/video-border-radius-clipping.html was a flaky failure after r263223

Attachments
Patch (3.19 KB, patch) 2020-06-19 22:48 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Geoffrey Garen

Comment 1 2020-06-19 22:48:11 PDT

Created attachment 402377 [details] Patch

Alexey Proskuryakov

Comment 2 2020-06-20 11:22:06 PDT

Comment on attachment 402377 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402377&action=review > LayoutTests/ChangeLog:13 > + I took a screen recording of a few hundred loads of this test @ r263222, > + and it looks like it was always possible for canplaythrough and seeked > + to fire before the video had rendered its first frame. So, this is a > + test issue. canplaythrough is indeed obviously unrelated to rendering the first frame, but dispatching seeked without rendering sounds like an actual bug, not a test issue. It this really allowed?

Geoffrey Garen

Comment 3 2020-06-20 12:21:08 PDT

> > LayoutTests/ChangeLog:13 > > + I took a screen recording of a few hundred loads of this test @ r263222, > > + and it looks like it was always possible for canplaythrough and seeked > > + to fire before the video had rendered its first frame. So, this is a > > + test issue. > > canplaythrough is indeed obviously unrelated to rendering the first frame, > but dispatching seeked without rendering sounds like an actual bug, not a > test issue. It this really allowed? You're asking the wrong question. This is a test of border radius. When we analyze a change to this test, we should ask whether the test still correctly measures border radius. That said, my reading is that it is allowed for the seeked event to fire before a video frame has rendered. The specification never speaks of what's rendered onscreen. For seeking, it says, "Wait until the user agent has established whether or not the media data for the new playback position is available, and, if it is, until it has decoded enough data to play back that position." So, seeked is fired when the seeked position has decoded, not when it has rendered onscreen.

Alexey Proskuryakov

Comment 4 2020-06-20 13:03:36 PDT

Well, there are two questions here. One is the one you think is right, whether the test still correctly measures border radius. Another is whether the test discovered an actual bug, introduced or triggered by changes in r263223. I did note that you observed something similar with r263222, but that doesn't explain why the test regressed. Assuming that your goal is to re-land r263223, we need an answer to the second question. Sounds like you think that the test always relied on something unspecified, so the change is OK and won't break websites. Let's have media experts confirm your understanding.

Geoffrey Garen

Comment 5 2020-06-20 14:49:44 PDT

> I did note that you observed something similar with r263222, but > that doesn't explain why the test regressed. Actually, it does. The timing of a screenshot vs a media element's events is flaky. It was flaky in December (r253310), it was flaky in r263222, and it was flaky in r263223. Flaky tests sometimes cause bot failures; that does not indicate a regression in behavior. > Assuming that your goal is to re-land r263223, we need an answer to the > second question. Sounds like you think that the test always relied on > something unspecified, so the change is OK and won't break websites. Let's > have media experts confirm your understanding. Websites don't take screenshots.

Alexey Proskuryakov

Comment 6 2020-06-20 16:16:08 PDT

No, general flakiness does not explain the observed regression. You could say that it’s not worth investigating extra flakiness when the whole area is flaky, and that would be fairly convincing. But claiming that you have an explanation at this point is untrue. We’ve seen plenty of processing done by websites via capturing video frames. So not sure what you are talking about with regards to websites not taking screenshots. Literal screenshots, they do not, but rendering timing bugs are not limited to literal screenshots.

Darin Adler

Comment 7 2020-06-21 13:13:01 PDT

Comment on attachment 402377 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402377&action=review >> LayoutTests/ChangeLog:13 >> + test issue. > > canplaythrough is indeed obviously unrelated to rendering the first frame, but dispatching seeked without rendering sounds like an actual bug, not a test issue. It this really allowed? I suggest investigating that further with a purpose-built test, rather than testing it accidentally with this test.

EWS

Comment 8 2020-06-21 13:20:27 PDT

Committed r263332: <https://trac.webkit.org/changeset/263332> All reviewed patches have been landed. Closing bug and clearing flags on attachment 402377 [details].

Radar WebKit Bug Importer

Comment 9 2020-06-21 13:21:15 PDT

<rdar://problem/64578547>

Alexey Proskuryakov

Comment 10 2020-06-22 09:08:54 PDT

> I suggest investigating that further with a purpose-built test, rather than testing it accidentally with this test. I consider re-landing r263223 blocked until this investigation is complete. Rewriting a test to swipe a regression under the carpet does not resolve the regression.

Geoffrey Garen

Comment 11 2020-06-22 10:09:47 PDT

(In reply to Alexey Proskuryakov from comment #10) > > I suggest investigating that further with a purpose-built test, rather than testing it accidentally with this test. > > I consider re-landing r263223 blocked until this investigation is complete. My investigation is complete. It shows that the canplaythrough and seeked events, by pre-existing behavior and by specification, do not synchronize with on-screen rendering. If you see something more to investigate here, you're welcome to do so. I'm not going to withhold a patch that has been reviewed and that passes all tests. If you'd like to propose a new standard for patches, that says they must pass review, and all tests, and personal approval by Alexey Proskuryakov, and all tests that could possibly have been written even though they can't be specifically described and they may not relate to any specified or necessary behavior, please send email to webkit-dev and we can consider that policy change as a team.

Alexey Proskuryakov

Comment 12 2020-06-22 10:17:42 PDT

There is no new standard here. Your patch broke tests, you don't know why the change is OK, so you just changed the tests until they passed. Now you are just escalating to personal attacks instead of civil discussion.

Darin Adler

Comment 13 2020-06-22 12:25:54 PDT

I agree that the test detected a change in behavior. But I am thinking it did not detect a change in behavior that we should care about. Instead they detected a way to accidentally depend on race conditions. Certainly website developers could make the same kinds of mistakes and there’s a reason to want tests that check these sorts of things. I’m not sure how to write those tests, though, since the code wouldn’t and shouldn’t try to time things consistently. We could probe this further, but I’m not sure there’s reason to believe this is worthwhile.

Alexey Proskuryakov

Comment 14 2020-06-22 18:38:08 PDT

The test detected the race getting more likely to occur. My point is that no one checked whether there is a race here in other browsers. And Geoff not finding it in the previously unfamiliar spec is weak evidence that the racy behavior is OK. This is why I was suggesting to have media experts weigh in before it got personal. If something works 100% in other browsers, and mostly works in shipping Safari, saying that we are free to make it fail more in Safari makes no sense. As for this being worthwhile to probe further - I think that non-urgent refactoring that makes observed behavior worse is a questionable trade-off.

Geoffrey Garen

Comment 15 2020-06-22 19:34:15 PDT

Not sure what you mean by "personal attacks" and "got personal". I wish you would directly address my reasonable technical and project comments, rather than dismissing them as "personal" and not "civil".

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Geoffrey Garen

Reported

2020-06-19 22:09 PDT

Modified

2020-06-22 19:34 PDT History

CC List

10 users Show

URL

Keywords InRadar

Depends on

Blocks