Bug 213424 - compositing/video/video-border-radius-clipping.html was a flaky failure after r263223
Summary: compositing/video/video-border-radius-clipping.html was a flaky failure after...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Geoffrey Garen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-06-19 22:09 PDT by Geoffrey Garen
Modified: 2020-06-22 19:34 PDT (History)
10 users (show)

See Also:


Attachments
Patch (3.19 KB, patch)
2020-06-19 22:48 PDT, Geoffrey Garen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Geoffrey Garen 2020-06-19 22:09:08 PDT
compositing/video/video-border-radius-clipping.html was a flaky failure after r263223
Comment 1 Geoffrey Garen 2020-06-19 22:48:11 PDT
Created attachment 402377 [details]
Patch
Comment 2 Alexey Proskuryakov 2020-06-20 11:22:06 PDT
Comment on attachment 402377 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402377&action=review

> LayoutTests/ChangeLog:13
> +        I took a screen recording of a few hundred loads of this test @ r263222,
> +        and it looks like it was always possible for canplaythrough and seeked
> +        to fire before the video had rendered its first frame. So, this is a
> +        test issue.

canplaythrough is indeed obviously unrelated to rendering the first frame, but dispatching seeked without rendering sounds like an actual bug, not a test issue. It this really allowed?
Comment 3 Geoffrey Garen 2020-06-20 12:21:08 PDT
> > LayoutTests/ChangeLog:13
> > +        I took a screen recording of a few hundred loads of this test @ r263222,
> > +        and it looks like it was always possible for canplaythrough and seeked
> > +        to fire before the video had rendered its first frame. So, this is a
> > +        test issue.
> 
> canplaythrough is indeed obviously unrelated to rendering the first frame,
> but dispatching seeked without rendering sounds like an actual bug, not a
> test issue. It this really allowed?

You're asking the wrong question. This is a test of border radius. When we analyze a change to this test, we should ask whether the test still correctly measures border radius.

That said, my reading is that it is allowed for the seeked event to fire before a video frame has rendered. The specification never speaks of what's rendered onscreen. For seeking, it says, "Wait until the user agent has established whether or not the media data for the new playback position is available, and, if it is, until it has decoded enough data to play back that position." So, seeked is fired when the seeked position has decoded, not when it has rendered onscreen.
Comment 4 Alexey Proskuryakov 2020-06-20 13:03:36 PDT
Well, there are two questions here. One is the one you think is right, whether the test still correctly measures border radius. Another is whether the test discovered an actual bug, introduced or triggered by changes in r263223. I did note that you observed something similar with r263222, but that doesn't explain why the test regressed.

Assuming that your goal is to re-land r263223, we need an answer to the second question. Sounds like you think that the test always relied on something unspecified, so the change is OK and won't break websites. Let's have media experts confirm your understanding.
Comment 5 Geoffrey Garen 2020-06-20 14:49:44 PDT
> I did note that you observed something similar with r263222, but
> that doesn't explain why the test regressed.

Actually, it does.

The timing of a screenshot vs a media element's events is flaky. It was flaky in December (r253310), it was flaky in r263222, and it was flaky in r263223. Flaky tests sometimes cause bot failures; that does not indicate a regression in behavior.

> Assuming that your goal is to re-land r263223, we need an answer to the
> second question. Sounds like you think that the test always relied on
> something unspecified, so the change is OK and won't break websites. Let's
> have media experts confirm your understanding.

Websites don't take screenshots.
Comment 6 Alexey Proskuryakov 2020-06-20 16:16:08 PDT
No, general flakiness does not explain the observed regression. You could say that it’s not worth investigating extra flakiness when the whole area is flaky, and that would be fairly convincing. But claiming that you have an explanation at this point is untrue. 

We’ve seen plenty of processing done by websites via capturing video frames. So not sure what you are talking about with regards to websites not taking screenshots. Literal screenshots, they do not, but rendering timing bugs are not limited to literal screenshots.
Comment 7 Darin Adler 2020-06-21 13:13:01 PDT
Comment on attachment 402377 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402377&action=review

>> LayoutTests/ChangeLog:13
>> +        test issue.
> 
> canplaythrough is indeed obviously unrelated to rendering the first frame, but dispatching seeked without rendering sounds like an actual bug, not a test issue. It this really allowed?

I suggest investigating that further with a purpose-built test, rather than testing it accidentally with this test.
Comment 8 EWS 2020-06-21 13:20:27 PDT
Committed r263332: <https://trac.webkit.org/changeset/263332>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 402377 [details].
Comment 9 Radar WebKit Bug Importer 2020-06-21 13:21:15 PDT
<rdar://problem/64578547>
Comment 10 Alexey Proskuryakov 2020-06-22 09:08:54 PDT
> I suggest investigating that further with a purpose-built test, rather than testing it accidentally with this test.

I consider re-landing r263223 blocked until this investigation is complete. Rewriting a test to swipe a regression under the carpet does not resolve the regression.
Comment 11 Geoffrey Garen 2020-06-22 10:09:47 PDT
(In reply to Alexey Proskuryakov from comment #10)
> > I suggest investigating that further with a purpose-built test, rather than testing it accidentally with this test.
> 
> I consider re-landing r263223 blocked until this investigation is complete.

My investigation is complete. It shows that the canplaythrough and seeked events, by pre-existing behavior and by specification, do not synchronize with on-screen rendering. If you see something more to investigate here, you're welcome to do so.

I'm not going to withhold a patch that has been reviewed and that passes all tests.

If you'd like to propose a new standard for patches, that says they must pass review, and all tests, and personal approval by Alexey Proskuryakov, and all tests that could possibly have been written even though they can't be specifically described and they may not relate to any specified or necessary behavior, please send email to webkit-dev and we can consider that policy change as a team.
Comment 12 Alexey Proskuryakov 2020-06-22 10:17:42 PDT
There is no new standard here. Your patch broke tests, you don't know why the change is OK, so you just changed the tests until they passed.

Now you are just escalating to personal attacks instead of civil discussion.
Comment 13 Darin Adler 2020-06-22 12:25:54 PDT
I agree that the test detected a change in behavior.

But I am thinking it did not detect a change in behavior that we should care about. Instead they detected a way to accidentally depend on race conditions. Certainly website developers could make the same kinds of mistakes and there’s a reason to want tests that check these sorts of things. I’m not sure how to write those tests, though, since the code wouldn’t and shouldn’t try to time things consistently.

We could probe this further, but I’m not sure there’s reason to believe this is worthwhile.
Comment 14 Alexey Proskuryakov 2020-06-22 18:38:08 PDT
The test detected the race getting more likely to occur.

My point is that no one checked whether there is a race here in other browsers. And Geoff not finding it in the previously unfamiliar spec is weak evidence that the racy behavior is OK. This is why I was suggesting to have media experts weigh in before it got personal.

If something works 100% in other browsers, and mostly works in shipping Safari, saying that we are free to make it fail more in Safari makes no sense.

As for this being worthwhile to probe further - I think that non-urgent refactoring that makes observed behavior worse is a questionable trade-off.
Comment 15 Geoffrey Garen 2020-06-22 19:34:15 PDT
Not sure what you mean by "personal attacks" and "got personal". I wish you would directly address my reasonable technical and project comments, rather than dismissing them as "personal" and not "civil".