Bug 19663

Summary: Image animation timers do not account for paint lag
Product: WebKit Reporter: Peter Kasting <pkasting>
Component: ImagesAssignee: Peter Kasting <pkasting>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, darin, hyatt, webkit
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
Bug Depends on:    
Bug Blocks: 14976    
Attachments:
Description Flags
patch v1
none
patch v2
none
patch v3
darin: review-
patch v4
none
patch v5
none
WIP patch, solves (1) - (4)
none
WIP patch, solves (1) - (4), (6), (7)
none
patch for (1) - (7), v1 hyatt: review+

Description Peter Kasting 2008-06-18 17:08:21 PDT
BitmapImage.cpp handles image animation using a startAnimation() function, which sets a timer to call advanceAnimation() after the current frame has been visible for its desired delay time.  advanceAnimation() does NOT call startAnimation() again; instead, it expects the image drawing functions to call this back when the image actually draws.  This prevents images in background tabs from animating and eating CPU.

However, this produces a problem.  Consider an animation which tries to run at 16 FPS (= 62.5 ms per frame).  If the average latency between advanceAnimation() finishing and the drawing functions calling back to startAnimation() is 5 ms, then the actual time between successive frames becomes 67.5 ms, or 14.8 FPS.  While this is not a huge difference, for sites which sync sounds and images (e.g. ytmnd.com), this can lead to images getting slowly out of sync over time.

A bigger problem is what happens if for some reason the drawing lag spikes briefly (say, because the system load spikes).  If drawing takes 80 ms instead of 5 in one of these frames, users will see a noticeable hitch.

There are two possible solutions.  advanceAnimation() could call startAnimation() directly, which would be the most accurate method.  This would mean images would animate in all tabs, which means the aforementioned ytmnd.com tabs wouldn't get out of sync if users switched tabs and switched back, but also means the browser would potentially use more CPU.  I don't think this solution is a good idea without a lot of other motivation to change the current behavior.

The second solution is to simply subtract off the paint delay when setting the timer for the next call to advanceAnimation().  Except in cases where the paint delay exceeds the desired frame duration (rare), this makes the image's effective animation speed match its desired speed, modulo timer inaccuracy.  The actual onscreen periods for each frame would lag the "real" animation times by one paint delay interval, which is probably fine.

Patch to do this coming soon.
Comment 1 Peter Kasting 2008-06-18 17:25:40 PDT
Created attachment 21826 [details]
patch v1

Patch to fix.  This does not account for timer inaccuracy, just paint latency.  We could probably account for timer inaccuracy too by changing the local variable to hold the system time at which we expected to get our next startAnimation() callback, and then using the difference in expected and actual times as the offset to the delay.  It's not clear to me whether that's a big deal.  I'm happy to do it if it would be better.
Comment 2 Peter Kasting 2008-06-18 17:41:57 PDT
Comment on attachment 21826 [details]
patch v1

Meh, I'm liking the patch that takes timer lag into account too more and more.  Let me whip that up.
Comment 3 Darin Adler 2008-06-18 17:56:17 PDT
Comment on attachment 21826 [details]
patch v1

Do we want to skip frames too?
Comment 4 Peter Kasting 2008-06-18 18:04:41 PDT
Created attachment 21827 [details]
patch v2

This patch accounts for both paint lag and timer slop.
Comment 5 Peter Kasting 2008-06-18 22:10:37 PDT
(In reply to comment #3)
> (From update of attachment 21826 [details] [edit])
> Do we want to skip frames too?

Maybe.  I have an idea on how I can do everything "right" without using CPU while the image is in the background tab or something.  Let me try tomorrow.
Comment 6 Peter Kasting 2008-06-20 17:16:19 PDT
Created attachment 21864 [details]
patch v3

New patch.

This not only accounts for paint and timer lag, but skips frames as needed, and in general deals with the case of "hidden" animated images.  Now you can switch tabs and switch back and the image will sync to the proper place.

There are four caveats with this patch:

(1) Because we keep adding small values as doubles to calculate the desired time for something to happen, I'm concerned this can suffer from floating point roundoff error.  That said, the values are generally in the range of 0.05 or higher, and we should be able to add a lot of these for a long time before we find errors.  I think.  (My memory of how roundoff errors accumulate here is weak.)

(2) This may need a conditional for when we are _way_ late on an animation frame, in order to avoid doing excessive work when the user doesn't care.  If a user has a hidden animation for 14 hours, and switches back, we're going to iterate through 14 hours' worth of frames.  Even though we're not doing any drawing or decoding in this process, it's still a lot of iterations.  Perhaps we can say that once the frame is 5 minutes late, the user doesn't care about resyncing.  That should be conservative and still fast enough to have no noticeable impact.

(3) In the case where we want to change immediately, I do it synchronously instead of setting a 0-wait timer.  This means the observers can get notified of a change synchronously here.  I don't know if this is problematic.

(3) I moved startAnimation() calls in the various backends so that in the case where we're skipping frames and need to immediately change the current frame, we get the right frame before grabbing its bits to draw.  This reduces a weird jumpy effect where you switch back to a tab and it draws the old frame and then immediately a new frame.  However, as with (3), this means image observers can get notified of the image changing, synchronously, before we actually draw the image.  Again, I don't know how problematic this is.
Comment 7 Darin Adler 2008-06-23 10:55:22 PDT
Comment on attachment 21864 [details]
patch v3

The approach looks good.

Needs a ChangeLog; I'd have to say review- just because of that.

I'm concerned, as you are, that the loop in BitmapImage::startAnimation can run for a long time. Is there any way for us to prevent pathologically slow case there?

+            nextFrame = (++nextFrame) % frameCount();

Modifying nextFrame on the right side and the left is not legal C, and can have undefined behavior depending on the compiler. The right side must say "nextFrame + 1" rather than "++nextFrame".

review- just because of this ++nextFrame issue, but this looks about ready to go.
Comment 8 Peter Kasting 2008-06-23 11:43:23 PDT
Created attachment 21886 [details]
patch v4

Yep, I noticed the missing ChangeLog about an hour ago, oops.

Here's a new version that fixes the ++nextFrame issue (nice catch!), and resets the timings if the frame is more than five minutes behind.

There is one other consideration with this patch, which is what images should do which take longer to initially load than they do to actually loop.  The current behavior, which matches the old behavior, is that the image only "tries to keep up" once it's fully loaded; before then, the early return in startAnimation() makes us ignore the frame load times.

I could change this to make the reference point for animation start be the very first draw of the image.  The effect would be that pages which want to sync images and other resources would get "better" behavior, but users would see a strange effect where the image would load, and then jump partway into its next animation cycle.  I think this is probably too weird and the current behavior is better, but comments welcome.
Comment 9 Darin Adler 2008-06-23 13:26:41 PDT
Comment on attachment 21886 [details]
patch v4

+    double currentDuration = frameDurationAtIndex(m_currentFrame);
+    const double time = currentTime();

It's OK to use const to emphasize that a local variable won't be changed, but we rarely do that in WebKit code. It's a little strange to do it for "time" and not for "currentDuration" since neither is intended to change.

I suggest just leaving it out.

r=me as-is, although I'd take out that const if I was landing the patch
Comment 10 Darin Adler 2008-06-23 13:28:26 PDT
(In reply to comment #8)
> There is one other consideration with this patch, which is what images should
> do which take longer to initially load than they do to actually loop.  The
> current behavior, which matches the old behavior, is that the image only "tries
> to keep up" once it's fully loaded; before then, the early return in
> startAnimation() makes us ignore the frame load times.
> 
> I could change this to make the reference point for animation start be the very
> first draw of the image.  The effect would be that pages which want to sync
> images and other resources would get "better" behavior, but users would see a
> strange effect where the image would load, and then jump partway into its next
> animation cycle.  I think this is probably too weird and the current behavior
> is better, but comments welcome.

I'd like to know what Hyatt thinks.

For compatibility, one question is what other browsers do.

Since you can detect load time for an image due to the load event, I think the behavior in the patch is good.

For the future, we need to come up with a way to synchronize:

    - animated GIFs
    - SVG animation
    - CSS animation
    - <video>
    - <audio>

Some good default behavior perhaps, along with some new features to design synchronization points.
Comment 11 Dave Hyatt 2008-06-23 13:45:58 PDT
Comment on attachment 21886 [details]
patch v4

+        // If we're more than five minutes behind, the user probably doesn't
+        // care about resyncing and we could burn a lot of time looping through
+        // frames below.  Just reset the timings.
+        if ((time - m_desiredFrameStartTime) > (5 * 60))

Could you make the (5 * 60) into a defined constant (maybe put it up with some of the other constants), e.g., cAnimationResyncThreshold.

I like the move of startAnimation().  That makes draw() consistent with drawTiled().
Comment 12 Peter Kasting 2008-06-23 22:23:56 PDT
(In reply to comment #9)
> (From update of attachment 21886 [details] [edit])
> +    double currentDuration = frameDurationAtIndex(m_currentFrame);
> +    const double time = currentTime();
> 
> It's OK to use const to emphasize that a local variable won't be changed, but
> we rarely do that in WebKit code. It's a little strange to do it for "time" and
> not for "currentDuration" since neither is intended to change.

Whoops.  My personal style is normally to use it all the time, but I also missed it on nextFrameStartTime.  Both of these are artifacts of my rewriting this code about 6 different ways in trying to make this most recent patch.  I'll make them all consistent in some way.

(In reply to comment #10)
> (In reply to comment #8)
> > There is one other consideration with this patch, which is what images should
> > do which take longer to initially load than they do to actually loop.
>
> For compatibility, one question is what other browsers do.

Other browsers follow the current behavior: animations will always restart at the beginning once fully loaded, not jump randomly into the middle of the animation.

Another downside of the "sync to the original load start, not once loading finishes" plan is that for animations with a finite loop count, users will potentially no longer see the animation loop the specified number of times.

> Since you can detect load time for an image due to the load event, I think the
> behavior in the patch is good.

Yes, it's a little harder for a naive web coder to use for synchronizing, but actually _easier_ for a good web programmer.  I think.

(In reply to comment #11)
> Could you make the (5 * 60) into a defined constant (maybe put it up with some
> of the other constants), e.g., cAnimationResyncThreshold.

Sure, will do.

> I like the move of startAnimation().  That makes draw() consistent with
> drawTiled().

Sorry, but what does "the move of" mean in this sentence?
Comment 13 Mark Rowe (bdash) 2008-07-26 22:54:26 PDT
Peter, were you going to upload a new version of the patch to address the review comments?
Comment 14 Peter Kasting 2008-07-27 16:46:48 PDT
Yes, sorry; this got lost behind other stuff, I'll try and post a new version this week.  I think regarding the question of when the reference point for animation be, I should leave it like it is rather than making it be the very first draw of the image.  The effects of the other seem bad to me.

I never did figure out what "the move of startAnimation()" meant.
Comment 15 Peter Kasting 2008-08-05 17:16:17 PDT
Created attachment 22667 [details]
patch v5

This fixes the couple of nits above and un-rots the patch.

My checkout is having some strange problems right now and I haven't been able to compile + test this version.  The changes over the last version are very tiny, so I don't think there should be any problem, but I suggest anyone landing this smoketest the build first.
Comment 16 Eric Seidel (no email) 2008-08-22 02:05:45 PDT
Comment on attachment 22667 [details]
patch v5

This is really for hyatt or darin to review.
Comment 17 Eric Seidel (no email) 2008-08-27 18:09:06 PDT
Hyatt, Darin, ping?
Comment 18 Dave Hyatt 2008-08-27 18:29:21 PDT
Comment on attachment 22667 [details]
patch v5

r=me
Comment 19 Mark Rowe (bdash) 2008-09-03 16:34:49 PDT
Landed in r36069.
Comment 20 Peter Kasting 2008-09-22 15:01:08 PDT
This patch was reverted in r36781 after causing too many problems.

Here's the list Hyatt and I came up with on IRC of what we want to see:
(1) The original patch here
(2) Fix animations not starting on CG (the patch on bug 20745)
(3) Fix images falsely considered to be "animating" (the patch on bug 20945)
(4) Fix perf regression by making resetAnimation() also reset the frame start time, so we don't "catch up" after a reset
(5) Fix excess CPU usage during "catch up" by throwing away only framebuffers, not all frame metadata, when discarding frames for "large" images (was already something of a problem, this patch just made it more obvious)
(6) Figure out why, at least on CG, the "catch up" code can fail: image observers seem to be notified but draw() isn't called again.  Try reproducing by loading http://castle104.com/images/spinner.gif and resizing the window a bit.
(7) Don't advance animations until the next frame is complete (prevents flash/flicker; this was already an issue)
(8) Animations that are only referenced by the bfcache should be paused

Hyatt says he knows how to do (8) and that it doesn't have to be fixed before landing.  I don't know how to address (6) and hope for his help there.  For the other issues, I can file some separate followup bugs and write new patches on this bug as appropriate.
Comment 21 Peter Kasting 2008-09-22 15:01:42 PDT
Comment on attachment 22667 [details]
patch v5

Clearing r+ so no one lands this again
Comment 22 Peter Kasting 2008-09-24 15:30:47 PDT
Created attachment 23768 [details]
WIP patch, solves (1) - (4)

This patch rolls up the patches for problems (1) - (4) (from this bug, the other linked bugs, and my own comments on IRC).  I'll start tackling some of the other problems on the list.
Comment 23 Peter Kasting 2008-09-25 15:55:48 PDT
Created attachment 23826 [details]
WIP patch, solves (1) - (4), (6), (7)

This addresses (6) and (7).  The reason we weren't advancing when the catch-up code ran is because we'd clear the dirty rect we'd just set, and then no timer would be running and no rect would be dirty, so nothing would happen again.

While fixing this, I encountered another issue: on large images, such as http://img2.abload.de/img/almost_failedovs.gif , the decode time in the catch-up code can become significant, and cause us to continue to fall behind.  So my first attempt at a patch for (6) caused an infinite loop.  In this patch, we detect when our attempts to catch up have failed, and fall back to setting a zero-delay timeout, which is about as good as I can do.

This can't yet be checked in; (5) has to be solved, as with this patch as-is, CPU usage on big images like the one above is astronomical.  By the end of that GIF I'm pegging one of my cores and still can't keep up, as opposed to the old code, which animates while only using about 40% of one core, max.  In addition to implementing (5), I'd like to also not throw away the framebuffer for the current frame when we discard the other framebuffers; right now we throw it away and immediately redecode it.  Implementing both these ideas will hopefully result in minimal, if any, CPU hit from this patch.
Comment 24 Peter Kasting 2008-09-26 13:53:08 PDT
Created attachment 23857 [details]
patch for (1) - (7), v1

OK, this is ready for review.

The last WIP patch had some bugs that caused animations to erroneously skip frames and run too fast, oops.

With this patch, animated images use more CPU than on stock Safari 3.1 (though much less than in the previous WIP patch).  However, I think most of this extra CPU usage is due to animating at the correct speed, instead of noticeably slower.  For example, on http://img2.abload.de/img/almost_failedovs.gif on my machine, a loop now peaks (near the end) at about 33% CPU instead of 19%.  But the loop also runs in about 11.5 seconds versus 16.7 (not very precise hand timings, probably only accurate to about 0.3 seconds).  Thus we use about 3/2 the CPU to run in 2/3 the time.  So I don't think this patch is making things less efficient per frame than before, we're just running a much larger number of frames per second on images that take significant decoding time per-frame.

To check the accuracy versus IE (whose frame delay time limits we're imitating), try loading http://stevesfishingsupplies.com/AboutSteve/Turning_Lady.gif in IE and WebKit at roughly the same time, side-by-side, and watching over time for whether they stay in sync.  Without this patch, Safari falls behind fairly quickly, whereas with it the two stay pretty well locked together, including when we switch tabs for a while and switch back.

I think this can be checked in without solving (8), which I'm still hoping for hyatt to do separately.  Incidentally, I also think the CG GIF decoder could use dramatically less CPU-- Firefox 3 manages to animate http://img2.abload.de/img/almost_failedovs.gif using an order of magnitude less CPU than Safari.  But that's someone else's problem.
Comment 25 Eric Seidel (no email) 2008-09-30 17:31:07 PDT
Comment on attachment 23857 [details]
patch for (1) - (7), v1

Hyatt is the man you want.
Comment 26 Darin Adler 2008-10-12 16:48:21 PDT
Comment on attachment 21886 [details]
patch v4

Clear review on this old version of the patch so it won't show up in the "reviewed and needs commit" queue.
Comment 27 Stephanie Lewis 2008-10-13 18:49:50 PDT
Perf. results look fine
Comment 28 Dave Hyatt 2008-10-15 11:46:35 PDT
Comment on attachment 23857 [details]
patch for (1) - (7), v1

r=me
Comment 29 Peter Kasting 2008-10-15 13:40:34 PDT
Committed in r37612.
Comment 30 Simon Fraser (smfr) 2009-01-02 16:22:03 PST
Caused bug 23082?
Comment 31 Peter Kasting 2009-06-16 13:56:21 PDT
*** Bug 14976 has been marked as a duplicate of this bug. ***