Bug 109579

Summary: [CoordGfx] Speed up web process rendering with double buffering
Product: WebKit Reporter: Balazs Kelemen <kbalazs>
Component: Layout and RenderingAssignee: Balazs Kelemen <kbalazs>
Status: NEW ---    
Severity: Normal CC: allan.jensen, bruno.abinader, cmarcelo, dongseong.hwang, gtk-ews, jturcotte, kbalazs, laszlo.gombos, luiz, noam, ossy, ostap73, rafael.lobo, rgabor, webkit.review.bot, xan.lopez, zeno
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 108294, 113785    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
A benchmark based on http://media.24ways.org/2009/15/space.html
none
Benchmark based on the leaves demo
none
new WIP patch
none
Patch
none
Patch
none
Patch jturcotte: review-, gtk-ews: commit-queue-

Description Balazs Kelemen 2013-02-12 06:16:11 PST
The idea is simple: while the UI repaint, we can prerender the new frame and provide it immediately when it is requested.
It has the downside using twice as many memory so we should only do it in "slow" situations. Patch and benchmarks coming.
Comment 1 Balazs Kelemen 2013-02-12 06:47:22 PST
Created attachment 187859 [details]
Patch
Comment 2 Balazs Kelemen 2013-02-12 06:50:08 PST
Created attachment 187860 [details]
A benchmark based on http://media.24ways.org/2009/15/space.html
Comment 3 Balazs Kelemen 2013-02-12 06:50:56 PST
Created attachment 187861 [details]
Benchmark based on the leaves demo
Comment 4 Balazs Kelemen 2013-02-12 07:06:56 PST
In the benchmarks I measured the FPS with requestAnimationFrame. The first one is based http://media.24ways.org/2009/15/space.html, the second one is the leaves demo with 500 leaves.

Speedup with the patch (avg fps):
leaves: 12.73 -> 20.33
space: 19 -> 60

I could not measure speedup with canvas tests, neither with WebGL yet (I should investigate in WebGL a bit more).
Comment 5 Noam Rosenthal 2013-02-12 07:17:05 PST
Comment on attachment 187859 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=187859&action=review

This is a very promising optimization, but it needs some more work. Maybe we can do some of it in stages?
My main issue is that this relies too much on heuristics; Maybe we can create a more reliable algorithm that is capped by memory rather than triggered by slowness.

> Source/WebCore/ChangeLog:25
> +        suspendPainting on the client accordint to the active state.

according

> Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedGraphicsScene.cpp:599
> +    for (size_t i = 0; i < m_committableRenderTasks.size(); ++i)

This is not thread safe. m_committable*** can be accessed from syncRemoteContent and didRenderFrame.

> Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedGraphicsScene.cpp:676
> +    m_renderQueue.swap(m_committableRenderTasks);

If the web process happens to be quicker than the UI process, we'll lose messages here.

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:372
> +    unsigned elapsed = currentTimeInMilliSeconds() - m_performFlushStartTime;
> +    bool isSlow = elapsed > SlowFrameThresholdMS;
> +    bool shouldUpdate = !((!m_numSlowFrames && !isSlow) || ((m_numSlowFrames == NumberOfSlowFramesThreshold) && isSlow));
> +    if (shouldUpdate) {
> +        m_numSlowFrames += isSlow ? 1 : -1;
> +        ASSERT(m_numSlowFrames <= NumberOfSlowFramesThreshold);
> +        if (!m_useDoubleBuffering && m_numSlowFrames == NumberOfSlowFramesThreshold) {
> +            m_useDoubleBuffering = true;
> +            if (!m_layerFlushTimer.isActive())
> +                m_layerFlushTimer.startOneShot(0);
> +        } else if (m_useDoubleBuffering && !m_numSlowFrames) {
> +            m_useDoubleBuffering = false;
> +            if (canPurgeFrontAtlases)
> +                m_frontUpdateAtlases.clear();
> +            else
> +                m_shouldPurgeFrontAtlases = true;
> +        }
> +    }

Not sure if this is the right heuristic.
Maybe we should always allow multiple frames, but limit the amount of memory that can be used by atlases. Think of it like a buffered pipe where we block if we run out of buffer space. I'm afraid of too much heuristics here.
Comment 6 Balazs Kelemen 2013-02-12 09:07:22 PST
> Not sure if this is the right heuristic.
> Maybe we should always allow multiple frames, but limit the amount of memory that can be used by atlases. Think of it like a buffered pipe where we block if we run out of buffer space. I'm afraid of too much heuristics here.

Detecting low memory is a hard topic on it's own. It would be very nice if WebKit would have a good cross-platform solution for that. I'm going see what can I do.
Comment 7 Noam Rosenthal 2013-02-12 09:09:02 PST
(In reply to comment #6)
> > Not sure if this is the right heuristic.
> > Maybe we should always allow multiple frames, but limit the amount of memory that can be used by atlases. Think of it like a buffered pipe where we block if we run out of buffer space. I'm afraid of too much heuristics here.
> 
> Detecting low memory is a hard topic on it's own. It would be very nice if WebKit would have a good cross-platform solution for that. I'm going see what can I do.

I don't mean detecting low memory - I mean explicitly capping the memory used by atlases.
Comment 8 Balazs Kelemen 2013-02-14 05:34:21 PST
I see another problem now. If we do flush in back buffers while the ui is painting, and than we invalidate it (set m_didRenderFrameInBackBuffer to false) because another source triggers scheduleLayerFlush, than we will have the operations of the invalidated flush in the queue (messages currently). If I'm correct neither throwing away them nor keeping them as is is good, instead there should be a way to merge them with the latest queue items (for example if there are 2 updateTile operation for the same tile those should be merged to 1). Does it sounds correct?

Anyway, this patch should be rebased upon bug 108294 so I think I suspend working on it now.
Comment 9 Dongseong Hwang 2013-02-20 18:37:03 PST
Parallelizing Web Process updating and UI Process commiting is very good.

I think it is time to separate UpdateAtlas code from CoordinatedLayerTreeHost :)
Comment 10 Balazs Kelemen 2013-03-19 06:52:57 PDT
Created attachment 193815 [details]
new WIP patch

This has some bug but I struggle with finding out what. Simple pages are ok but the animation in the first attachment shows bad updates (there are frames where some content appears on an area where it shouldn't). Seems to be somehow the mapping from surface areas to textures and/or textures to the screen is broken.
Comment 11 Balazs Kelemen 2013-03-22 09:21:21 PDT
Created attachment 194567 [details]
Patch
Comment 12 WebKit Review Bot 2013-03-22 09:23:56 PDT
Attachment 194567 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.cpp', u'Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h', u'Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedBackingStore.cpp', u'Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedBackingStore.h', u'Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedGraphicsLayer.cpp', u'Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedGraphicsLayer.h', u'Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedGraphicsScene.cpp', u'Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedGraphicsState.h', u'Source/WebCore/platform/graphics/texmap/coordinated/UpdateAtlas.cpp', u'Source/WebCore/platform/graphics/texmap/coordinated/UpdateAtlas.h', u'Source/WebKit2/ChangeLog', u'Source/WebKit2/Platform/CoreIPC/ArgumentCoders.h', u'Source/WebKit2/Shared/CoordinatedGraphics/CoordinatedGraphicsArgumentCoders.cpp', u'Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp', u'Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.h']" exit_code: 1
Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h:126:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 16 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 Balazs Kelemen 2013-03-22 09:26:41 PDT
> Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h:126:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
> Total errors found: 1 in 16 files
> 
> 
> If any of these errors are false positives, please file a bug against check-webkit-style.

Seems like check-webkit-style makes it mandatory to give a name for the enum type (i.e. enum {...} m_myEnumMember is not valid). I think it's a bug so I will file a bug against it.
Comment 14 Balazs Kelemen 2013-03-22 09:30:11 PDT
Here are some performance measurement results with the patch. It still has a notable win on the tests attached here:
space animation: 19.6 to 25.5 fps
leaves: 8.75 to 14.75 fps

Additionally I found a WebGL test (just this one) where it is a win. It is http://airtightinteractive.com/demos/js/cubes/, and it moves the fps counter from 30 to 60 (the counter built in the page, it is also based on rAF).
Comment 15 Noam Rosenthal 2013-03-25 03:10:10 PDT
Comment on attachment 194567 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=194567&action=review

I dont like the changes from vector of pairs to HashMap. In general I don't think we should change CoreIPC to deal with hashsets/maps, those don't make sense as serialized values in this context, and it's generally an orthogonal issue to what you're trying to solve.
I think it would be good to clean this up a bit where we can see the actual double buffering change and not all the little peripheral changes this introduces...

> Source/WebCore/platform/graphics/texmap/coordinated/CoordinatedGraphicsLayer.cpp:942
> +    if (!addResult.isNewEntry) {
> +        // Remove old updates if imcompatible with or covered by the new one.

Early return
Comment 16 Balazs Kelemen 2013-03-25 04:24:57 PDT
(In reply to comment #15)
> (From update of attachment 194567 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=194567&action=review
> 
> I dont like the changes from vector of pairs to HashMap. In general I don't think we should change CoreIPC to deal with hashsets/maps, those don't make sense as serialized values in this context, and it's generally an orthogonal issue to what you're trying to solve.
> I think it would be good to clean this up a bit where we can see the actual double buffering change and not all the little peripheral changes this introduces...

Ok. The maps are only needed in the web process, so we can still serialize them as vectors.
Comment 17 Allan Sandfeld Jensen 2013-03-25 05:48:45 PDT
Comment on attachment 194567 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=194567&action=review

> Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h:72
> +    // It makes possible to devide the area into back and front buffers.

divide

> Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h:115
> +    NodeList m_markedNodes;

Wouldn't it be more efficient to actually mark the nodes instead of just putting them in a list?
Comment 18 Balazs Kelemen 2013-03-25 07:18:11 PDT
(In reply to comment #17)
> (From update of attachment 194567 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=194567&action=review
> 
> > Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h:72
> > +    // It makes possible to devide the area into back and front buffers.
> 
> divide
> 
> > Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h:115
> > +    NodeList m_markedNodes;
> 
> Wouldn't it be more efficient to actually mark the nodes instead of just putting them in a list?

The problem with that is that we cannot modify the tree while traversing it, so in clearMarkedAllocations we would still need to create a local list.
Comment 19 Jocelyn Turcotte 2013-03-25 07:33:38 PDT
Comment on attachment 194567 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=194567&action=review

Overall the direction is good, but this would need to be a proper pipeline instead of just a flag for the web process meaning "yo, I have some memory to waste, please go ahead and render stuff and maybe it will get displayed in a frame of its own, who knows!". In the end, if the UI process is double buffered, so should be the web process, and each update should make its way on the pipeline without being merged anywhere.

> Source/WebCore/ChangeLog:41
> +        Now we can have multiple tile updates for a given tile. Copy all of
> +        them to the texture in order.

If those updates are intersecting (which is the case most of the time when an element is animated), it feels to me that this will just waste memory and CPU power on rendering tile states that the user will never see. This can't happen.

> Source/WebCore/platform/graphics/texmap/coordinated/UpdateAtlas.cpp:56
> +    m_hasCommittedOnce = true;

This is useless unless you would set it to false somewhere, no?

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:88
> +    , m_useDoubleBuffering(true) // FIXME: Currently it will stay enabled forerev. We should disable it on low memory situations.

forerev -> forever

I think that this is a very good improvement for 2D canvas, especially with requestAnimationFrame.
But why would I want this when loading a page or basic page layout, which is what >90% of the rendered frames will be about?
Why not trigger this only when the page requests an animation frame?

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:155
> +    // Invalidate prerendered frame. requestAnimationFrame callbacks should be synchronised
> +    // to the UI frame rate so we won't call them until we hear from the UI process.
> +    if (m_didRenderFrameInBackBuffers)
> +        m_shouldServiceScriptedAnimations = false;

I'm not sure what this is intended to do, but this is wrong.
This breaks the frame rendering pipeline, serviceScriptedAnimations needs to be called before every update or else you will end up with skipped rAF frames when a timer or input events are also triggering updates on the page.

You also need to know the time at which the frame will make it to the screen instead of using currentTime(), this is where double buffering is really beneficial.
This logic is also very prone to async issues, where this flag will overwrite a renderNextFrame() that just arrived, either now or as the code evolves.

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:780
> +    m_didRenderFrameInBackBuffers = false;

This "m_didRenderFrameInBackBuffers = false" logic is too brittle, there is already too many of them everywhere in the code, and more of them will probably come. This WILL break.

Maybe having an int with the number of pending frames would improve things, but it still sounds quite crappy to me. At least it couldn't be overwritten as easily, possibly losing a frame.

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.h:202
> +    bool m_didRenderFrameInBackBuffers;

Flags should be named according to their effects (the code path that they will provoke), and not to what triggered them.

This is even more confusing since m_waitingForUIProcess is named according to what we wait for while m_didRenderFrameInBackBuffers is the other way around.
How about m_waitingToSwapPrerenderedFrame, or something like this?
Comment 20 Balazs Kelemen 2013-04-01 07:01:58 PDT
(In reply to comment #19)
> (From update of attachment 194567 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=194567&action=review

Thanks for the detailed review!

> 
> Overall the direction is good, but this would need to be a proper pipeline instead of just a flag for the web process meaning "yo, I have some memory to waste, please go ahead and render stuff and maybe it will get displayed in a frame of its own, who knows!". In the end, if the UI process is double buffered, so should be the web process, and each update should make its way on the pipeline without being merged anywhere.

I'm not sure what you mean here by "if the UI process is double buffered, so should the web process".

> 
> > Source/WebCore/ChangeLog:41
> > +        Now we can have multiple tile updates for a given tile. Copy all of
> > +        them to the texture in order.
> 
> If those updates are intersecting (which is the case most of the time when an element is animated), it feels to me that this will just waste memory and CPU power on rendering tile states that the user will never see. This can't happen.

Now I agree it was a bad decision to allow multiple flushes for a given frame.

> 
> > Source/WebCore/platform/graphics/texmap/coordinated/UpdateAtlas.cpp:56
> > +    m_hasCommittedOnce = true;
> 
> This is useless unless you would set it to false somewhere, no?
> 

Yep, I forgot to set it false in the constructor.
(Your other comments should not be appropriate for a follow-up patch, as I plan to do it, so I didn't repeat them).

In summary, the key question is that when do we allow to render a frame ahead of time. In the first patch I used to schedule it at the end of rendering, in the second I was thinking that allowing it on every explicit request (every scheduleLayerFlush call) is better because we will have the most up-to-date result at the end. Now I think that the best would be to allow prerendering a frame only once, and only if there is a request after flushPendingLayerChanges and before renderNextFrame. This way we automatically get a good heuristic about whether can it be useful.
Comment 21 Balazs Kelemen 2013-04-01 07:34:33 PDT
Sorry but your naming logic seems strange to me.

> > +    bool m_didRenderFrameInBackBuffers;
> 
> Flags should be named according to their effects (the code path that they will provoke), and not to what triggered them.

Well, it provokes the code path of calling commitFrame in renderNextFrame. I did  not suppose to name it after what triggered it, I think it explains well how it extends the state machine.

> 
> This is even more confusing since m_waitingForUIProcess is named according to what we wait for while m_didRenderFrameInBackBuffers is the other way around.
> How about m_waitingToSwapPrerenderedFrame, or something like this?

waitingToSwapPrerenderedFrame tells me that there are two sources of waiting, while it is still the ui process we are waiting for.
I can think about hasPrerenderedFrame or didRenderFrameAheadOfTime, does either sound better to you than the original?
Comment 22 Balazs Kelemen 2013-04-01 13:17:40 PDT
Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:780
> > +    m_didRenderFrameInBackBuffers = false;
> 
> This "m_didRenderFrameInBackBuffers = false" logic is too brittle, there is already too many of them everywhere in the code, and more of them will probably come. This WILL break.
> 
> Maybe having an int with the number of pending frames would improve things, but it still sounds quite crappy to me. At least it couldn't be overwritten as easily, possibly losing a frame.
 
Noam was also suggesting that but indeed I think it just makes it more confused. See: https://gist.github.com/balazs/5287393
Comment 23 Balazs Kelemen 2013-04-01 13:26:07 PDT
Created attachment 196015 [details]
Patch
Comment 24 WebKit Review Bot 2013-04-01 13:30:36 PDT
Attachment 196015 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.cpp', u'Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h', u'Source/WebCore/platform/graphics/texmap/coordinated/UpdateAtlas.cpp', u'Source/WebCore/platform/graphics/texmap/coordinated/UpdateAtlas.h', u'Source/WebKit2/ChangeLog', u'Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp', u'Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.h']" exit_code: 1
Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h:126:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 25 Noam Rosenthal 2013-04-01 14:34:51 PDT
Comment on attachment 196015 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=196015&action=review

> Source/WebCore/ChangeLog:9
> +        This is an optimization that has the most value for animations that are heavy in layout/repaint.

I would say for JS-driven animations.

> Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.cpp:356
> +        if (current->largestFree.isEmpty() && !left && !right)
> +            m_markedNodes.append(current);
> +        else {

continue;
(reduce indentations)

> Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.cpp:372
> +        Node* n = m_markedNodes[i];

n -> node

> Source/WebCore/platform/graphics/texmap/coordinated/AreaAllocator.h:72
> +    // It makes possible to devide the area into back and front buffers.

devide -> divide

> Source/WebCore/platform/graphics/texmap/coordinated/UpdateAtlas.cpp:65
> +    // Do not release allocations if this is a new atlas that has never been committed yet.
> +    if (!m_hasCommittedOnce)
> +        return;

Seems like this should be m_hasMarkedAllocations, and this boolean should be set to true in every cycle we mark anything in that atlas. Otherwise we'd end up going through the atlas unmarking allocations for every frame except the very first one.

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:320
> +        m_hasPrerenderedFrame = m_shouldSyncFrame;

What's the difference between those 2 booleans? Seems like one of them is enough.

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:338
> +    m_state.contentsSize = roundedIntSize(m_nonCompositedContentLayer->size());
> +    m_state.coveredRect = toCoordinatedGraphicsLayer(m_nonCompositedContentLayer.get())->coverRect();
> +    m_state.scrollPosition = m_visibleContentsRect.location();

You have to make sure here that the two rendered frames have the same geometry parameters. Otherwise this would lead to interesting bugs.

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:510
> +    if (!m_waitingForUIProcess)
> +        lockAnimations();

lockAnimations() has to die... but that's a different issue.
Comment 26 Balazs Kelemen 2013-04-02 03:57:22 PDT
(In reply to comment #25)
> (From update of attachment 196015 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=196015&action=review

> 
> > Source/WebCore/platform/graphics/texmap/coordinated/UpdateAtlas.cpp:65
> > +    // Do not release allocations if this is a new atlas that has never been committed yet.
> > +    if (!m_hasCommittedOnce)
> > +        return;
> 
> Seems like this should be m_hasMarkedAllocations, and this boolean should be set to true in every cycle we mark anything in that atlas. Otherwise we'd end up going through the atlas unmarking allocations for every frame except the very first one.

Isn't that's what I did? See:

+void UpdateAtlas::didCommitBuffers()
 {
-    if (!m_areaAllocator) {
-        m_areaAllocator = adoptPtr(new GeneralAreaAllocator(size()));
-        m_areaAllocator->setMinimumAllocation(IntSize(32, 32));
-    }
+    m_hasCommittedOnce = true;
+    m_areaAllocator->markAllocations();
 }

The invariant is that didSwapBuffers will be called after every didCommitBuffers, and didCommitBuffers will be called first. This bool just needed to not make it necessary for the coordinator to track which atlases are new (have been created for the current frame).

> 
> > Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:320
> > +        m_hasPrerenderedFrame = m_shouldSyncFrame;
> 
> What's the difference between those 2 booleans? Seems like one of them is enough.

Yes, it seems like now they are equivalent.

> 
> > Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:338
> > +    m_state.contentsSize = roundedIntSize(m_nonCompositedContentLayer->size());
> > +    m_state.coveredRect = toCoordinatedGraphicsLayer(m_nonCompositedContentLayer.get())->coverRect();
> > +    m_state.scrollPosition = m_visibleContentsRect.location();
> 
> You have to make sure here that the two rendered frames have the same geometry parameters. Otherwise this would lead to interesting bugs.
> 

Why? We still hold the result of only one flush in m_state before sending it.

> > Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:510
> > +    if (!m_waitingForUIProcess)
> > +        lockAnimations();
> 
> lockAnimations() has to die... but that's a different issue.

Yep, I just wanted to alter it's behavior in a way that somewhat makes sense as long as we have it.

Let's talk about your concerns on IRC (in case my answers were not satisfying).
Comment 27 Balazs Kelemen 2013-04-02 08:51:49 PDT
Created attachment 196150 [details]
Patch
Comment 28 kov's GTK+ EWS bot 2013-04-02 09:08:42 PDT
Comment on attachment 196150 [details]
Patch

Attachment 196150 [details] did not pass gtk-ews (gtk):
Output: http://webkit-commit-queue.appspot.com/results/17394100
Comment 29 Jocelyn Turcotte 2013-04-02 09:49:14 PDT
Comment on attachment 196015 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=196015&action=review

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:298
> +bool CoordinatedLayerTreeHost::shouldFlushLayerChanges() const
> +{
> +    // We allow one frame to be prerendered before the UI process ask for it.
> +    ASSERT(m_waitingForUIProcess || !m_hasPrerenderedFrame);
> +    return !m_hasPrerenderedFrame;
> +}
> +
>  bool CoordinatedLayerTreeHost::flushPendingLayerChanges()
>  {
> -    if (m_waitingForUIProcess)
> +    if (!shouldFlushLayerChanges())
>          return false;

I'm still worried that this will bring uneven timing since what you do is basically delaying all frames by 16/32ms. Our use of currentTime() in syncDisplayState() isn't correct anymore.
Make sure that you test smoothness and not FPS. If you have 60 unevenly timed frames it WILL look worse than 30 even frames per second.
Users don't care at all about FPS unless it's below 24-30 or if they see the number displayed somewhere :)

Test your changes with http://jsfiddle.net/XQpzU/4358/light/ and http://html5-benchmark.com/
If this changes doesn't make them smoother, it's a waste of resources. If you can test on non-desktop hardware it would also be very nice, this is where you'll need it most.

>>> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:320
>>> +        m_hasPrerenderedFrame = m_shouldSyncFrame;
>> 
>> What's the difference between those 2 booleans? Seems like one of them is enough.
> 
> Yes, it seems like now they are equivalent.

m_shouldSyncFrame knows if anything/tile has been modified during m_nonCompositedContentLayer->flushCompositingStateForThisLayerOnly() which "bool didSync" can't cover.
They aren't the same but I'm sure m_shouldSyncFrame could use a better name.
Comment 30 Jocelyn Turcotte 2013-04-02 10:12:22 PDT
(In reply to comment #20)
> I'm not sure what you mean here by "if the UI process is double buffered, so should the web process".

What I mean is that if you have a pipeline of rocks for frogs to cross the river like this:
o
~
o
~
o

You can have 1 frog crossing at a time. Now if you want two frogs, you can't just put two rocks on the last stage of the pipeline, or else you'll still have only one frog that can cross at a time (if like in our case, the bottle neck isn't just the last stage):
o o
~~
 o
~~
 o

You need double-buffered stages on all the pipeline:
o o
~~
o o
~~
o o

But after looking for a while, I'm realizing that what I'm saying doesn't make much sense. We use double buffering since the first frame is "locked" all the time while it's displayed to prevent tearing.
Like seen here: http://en.wikipedia.org/wiki/File:Comparison_double_triple_buffering.svg
This here would only make sense if we used tripple buffering in the UI process (two offscreen surfaces on all the pipeline, like in our river above).
So overall I'm still very skeptical about this change, especially realizing that.

> waitingToSwapPrerenderedFrame tells me that there are two sources of waiting, while it is still the ui process we are waiting for.
> I can think about hasPrerenderedFrame or didRenderFrameAheadOfTime, does either sound better to you than the original?

hasPrerenderedFrame sounds good, I think my logic was also flawed, we don't want to name the flag according to the code path either (like m_shouldSyncFrame does, it's still confusing) but probably name them more according to the state that they represent, and "hasPrerenderedFrame" fits well.
Comment 31 Jocelyn Turcotte 2013-04-02 10:20:52 PDT
Comment on attachment 196150 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=196150&action=review

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:291
> +    ASSERT(m_waitingForUIProcess || !m_shouldSyncFrame);
> +    return !m_shouldSyncFrame;

r- following the m_shouldSyncFrame vs. m_hasPrerenderedFrame comment before. They aren't the same (they were set to false in the same conditions, but not to true).
Comment 32 Balazs Kelemen 2013-04-03 02:39:42 PDT
> Test your changes with http://jsfiddle.net/XQpzU/4358/light/ and http://html5-benchmark.com/
> If this changes doesn't make them smoother, it's a waste of resources. If you can test on non-desktop hardware it would also be very nice, this is where you'll need it most.

Thanks for those links, I will test the patch with them, and hopefully somebody will be kind enough to test it on device as well. Also http://airtightinteractive.com/demos/js/cubes/ can be a good test (it's a requestAnimationFrame driven WebGL animation).

> 
> >>> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedLayerTreeHost.cpp:320
> >>> +        m_hasPrerenderedFrame = m_shouldSyncFrame;
> >> 
> >> What's the difference between those 2 booleans? Seems like one of them is enough.
> > 
> > Yes, it seems like now they are equivalent.
> 
> m_shouldSyncFrame knows if anything/tile has been modified during m_nonCompositedContentLayer->flushCompositingStateForThisLayerOnly() which "bool didSync" can't cover.
> They aren't the same but I'm sure m_shouldSyncFrame could use a better name.

Could you point out the case when they are not the same? m_shouldSyncFrame should only be set under flushPendingLayerChanges. I know about one exception but I already provided a patch to fix that, see bug 113785.
Comment 33 Balazs Kelemen 2013-04-03 04:12:17 PDT
> >  bool CoordinatedLayerTreeHost::flushPendingLayerChanges()
> >  {
> > -    if (m_waitingForUIProcess)
> > +    if (!shouldFlushLayerChanges())
> >          return false;
> 
> I'm still worried that this will bring uneven timing since what you do is basically delaying all frames by 16/32ms. Our use of currentTime() in syncDisplayState() isn't correct anymore.
> Make sure that you test smoothness and not FPS. If you have 60 unevenly timed frames it WILL look worse than 30 even frames per second.
> Users don't care at all about FPS unless it's below 24-30 or if they see the number displayed somewhere :)

You are right, it can cause uneven timing, and it is very observable. Fox example with http://airtightinteractive.com/demos/js/cubes/ the rotation is gets slower from time to time, and in http://jsfiddle.net/XQpzU/4358/light/ the space between the circles is varying.

I will look into how could I avoid that but it is becoming more and more clear that this is a tricky kind of optimization and it's hard to do right.
Comment 34 Jocelyn Turcotte 2013-04-03 04:56:09 PDT
(In reply to comment #32)
> Could you point out the case when they are not the same? m_shouldSyncFrame should only be set under flushPendingLayerChanges. I know about one exception but I already provided a patch to fix that, see bug 113785.

At the moment m_shouldSyncFrame can be set between when JS marks a part of the page dirty (create/delete layer) while m_hasPrerenderedFrame should only be set after layouting and tile rendering has been done. If you really fix m_shouldSyncFrame to be set within flushPendingLayerChanges this would end up being equivalent since you wouldn't go back to the event loop meanwhile, but in any case they would still have different purposes and meaning. I believe it would injure the maintainability of this file to mix them.

(In reply to comment #33)
> You are right, it can cause uneven timing, and it is very observable. Fox example with http://airtightinteractive.com/demos/js/cubes/ the rotation is gets slower from time to time, and in http://jsfiddle.net/XQpzU/4358/light/ the space between the circles is varying.
> 
> I will look into how could I avoid that but it is becoming more and more clear that this is a tricky kind of optimization and it's hard to do right.

From my point of view, the only way to achieve this is to have a reliable way of knowing at which time (i.e. NOT currentTime() for sure) will the frame that we are currently rendering going to be shown to the user on the window's visible frame buffer. This what the DisplayRefreshMonitor class is used for by Mac, I think.

Let's chat on IRC if you have ideas on how to fix this, but this patch is harmful and shouldn't be enabled if it just compromises smoothness for slugish FPS. As I said before, smoothness is much more important to users eyes than FPS.