Bug 94721 - [Chromium] Scheduler is currently never processing a commit until it receives a vsync tick.
Summary: [Chromium] Scheduler is currently never processing a commit until it receives...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: David Reveman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-22 09:42 PDT by David Reveman
Modified: 2012-09-14 15:17 PDT (History)
9 users (show)

See Also:


Attachments
Patch (13.99 KB, patch)
2012-08-22 16:15 PDT, David Reveman
no flags Details | Formatted Diff | Diff
Patch (25.02 KB, patch)
2012-08-23 15:30 PDT, David Reveman
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from gce-cr-linux-08 (369.93 KB, application/zip)
2012-08-23 16:17 PDT, WebKit Review Bot
no flags Details
Patch (24.33 KB, patch)
2012-08-23 17:57 PDT, David Reveman
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from gce-cr-linux-03 (314.81 KB, application/zip)
2012-08-23 19:56 PDT, WebKit Review Bot
no flags Details
Patch (28.77 KB, patch)
2012-08-24 16:40 PDT, David Reveman
no flags Details | Formatted Diff | Diff
Patch (30.25 KB, patch)
2012-08-28 20:48 PDT, David Reveman
no flags Details | Formatted Diff | Diff
Patch (30.40 KB, patch)
2012-09-02 21:58 PDT, David Reveman
no flags Details | Formatted Diff | Diff
Patch (30.51 KB, patch)
2012-09-08 11:51 PDT, David Reveman
no flags Details | Formatted Diff | Diff
Patch (30.65 KB, patch)
2012-09-11 17:06 PDT, David Reveman
jamesr: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Reveman 2012-08-22 09:42:30 PDT
The scheduler will only get a resource update status from the scheduler client at vsync tick time. To avoid blocking the main thread for an unnecessary amount of time we might want to allow the client to notify the scheduler about resource update completion prior to vsync tick time.
Comment 1 Nat Duca 2012-08-22 11:35:07 PDT
Sounds interesting. Be careful to not allow more than one commit per frame. The waiting_for_First_draw part of the commit flow was done originally to avoid the case where a fast main thread would commit dozens of times per actual drawn frame.
Comment 2 Antoine Labour 2012-08-22 12:25:24 PDT
Should we cull the extra commit requests on the impl side instead? I.e. not post beginFrame more than once per frame? That way you don't need to block the main thread to achieve the result.
Comment 3 David Reveman 2012-08-22 16:11:05 PDT
(In reply to comment #2)
> Should we cull the extra commit requests on the impl side instead? I.e. not post beginFrame more than once per frame? That way you don't need to block the main thread to achieve the result.

I think that preventing the main thread from starting to produce a new frame until we get a vsync tick and commit the last frame might limit the throughput significantly. I'll investigate this possibility. For now I'll attach a patch that adds a updateMoreResourcesComplete callback to CCScheduler.
Comment 4 David Reveman 2012-08-22 16:15:09 PDT
Created attachment 160029 [details]
Patch

Work in progress
Comment 5 David Reveman 2012-08-23 15:30:40 PDT
Created attachment 160264 [details]
Patch
Comment 6 James Robinson 2012-08-23 15:53:36 PDT
Comment on attachment 160264 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=160264&action=review

> Source/WebCore/platform/graphics/chromium/cc/CCTextureUpdateController.cpp:147
> +    // Update is deferred: we post a 0-delay task.

Why is the update deferred (especially the first one)?
Comment 7 David Reveman 2012-08-23 16:01:25 PDT
(In reply to comment #6)
> (From update of attachment 160264 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=160264&action=review
> 
> > Source/WebCore/platform/graphics/chromium/cc/CCTextureUpdateController.cpp:147
> > +    // Update is deferred: we post a 0-delay task.
> 
> Why is the update deferred (especially the first one)?

This is to avoid re-entering CCScheduler::processScheduledActions as a result of calling client->updateMoreTextureComplete() when there's not enough time to perform any updates.
Comment 8 James Robinson 2012-08-23 16:03:58 PDT
Comment on attachment 160264 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=160264&action=review

>>> Source/WebCore/platform/graphics/chromium/cc/CCTextureUpdateController.cpp:147
>>> +    // Update is deferred: we post a 0-delay task.
>> 
>> Why is the update deferred (especially the first one)?
> 
> This is to avoid re-entering CCScheduler::processScheduledActions as a result of calling client->updateMoreTextureComplete() when there's not enough time to perform any updates.

Is that a problem?

If it is and it is the only problem, could you defer only the updateMoreTexturesCompleted() callback? I'd prefer not making all texture uploads sit and wait for another task on the runloop. Once the scheduler decides it's go time it should be go time
Comment 9 WebKit Review Bot 2012-08-23 16:17:46 PDT
Comment on attachment 160264 [details]
Patch

Attachment 160264 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13569796

New failing tests:
CCLayerTreeHostTestAtomicCommitWithPartialUpdate.runMultiThread
Comment 10 WebKit Review Bot 2012-08-23 16:17:49 PDT
Created attachment 160275 [details]
Archive of layout-test-results from gce-cr-linux-08

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: gce-cr-linux-08  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Comment 11 David Reveman 2012-08-23 16:29:04 PDT
(In reply to comment #8)
> (From update of attachment 160264 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=160264&action=review
> 
> >>> Source/WebCore/platform/graphics/chromium/cc/CCTextureUpdateController.cpp:147
> >>> +    // Update is deferred: we post a 0-delay task.
> >> 
> >> Why is the update deferred (especially the first one)?
> > 
> > This is to avoid re-entering CCScheduler::processScheduledActions as a result of calling client->updateMoreTextureComplete() when there's not enough time to perform any updates.
> 
> Is that a problem?
> 

Yea, the way we use m_updateMoreResourcesPending in CCScheduler doesn't support this. The updateMoreTextureComplete() call would just be ignored right now. I could move things around to allow us to re-enter processScheduledActions but it feels fragile. I'd prefer to have all updateMoreTextureComplete() callbacks be deferred.

> 
> If it is and it is the only problem, could you defer only the updateMoreTexturesCompleted() callback? I'd prefer not making all texture uploads sit and wait for another task on the runloop. Once the scheduler decides it's go time it should be go time

Ok, deferring only the updateMoreTexturesCompleted() callback sounds good.
Comment 12 David Reveman 2012-08-23 17:57:17 PDT
Created attachment 160299 [details]
Patch

Only defer updateMoreTexturesCompleted() callback
Comment 13 James Robinson 2012-08-23 18:18:06 PDT
Comment on attachment 160299 [details]
Patch

R=me
Comment 14 WebKit Review Bot 2012-08-23 19:56:04 PDT
Comment on attachment 160299 [details]
Patch

Attachment 160299 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13560940

New failing tests:
CCLayerTreeHostTestAtomicCommitWithPartialUpdate.runMultiThread
Comment 15 WebKit Review Bot 2012-08-23 19:56:07 PDT
Created attachment 160312 [details]
Archive of layout-test-results from gce-cr-linux-03

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: gce-cr-linux-03  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Comment 16 Nat Duca 2012-08-24 11:56:38 PDT
Comment on attachment 160299 [details]
Patch

LGTM ish :) Scared at complexity but I think this is a net improvement.
Comment 17 David Reveman 2012-08-24 16:40:58 PDT
Created attachment 160520 [details]
Patch

Remove CCSchedulerClient::hasMoreResourceUpdates and updateMoreResourcesComplete -> updateResourcesComplete. CCScheduler can now keep calling updateMoreResources(timeLimit) at whatever interval it likes. CCScheduler::updateResourcesComplete() is called when no resources are left.
Comment 18 James Robinson 2012-08-27 18:49:07 PDT
Comment on attachment 160520 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=160520&action=review

Could you please explain how the two bools on CCScheduler, the CCSchedulerClient::hasMoreResourcesUpdates() call, and the stateful bits of CCTextureUpdateController are supposed to interact?  I'm trying to follow all of the state changes and it's really difficult for me to grok who is supposed to be in charge of what.  From what I've got to so far it seems like this would be simpler if the scheduler just asked its client if it had some updates to process, but given all the indirection I'm not sure if that is what is supposed to go on here.

Ideally CCSchedulerStateMachine would know about all state that's relevant to scheduling and CCScheduler would drive whatever updates needed to happen based off of that.  I think the fact that CCTextureUpdateController (and CCThreadProxy) are somewhat stateful confuse this.

> Source/WebCore/platform/graphics/chromium/cc/CCScheduler.h:95
> +    void beginFrameComplete(bool hasResourceUpdates);

I'm not sure why this is a parameter instead of asking the client hasMoreResourcesUpdates() - what's the difference between the two?

> Source/WebCore/platform/graphics/chromium/cc/CCScheduler.h:124
> +    bool m_hasMoreResourceUpdates;
>      bool m_updateMoreResourcesPending;

How do these two bits of state interact?
Comment 19 David Reveman 2012-08-27 20:10:57 PDT
(In reply to comment #18)
> (From update of attachment 160520 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=160520&action=review
> 
> Could you please explain how the two bools on CCScheduler, the CCSchedulerClient::hasMoreResourcesUpdates() call, and the stateful bits of CCTextureUpdateController are supposed to interact?  I'm trying to follow all of the state changes and it's really difficult for me to grok who is supposed to be in charge of what.  From what I've got to so far it seems like this would be simpler if the scheduler just asked its client if it had some updates to process, but given all the indirection I'm not sure if that is what is supposed to go on here.

CCScheduler::updateResourcesComplete() is called when all resource updates are done. Having both the hasMoreResourcesUpdates() client function and updateResourcesComplete() is confusing and bad. Especially when hasMoreResourcesUpdates() can return false before updateResourcesComplete() is called.

> 
> Ideally CCSchedulerStateMachine would know about all state that's relevant to scheduling and CCScheduler would drive whatever updates needed to happen based off of that.  I think the fact that CCTextureUpdateController (and CCThreadProxy) are somewhat stateful confuse this.

I initially had the scheduler and CCSchedulerStateMachine aware of all the texture upload timing but we decided to keep all timing and logic related to texture uploads in CCTextureUpdateController, which I still think is a good.

> 
> > Source/WebCore/platform/graphics/chromium/cc/CCScheduler.h:95
> > +    void beginFrameComplete(bool hasResourceUpdates);
> 
> I'm not sure why this is a parameter instead of asking the client hasMoreResourcesUpdates() - what's the difference between the two?

hasMoreResourcesUpdates() is gone. hasResourceUpdates argument passed to beginFrameComplete determines if we will receive a updateResourcesComplete() call or not.

> 
> > Source/WebCore/platform/graphics/chromium/cc/CCScheduler.h:124
> > +    bool m_hasMoreResourceUpdates;
> >      bool m_updateMoreResourcesPending;
> 
> How do these two bits of state interact?

m_hasMoreResourceUpdates is initialized using the hasResourceUpdates argument passed to beginFrameComplete and will be set to false when updateResourcesComplete() is called. This will only change once per commit.

m_updateMoreResourcesPending is used to properly interact with CCSchedulerStateMachine which can transition in and out of COMMIT_STATE_UPDATING_RESOURCES state multiple times during a commit. I think we can make some simple adjustments to CCSchedulerStateMachine that allows us to eliminate the m_updateMoreResourcesPending state from CCScheduler but I don't want to do that as part of this patch.
Comment 20 James Robinson 2012-08-28 10:07:46 PDT
Comment on attachment 160520 [details]
Patch

Hmmm, OK.  Thanks for the explanation.  I'd be interested to see the bit on the scheduler moved into the state machine
Comment 21 David Reveman 2012-08-28 10:23:16 PDT
(In reply to comment #20)
> (From update of attachment 160520 [details])
> Hmmm, OK.  Thanks for the explanation.  I'd be interested to see the bit on the scheduler moved into the state machine

I'll take care of that asap.
Comment 22 WebKit Review Bot 2012-08-28 11:20:50 PDT
Comment on attachment 160520 [details]
Patch

Rejecting attachment 160520 [details] from commit-queue.

Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2

Last 500 characters of output:
.
1 out of 5 hunks FAILED -- saving rejects to file Source/WebCore/platform/graphics/chromium/cc/CCThreadProxy.cpp.rej
patching file Source/WebCore/platform/graphics/chromium/cc/CCThreadProxy.h
patching file Source/WebKit/chromium/tests/CCSchedulerTest.cpp
patching file Source/WebKit/chromium/tests/CCTextureUpdateControllerTest.cpp

Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', u'--force', u'--reviewer', u'James Robi..." exit_code: 1 cwd: /mnt/git/webkit-commit-queue/

Full output: http://queues.webkit.org/results/13659412
Comment 23 David Reveman 2012-08-28 20:48:53 PDT
Created attachment 161123 [details]
Patch

Rebase
Comment 24 WebKit Review Bot 2012-08-28 21:24:54 PDT
Comment on attachment 161123 [details]
Patch

Clearing flags on attachment: 161123

Committed r126956: <http://trac.webkit.org/changeset/126956>
Comment 25 WebKit Review Bot 2012-08-28 21:24:59 PDT
All reviewed patches have been landed.  Closing bug.
Comment 26 James Robinson 2012-08-29 18:58:29 PDT
Reverted r126956 for reason:

Breaks several unit tests

Committed r127079: <http://trac.webkit.org/changeset/127079>
Comment 27 David Reveman 2012-09-02 21:58:09 PDT
Created attachment 161859 [details]
Patch

Don't allow continuous invalidate to cause multiple commits per redraw
Comment 28 David Reveman 2012-09-02 22:04:56 PDT
Latest patch maintains the current commits/redraw behavior. However, that behavior might be broken and this is the bug tracking that issue: https://bugs.webkit.org/show_bug.cgi?id=95661
Comment 29 James Robinson 2012-09-04 19:59:40 PDT
Comment on attachment 161859 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=161859&action=review

I feel that we should probably fix the commit+draw ordering problem first and then make sure this patch doesn't break it.

At a more basic I don't really get what the vsync tick has to do with resource updates at all.  How are the two concepts related?  My mental model was that after getting a commit with some number of resource updates on the impl thread, we have to spend some time doing uploads until we're up to the last batch at which point we complete the commit + do the last batch of updates.  None of this should have anything to do with vsync, should it?  Vsync is just about when we draw.

> Source/WebCore/platform/graphics/chromium/cc/CCScheduler.cpp:146
> +        m_stateMachine.setCanBeginFrame(true);

This looks a bit odd - what's it for?
Comment 30 David Reveman 2012-09-08 11:51:09 PDT
Created attachment 162963 [details]
Patch

Rebase
Comment 31 Build Bot 2012-09-08 13:11:04 PDT
Comment on attachment 162963 [details]
Patch

Attachment 162963 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13788678
Comment 32 David Reveman 2012-09-08 15:14:59 PDT
(In reply to comment #29)
> (From update of attachment 161859 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=161859&action=review
> 
> I feel that we should probably fix the commit+draw ordering problem first and then make sure this patch doesn't break it.

Yes, I think that was the best thing to do.

> 
> At a more basic I don't really get what the vsync tick has to do with resource updates at all.  How are the two concepts related?  My mental model was that after getting a commit with some number of resource updates on the impl thread, we have to spend some time doing uploads until we're up to the last batch at which point we complete the commit + do the last batch of updates.  None of this should have anything to do with vsync, should it?  Vsync is just about when we draw.

We currently avoid starting a new batch of uploads just before vsync as that might prevent us from drawing at the right time. That's the only reason resource updates need to know about the vsync tick.

It might be worth checking what the results are from making resource updates completely unaware of vsync tick time as that would simplify things.

> 
> > Source/WebCore/platform/graphics/chromium/cc/CCScheduler.cpp:146
> > +        m_stateMachine.setCanBeginFrame(true);
> 
> This looks a bit odd - what's it for?

That was just to maintain the old broken commit+draw behavior which has now been fixed so this is no longer in the latest patch.
Comment 33 Nat Duca 2012-09-11 14:12:46 PDT
Comment on attachment 162963 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=162963&action=review

Yeah, I think its time to get rid of the updatingResources state in the state machine. In summary:
* I like the new controller client
* lets make the make the controller *always exist* and add setQueue/discardPending methods to it
* lets not tell the compositor that beginFrameComplete until the queue is drained

> Source/WebKit/chromium/ChangeLog:3
> +        [Chromium] Scheduler will never process a commit until it receives a vsync tick.

Can you clarify this bug title?

Its ambiguous whether you're describng the problem or the fix. E.g. "fix the compositor never processesing a commit until the vsync tick" versus "make the scheduler never process a commit until a vsync tick"

> Source/WebCore/platform/graphics/chromium/cc/CCThreadProxy.cpp:-329
> -    m_currentTextureUpdateControllerOnImplThread.clear();

why do you let the object live? you  seem to still be recreating it every time you beginFrameComplete

> Source/WebCore/platform/graphics/chromium/cc/CCThreadProxy.cpp:616
> +        m_currentTextureUpdateControllerOnImplThread = CCTextureUpdateController::create(this, CCProxy::implThread(), queue, m_layerTreeHostImpl->resourceProvider(), m_layerTreeHostImpl->renderer()->textureCopier(), m_layerTreeHostImpl->renderer()->textureUploader());

can we just have the controller always be alive, and then give it new queues or something? this "its sometimes recreated and sometimes just lying around" is super confusing

> Source/WebCore/platform/graphics/chromium/cc/CCThreadProxy.cpp:619
> +    m_schedulerOnImplThread->beginFrameComplete(hasResourceUpdates);

I'm wondering, would this patch be clearer if we just bite the bullet and lie to the scheduler about beginFrameComplete not happening UNTIL the resource updates are done?
Comment 34 James Robinson 2012-09-11 14:56:15 PDT
Comment on attachment 162963 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=162963&action=review

> Source/WebKit/chromium/tests/CCTextureUpdateControllerTest.cpp:579
> +static void runPendingTasks(FakeCCThread* thread, FakeCCTextureUpdateController* controller)

it's somewhat terrifying that you have to spin the event loop for this unit test. why? we don't support nesting of WebThread loops for a reason
Comment 35 David Reveman 2012-09-11 15:59:38 PDT
(In reply to comment #33)
> (From update of attachment 162963 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=162963&action=review
> 
> Yeah, I think its time to get rid of the updatingResources state in the state machine. In summary:
> * I like the new controller client
> * lets make the make the controller *always exist* and add setQueue/discardPending methods to it

what are the benefits of doing this? one of the initial reasons for having a limited lifetime for the controller was that it needs to hold pointers to TextureCopier/TextureUploader which will be invalid if the graphics context is recreated.

> * lets not tell the compositor that beginFrameComplete until the queue is drained

the scheduler currently drives texture uploads based on vsync tick. how do we avoid starting a large batch of texture uploads just before a vsync tick if we make the scheduler completely unaware of resource updates? adding a CCTextureUpdateControllerClient::nextVSyncTickTimeIfActivated function would solve part of the problem but we also need to make the controller start more uploads as soon as we've scheduled a draw.

> 
> > Source/WebKit/chromium/ChangeLog:3
> > +        [Chromium] Scheduler will never process a commit until it receives a vsync tick.
> 
> Can you clarify this bug title?
> 
> Its ambiguous whether you're describng the problem or the fix. E.g. "fix the compositor never processesing a commit until the vsync tick" versus "make the scheduler never process a commit until a vsync tick"

oh, I try to always describe a problem rather than a fix in the bug title. I'll change the title to make this clear.

> 
> > Source/WebCore/platform/graphics/chromium/cc/CCThreadProxy.cpp:-329
> > -    m_currentTextureUpdateControllerOnImplThread.clear();
> 
> why do you let the object live? you  seem to still be recreating it every time you beginFrameComplete

This is in CCThreadProxy::didLoseContextOnImplThread(). We're still always clearing the controller in CCThreadProxy::scheduledActionCommit(). Not sure if it's possible that didLoseContextOnImplThread() can be called as a result of a lost context callback triggered by performing texture updates. If that's possible, destroying the controller here would be really bad.

> 
> > Source/WebCore/platform/graphics/chromium/cc/CCThreadProxy.cpp:616
> > +        m_currentTextureUpdateControllerOnImplThread = CCTextureUpdateController::create(this, CCProxy::implThread(), queue, m_layerTreeHostImpl->resourceProvider(), m_layerTreeHostImpl->renderer()->textureCopier(), m_layerTreeHostImpl->renderer()->textureUploader());
> 
> can we just have the controller always be alive, and then give it new queues or something? this "its sometimes recreated and sometimes just lying around" is super confusing

we're creating the controller in CCThreadProxy::beginFrameCompleteOnImplThread if we have resource updates. the controller is ALWAYS destroyed in CCThreadProxy::scheduledActionCommit().

> 
> > Source/WebCore/platform/graphics/chromium/cc/CCThreadProxy.cpp:619
> > +    m_schedulerOnImplThread->beginFrameComplete(hasResourceUpdates);
> 
> I'm wondering, would this patch be clearer if we just bite the bullet and lie to the scheduler about beginFrameComplete not happening UNTIL the resource updates are done?

It might be a good idea to eventually move resource updates completely out of the scheduler but I think that's a very different patch that might not be as simple as it seems. I'd rather land this and continue to clean up the texture update system one small step at a time. The end might be that resource updates are completely detached from the scheduler but I'm worried about taking short cuts to get there.
Comment 36 James Robinson 2012-09-11 16:06:32 PDT
While I like incremental approaches generally, one problem with this code is that it's now so complicated and has so many bugs that it's really hard to tell if small changes in any particular direction are getting us somewhere or not.  We need to dramatically simplify this logic ASAPly.
Comment 37 James Robinson 2012-09-11 16:18:09 PDT
(In reply to comment #35)
> (In reply to comment #33)
> > (From update of attachment 162963 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=162963&action=review
> > 
> > Yeah, I think its time to get rid of the updatingResources state in the state machine. In summary:
> > * I like the new controller client
> > * lets make the make the controller *always exist* and add setQueue/discardPending methods to it
> 
> what are the benefits of doing this? one of the initial reasons for having a limited lifetime for the controller was that it needs to hold pointers to TextureCopier/TextureUploader which will be invalid if the graphics context is recreated.

You're deleting the controller on lost context explicitly, so how is that a problem?

> 
> > * lets not tell the compositor that beginFrameComplete until the queue is drained
> 
> the scheduler currently drives texture uploads based on vsync tick. how do we avoid starting a large batch of texture uploads just before a vsync tick if we make the scheduler completely unaware of resource updates? adding a CCTextureUpdateControllerClient::nextVSyncTickTimeIfActivated function would solve part of the problem but we also need to make the controller start more uploads as soon as we've scheduled a draw.

Texture uploads really shouldn't be driven off of vsync at all.  If the uploader is only putting sustainable batches of uploads in per "tick", then it shouldn't really matter when it does that relative to the vsync, should it?
Comment 38 David Reveman 2012-09-11 16:42:40 PDT
(In reply to comment #34)
> (From update of attachment 162963 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=162963&action=review
> 
> > Source/WebKit/chromium/tests/CCTextureUpdateControllerTest.cpp:579
> > +static void runPendingTasks(FakeCCThread* thread, FakeCCTextureUpdateController* controller)
> 
> it's somewhat terrifying that you have to spin the event loop for this unit test. why? we don't support nesting of WebThread loops for a reason

i did it this way to reduce the amount of code a bit and test the feature and the interface rather than the specific details of the implementation. I can remove the spin and still get the code reduction part. hard not to test the details of the implementation if I can't spin the event loop.
Comment 39 David Reveman 2012-09-11 17:06:33 PDT
Created attachment 163484 [details]
Patch

Avoid spinning the event loop in unit tests
Comment 40 David Reveman 2012-09-11 17:08:52 PDT
(In reply to comment #36)
> While I like incremental approaches generally, one problem with this code is that it's now so complicated and has so many bugs that it's really hard to tell if small changes in any particular direction are getting us somewhere or not.  We need to dramatically simplify this logic ASAPly.

I agree that it needs to be simplified asap. I think this patch together with https://bugs.webkit.org/show_bug.cgi?id=95290 makes some good progress.

Some of the bugs we're seeing are connected to the fact that resource updates were heavily controlled by the scheduler and we're removing some of that control which makes the scheduler behave differently. I'm not convinced that quickly removing all ties to resource updates from the scheduler is a better approach.
Comment 41 David Reveman 2012-09-11 17:42:56 PDT
(In reply to comment #37)
> (In reply to comment #35)
> > (In reply to comment #33)
> > > (From update of attachment 162963 [details] [details] [details])
> > > View in context: https://bugs.webkit.org/attachment.cgi?id=162963&action=review
> > > 
> > > Yeah, I think its time to get rid of the updatingResources state in the state machine. In summary:
> > > * I like the new controller client
> > > * lets make the make the controller *always exist* and add setQueue/discardPending methods to it
> > 
> > what are the benefits of doing this? one of the initial reasons for having a limited lifetime for the controller was that it needs to hold pointers to TextureCopier/TextureUploader which will be invalid if the graphics context is recreated.
> 
> You're deleting the controller on lost context explicitly, so how is that a problem?

It's not a problem. I just think that not having any instances that need to be recreated if the context is lost is better than having some. If there are benefits of keeping the controller alive than I'm not against doing so.

> 
> > 
> > > * lets not tell the compositor that beginFrameComplete until the queue is drained
> > 
> > the scheduler currently drives texture uploads based on vsync tick. how do we avoid starting a large batch of texture uploads just before a vsync tick if we make the scheduler completely unaware of resource updates? adding a CCTextureUpdateControllerClient::nextVSyncTickTimeIfActivated function would solve part of the problem but we also need to make the controller start more uploads as soon as we've scheduled a draw.
> 
> Texture uploads really shouldn't be driven off of vsync at all.  If the uploader is only putting sustainable batches of uploads in per "tick", then it shouldn't really matter when it does that relative to the vsync, should it?

I agree that it shouldn't be driven off of vsync like it is today but I think we still want these two things out of the texture update controller that relates to vsync ticks:

1. only perform a sustainable amount of uploads per tick just as you mentioned
2. start issuing more uploads as soon as we've passed the time of a possible vsync tick

does that make sense?
Comment 42 Nat Duca 2012-09-11 19:22:37 PDT
(In reply to comment #35)
> > * lets make the make the controller *always exist* and add setQueue/discardPending methods to it
> 
> what are the benefits of doing this? one of the initial reasons for having a limited lifetime for the controller was that it needs to hold pointers to TextureCopier/TextureUploader which will be invalid if the graphics context is recreated.

You added a discard method to the updater. And you pulled out the queue anyway. You have very little less that is specific to the instance. That is cleaner as a create-once idiom.

> > * lets not tell the compositor that beginFrameComplete until the queue is drained
> 
> the scheduler currently drives texture uploads based on vsync tick. how do we avoid starting a large batch of texture uploads just before a vsync tick if we make the scheduler completely unaware of resource updates?

Thats a fine solution.

Let the controller get at the update time via its client, and tell the controller that we just drew in scheduledActionDrawAndBlah()

> > why do you let the object live? you  seem to still be recreating it every time you beginFrameComplete
> 
> This is in CCThreadProxy::didLoseContextOnImplThread(). We're still always clearing the controller in CCThreadProxy::scheduledActionCommit(). Not sure if it's possible that didLoseContextOnImplThread() can be called as a result of a lost context callback triggered by performing texture updates. If that's possible, destroying the controller here would be really bad.

This underscores the value in having the controller always exist and just have mutators on it.

> we're creating the controller in CCThreadProxy::beginFrameCompleteOnImplThread if we have resource updates. the controller is ALWAYS destroyed in CCThreadProxy::scheduledActionCommit().


If you persist it, then you dont need to worry about the lifetime. You just set the queue in beginFrameComplete, and m_Scheduler->beginFrameComplete when the queue is empty. It becomes really simple and you never have to think about it being deleted or not.

> It might be a good idea to eventually move resource updates completely out of the scheduler but I think that's a very different patch that might not be as simple as it seems.

Why not? I really think its only a few minor edits from what you have? You could leave the updatingResources state in the ccschedulerstatemachine and just never tell it that hasMoreUPdates is false, for this patch!
Comment 43 David Reveman 2012-09-11 21:43:18 PDT
(In reply to comment #42)
> (In reply to comment #35)
> > > * lets make the make the controller *always exist* and add setQueue/discardPending methods to it
> > 
> > what are the benefits of doing this? one of the initial reasons for having a limited lifetime for the controller was that it needs to hold pointers to TextureCopier/TextureUploader which will be invalid if the graphics context is recreated.
> 
> You added a discard method to the updater. And you pulled out the queue anyway. You have very little less that is specific to the instance. That is cleaner as a create-once idiom.

I'd like to eventually remove the discard method. This was to avoid an issue with query objects not behaving correctly after lost context. It has been fixed since and it seems inconsistent to have special treatment of queries. However, I don't want include that change in this patch.

> 
> > > * lets not tell the compositor that beginFrameComplete until the queue is drained
> > 
> > the scheduler currently drives texture uploads based on vsync tick. how do we avoid starting a large batch of texture uploads just before a vsync tick if we make the scheduler completely unaware of resource updates?
> 
> Thats a fine solution.
> 
> Let the controller get at the update time via its client, and tell the controller that we just drew in scheduledActionDrawAndBlah()

What if we don't need to draw?

> 
> > > why do you let the object live? you  seem to still be recreating it every time you beginFrameComplete
> > 
> > This is in CCThreadProxy::didLoseContextOnImplThread(). We're still always clearing the controller in CCThreadProxy::scheduledActionCommit(). Not sure if it's possible that didLoseContextOnImplThread() can be called as a result of a lost context callback triggered by performing texture updates. If that's possible, destroying the controller here would be really bad.
> 
> This underscores the value in having the controller always exist and just have mutators on it.

I'm not sure this is worth much as I think discardUploads should be removed.

> 
> > we're creating the controller in CCThreadProxy::beginFrameCompleteOnImplThread if we have resource updates. the controller is ALWAYS destroyed in CCThreadProxy::scheduledActionCommit().
> 
> 
> If you persist it, then you dont need to worry about the lifetime. You just set the queue in beginFrameComplete, and m_Scheduler->beginFrameComplete when the queue is empty. It becomes really simple and you never have to think about it being deleted or not.

Except when the context is recreated, or how about shutdown? It might currently be safe to let the controller be destroyed when the proxy is destroyed but that's really a detail of the controller implementation and it would be inappropriate to assume it.

We already have state that has "beginFrameComplete to commit" lifetime. Including one more variable with this lifetime is really as simple as it gets.

> 
> > It might be a good idea to eventually move resource updates completely out of the scheduler but I think that's a very different patch that might not be as simple as it seems.
> 
> Why not? I really think its only a few minor edits from what you have? You could leave the updatingResources state in the ccschedulerstatemachine and just never tell it that hasMoreUPdates is false, for this patch!

Still not sure how to properly tell the controller that we've passed the time of a possible vsync tick if we don't turn the frame rate controller on.

Heh, don't try to trick me into thinking that I could get away with landing a change that removes resource updates from the scheduler but leaves all the unit tests and current scheduler and schedulerstatemachine logic in place. I'm not falling for that one :)
Comment 44 Nat Duca 2012-09-11 22:19:08 PDT
Hmm, you're right, of course. :) Your counterpoints are well made.

I would support landing this if we can resolve the scariness that jamesr saw in the unit test.
Comment 45 James Robinson 2012-09-11 22:42:29 PDT
I'm not sure I really understand why the texture upload logic needs to try to be careful around vsyncs, to be honest.  The idea of incremental texture uploading is that no single batch is heavy enough on the GPU side to cause us to miss an entire frame.  It should be perfectly fine to start a new batch right before a vsync if we know the previous upload has finished GPU-side.  It might delay the upcoming draw, but if the batch size is correct it won't make us miss an entire frame (and if the batch is so big to do that it doesn't matter when we do it).  If we do an upload then a draw and have a set of additional uploads to do the queued up GPU work will cause us to hold off on uploads in the next vsync interval since our queries won't be cleared yet but that's A-OK.  What am I missing?
Comment 46 Nat Duca 2012-09-11 23:00:22 PDT
Thats a good point, too. We're left-handed relative to vsync, meaning we actually draw at the start of the frame, rather than the end. That means that being late on the compositor-thread draw will actually not disrupt rendering.

The downside of this is that a lot of our perf instrumentation would read this as a performance regression. The fps meter, the beginFrame timestamp we pass to rAF, and the scrolling benchmark test all use the start time of the draw.

This latter stuff we could regress, I suppose, and start fixing later. But, OTOH, I also want to rewrite a large bit of the scheduler to rAF on vsync tick and then draw whenever bFAC+uploads complete. That'd change this bookeeping also, so I'm somewhat skeptical of doing too much in here when we know I'm going to overhaul it soon anyway (or annoy someone else until they do it)...

Net/net, I see good points on all sides and Ima stumped. Thanks @reveman for setting up a chat tomorrow morning. I think it'll help.
Comment 47 James Robinson 2012-09-12 17:09:27 PDT
Comment on attachment 163484 [details]
Patch

OK, well this is definitely an improvement and it sounds like we have a rough idea of the direction forward from here. R=me
Comment 48 Nat Duca 2012-09-12 17:14:22 PDT
Are we landing this here or cc-side?
Comment 49 James Robinson 2012-09-12 17:26:46 PDT
Whatever's easier for you, David. If you want to land here I can merge it down.
Comment 50 David Reveman 2012-09-13 08:18:15 PDT
Lets land on the cc side. Just need to land https://codereview.chromium.org/10914236/ first.
Comment 51 David Reveman 2012-09-14 15:17:53 PDT
Committed as: http://src.chromium.org/viewvc/chrome?view=rev&revision=156669