Bug 227059

Summary: rAF driven WebGL submits excessive amount of GPU work when frames are slow
Product: WebKit Reporter: Kimmo Kinnunen <kkinnunen>
Component: WebGLAssignee: Dean Jackson <dino>
Status: RESOLVED FIXED    
Severity: Normal CC: annulen, dino, ews-watchlist, gyuyoung.kim, jonlee, kbr, kkinnunen, kondapallykalyan, kpiddington, mmaxfield, nham, ryuan.choi, sabouhallawa, sergio, simon.fraser, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 227024    
Bug Blocks: 226841, 228012    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Debug via EWS
none
Debug via EWS
none
Debug via EWS force gl
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing
none
Patch for landing none

Description Kimmo Kinnunen 2021-06-16 01:39:29 PDT
rAF driven WebGL submits excessive amount of GPU work when frames are slow

WebGL frame submit limit is defined by rAF.

When the frames are slow, this may end up submitting more GPU work than the system can handle.

The frame submission should be limited also by the amount of unfinished frames.
Comment 1 Radar WebKit Bug Importer 2021-06-16 01:41:06 PDT
<rdar://problem/79385858>
Comment 2 Kimmo Kinnunen 2021-06-16 02:17:38 PDT
Created attachment 431528 [details]
Patch
Comment 3 Kimmo Kinnunen 2021-06-16 03:43:31 PDT
Created attachment 431532 [details]
Patch
Comment 4 Kimmo Kinnunen 2021-06-16 05:01:37 PDT
Created attachment 431537 [details]
Patch
Comment 5 EWS Watchlist 2021-06-16 05:02:31 PDT
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE
Comment 6 Kenneth Russell 2021-06-16 22:31:44 PDT
Comment on attachment 431537 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431537&action=review

Were the most recent test failures here caused by Bug 227064?

> Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:2851
> +    if (ScopedGLFence fence = WTFMove(m_frameCompletionFences[oldestFrameCompletionFence])) {

It's hard for me to trace the exact logic this uses. WTFMove calls the constructor taking ScopedGLFence&& for the left-hand side and uses std::exchange? And the test uses operator GLSync()? Would you consider defining operator bool() for ScopedGLFence?
Comment 7 Kimmo Kinnunen 2021-06-17 02:12:03 PDT
(In reply to Kenneth Russell from comment #6)
> Comment on attachment 431537 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=431537&action=review
> 
> Were the most recent test failures here caused by Bug 227064?

Unfortunately the iOS legacy test failure is surprising, I need to investigate that.

> > Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:2851
> > +    if (ScopedGLFence fence = WTFMove(m_frameCompletionFences[oldestFrameCompletionFence])) {
> 
> It's hard for me to trace the exact logic this uses. WTFMove calls the
> constructor taking ScopedGLFence&& for the left-hand side and uses
> std::exchange? And the test uses operator GLSync()? 

Yeah, correct in spirit. 
As far as I understand:
1. WTFMove casts the lvalue ref m_frameCompletionFences[oldestFrameCompletionFence] to rvalue ref.
2. ScopedGLFence fence is created with default constructor
3. Assignment operator assigns ScopeGLFence&& over the default-constructed fence and this std::exchanges the contents.



> Would you consider
> defining operator bool() for ScopedGLFence?

Sure.
Comment 8 Kimmo Kinnunen 2021-06-17 07:01:00 PDT
Created attachment 431661 [details]
Debug via EWS
Comment 9 Kimmo Kinnunen 2021-06-17 12:07:45 PDT
Created attachment 431697 [details]
Debug via EWS
Comment 10 Kimmo Kinnunen 2021-06-17 12:12:40 PDT
Created attachment 431699 [details]
Debug via EWS force gl
Comment 11 Kimmo Kinnunen 2021-06-21 03:05:53 PDT
Created attachment 431842 [details]
Patch
Comment 12 Kimmo Kinnunen 2021-06-21 10:58:25 PDT
This would be ready for review.
Using the sync objects on OpenGL part appears a bit risky since the Catalina iOS-bots seemed to fail with the symptom that GL fences timeout with success. Probably that means we cannot use the fences in the implementation when using OpenGL.
Comment 13 Kenneth Russell 2021-06-21 15:54:55 PDT
Comment on attachment 431842 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431842&action=review

Looks great - unfortunate you had to debug the OpenGL driver issue but let's move forward. r+

> Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:65
> +static Seconds maxFrameDuration = 5_s;

Can/should this be const, too?
Comment 14 EWS 2021-06-23 10:08:58 PDT
Committed r279172 (239069@main): <https://commits.webkit.org/239069@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 431842 [details].
Comment 15 Ben Nham 2021-07-01 17:47:44 PDT
Reopening to attach new patch.
Comment 16 Ben Nham 2021-07-01 17:47:46 PDT
Created attachment 432757 [details]
Patch
Comment 17 Ben Nham 2021-07-01 17:50:10 PDT
Created attachment 432758 [details]
Patch
Comment 18 Kenneth Russell 2021-07-01 17:52:25 PDT
The timing of the revert is unfortunate - can any more details be provided about where the failures are happening? The original patch author will be OOO for a while; is there any hope of getting this fixed and relanded in the short term?
Comment 19 Ben Nham 2021-07-01 17:57:26 PDT
It looks like some devices don't actually support GL_ARB_sync which causes crashes in context construction:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   com.apple.WebCore             	0x00007fff3a07a21c WebCore::GraphicsContextGLOpenGL::reshapeDisplayBufferBacking() + 188
1   com.apple.WebCore             	0x00007fff396892d3 WebCore::GraphicsContextGLOpenGL::reshapeFBOs(WebCore::IntSize const&) + 451
2   com.apple.WebCore             	0x00007fff3968b353 WebCore::GraphicsContextGLOpenGL::reshape(int, int) + 547
3   com.apple.WebCore             	0x00007fff3ab6c384 WebCore::WebGLRenderingContextBase::initializeNewContext() + 2324
4   com.apple.WebCore             	0x00007fff3ab776d6 WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContextGLAttributes&, WebCore::GraphicsContextGLWebGLVersion) + 2934
5   com.apple.WebCore             	0x00007fff3aa283aa WebCore::HTMLCanvasElement::createContextWebGL(WebCore::GraphicsContextGLWebGLVersion, WebCore::GraphicsContextGLAttributes&&) + 330
6   com.apple.WebCore             	0x00007fff3aa27ea5 WebCore::HTMLCanvasElement::getContext(JSC::JSGlobalObject&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown, (JSC::ShouldStrongDestructorGrabLock)0>, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>&&) + 1013
7   com.apple.WebCore             	0x00007fff39a6f80d WebCore::jsHTMLCanvasElementPrototypeFunction_getContext(JSC::JSGlobalObject*, JSC::CallFrame*) + 429
8   ???                           	0x00004a2582e011d8 0 + 81524969968088
9   com.apple.JavaScriptCore      	0x00007fff367f7157 llint_entry + 110528

The crashes are blocking several of our perf bots. The safest thing for now is probably to revert the patch to unblock our bots until Kimmo gets back to this.
Comment 20 Kenneth Russell 2021-07-01 18:27:56 PDT
Comment on attachment 432758 [details]
Patch

The GL_ARB_sync extension is emulated in ANGLE's Metal backend. Can you tell us what devices are affected, or what GPU families they have, so we can understand better why this emulation is failing?
Comment 21 Kyle Piddington 2021-07-01 18:32:52 PDT
(In reply to Kenneth Russell from comment #20)
> Comment on attachment 432758 [details]
> Patch
> 
> The GL_ARB_sync extension is emulated in ANGLE's Metal backend. Can you tell
> us what devices are affected, or what GPU families they have, so we can
> understand better why this emulation is failing?

From my understanding, the issue lies mostly on Nvidia Machines. There's a comment for events:
    // http://crbug.com/1136673
    // Fence sync is flaky on Nvidia
    ANGLE_FEATURE_CONDITION((&mFeatures), hasEvents, isMetal2_1 && !isNVIDIA());

as events drive ARB_Sync, this isn't too unexpected. 

  if (mFeatures.hasEvents.enabled)
    {
        // MTLSharedEvent is only available since Metal 2.1
        outExtensions->fenceSync = true;
        outExtensions->waitSync  = true;
    }

We might be able to emulate these to not use GPUDriven events, but rather split the command buffer on sync points for systems that don't support events.
Comment 22 Ben Nham 2021-07-01 18:34:04 PDT
Over to Dean.
Comment 23 Kenneth Russell 2021-07-01 18:45:54 PDT
(In reply to Kyle Piddington from comment #21)
> We might be able to emulate these to not use GPUDriven events, but rather
> split the command buffer on sync points for systems that don't support
> events.

Great analysis Kyle - this sounds good. These don't have to be extremely accurate. Would it be feasible for you to try that and test it on your system by force-enabling that code path?
Comment 24 EWS 2021-07-01 23:01:59 PDT
Committed r279497 (239349@main): <https://commits.webkit.org/239349@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 432758 [details].
Comment 25 Ben Nham 2021-07-01 23:03:17 PDT
Reopening for the actual fix.
Comment 26 Kyle Piddington 2021-07-09 19:00:14 PDT
Created attachment 433251 [details]
Patch
Comment 27 Radar WebKit Bug Importer 2021-07-12 13:43:17 PDT
<rdar://problem/80479483>
Comment 28 Kenneth Russell 2021-07-12 14:02:21 PDT
Comment on attachment 433251 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433251&action=review

> Source/WebCore/ChangeLog:6
> +        Reviewed by NOBODY (OOPS!).

Can more description be copied from the reverted patch's ChangeLog, and also a description added of what changed in this version of the patch to fix the previous breakage?
Comment 29 Kenneth Russell 2021-07-12 14:02:46 PDT
Comment on attachment 433251 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433251&action=review

> Source/WebCore/ChangeLog:8
> +        No new tests (OOPS!).

Note also: this line will have to be removed for the CQ to land this.
Comment 30 Kyle Piddington 2021-07-12 18:26:08 PDT
Created attachment 433383 [details]
Patch for landing
Comment 31 EWS 2021-07-12 19:24:58 PDT
commit-queue failed to commit attachment 433383 [details] to WebKit repository. To retry, please set cq+ flag again.
Comment 32 Kyle Piddington 2021-07-13 11:50:51 PDT
Created attachment 433437 [details]
Patch for landing
Comment 33 EWS 2021-07-13 12:28:26 PDT
Committed r279883 (239635@main): <https://commits.webkit.org/239635@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 433437 [details].