RESOLVED FIXED 227059
rAF driven WebGL submits excessive amount of GPU work when frames are slow
https://bugs.webkit.org/show_bug.cgi?id=227059
Summary rAF driven WebGL submits excessive amount of GPU work when frames are slow
Kimmo Kinnunen
Reported 2021-06-16 01:39:29 PDT
rAF driven WebGL submits excessive amount of GPU work when frames are slow WebGL frame submit limit is defined by rAF. When the frames are slow, this may end up submitting more GPU work than the system can handle. The frame submission should be limited also by the amount of unfinished frames.
Attachments
Patch (15.43 KB, patch)
2021-06-16 02:17 PDT, Kimmo Kinnunen
no flags
Patch (15.43 KB, patch)
2021-06-16 03:43 PDT, Kimmo Kinnunen
no flags
Patch (14.44 KB, patch)
2021-06-16 05:01 PDT, Kimmo Kinnunen
no flags
Debug via EWS (15.43 KB, patch)
2021-06-17 07:01 PDT, Kimmo Kinnunen
no flags
Debug via EWS (17.25 KB, patch)
2021-06-17 12:07 PDT, Kimmo Kinnunen
no flags
Debug via EWS force gl (17.53 KB, patch)
2021-06-17 12:12 PDT, Kimmo Kinnunen
no flags
Patch (16.67 KB, patch)
2021-06-21 03:05 PDT, Kimmo Kinnunen
no flags
Patch (15.61 KB, patch)
2021-07-01 17:47 PDT, Ben Nham
no flags
Patch (15.61 KB, patch)
2021-07-01 17:50 PDT, Ben Nham
no flags
Patch (18.07 KB, patch)
2021-07-09 19:00 PDT, Kyle Piddington
no flags
Patch for landing (18.91 KB, patch)
2021-07-12 18:26 PDT, Kyle Piddington
no flags
Patch for landing (18.91 KB, patch)
2021-07-13 11:50 PDT, Kyle Piddington
no flags
Radar WebKit Bug Importer
Comment 1 2021-06-16 01:41:06 PDT
Kimmo Kinnunen
Comment 2 2021-06-16 02:17:38 PDT
Kimmo Kinnunen
Comment 3 2021-06-16 03:43:31 PDT
Kimmo Kinnunen
Comment 4 2021-06-16 05:01:37 PDT
EWS Watchlist
Comment 5 2021-06-16 05:02:31 PDT
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE
Kenneth Russell
Comment 6 2021-06-16 22:31:44 PDT
Comment on attachment 431537 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431537&action=review Were the most recent test failures here caused by Bug 227064? > Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:2851 > + if (ScopedGLFence fence = WTFMove(m_frameCompletionFences[oldestFrameCompletionFence])) { It's hard for me to trace the exact logic this uses. WTFMove calls the constructor taking ScopedGLFence&& for the left-hand side and uses std::exchange? And the test uses operator GLSync()? Would you consider defining operator bool() for ScopedGLFence?
Kimmo Kinnunen
Comment 7 2021-06-17 02:12:03 PDT
(In reply to Kenneth Russell from comment #6) > Comment on attachment 431537 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=431537&action=review > > Were the most recent test failures here caused by Bug 227064? Unfortunately the iOS legacy test failure is surprising, I need to investigate that. > > Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:2851 > > + if (ScopedGLFence fence = WTFMove(m_frameCompletionFences[oldestFrameCompletionFence])) { > > It's hard for me to trace the exact logic this uses. WTFMove calls the > constructor taking ScopedGLFence&& for the left-hand side and uses > std::exchange? And the test uses operator GLSync()? Yeah, correct in spirit. As far as I understand: 1. WTFMove casts the lvalue ref m_frameCompletionFences[oldestFrameCompletionFence] to rvalue ref. 2. ScopedGLFence fence is created with default constructor 3. Assignment operator assigns ScopeGLFence&& over the default-constructed fence and this std::exchanges the contents. > Would you consider > defining operator bool() for ScopedGLFence? Sure.
Kimmo Kinnunen
Comment 8 2021-06-17 07:01:00 PDT
Created attachment 431661 [details] Debug via EWS
Kimmo Kinnunen
Comment 9 2021-06-17 12:07:45 PDT
Created attachment 431697 [details] Debug via EWS
Kimmo Kinnunen
Comment 10 2021-06-17 12:12:40 PDT
Created attachment 431699 [details] Debug via EWS force gl
Kimmo Kinnunen
Comment 11 2021-06-21 03:05:53 PDT
Kimmo Kinnunen
Comment 12 2021-06-21 10:58:25 PDT
This would be ready for review. Using the sync objects on OpenGL part appears a bit risky since the Catalina iOS-bots seemed to fail with the symptom that GL fences timeout with success. Probably that means we cannot use the fences in the implementation when using OpenGL.
Kenneth Russell
Comment 13 2021-06-21 15:54:55 PDT
Comment on attachment 431842 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431842&action=review Looks great - unfortunate you had to debug the OpenGL driver issue but let's move forward. r+ > Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:65 > +static Seconds maxFrameDuration = 5_s; Can/should this be const, too?
EWS
Comment 14 2021-06-23 10:08:58 PDT
Committed r279172 (239069@main): <https://commits.webkit.org/239069@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 431842 [details].
Ben Nham
Comment 15 2021-07-01 17:47:44 PDT
Reopening to attach new patch.
Ben Nham
Comment 16 2021-07-01 17:47:46 PDT
Ben Nham
Comment 17 2021-07-01 17:50:10 PDT
Kenneth Russell
Comment 18 2021-07-01 17:52:25 PDT
The timing of the revert is unfortunate - can any more details be provided about where the failures are happening? The original patch author will be OOO for a while; is there any hope of getting this fixed and relanded in the short term?
Ben Nham
Comment 19 2021-07-01 17:57:26 PDT
It looks like some devices don't actually support GL_ARB_sync which causes crashes in context construction: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 com.apple.WebCore 0x00007fff3a07a21c WebCore::GraphicsContextGLOpenGL::reshapeDisplayBufferBacking() + 188 1 com.apple.WebCore 0x00007fff396892d3 WebCore::GraphicsContextGLOpenGL::reshapeFBOs(WebCore::IntSize const&) + 451 2 com.apple.WebCore 0x00007fff3968b353 WebCore::GraphicsContextGLOpenGL::reshape(int, int) + 547 3 com.apple.WebCore 0x00007fff3ab6c384 WebCore::WebGLRenderingContextBase::initializeNewContext() + 2324 4 com.apple.WebCore 0x00007fff3ab776d6 WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContextGLAttributes&, WebCore::GraphicsContextGLWebGLVersion) + 2934 5 com.apple.WebCore 0x00007fff3aa283aa WebCore::HTMLCanvasElement::createContextWebGL(WebCore::GraphicsContextGLWebGLVersion, WebCore::GraphicsContextGLAttributes&&) + 330 6 com.apple.WebCore 0x00007fff3aa27ea5 WebCore::HTMLCanvasElement::getContext(JSC::JSGlobalObject&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown, (JSC::ShouldStrongDestructorGrabLock)0>, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>&&) + 1013 7 com.apple.WebCore 0x00007fff39a6f80d WebCore::jsHTMLCanvasElementPrototypeFunction_getContext(JSC::JSGlobalObject*, JSC::CallFrame*) + 429 8 ??? 0x00004a2582e011d8 0 + 81524969968088 9 com.apple.JavaScriptCore 0x00007fff367f7157 llint_entry + 110528 The crashes are blocking several of our perf bots. The safest thing for now is probably to revert the patch to unblock our bots until Kimmo gets back to this.
Kenneth Russell
Comment 20 2021-07-01 18:27:56 PDT
Comment on attachment 432758 [details] Patch The GL_ARB_sync extension is emulated in ANGLE's Metal backend. Can you tell us what devices are affected, or what GPU families they have, so we can understand better why this emulation is failing?
Kyle Piddington
Comment 21 2021-07-01 18:32:52 PDT
(In reply to Kenneth Russell from comment #20) > Comment on attachment 432758 [details] > Patch > > The GL_ARB_sync extension is emulated in ANGLE's Metal backend. Can you tell > us what devices are affected, or what GPU families they have, so we can > understand better why this emulation is failing? From my understanding, the issue lies mostly on Nvidia Machines. There's a comment for events: // http://crbug.com/1136673 // Fence sync is flaky on Nvidia ANGLE_FEATURE_CONDITION((&mFeatures), hasEvents, isMetal2_1 && !isNVIDIA()); as events drive ARB_Sync, this isn't too unexpected. if (mFeatures.hasEvents.enabled) { // MTLSharedEvent is only available since Metal 2.1 outExtensions->fenceSync = true; outExtensions->waitSync = true; } We might be able to emulate these to not use GPUDriven events, but rather split the command buffer on sync points for systems that don't support events.
Ben Nham
Comment 22 2021-07-01 18:34:04 PDT
Over to Dean.
Kenneth Russell
Comment 23 2021-07-01 18:45:54 PDT
(In reply to Kyle Piddington from comment #21) > We might be able to emulate these to not use GPUDriven events, but rather > split the command buffer on sync points for systems that don't support > events. Great analysis Kyle - this sounds good. These don't have to be extremely accurate. Would it be feasible for you to try that and test it on your system by force-enabling that code path?
EWS
Comment 24 2021-07-01 23:01:59 PDT
Committed r279497 (239349@main): <https://commits.webkit.org/239349@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 432758 [details].
Ben Nham
Comment 25 2021-07-01 23:03:17 PDT
Reopening for the actual fix.
Kyle Piddington
Comment 26 2021-07-09 19:00:14 PDT
Radar WebKit Bug Importer
Comment 27 2021-07-12 13:43:17 PDT
Kenneth Russell
Comment 28 2021-07-12 14:02:21 PDT
Comment on attachment 433251 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=433251&action=review > Source/WebCore/ChangeLog:6 > + Reviewed by NOBODY (OOPS!). Can more description be copied from the reverted patch's ChangeLog, and also a description added of what changed in this version of the patch to fix the previous breakage?
Kenneth Russell
Comment 29 2021-07-12 14:02:46 PDT
Comment on attachment 433251 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=433251&action=review > Source/WebCore/ChangeLog:8 > + No new tests (OOPS!). Note also: this line will have to be removed for the CQ to land this.
Kyle Piddington
Comment 30 2021-07-12 18:26:08 PDT
Created attachment 433383 [details] Patch for landing
EWS
Comment 31 2021-07-12 19:24:58 PDT
commit-queue failed to commit attachment 433383 [details] to WebKit repository. To retry, please set cq+ flag again.
Kyle Piddington
Comment 32 2021-07-13 11:50:51 PDT
Created attachment 433437 [details] Patch for landing
EWS
Comment 33 2021-07-13 12:28:26 PDT
Committed r279883 (239635@main): <https://commits.webkit.org/239635@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 433437 [details].
Note You need to log in before you can comment on or make changes to this bug.