| Summary: | rAF driven WebGL submits excessive amount of GPU work when frames are slow | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Kimmo Kinnunen <kkinnunen> | ||||||||||||||||||||||||||
| Component: | WebGL | Assignee: | Dean Jackson <dino> | ||||||||||||||||||||||||||
| Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||
| Severity: | Normal | CC: | annulen, dino, ews-watchlist, gyuyoung.kim, jonlee, kbr, kkinnunen, kondapallykalyan, kpiddington, mmaxfield, nham, ryuan.choi, sabouhallawa, sergio, simon.fraser, webkit-bug-importer | ||||||||||||||||||||||||||
| Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||||||||||
| Version: | WebKit Nightly Build | ||||||||||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||||
| Bug Depends on: | 227024 | ||||||||||||||||||||||||||||
| Bug Blocks: | 226841, 228012 | ||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||
|
Description
Kimmo Kinnunen
2021-06-16 01:39:29 PDT
Created attachment 431528 [details]
Patch
Created attachment 431532 [details]
Patch
Created attachment 431537 [details]
Patch
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE Comment on attachment 431537 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431537&action=review Were the most recent test failures here caused by Bug 227064? > Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:2851 > + if (ScopedGLFence fence = WTFMove(m_frameCompletionFences[oldestFrameCompletionFence])) { It's hard for me to trace the exact logic this uses. WTFMove calls the constructor taking ScopedGLFence&& for the left-hand side and uses std::exchange? And the test uses operator GLSync()? Would you consider defining operator bool() for ScopedGLFence? (In reply to Kenneth Russell from comment #6) > Comment on attachment 431537 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=431537&action=review > > Were the most recent test failures here caused by Bug 227064? Unfortunately the iOS legacy test failure is surprising, I need to investigate that. > > Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:2851 > > + if (ScopedGLFence fence = WTFMove(m_frameCompletionFences[oldestFrameCompletionFence])) { > > It's hard for me to trace the exact logic this uses. WTFMove calls the > constructor taking ScopedGLFence&& for the left-hand side and uses > std::exchange? And the test uses operator GLSync()? Yeah, correct in spirit. As far as I understand: 1. WTFMove casts the lvalue ref m_frameCompletionFences[oldestFrameCompletionFence] to rvalue ref. 2. ScopedGLFence fence is created with default constructor 3. Assignment operator assigns ScopeGLFence&& over the default-constructed fence and this std::exchanges the contents. > Would you consider > defining operator bool() for ScopedGLFence? Sure. Created attachment 431661 [details]
Debug via EWS
Created attachment 431697 [details]
Debug via EWS
Created attachment 431699 [details]
Debug via EWS force gl
Created attachment 431842 [details]
Patch
This would be ready for review. Using the sync objects on OpenGL part appears a bit risky since the Catalina iOS-bots seemed to fail with the symptom that GL fences timeout with success. Probably that means we cannot use the fences in the implementation when using OpenGL. Comment on attachment 431842 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431842&action=review Looks great - unfortunate you had to debug the OpenGL driver issue but let's move forward. r+ > Source/WebCore/platform/graphics/angle/GraphicsContextGLANGLE.cpp:65 > +static Seconds maxFrameDuration = 5_s; Can/should this be const, too? Committed r279172 (239069@main): <https://commits.webkit.org/239069@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 431842 [details]. Reopening to attach new patch. Created attachment 432757 [details]
Patch
Created attachment 432758 [details]
Patch
The timing of the revert is unfortunate - can any more details be provided about where the failures are happening? The original patch author will be OOO for a while; is there any hope of getting this fixed and relanded in the short term? It looks like some devices don't actually support GL_ARB_sync which causes crashes in context construction: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 com.apple.WebCore 0x00007fff3a07a21c WebCore::GraphicsContextGLOpenGL::reshapeDisplayBufferBacking() + 188 1 com.apple.WebCore 0x00007fff396892d3 WebCore::GraphicsContextGLOpenGL::reshapeFBOs(WebCore::IntSize const&) + 451 2 com.apple.WebCore 0x00007fff3968b353 WebCore::GraphicsContextGLOpenGL::reshape(int, int) + 547 3 com.apple.WebCore 0x00007fff3ab6c384 WebCore::WebGLRenderingContextBase::initializeNewContext() + 2324 4 com.apple.WebCore 0x00007fff3ab776d6 WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContextGLAttributes&, WebCore::GraphicsContextGLWebGLVersion) + 2934 5 com.apple.WebCore 0x00007fff3aa283aa WebCore::HTMLCanvasElement::createContextWebGL(WebCore::GraphicsContextGLWebGLVersion, WebCore::GraphicsContextGLAttributes&&) + 330 6 com.apple.WebCore 0x00007fff3aa27ea5 WebCore::HTMLCanvasElement::getContext(JSC::JSGlobalObject&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown, (JSC::ShouldStrongDestructorGrabLock)0>, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>&&) + 1013 7 com.apple.WebCore 0x00007fff39a6f80d WebCore::jsHTMLCanvasElementPrototypeFunction_getContext(JSC::JSGlobalObject*, JSC::CallFrame*) + 429 8 ??? 0x00004a2582e011d8 0 + 81524969968088 9 com.apple.JavaScriptCore 0x00007fff367f7157 llint_entry + 110528 The crashes are blocking several of our perf bots. The safest thing for now is probably to revert the patch to unblock our bots until Kimmo gets back to this. Comment on attachment 432758 [details]
Patch
The GL_ARB_sync extension is emulated in ANGLE's Metal backend. Can you tell us what devices are affected, or what GPU families they have, so we can understand better why this emulation is failing?
(In reply to Kenneth Russell from comment #20) > Comment on attachment 432758 [details] > Patch > > The GL_ARB_sync extension is emulated in ANGLE's Metal backend. Can you tell > us what devices are affected, or what GPU families they have, so we can > understand better why this emulation is failing? From my understanding, the issue lies mostly on Nvidia Machines. There's a comment for events: // http://crbug.com/1136673 // Fence sync is flaky on Nvidia ANGLE_FEATURE_CONDITION((&mFeatures), hasEvents, isMetal2_1 && !isNVIDIA()); as events drive ARB_Sync, this isn't too unexpected. if (mFeatures.hasEvents.enabled) { // MTLSharedEvent is only available since Metal 2.1 outExtensions->fenceSync = true; outExtensions->waitSync = true; } We might be able to emulate these to not use GPUDriven events, but rather split the command buffer on sync points for systems that don't support events. Over to Dean. (In reply to Kyle Piddington from comment #21) > We might be able to emulate these to not use GPUDriven events, but rather > split the command buffer on sync points for systems that don't support > events. Great analysis Kyle - this sounds good. These don't have to be extremely accurate. Would it be feasible for you to try that and test it on your system by force-enabling that code path? Committed r279497 (239349@main): <https://commits.webkit.org/239349@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 432758 [details]. Reopening for the actual fix. Created attachment 433251 [details]
Patch
Comment on attachment 433251 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=433251&action=review > Source/WebCore/ChangeLog:6 > + Reviewed by NOBODY (OOPS!). Can more description be copied from the reverted patch's ChangeLog, and also a description added of what changed in this version of the patch to fix the previous breakage? Comment on attachment 433251 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=433251&action=review > Source/WebCore/ChangeLog:8 > + No new tests (OOPS!). Note also: this line will have to be removed for the CQ to land this. Created attachment 433383 [details]
Patch for landing
commit-queue failed to commit attachment 433383 [details] to WebKit repository. To retry, please set cq+ flag again.
Created attachment 433437 [details]
Patch for landing
Committed r279883 (239635@main): <https://commits.webkit.org/239635@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 433437 [details]. |