Bug 216106 - [iOS WK1] Crashes when using ANGLE WebGL from another thread
Summary: [iOS WK1] Crashes when using ANGLE WebGL from another thread
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebGL (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P1 Normal
Assignee: Kimmo Kinnunen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-09-02 16:09 PDT by Dean Jackson
Modified: 2020-10-05 14:31 PDT (History)
14 users (show)

See Also:


Attachments
Test app (7.81 KB, application/x-gzip)
2020-09-08 17:41 PDT, Dean Jackson
no flags Details
ANGLE EAGL multithreading patch (2.57 KB, text/plain)
2020-09-16 13:23 PDT, Kenneth Russell
no flags Details
Patch (22.51 KB, patch)
2020-09-22 12:40 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff
Patch (23.50 KB, patch)
2020-09-22 13:11 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff
Patch (112.73 KB, patch)
2020-09-29 07:19 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff
Patch (119.84 KB, patch)
2020-09-30 04:38 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff
Patch (100.68 KB, patch)
2020-10-01 01:27 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dean Jackson 2020-09-02 16:09:48 PDT
We are seeing a lot of incoming crash reports on the iOS 14 beta when WebGL is used with UIWebView (WK1). The issue is that ANGLE is not yet able to handle multithreaded use of a single context (even when WebKit is ensuring only one of them is actually using the context).

This is tracked by ANGLE bug: https://bugs.chromium.org/p/angleproject/issues/detail?id=5010

There are three cases:

1. Web thread calls EGL_Initialize first, and then the main thread tries to create another WebGL context. This will crash in `RendererGL::getRendererDescription()`, where the `reinterpret_cast<char*>` fails because it is passed a null pointer.

2. Main thread calls EGL_Initialize first, and then the Web thread tries to create another WebGL context. This appears to crash in gl::FramebufferManager::getFramebuffer.

3. Thread A creates the only WebGL context, and it is used from thread B. (I think this might be a similar stack to 2).

Again, this only applies to WebKit 1.

Stack Trace for case 1.
-----

Thread 0 name:  ScePS4LinkRPCtrl  Dispatch queue: com.apple.main-thread
Thread 0 Crashed ↩:
0   libsystem_platform.dylib      	0x00000001cf5c9bc4 _platform_strlen + 4
1   WebCore                       	0x0000000196afe468 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string<std::nullptr_t>(char const*) + 40 (__string:253)
2   WebCore                       	0x0000000196c82ac4 rx::RendererGL::getRendererDescription() const + 76 (RendererGL.cpp:263)
3   WebCore                       	0x0000000196b164d0 gl::Context::initRendererString() + 220 (Context.cpp:2890)
4   WebCore                       	0x0000000196b1621c gl::Context::makeCurrent(egl::Display*, egl::Surface*, egl::Surface*) + 92 (Context.cpp:650)
5   WebCore                       	0x0000000196b5d87c egl::Display::makeCurrent(egl::Thread const*, egl::Surface*, egl::Surface*, gl::Context*) + 204 (Display.cpp:1227)
6   WebCore                       	0x0000000196b680bc EGL_MakeCurrent + 212 (entry_points_egl.cpp:454)
7   WebCore                       	0x000000019555a224 WebCore::GraphicsContextGLOpenGL::GraphicsContextGLOpenGL(WebCore::GraphicsContextGLAttributes, WebCore::HostWindow*, WebCore::GraphicsContextGL::Destination, WebCore::GraphicsContextGLOpenGL*) + 1748 (GraphicsContextGLOpenGLCocoa.mm:382)
8   WebCore                       	0x0000000195559888 WebCore::GraphicsContextGLOpenGL::create(WebCore::GraphicsContextGLAttributes, WebCore::HostWindow*, WebCore::GraphicsContextGL::Destination) + 168 (GraphicsContextGLOpenGLCocoa.mm:188)
9   WebCore                       	0x0000000195f6fa30 WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContextGLAttributes&, WTF::String const&) + 1164 (WebGLRenderingContextBase.cpp:705)
10  WebCore                       	0x0000000195e0a5e0 WebCore::HTMLCanvasElement::getContext(JSC::JSGlobalObject&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown, (JSC::ShouldStrongDestructorGrabLock)0>, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>&&) + 896 (HTMLCanvasElement.cpp:442)
11  WebCore                       	0x0000000194f1ae74 WebCore::jsHTMLCanvasElementPrototypeFunctionGetContext(JSC::JSGlobalObject*, JSC::CallFrame*) + 500 (JSHTMLCanvasElement.cpp:312)
12  JavaScriptCore                	0x00000001916b940c llint_entry + 159388
13  JavaScriptCore                	0x00000001916b6880 llint_entry + 148240
14  JavaScriptCore                	0x00000001916b6880 llint_entry + 148240
15  JavaScriptCore                	0x00000001916b6880 llint_entry + 148240
16  JavaScriptCore                	0x00000001916b67cc llint_entry + 148060
17  JavaScriptCore                	0x00000001916922c4 vmEntryToJavaScript + 276
18  JavaScriptCore                	0x0000000191cbeacc JSC::Interpreter::executeCall(JSC::JSGlobalObject*, JSC::JSObject*, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&) + 488 (JITCodeInlines.h:42)
19  JavaScriptCore                	0x0000000191ee50b8 JSC::profiledCall(JSC::JSGlobalObject*, JSC::ProfilingReason, JSC::JSValue, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&) + 176 (CallData.cpp:57)
20  WebCore                       	0x000000019592984c WebCore::JSEventListener::handleEvent(WebCore::ScriptExecutionContext&, WebCore::Event&) + 1636 (JSExecState.h:73)
21  WebCore                       	0x0000000195bf7354 WebCore::EventTarget::innerInvokeEventListeners(WebCore::Event&, WTF::Vector<WTF::RefPtr<WebCore::RegisteredEventListener, WTF::DumbPtrTraits<WebCore::RegisteredEventListener> >, 1ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>, WebCore::EventTarget::EventInvokePhase) + 440 (EventTarget.cpp:341)
22  WebCore                       	0x0000000195bf4690 WebCore::EventTarget::fireEventListeners(WebCore::Event&, WebCore::EventTarget::EventInvokePhase) + 384 (EventTarget.cpp:273)
23  WebCore                       	0x0000000195befc90 WebCore::dispatchEventInDOM(WebCore::Event&, WebCore::EventPath const&) + 128 (EventDispatcher.cpp:85)
24  WebCore                       	0x0000000195bef684 WebCore::EventDispatcher::dispatchEvent(WebCore::Node&, WebCore::Event&) + 1224 (EventDispatcher.cpp:154)
25  WebCore                       	0x00000001952390f8 WebCore::EventHandler::dispatchTouchEvent(WebCore::PlatformTouchEvent const&, WTF::AtomString const&, WTF::HashMap<WTF::Ref<WebCore::EventTarget, WTF::DumbPtrTraits<WebCore::EventTarget> >, std::__1::unique_ptr<WTF::Vector<WTF::RefPtr<WebCore::Touch, WTF::DumbPtrTraits<WebCore::Touch> >, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>, std::__1::default_delete<WTF::Vector<WTF::RefPtr<WebCore::Touch, WTF::DumbPtrTraits<WebCore::Touch> >, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc> > >, WTF::DefaultHash<WTF::Ref<WebCore::EventTarget, WTF::DumbPtrTraits<WebCore::EventTarget> > >, WTF::HashTraits<WTF::Ref<WebCore::EventTarget, WTF::DumbPtrTraits<WebCore::EventTarget> > >, WTF::HashTraits<std::__1::unique_ptr<WTF::Vector<WTF::RefPtr<WebCore::Touch, WTF::DumbPtrTraits<WebCore::Touch> >, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>, std::__1::default_delete<WTF::Vector<WTF::RefPtr<WebCore::Touch, WTF::DumbPtrTraits<WebCore::Touch> >, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc> > > > > const&, float, float) + 3216 (EventHandlerIOSTouch.cpp:188)
26  WebCore                       	0x000000019523b694 WebCore::EventHandler::handleTouchEvent(WebCore::PlatformTouchEvent const&) + 7828 (EventHandlerIOSTouch.cpp:519)
27  WebCore                       	0x000000019523c724 WebCore::EventHandler::touchEvent(WebEvent*) + 128 (EventHandlerIOS.mm:128)

Stack trace for other cases
----------

     41 WebCore: gl::FramebufferManager::getFramebuffer(gl::FramebufferID) const <==
        41 WebCore: gl::Context::unsetDefaultFramebuffer()
          41 WebCore: gl::Context::unsetDefaultFramebuffer()
            41 WebCore: gl::Context::unMakeCurrent(egl::Display const*)
              41 WebCore: egl::Display::makeCurrent(egl::Thread const*, egl::Surface*, egl::Surface*, gl::Context*)
                41 WebCore: EGL_MakeCurrent
                  41 WebCore: WebCore::GraphicsContextGLOpenGL::GraphicsContextGLOpenGL(WebCore::GraphicsContextGLAttributes, WebCore::HostWindow*, WebCore::GraphicsContextGL::Destination, WebCore::GraphicsContextGLOpenGL*)
                    41 WebCore: WebCore::GraphicsContextGLOpenGL::create(WebCore::GraphicsContextGLAttributes, WebCore::HostWindow*, WebCore::GraphicsContextGL::Destination)
                      41 WebCore: WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContextGLAttributes&, WTF::String const&)
                        41 WebCore: WebCore::HTMLCanvasElement::getContext(JSC::JSGlobalObject&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown, (JSC::ShouldStrongDestructorGrabLock)0>, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>&&)
                          41 WebCore: WebCore::jsHTMLCanvasElementPrototypeFunctionGetContext(JSC::JSGlobalObject*, JSC::CallFrame*)
                            41 
                              41 JavaScriptCore: llint_entry
                                41 JavaScriptCore: llint_entry
                                  41 JavaScriptCore: llint_entry
                                    41 JavaScriptCore: llint_entry
                                      41 JavaScriptCore: llint_entry
                                        41 JavaScriptCore: llint_entry
                                          41 JavaScriptCore: vmEntryToJavaScript
                                            41 JavaScriptCore: JSC::Interpreter::executeProgram(JSC::SourceCode const&, JSC::JSGlobalObject*, JSC::JSObject*)
                                              41 JavaScriptCore: JSC::evaluate(JSC::JSGlobalObject*, JSC::SourceCode const&, JSC::JSValue, WTF::NakedPtr<JSC::Exception>&)
                                                41 WebCore: WebCore::JSExecState::profiledEvaluate(JSC::JSGlobalObject*, JSC::ProfilingReason, JSC::SourceCode const&, JSC::JSValue, WTF::NakedPtr<JSC::Exception>&)
                                                  41 WebCore: WebCore::ScriptController::evaluateInWorld(WebCore::ScriptSourceCode const&, WebCore::DOMWrapperWorld&)
                                                    41 WebCore: WebCore::ScriptController::evaluateIgnoringException(WebCore::ScriptSourceCode const&)
                                                      41 WebCore: WebCore::ScriptElement::executeClassicScript(WebCore::ScriptSourceCode const&)
                                                        41 WebCore: WebCore::ScriptElement::prepareScript(WTF::TextPosition const&, WebCore::ScriptElement::LegacyTypeSupport)
                                                          41 WebCore: WebCore::HTMLScriptElement::didFinishInsertingNode()
                                                            41 WebCore: WebCore::ContainerNode::appendChildWithoutPreInsertionValidityCheck(WebCore::Node&)
                                                              41 WebCore: WebCore::Node::appendChild(WebCore::Node&)
                                                                41 WebCore: WebCore::jsNodePrototypeFunctionAppendChild(JSC::JSGlobalObject*, JSC::CallFrame*)
Comment 1 Kenneth Russell 2020-09-02 16:23:12 PDT
As discussed with Dean, some of these cases were already subtly broken prior to the switch to ANGLE, because WebGLRenderingContextBase makes its EAGL context current on the first thread it's used on. However, there are new failure modes with ANGLE because of its internal context virtualization.

Two changes - one in ANGLE, one in WebKit - are necessary to fix this:

1) ANGLE must offer a way to disable its context virtualization, and that must be used in UIWebView (ideally, only there, since virtualization offers performance benefits). Tracked in https://bugs.chromium.org/p/angleproject/issues/detail?id=5010 and dependent bug.

2) WebKit must release the GraphicsContextGL when a thread's done with its WebGL work. A new "releaseContext()" must be added along "makeContextCurrent()" in GraphicsContextGLOpenGL.

To avoid releasing the context too often, it would be ideal to do it less often than after each WebGL API call.

dino@ mentioned that when touch events are dispatched (on a different thread than the web main thread), a lock is grabbed that prevents concurrent access to the DOM. Could anyone offer a pointer to that code?

If we could store the current GraphicsContextGLOpenGL somewhere, and release it when the touch event handler exits (just before releasing that lock), that would handle that thread's case.

For the main thread, it would also be ideal to release any current GraphicsContextGLOpenGL when that DOM lock is released. I'm not sure how often that lock is acquired and released - any pointers to the code which manages it on the main thread?
Comment 2 Kenneth Russell 2020-09-02 16:24:09 PDT
Since many crash reports are coming in regarding this, raising this to P1. We certainly don't want to regress UIWebView's stability with the switch to ANGLE.
Comment 3 Dean Jackson 2020-09-02 16:55:19 PDT
(In reply to Kenneth Russell from comment #1)
> As discussed with Dean, some of these cases were already subtly broken prior
> to the switch to ANGLE, because WebGLRenderingContextBase makes its EAGL
> context current on the first thread it's used on. However, there are new
> failure modes with ANGLE because of its internal context virtualization.
> 
> Two changes - one in ANGLE, one in WebKit - are necessary to fix this:
> 
> 1) ANGLE must offer a way to disable its context virtualization, and that
> must be used in UIWebView (ideally, only there, since virtualization offers
> performance benefits). Tracked in
> https://bugs.chromium.org/p/angleproject/issues/detail?id=5010 and dependent
> bug.
> 
> 2) WebKit must release the GraphicsContextGL when a thread's done with its
> WebGL work. A new "releaseContext()" must be added along
> "makeContextCurrent()" in GraphicsContextGLOpenGL.

According to the Apple documentation, this isn't true.

https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/ConcurrencyandOpenGLES/ConcurrencyandOpenGLES.html#//apple_ref/doc/uid/TP40008793-CH409-SW4

It's ok for the EAGLContext to be the current context in multiple threads. It's just not ok to use it from multiple threads at the same time, and we already guard against that by taking a Web Thread lock when we process events on the main thread.

However, we might have to call glFlush in this situation.

> If we could store the current GraphicsContextGLOpenGL somewhere, and release
> it when the touch event handler exits (just before releasing that lock),
> that would handle that thread's case.
> 
> For the main thread, it would also be ideal to release any current
> GraphicsContextGLOpenGL when that DOM lock is released. I'm not sure how
> often that lock is acquired and released - any pointers to the code which
> manages it on the main thread?

This happens inside UIKit, not WebKit, unfortunately.

The code that calls _webTouchEventsRecognized has taken the WebThreadLock.
Comment 4 Kenneth Russell 2020-09-02 22:53:33 PDT
Thanks dino@ for that documentation pointer.

https://developer.apple.com/documentation/opengles/eaglcontext/1624882-setcurrentcontext?language=objc

EAGL's setCurrentContext behaves differently from any other OpenGL or OpenGL ES window system binding, including eglMakeCurrent - but OK, in that case, perhaps we could give ANGLE a different multithreading mode (selected during EGL context creation time) where it can actually make its hardware context current on multiple threads simultaneously, and rely on external synchronization.

CC'ing geofflang@ and jonahr@.
Comment 5 Kimmo Kinnunen 2020-09-02 23:02:32 PDT
Not saying you didn't figure this out, and as far as I understand the problem, which may be wrong:

In general using ANGLE from multiple threads would need to disable the context virtualization.

I think in this this particular case, virtualization should still work:
- Backend is using EAGL (context can be current in many threads)
- Every ANGLE context is locked by the same lock

If web workers can render to canvases without WebThreadLock, then those contexts should have virtualisation disabled.

So wouldn't the solution be just to:
0) EGL_Initialize() once in main thread
1) EGL_MakeCurrent(nullptr) whenever releasing the WebThreadLock
2) Remember to save + retain the client EAGL context and set it back whenever main thread calls into WebKit (e.g. in the touch handler)


E.g. it's of no interest whether EAGLContext can be current on multiple threads -- you still need to (possibly) flush, which the EGL_MakeCurrent(null) would guarantee via however ANGLE guarantees it.
Comment 6 Kimmo Kinnunen 2020-09-02 23:14:46 PDT
And to be complete, sorry if I muddy the waters with the possibility of needing flush even though EAGL docs not explicitly saying it:

If the flush incurred by the EGL_MakeCurrent(nullptr) contract would be too much overhead, one would invent the ANGLE_EGL_no_flush where then EGL_MakeCurrentNoFlush(nullptr) would just call EAGLContext setContext:nullptr .
Comment 7 Kenneth Russell 2020-09-03 10:56:28 PDT
Good suggestions Kimmo. I haven't thought through all the implications of your suggestion but a few points:

1) There's no good point to call EGL_Initialize on the main thread if a touch event handler is the first one to call WebGL.

2) ANGLE would have to be changed to call [EAGLContext setCurrentContext:] during EGL_MakeCurrent. With its current assumptions about single-threadedness and context virtualization, it's called only once, during EGL_Initialize. There will surely need to be other changes to make this work properly.

3) Based on dino@'s point about UIKit being the one to grab the WebThreadLock during touch event dispatch, it's not clear there's a good point in WebKit to call EGL_MakeCurrent(null) on that thread. (Maybe there is.) Also not sure about when to do this on the main thread - who grabs / releases the WebThreadLock there?
Comment 8 Kenneth Russell 2020-09-03 12:07:15 PDT
Kimmo pointed out I probably misunderstood the naming of the various threads. If the "UI thread" (== "main thread") is the one dispatching touch events, and the "web main thread" is a different thread that ordinarily runs WebKit's JS / DOM / etc:

My point still stands - it might still be the case that the web main thread never did any WebGL work, and the touch event handler on the UI thread is the first one to touch WebGL.

It should still be fine, under the cover of the WebThreadLock, for either thread to EGL_Initialize and make the WebGL context.

Still need to release ANGLE's EGL context on both the web main thread and UI thread at appropriate points, in order to maintain correct EGL semantics.

ANGLE still needs to be updated to call [EAGLContext setCurrentContext:] during EGL_MakeCurrent. jonahr@ has already done this work for ANGLE's other backends; needs to be done for the iOS backend.
Comment 9 Dean Jackson 2020-09-08 17:41:46 PDT
Created attachment 408295 [details]
Test app

Attached is an Xcode project that will trigger these crashes.
Comment 10 Radar WebKit Bug Importer 2020-09-09 16:10:14 PDT
<rdar://problem/68602452>
Comment 11 Dean Jackson 2020-09-15 15:31:25 PDT
This is actually rdar://67827426
Comment 12 Kenneth Russell 2020-09-15 15:40:36 PDT
To have a central place for documentation, the point in the code dino@ pointed out dealing with the WebThreadLock, etc. is Source/WebCore/platform/ios/wak/WebCoreThread.mm :

https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/platform/ios/wak/WebCoreThread.mm
Comment 13 Dean Jackson 2020-09-15 16:36:35 PDT
My understanding is that the remaining steps for a fix are:

- [ANGLE] patch to allow multithreading support for EAGL backend

- [WebKit] release the EGL (which is from ANGLE, not EAGL) context as we release the WebThreadLock, somewhere around MainRunLoopAutoUnlock

- [WebKit] have the WebThread release the EGL context if the main thread wants it. This would likely be just releasing it unconditionally in prepareCanvasForRendering if we're in WK1 and are holding the context.
Comment 14 Kenneth Russell 2020-09-15 17:25:49 PDT
Those sound like the right steps needed. Let's figure out how to best get the ANGLE side changes done. https://bugs.chromium.org/p/angleproject/issues/detail?id=5010 was filed about this, but probably we should test all three components together in WebKit before upstreaming any changes to ANGLE.
Comment 15 Kenneth Russell 2020-09-16 13:23:05 PDT
Created attachment 408950 [details]
ANGLE EAGL multithreading patch

Here's the (small) proposed ANGLE-side change adding multithreading support to its EAGL backend. Dean, could you please test this in conjunction with your WebKit-side changes? I've tested it locally in the MobileMiniBrowser on the iOS Simulator, without any other changes, and it seems to work.
Comment 16 Dean Jackson 2020-09-16 14:26:30 PDT
Thanks Ken. I'll use this with the WebKit parts.
Comment 17 Kimmo Kinnunen 2020-09-22 12:40:33 PDT
Created attachment 409390 [details]
Patch
Comment 18 Kimmo Kinnunen 2020-09-22 12:49:55 PDT
> [WebKit] release the EGL (which is from ANGLE, not EAGL) context as we release the WebThreadLock, somewhere around MainRunLoopAutoUnlock

This probably is not correct. If I understand correctly, MainRunLoopAutoUnlock happens at runloop stage. However, client might make 2, 3 or 77 calls that all cause WebGL on arbitrary thread.

> [WebKit] have the WebThread release the EGL context if the main thread wants it. This would likely be just releasing it unconditionally in prepareCanvasForRendering if we're in WK1 and are holding the context.


This is probably not correct. Most arbitrary WebKit callgraphs can invoke graphicscontext tasks.

I did a test but obviously it's not conclusive, attached as an example.

I'm still trying to understand what'd be the correct place to do the unset for the arbitrary client threads. My current thinking is that it's closing of the scopes where we call WebThreadLock()
Comment 19 Kimmo Kinnunen 2020-09-22 12:55:47 PDT
> MainRunLoopAutoUnlock happens at runloop stage. However, client might make 2, 3 or 77 calls that all cause WebGL on arbitrary thread.

What I mean is that clients might clobber EAGL context between each consecutive call.

Thus we need to unset ANGLE context after each call, so that we can also set it during each call and get ANGLE to forward the call to EAGL.
Comment 20 Kimmo Kinnunen 2020-09-22 13:11:19 PDT
Created attachment 409393 [details]
Patch
Comment 21 EWS Watchlist 2020-09-22 13:12:35 PDT
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE
Comment 22 Kenneth Russell 2020-09-23 18:41:16 PDT
Comment on attachment 409393 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=409393&action=review

> Source/ThirdParty/ANGLE/src/libANGLE/Display.cpp:1228
> +        ANGLE_TRY(previousContext->flush());

This is causing the build failure - have to remove the ANGLE_TRY.

Is this known to be necessary? Not making this change would be better.

> Source/WebCore/platform/ios/wak/WebCoreThread.mm:75
> +class ScopedClientThreadCrossThreadGlobalStateRestore {

Not used - hopefully this isn't needed.
Comment 23 Kenneth Russell 2020-09-23 18:44:20 PDT
To the best of my knowledge, no other code in WebKit is using EAGL directly. Therefore it shouldn't be necessary to release the GraphicsContextGL / ANGLE context upon each WebGL API call, but only upon exiting the scopes in UIWebView where the WebThreadLock is released, as it looks like your current patch does.

It would be much preferred to avoid releasing the context after every WebGL API call. Doing this might dramatically, negatively impact performance.
Comment 24 Kimmo Kinnunen 2020-09-29 07:19:21 PDT
Created attachment 409997 [details]
Patch
Comment 25 Kenneth Russell 2020-09-29 17:57:27 PDT
Comment on attachment 409997 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=409997&action=review

Fantastic work Kimmo fixing this! Several comments throughout.

> Source/ThirdParty/ANGLE/include/EGL/eglext_angle.h:121
> +#define EGL_PLATFORM_ANGLE_DEVICE_CONTEXT_VOLATILE_EAGL_ANGLE 0x3490

ANGLE maintains an internal list of unallocated enum constants. I've allocated 0x34A2 for this; please use it. Thanks.

Could you please document these two in a new extensions/EGL_ANGLE_platform_angle_device_context_volatile.txt? You can model it after EGL_ANGLE_platform_angle_context_virtualization.txt .

> Source/ThirdParty/ANGLE/include/EGL/eglext_angle.h:126
> +#define EGL_PLATFORM_ANGLE_DEVICE_CONTEXT_VOLATILE_CGL_ANGLE 0x3491

Have allocated 0x34A3 out of ANGLE's pre-reserved range for this; please use that. Thanks.

> Source/ThirdParty/ANGLE/src/libANGLE/Context.cpp:699
> +    ANGLE_TRY(angle::ResultToEGL(mImplementation->flush(this)));

Are you sure this change is needed here? Since you're implementing and calling eglReleaseThread, could the flush be done just there?

ANGLE virtualizes contexts which eliminates the need for many flushes. Concerned that this might impact performance. Wondering if you could test with WebGL content like the Aquarium with a large number of fish.

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/DisplayImpl.h:71
> +    virtual egl::Error releaseThread() = 0;

Consider implementing these in the base class and returning egl::NoError(), like DisplayImpl::handleGPUSwitch.

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/DisplayImpl.h:77
> +

There are a few unnecessary whitespace changes; not a big deal.

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/DisplayGL.cpp:38
> +    UNIMPLEMENTED();

These shouldn't call UNIMPLEMENTED. All of the entry points call through this, which will cause lots of warnings, trace events, and even assertion failures in debug builds in DisplayGLX, DisplayWGL, etc. See my comment in DisplayImpl.h.

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/DisplayGL.h:36
> +    egl::Error releaseThread() override;

These need to be overridden in more subclasses, such as DisplayD3D, DisplayVk, etc. See comment in the implementing file.

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/cgl/DisplayCGL.mm:250
> +    if (mDeviceContextIsVolatile || mThreadsWithCurrentContext.find(threadId) == mThreadsWithCurrentContext.end())

What happens if a second thread attempts to eglBindAPI while another's already bound? Does the CGLSetCurrentContext call below fail, or does any higher-level validation code in ANGLE prevent it (I'm guessing not)?

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/cgl/DisplayCGL.mm:267
> +        if (CGLSetCurrentContext(nullptr) != kCGLNoError)

Can this have unexpected side effects like:

 - App calls ANGLE's eglBindAPI, eglMakeCurrent and does work
 - App uses CGL to make their own context current and does work with it
 - App calls eglReleaseThread

This will cause the app's CGL context to be un-made current.

I don't want to suggest to call CGLGetCurrentContext here - at least, not necessarily - but do you have any thoughts about this?

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/eagl/DisplayEAGL.mm:141
> +        if (![getEAGLContextClass() setCurrentContext:nil])

Would flushing here, instead of in unmakeCurrent, work?

Also, similar question as that one in DisplayCGL::releaseThread.

> Source/ThirdParty/ANGLE/src/libGLESv2/entry_points_egl.cpp:1083
> +    ANGLE_EGL_TRY_RETURN(thread, display->bindThread(), "eglCreateWindowSurface",

Should this and above have referenced "eglCreatePlatformWindowSurface"?

> Source/ThirdParty/ANGLE/src/libGLESv2/entry_points_egl_ext.cpp:1667
> +    ANGLE_EGL_TRY(thread, display->bindThread(), "eglReacquireHighPowerGPUANGLE",

eglReacquireHighPowerGPUANGLE -> eglHandleGPUSwitchANGLE

> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:149
> +        // After initializitaion, EGL_DEFAULT_DISPLAY will return the platform-customized display.

initializitaion -> initialization

> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:774
> +#if PLATFORM(IOS_FAMILY) && USE(ANGLE)

dino@ should review these #ifdefs - not sure whether MacCatalyst-related #ifdefs might be needed.

> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:777
> +    ASSERT(!WebCore::isInWebProcess());

WebKit2 isn't supported at all on iOS? Not even in-browser, ignoring UIWebView?

> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:780
> +    if (!WebThreadIsEnabled())

Is it OK from a platform layering perspective for this code to access functions in platform/ios/wak? This call causes a circular dependency between code in that directory and this one.
Comment 26 Kimmo Kinnunen 2020-09-30 04:38:42 PDT
Created attachment 410107 [details]
Patch
Comment 27 Kimmo Kinnunen 2020-09-30 04:39:52 PDT
Comment on attachment 409997 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=409997&action=review

>> Source/ThirdParty/ANGLE/include/EGL/eglext_angle.h:121
>> +#define EGL_PLATFORM_ANGLE_DEVICE_CONTEXT_VOLATILE_EAGL_ANGLE 0x3490
> 
> ANGLE maintains an internal list of unallocated enum constants. I've allocated 0x34A2 for this; please use it. Thanks.
> 
> Could you please document these two in a new extensions/EGL_ANGLE_platform_angle_device_context_volatile.txt? You can model it after EGL_ANGLE_platform_angle_context_virtualization.txt .

done

>> Source/ThirdParty/ANGLE/include/EGL/eglext_angle.h:126
>> +#define EGL_PLATFORM_ANGLE_DEVICE_CONTEXT_VOLATILE_CGL_ANGLE 0x3491
> 
> Have allocated 0x34A3 out of ANGLE's pre-reserved range for this; please use that. Thanks.

done

>> Source/ThirdParty/ANGLE/src/libANGLE/Context.cpp:699
>> +    ANGLE_TRY(angle::ResultToEGL(mImplementation->flush(this)));
> 
> Are you sure this change is needed here? Since you're implementing and calling eglReleaseThread, could the flush be done just there?
> 
> ANGLE virtualizes contexts which eliminates the need for many flushes. Concerned that this might impact performance. Wondering if you could test with WebGL content like the Aquarium with a large number of fish.

So if you consider case

  eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, mMyContext);
  glClearColor(1, 0,0, 1);
  glClear();
  eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT);
  CRASH();

I think EGL spec guarantees that the clear hits wherever it needs to hit.
If we don't flush, there's no guarantee.
The red/no-red could be observed from, say, cross-process texture.

> Since you're implementing and calling eglReleaseThread, could the flush be done just there?

Well, it's a bit matter of viewpoint. eglReleaseThread does not flush because of "releasing the thread". It flushes because it changes the context. I think again it's the case where if you release in one thread and continue using in other thread, you must flush or implementations that have per-thread queues might reorder calls that happen before ReleaseThread with calls that happen after MakeCurrent in other thread.

Also, in this patch the the other case needs to flush too,  eg the case of running in WebCoreThread where we do not do ReleaseThread but MakeCurrent.



> Concerned that this might impact performance.

For current WebKit use-cases, the perf impact probably is not very big, since it doesn't change the contexts (so unless Aquarium uses multiple webgl contexts).

The impact would be bigger for Chromium, though, since the command buffer multiplexes all contexts in same submit thread, with MakeCurrent in between contexts (last time I looked, maybe it's different with passthrough?)


To unblock this I can of course do like: 
  // ANGLE does not flush on MakeCurrent as EGL says it should, so we flush here.
 glFlush();
 if (!IsInWebThread())
     eglReleaseThread();
 else
     eglMakeCurrent(nullptr);

>> Source/ThirdParty/ANGLE/src/libANGLE/renderer/DisplayImpl.h:71
>> +    virtual egl::Error releaseThread() = 0;
> 
> Consider implementing these in the base class and returning egl::NoError(), like DisplayImpl::handleGPUSwitch.

done

>> Source/ThirdParty/ANGLE/src/libANGLE/renderer/DisplayImpl.h:77
>> +
> 
> There are a few unnecessary whitespace changes; not a big deal.

done

>> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/DisplayGL.cpp:38
>> +    UNIMPLEMENTED();
> 
> These shouldn't call UNIMPLEMENTED. All of the entry points call through this, which will cause lots of warnings, trace events, and even assertion failures in debug builds in DisplayGLX, DisplayWGL, etc. See my comment in DisplayImpl.h.

done

>> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/DisplayGL.h:36
>> +    egl::Error releaseThread() override;
> 
> These need to be overridden in more subclasses, such as DisplayD3D, DisplayVk, etc. See comment in the implementing file.

done

>> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/cgl/DisplayCGL.mm:250
>> +    if (mDeviceContextIsVolatile || mThreadsWithCurrentContext.find(threadId) == mThreadsWithCurrentContext.end())
> 
> What happens if a second thread attempts to eglBindAPI while another's already bound? Does the CGLSetCurrentContext call below fail, or does any higher-level validation code in ANGLE prevent it (I'm guessing not)?

eglBindAPI(type) just selects _EGL_ api type. It's equivalent of :
  thread_local apiType = type.

It does not affect any other thing. In fact, the "bindThread", which I renamed to "prepareForCall", is not even invoked on eglBindAPI.

I try to document the function better.

With regards to what I understand is the underlying question:

This part of calling CGLSetCurrentContext logic does not terribly change between what was before and what is after.
* CGL context will be current in multiple threads at the same time.
* CGL context needs to be called-into serially.
* EGL rules are the mechanism for this serialization: no EGL context is current in two threads simultaneously.

All these were effective before, and they should be effective after.

the new functionality is
1) update CGL context on all EGL functions, not just make current
 --> Fixes a bug where function like eglTerminate would be called in other thread than eglInitialize.
2) add concept of "device context volatility" --> Update CGL context even if ANGLE thinks it's already set.

>> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/cgl/DisplayCGL.mm:267
>> +        if (CGLSetCurrentContext(nullptr) != kCGLNoError)
> 
> Can this have unexpected side effects like:
> 
>  - App calls ANGLE's eglBindAPI, eglMakeCurrent and does work
>  - App uses CGL to make their own context current and does work with it
>  - App calls eglReleaseThread
> 
> This will cause the app's CGL context to be un-made current.
> 
> I don't want to suggest to call CGLGetCurrentContext here - at least, not necessarily - but do you have any thoughts about this?

> Can this have unexpected side effects like:

There's no added unexpected side effects. eglInitialize already trashed the CGL context of the APP -- that's the way ANGLE did rendering: with CGL.

> This will cause the app's CGL context to be un-made current.

This is already the contract between APP and ANGLE:
 - App knows ANGLE will trash the CGL context on egl calls.

So in webkit scenario:

Contract between APP and WebKit WK1:
 - App knows WebKit WK1 will trash CGL context on WebKit calls.

Contract between WebKit WK1 and ANGLE:
 - WebKit knows ANGLE will trash CGL context on egl calls.

>> Source/ThirdParty/ANGLE/src/libANGLE/renderer/gl/eagl/DisplayEAGL.mm:141
>> +        if (![getEAGLContextClass() setCurrentContext:nil])
> 
> Would flushing here, instead of in unmakeCurrent, work?
> 
> Also, similar question as that one in DisplayCGL::releaseThread.

The releaseThread() is just an optimisation of not leaking memory.
The flush vs no flush is a MakeCurrent discussion.

>> Source/ThirdParty/ANGLE/src/libGLESv2/entry_points_egl.cpp:1083
>> +    ANGLE_EGL_TRY_RETURN(thread, display->bindThread(), "eglCreateWindowSurface",
> 
> Should this and above have referenced "eglCreatePlatformWindowSurface"?

done

>> Source/ThirdParty/ANGLE/src/libGLESv2/entry_points_egl_ext.cpp:1667
>> +    ANGLE_EGL_TRY(thread, display->bindThread(), "eglReacquireHighPowerGPUANGLE",
> 
> eglReacquireHighPowerGPUANGLE -> eglHandleGPUSwitchANGLE

done

>> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:149
>> +        // After initializitaion, EGL_DEFAULT_DISPLAY will return the platform-customized display.
> 
> initializitaion -> initialization

done

>> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:774
>> +#if PLATFORM(IOS_FAMILY) && USE(ANGLE)
> 
> dino@ should review these #ifdefs - not sure whether MacCatalyst-related #ifdefs might be needed.

This is intended to cover maccatalyst -- iOS_family should cover that.

>> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:777
>> +    ASSERT(!WebCore::isInWebProcess());
> 
> WebKit2 isn't supported at all on iOS? Not even in-browser, ignoring UIWebView?

Added:
    // At the moment this function is relevant only when web thread lock owns the GraphicsContextGLOpenGL current context.

>> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:780
>> +    if (!WebThreadIsEnabled())
> 
> Is it OK from a platform layering perspective for this code to access functions in platform/ios/wak? This call causes a circular dependency between code in that directory and this one.

Good point. Probably a matter of viewpoint: is web thread a property on which everything is built, or is web thread  aproperty that is on top of everything. I'm not familiar with the platform enough. 
This functionality is used in other similar levels, but the code is maybe simpler done at the call site.
I'll change the code to supply the information from the call site.
Comment 28 Kimmo Kinnunen 2020-09-30 05:13:47 PDT
>  eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, mMyContext);
>  glClearColor(1, 0,0, 1);
>  glClear();
>  eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT);
>  CRASH();

> I think EGL spec guarantees that the clear hits wherever it needs to hit.

Hmm, disregard this, this is obviously incorrect.

The idea would be something like
  eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, mMyContext);
  glClearColor(1, 0,0, 1);
  glClear();
  eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT);
  for (;;);

and then some handwaving about the per-thread buffers.

Anyhow, if we want to fix this inside angle and not have perf penalty, maybe some  "context is multithreaded" flag could be done. (or, to be pedantic maybe some "context is not multithread aware" flag)..
Comment 29 Kenneth Russell 2020-09-30 22:31:05 PDT
Comment on attachment 410107 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=410107&action=review

Great work Kimmo solving this problem! Great work on the test, too. This change looks nearly perfect to me - only one comment / question, but setting r+ now in order to save re-review time regardless of which way it's answered, once you have the build failures fixed.

> Source/ThirdParty/ANGLE/extensions/EGL_ANGLE_platform_angle_device_context_volatile_cgl.txt:3
> +    ANGLE_platform_angle_device_context_volatile_CGL

Lowercase _CGL here and below.

> Source/ThirdParty/ANGLE/extensions/EGL_ANGLE_platform_angle_device_context_volatile_cgl.txt:11
> +    Kenneth Russel, Google

Two "l's please :)

> Source/ThirdParty/ANGLE/extensions/EGL_ANGLE_platform_angle_device_context_volatile_cgl.txt:15
> +    Kenneth Russel, Google (kbr 'at' chromium 'dot' org)

Here too.

> Source/ThirdParty/ANGLE/extensions/EGL_ANGLE_platform_angle_device_context_volatile_eagl.txt:11
> +    Kenneth Russel, Google

Here too.

> Source/ThirdParty/ANGLE/extensions/EGL_ANGLE_platform_angle_device_context_volatile_eagl.txt:15
> +    Kenneth Russel, Google (kbr 'at' chromium 'dot' org)

Here too.

> Source/ThirdParty/ANGLE/src/libANGLE/Display.h:105
> +    // Called before all display state dependent EGL function. Backends can set up, for example,

function -> functions

> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:765
> +    // potentially switching threads later, we should flush.

See below - if the code is removed then this comment should be updated.

> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:771
> +            gl::Flush();

I don't understand the need for this flush on MakeCurrent. If we're coming in with a different WebGL context / GraphicsContextGL current than this one, it's still not necessary to do a glFlush at this point, because the same hardware context is used for both GraphicsContextGL instances. Commands are properly ordered between them. ANGLE handles the virtual context switch and sending down the user's GL context state to the hardware context in its EGL_MakeCurrent.

If we're coming in and the application's EGAL context (for example) is current, then this code won't have the desired effect - ANGLE's EGL_GetCurrentContext() will not know about that real native context and will return nullptr. gl::Flush() will not necessarily perform a real glFlush anyway.

Basically, it seems to me that the 3 lines of code above - the calls to EGL_GetCurrentContext through gl::Flush - should be removed.

> Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:801
> +        // Called when we do not know if this thread again.

...know if we will ever see another call from this thread again.
Comment 30 Kimmo Kinnunen 2020-10-01 00:23:02 PDT
> > Source/WebCore/platform/graphics/cocoa/GraphicsContextGLOpenGLCocoa.mm:771
> > +            gl::Flush();


> I don't understand the need for this flush on MakeCurrent.

So it's about what we assume about ANGLE implementation.

My suggestion was:
* Assume ANGLE has a bug where it does not flush on MakeCurrent

Your suggestion was:
* Assume ANGLE has one platform context
* Assume ANGLE has a bug where it does not flush on MakeCurrent

I can change it to your suggestion and add a comment.
Comment 31 Kimmo Kinnunen 2020-10-01 01:27:03 PDT
Created attachment 410207 [details]
Patch
Comment 32 Maciej Stachowiak 2020-10-01 13:10:06 PDT
<rdar://problem/67827426>
Comment 33 Dean Jackson 2020-10-01 14:58:15 PDT
Comment on attachment 410207 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=410207&action=review

Thanks so much Kimmo. This is awesome.

> Source/ThirdParty/ANGLE/ChangeLog:16
> +        gl function if the client knowns the platform state might

typo: knows

> Tools/TestWebKitAPI/Configurations/TestWebKitAPI.xcconfig:74
> +WK_OPENGL_LDFLAGS = $(WK_OPENGL_LDFLAGS_$(WK_PLATFORM_NAME));
> +WK_OPENGL_LDFLAGS_iphoneos = -framework OpenGLES;
> +WK_OPENGL_LDFLAGS_maccatalyst = -framework OpenGL;
> +WK_OPENGL_LDFLAGS_iphonesimulator = -framework OpenGLES;
> +WK_OPENGL_LDFLAGS_watchos = $(WK_OPENGL_LDFLAGS_iphoneos);
> +WK_OPENGL_LDFLAGS_watchsimulator = $(WK_OPENGL_LDFLAGS_iphonesimulator);
> +WK_OPENGL_LDFLAGS_appletvos = $(WK_OPENGL_LDFLAGS_iphoneos);
> +WK_OPENGL_LDFLAGS_appletvsimulator = $(WK_OPENGL_LDFLAGS_iphonesimulator);
> +WK_OPENGL_LDFLAGS_macosx = -framework OpenGL;
> +
> +OTHER_LDFLAGS = $(inherited) $(UNEXPORTED_SYMBOL_LDFLAGS) -lgtest -force_load $(BUILT_PRODUCTS_DIR)/libTestWebKitAPI.a -framework JavaScriptCore -framework WebKit -lWebCoreTestSupport $(WK_AUTHKIT_LDFLAGS) -framework Network $(WK_PDFKIT_LDFLAGS) $(WK_HID_LDFLAGS) $(WK_OPENGL_LDFLAGS) $(OTHER_LDFLAGS_PLATFORM_$(WK_COCOA_TOUCH));

This is ok for the moment, but I just made a change to NOT link to OpenGL in all our frameworks, and instead soft-link. The problem is with iOS apps on Apple Silicon, where we need to decide at runtime whether to use GLES or GL.

> Tools/TestWebKitAPI/Tests/WebKitLegacy/ios/WebGLNoCrashOnOtherThreadAccess.mm:180
> +    [uiWebView stringByEvaluatingJavaScriptFromString:@"runTestAsync();"];

It would also be possible to have a test that dispatches touch events rather than evaluating JS, which would exercise the main/UI thread entry.
Comment 34 Kimmo Kinnunen 2020-10-01 22:26:55 PDT
(In reply to Dean Jackson from comment #33)
> This is ok for the moment, but I just made a change to NOT link to OpenGL in
> all our frameworks, and instead soft-link. The problem is with iOS apps on
> Apple Silicon, where we need to decide at runtime whether to use GLES or GL.

I did a version of that too in the obsolete patches, but it failed since PAL needed libcompression.tbd and I didn't even know what that was :) It also didn't repro on my macOS and the patch was in a hurry..

Then I thought of the use-case: this is Application code, and AFAIU the applications are never compiled with either-or or dual mode. Applications are always either Mac apps or iOS apps, always CGL or EAGL. Applications never use both? Or are you saying when compiling the test case app as maccatalyst app, it should be compiled as using both and then actually use both? (That's what I had)

> 
> > Tools/TestWebKitAPI/Tests/WebKitLegacy/ios/WebGLNoCrashOnOtherThreadAccess.mm:180
> > +    [uiWebView stringByEvaluatingJavaScriptFromString:@"runTestAsync();"];
> 
> It would also be possible to have a test that dispatches touch events rather
> than evaluating JS, which would exercise the main/UI thread entry.

I tried to test that using the direct JS VM access API, JSContext invocation. I did verify this ran the call in the calling thread, eg the main/ui thread.
Comment 35 EWS 2020-10-02 00:05:47 PDT
Committed r267869: <https://trac.webkit.org/changeset/267869>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 410207 [details].
Comment 36 David Kilzer (:ddkilzer) 2020-10-02 08:09:11 PDT
(In reply to Kimmo Kinnunen from comment #34)
> (In reply to Dean Jackson from comment #33)
> > This is ok for the moment, but I just made a change to NOT link to OpenGL in
> > all our frameworks, and instead soft-link. The problem is with iOS apps on
> > Apple Silicon, where we need to decide at runtime whether to use GLES or GL.
> 
> I did a version of that too in the obsolete patches, but it failed since PAL
> needed libcompression.tbd and I didn't even know what that was :) It also
> didn't repro on my macOS and the patch was in a hurry..

The libcompression.tbd file is used by the linker when the actual library (libcompression.dylib) doesn't exist on disk.  It's a text file with a list of symbols that libcompression.dylib would export, and can be used by the linker in place of the actual dylib.

This file format was created to make it possible to parallelize builds so that a project that links to a dylib doesn't have to wait for the dylib to be built (since building the *.tbd file is quicker because it only requires parsing header files instead of compiling and linking all source files).

> Then I thought of the use-case: this is Application code, and AFAIU the
> applications are never compiled with either-or or dual mode. Applications
> are always either Mac apps or iOS apps, always CGL or EAGL. Applications
> never use both? Or are you saying when compiling the test case app as
> maccatalyst app, it should be compiled as using both and then actually use
> both? (That's what I had)

We announced that it's possible to run iOS apps on Apple silicon hardware (without the iOS Simulator), so that would be a case when the app would use EAGL on the iOS device and GL on macOS.  I think that's why we need to soft-link to the libraries so we can make the choice at runtime instead of compile time.
Comment 37 Dean Jackson 2020-10-05 14:31:17 PDT
(In reply to David Kilzer (:ddkilzer) from comment #36)

> > Then I thought of the use-case: this is Application code, and AFAIU the
> > applications are never compiled with either-or or dual mode. Applications
> > are always either Mac apps or iOS apps, always CGL or EAGL. Applications
> > never use both? Or are you saying when compiling the test case app as
> > maccatalyst app, it should be compiled as using both and then actually use
> > both? (That's what I had)
> 
> We announced that it's possible to run iOS apps on Apple silicon hardware
> (without the iOS Simulator), so that would be a case when the app would use
> EAGL on the iOS device and GL on macOS.  I think that's why we need to
> soft-link to the libraries so we can make the choice at runtime instead of
> compile time.

That's right. WebKit in MacCatalyst on Apple Silicon can use either OpenGL or OpenGLES, and decides at runtime based on the binary calling it. Even then it is a bit more complicated, because a WKWebView UIProcess can be Catalyst with an iOS binary, but the WebProcess would be a macOS binary. A UIWebView is whatever the app was.