Bug 239015

Summary: REGRESSION (Safari 15.4) Performance regression after uploading WebGL buffers
Product: WebKit Reporter: Philip Taylor <philip.taylor>
Component: WebGLAssignee: Kimmo Kinnunen <kkinnunen>
Status: RESOLVED FIXED    
Severity: Blocker CC: dino, ews-watchlist, geofflang, gman, jonahr, kbr, kkinnunen, kondapallykalyan, kpiddington, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: All   
OS: All   
See Also: https://bugs.webkit.org/show_bug.cgi?id=239518
Attachments:
Description Flags
Patch
none
Patch
none
Example workaround none

Philip Taylor
Reported 2022-04-08 13:53:04 PDT
Hello, I am responsible for a WebGL rendering engine for displaying large CAD models on the web. The recent 15.4 update of Safari broke our engine, and I have been unable to reliably track down and implement workaround to the issue. This url should crash any iOS device, but load correctly on desktop browsers. https://docs.zea.live/Safari-Crash-Test/ A few comments. 1. We already disabled out 'multi-draw' code paths. 2. We already disabled antialiasing for iOS devices. 3. We already tried pre-allocating large GPU buffers to avoid re-allocations. It seemed at first that it was caused by us allocating and freeing buffers, so we have tried to eliminate as many buffer free/allocations sequences as possible, but still cannot make the provided page load. Note: Prior to 15.4 our engine would handle these large data sets very well on iOS Safari.
Attachments
Patch (4.93 KB, patch)
2022-04-22 01:24 PDT, Kimmo Kinnunen
no flags
Patch (4.94 KB, patch)
2022-04-22 01:34 PDT, Kimmo Kinnunen
no flags
Example workaround (3.65 KB, text/html)
2022-04-25 02:26 PDT, Kimmo Kinnunen
no flags
Radar WebKit Bug Importer
Comment 1 2022-04-08 14:37:04 PDT
Kenneth Russell
Comment 2 2022-04-12 12:19:08 PDT
Hopefully these problems have already been resolved in top-of-tree WebKit and given sufficient testing on the EWS won't happen again.
Kenneth Russell
Comment 3 2022-04-12 12:22:25 PDT
Actually I'm not sure they have been - https://docs.zea.live/Safari-Crash-Test/ seems to hang while loading in Safari Technology Preview on an M1 MacBook Pro. The same model loads successfully, albeit very slowly, in Chrome Canary with ANGLE's Metal backend enabled.
Philip Taylor
Comment 4 2022-04-12 17:54:22 PDT
I have been experiencing a mix of extremely 1. poor performance (2fps), 2. crashes, or 3. excellent performance (60fps) on quite similar datasets. The pre-allocating large GPU buffers was a solution that caused some datasets to jump from 2fps up to 60 fps on the same machine. (normally I incrementally grow GPU buffers by powers of 2). I can provide other datasets that provide various behaviors if needed. I am making stabs in the dark, but it felt like my buffer allocations were corrupting something in the back end.
Kenneth Russell
Comment 5 2022-04-18 16:59:16 PDT
If you have an M1 MacBook Pro and can produce a test case which demonstrates more of these problems on Safari Technology Preview, more of us can help investigate. Only Apple engineers can build and test WebKit on iOS.
Philip Taylor
Comment 6 2022-04-20 10:56:43 PDT
Thanks Ken. I'd like to update this issue. The issues is not specific to iOS. What seems to be happening is a severe performance regression, that looks like a crash in Safari as our renderer gets stuck iterating through a lot of draw calls. We store Geometries in one large buffer, and we update this buffer as we stream in geometries. Depending on the order that our workers finish parsing geoms, the _number_ of buffer updates may change. It seems like if we touch the same geom buffer twice, subsequent calls to drawElements are incredibly slow. First we allocate our large buffers. (Resizing each time by powers of 2) Then upload our geometries into these buffers using.. ``` gl.bufferSubData(gl.ARRAY_BUFFER, dstByteOffsetInBytes, attrData.values) ``` ``` gl.bufferSubData(gl.ELEMENT_ARRAY_BUFFER, dstByteOffsetInBytes, offsettedIndices) ``` Then we draw in large batches. ``` if (gl.multiDrawElements) { gl.multiDrawElements(gl.TRIANGLES, counts, 0, gl.UNSIGNED_INT, offsets, 0, drawCount) } else { const { geomItemId } = renderstate.unifs for (let i = 0; i < drawCount; i++) { gl.uniform1i(geomItemId.location, drawIds[i]) gl.drawElements(gl.TRIANGLES, counts[i], gl.UNSIGNED_INT, offsets[i]) } } ``` Something must change after our calls to bufferSubData that then makes drawElements unusably slow. Note: We do have a code path for multi-draw, which is currently disabled on Safari due to another regression. This performance issue may have existed before, but we didn't notice it, because we didn't use this code path, as it was a fallback. We spent today debugging this issue in Safari on an M1 macbook pro. The issue is not only on iOS.
Philip Taylor
Comment 7 2022-04-20 18:41:33 PDT
1. We generate some buffers (larger than we need) using the following code. const attrBuffer = gl.createBuffer() gl.bindBuffer(gl.ARRAY_BUFFER, attrBuffer) const sizeInBytes = numValues * attrSpec.elementSize gl.bufferData(gl.ARRAY_BUFFER, sizeInBytes, gl.STATIC_DRAW) 2. We start loading our geometries, and populating the buffers using the following gl.bindBuffer(gl.ARRAY_BUFFER, glattrbuffer.buffer) gl.bufferSubData(gl.ARRAY_BUFFER, offsetInBytes, attrData.values) 3. We generate Vertex Array Objects to bind our buffers to the GPU. this.vao = gl.createVertexArray() gl.bindVertexArray(this.vao) ... 4. During rendering we bind these buffers and dra. gl.bindVertexArray(this.vao) ... gl.drawElements(gl.TRIANGLES, counts[i], gl.UNSIGNED_INT, offsets[i]) 5. Subsequently more data might be uploaded, which means we might got back to step 2 if the buffers were already big enough, else 1 if we need to allocate more space. - if we need to re-allocate, we copy the data from the previous buffers into the new ones, but I have ensured this does not happen. I used to think this bug was caused by us repeatedly increasing the size of the buffers, but that does not seem to be the case. I now pre-allocate rather big buffers so we don't need to resize them, and can still repro the problem on even small assets. Now... it seems that if we simply generate the buffers, populate them, generate the bindings, and then draw, the performance is abysmal. However, if we upload more data into the buffers, then performance is perfect again.... We don't regenerate our VAOs, as they should be valid until we need to resize. I now have simpler repros than the heavy scene I setup last week.
Philip Taylor
Comment 8 2022-04-21 05:39:48 PDT
Hi, just updating my test scene with a simple repro.(my last scene loaded a rather large asset). This page loads and renders incredibly slowly (~1fps) in Safari. https://docs.zea.live/Safari-Crash-Test/ This page loads and renders incredibly quickly (60fps) in Safari. https://docs.zea.live/Safari-Crash-Test/?grid The difference? I have 2 sets of geometries. The grid and the brake system. Both are used to populate the buffers in 2 different steps.
Philip Taylor
Comment 9 2022-04-21 12:20:48 PDT
I have tried re-enabling the multi-draw code path and allocating extra indices above what is used, and we no longer see the exception that was thrown previously. However, the current performance regression is still there.
Kimmo Kinnunen
Comment 10 2022-04-21 13:11:15 PDT
The slowdown seems to come from the primitive restart index cache update. WebKit's restart cache refactor was copied to ANGLE but not verbatim. The is cache truthiness flag meaning got inverted and the memory usage optimisation in clearConversionBuffers was lost. https://bugs.webkit.org/show_bug.cgi?id=227451 https://chromium-review.googlesource.com/c/angle/angle/+/3331675 It appears that the dummy workaround would be to make a dummy sub buffer update to the element buffer after the first draw. - First draw updates the indexType to be correct (invalid enum -> draw buffer element type) - dummy subbuffer update marks the cache dirty since the "cache is dirty" state currently means "don't update the cache", it will skip the cache update.
Kenneth Russell
Comment 11 2022-04-21 13:37:11 PDT
Kimmo, thank you for tracking that down. Can you file a bug against ANGLE pointing to this one and describing in detail what the bug is and what we need to do to fix it? I'd suggest we fix it upstream in ANGLE and then roll the fix into WebKit. How does that sound?
Kimmo Kinnunen
Comment 12 2022-04-22 01:24:34 PDT
EWS Watchlist
Comment 13 2022-04-22 01:26:14 PDT
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE
Kimmo Kinnunen
Comment 14 2022-04-22 01:34:49 PDT
Philip Taylor
Comment 15 2022-04-22 05:46:39 PDT
Hi Kimmo, I tried applying your suggested workaround, and its not having any effect. I believe it must be close, your workaround aligns with my observations that multiple buffer uploads avoided the regression. Can you take a quick look at this code to see what could be missing? ``` { // Hack to force the primitive restart index cache to be dirty... // https://bugs.webkit.org/show_bug.cgi?id=239015 // - First draw updates the indexType to be correct (invalid enum -> draw buffer element type) const gl = this.__gl gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, this.indexBuffer) gl.drawElements(gl.POINTS, 1, gl.UNSIGNED_INT, 0) // - dummy subbuffer update marks the cache dirty // bufferSubData an array of size 1 at the end of the current allocation. const dummyIndices = new Uint32Array(1) const elementSize = 4 // Uint32Array const offsetInBytes = this.indicesAllocator.allocatedSpace * elementSize gl.bufferSubData(gl.ELEMENT_ARRAY_BUFFER, offsetInBytes, dummyIndices) gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, null) } ```
Philip Taylor
Comment 16 2022-04-22 05:52:30 PDT
For good measure, I tried drawing each of the element types. { // Hack to force the primitive restart index cache to be dirty... // https://bugs.webkit.org/show_bug.cgi?id=239015 // - First draw updates the indexType to be correct (invalid enum -> draw buffer element type) const gl = this.__gl gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, this.indexBuffer) gl.drawElements(gl.POINTS, 1, gl.UNSIGNED_INT, 0) gl.drawElements(gl.LINES, 2, gl.UNSIGNED_INT, 0) gl.drawElements(gl.TRIANGLES, 3, gl.UNSIGNED_INT, 0) // - dummy subbuffer update marks the cache dirty // bufferSubData an array of size 1 at the end of the current allocation. const dummyIndices = new Uint32Array(1) const elementSize = 4 // Uint32Array const offsetInBytes = this.indicesAllocator.allocatedSpace * elementSize gl.bufferSubData(gl.ELEMENT_ARRAY_BUFFER, offsetInBytes, dummyIndices) gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, null) }
Kimmo Kinnunen
Comment 17 2022-04-22 08:29:56 PDT
I think you need to draw with TRIANGLE_STRIP or TRIANGLE_FAN Btw, another option for a workaround may be that if you can select your mesh format, draw just with TRIANGLES. This does not need to compute the primitive restart.
Kimmo Kinnunen
Comment 18 2022-04-22 08:33:07 PDT
(In reply to Kimmo Kinnunen from comment #17) > I think you need to draw with TRIANGLE_STRIP or TRIANGLE_FAN > > Btw, another option for a workaround may be that if you can select your mesh > format, draw just with TRIANGLES. This does not need to compute the > primitive restart. Ooops, disregard this. It's just the inverse: For first workaround, drawing with points, lines, triangles should be the workaround. For the second workaround to try, you can try to use trinagle_strip, triangle_fan if possible.
Kimmo Kinnunen
Comment 19 2022-04-22 08:49:17 PDT
(In reply to Kenneth Russell from comment #11) > Kimmo, thank you for tracking that down. Can you file a bug against ANGLE > pointing to this one and describing in detail what the bug is and what we > need to do to fix it? > > I'd suggest we fix it upstream in ANGLE and then roll the fix into WebKit. > How does that sound? I think for our release cherrypick purposes we need to merge it the other way.
Philip Taylor
Comment 20 2022-04-22 09:16:13 PDT
(In reply to Kimmo Kinnunen from comment #18) > (In reply to Kimmo Kinnunen from comment #17) > > I think you need to draw with TRIANGLE_STRIP or TRIANGLE_FAN > > > > Btw, another option for a workaround may be that if you can select your mesh > > format, draw just with TRIANGLES. This does not need to compute the > > primitive restart. > > > Ooops, disregard this. > > > It's just the inverse: > > For first workaround, drawing with points, lines, triangles should be the > workaround. > > For the second workaround to try, you can try to use trinagle_strip, > triangle_fan if possible. Thanks. but If I understand you correctly, we have applied your workaround.. From one set of GPU buffers we draw TRIANGLES, LINES, and POINTS. (not fans and strips). So I am not sure what next steps need to be taken to improve my workaround. I may just not understand what you are saying.
Kimmo Kinnunen
Comment 21 2022-04-22 13:13:54 PDT
(In reply to Philip Taylor from comment #20) > So I am not sure what next steps need to be taken to improve my workaround. > I may just not understand what you are saying. As far as I see, it would appear that your test case matches my mental model. I think next step I need to build a simple test case for the workaround that exercises the codepath and try to see why it doesn't repro. I get to that only on Monday..
Kenneth Russell
Comment 22 2022-04-22 13:20:34 PDT
Comment on attachment 458121 [details] Patch Thank you so much Kimmo for tracking this down. This was a major performance pitfall for some customers. Sounds good to apply this in WebKit and then upstream it to ANGLE.
EWS
Comment 23 2022-04-25 00:05:45 PDT
Committed r293317 (249940@main): <https://commits.webkit.org/249940@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 458121 [details].
Kimmo Kinnunen
Comment 24 2022-04-25 02:26:21 PDT
Created attachment 458254 [details] Example workaround
Philip Taylor
Comment 25 2022-04-26 07:21:36 PDT
I can confirm that the workaround does address the performance issue on our side. thanks Kimmo
Note You need to log in before you can comment on or make changes to this bug.