Bug 231607 - REGRESSION (iOS 15): Tab crashes when trying to render Projector stories
Summary: REGRESSION (iOS 15): Tab crashes when trying to render Projector stories
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebGL (show other bugs)
Version: Safari 15
Hardware: iPhone / iPad Other
: P2 Normal
Assignee: Kimmo Kinnunen
URL: https://staging.projector.com/story/b...
Keywords: InRadar
Depends on:
Blocks: anglemetalregr
  Show dependency treegraph
 
Reported: 2021-10-12 12:37 PDT by Sidhu Alluri
Modified: 2021-11-03 13:10 PDT (History)
9 users (show)

See Also:


Attachments
Patch (4.59 KB, patch)
2021-10-28 17:17 PDT, Kyle Piddington
no flags Details | Formatted Diff | Diff
Patch (8.39 KB, patch)
2021-10-29 20:03 PDT, Kyle Piddington
no flags Details | Formatted Diff | Diff
Patch for landing (8.33 KB, patch)
2021-11-03 12:31 PDT, Kyle Piddington
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sidhu Alluri 2021-10-12 12:37:08 PDT
The following link crashes the iOS 15 Safari tab on iPhone X, 12 Mini, and 13. Works fine in all devices with iOS 14 or earlier.
https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6-de8b8f178a8b?scene=6f3e4207

After reloading once, the tab displays "A problem repeatedly occurred on https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6-de8b8f178a8b?scene=6f3e4207"

No console errors when trying to debug.
Comment 1 Kenneth Russell 2021-10-12 14:16:43 PDT
I only have the hardware to test this on macOS but the context creation attributes appear to be:

alpha: false
antialias: false
powerPreference: "high-performance"
premultipliedAlpha: true
preserveDrawingBuffer: false
stencil: false

Not sure why this combination might cause the tab to crash. Kimmo, would it be possible for you to confirm this? It doesn't look like the xrCompatible context creation attribute is the issue here.
Comment 2 Sidhu Alluri 2021-10-12 18:41:41 PDT
Managed to get it to stop crashing after reducing the gl buffer size to 15MB (down from 96MB, we use a single large buffer). `createBuffer` call succeeds but leads to eventual tab crash after enough draw calls. I'll try to narrow it further, but wanted to give an update.
Comment 3 Sidhu Alluri 2021-10-13 08:21:51 PDT
(In reply to Sidhu Alluri from comment #0)
> The following link crashes the iOS 15 Safari tab on iPhone X, 12 Mini, and
> 13. Works fine in all devices with iOS 14 or earlier.
> https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6-
> de8b8f178a8b?scene=6f3e4207
> 
> After reloading once, the tab displays "A problem repeatedly occurred on
> https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6-
> de8b8f178a8b?scene=6f3e4207"
> 
> No console errors when trying to debug.

I'm deploying a lower sized gl buffer to work around the issue but I added a query param to manually set the buffer size. The issue can still be repro'd via this link https://staging.projector.com/story/be1ca8c7-a46a-41fc-9ad6-de8b8f178a8b?scene=6f3e4207&glBufferSize=100663296
Comment 4 Radar WebKit Bug Importer 2021-10-13 17:18:52 PDT
<rdar://problem/84224687>
Comment 5 Kimmo Kinnunen 2021-10-15 07:26:33 PDT
This appears to be due to WebContent process running over memory limit. It's unclear if it's Metal or GPU process difference to iOS 14.
Comment 6 Kimmo Kinnunen 2021-10-22 07:05:59 PDT
This is mtl::Buffer instances being leaked due to buffer pool creating them unbounded
Comment 7 Kyle Piddington 2021-10-22 12:47:09 PDT
Looking into this bug: It seems there’s some trouble at work with the per-frame data. I see the initial 96 MB allocation, but not any repetition of that initial allocation.  Each frame, however, I'm seeing about 38 more MB of allocated data. 

We seem to hold on to the first few frames worth of data, and don't get around to clearing them, however, which I think is leading to this bug.
Comment 8 Kyle Piddington 2021-10-22 12:54:15 PDT
From analysis: We're creating a lot more buffer pools than we are buffers. It seems the issue isn't one buffer pool allocating too much, but too many buffer pools being created in the first place.
Comment 9 Le Hoang Quyen 2021-10-22 18:12:54 PDT
Sorry, the vertex conversion's buffer pool mechanism in the Metal back-end was originally written by me. So I did attempt to trace the Metal commands while running this website in order to know what's going on.

Here what I observed:
- The website created a single large buffer (roughly 100MB) then seemingly used many small portions of it for different draw calls:
  - portion from 0-100 for draw call 1 (only needs 10 vertices, this is an approximate number).
  - portion from 100-200 for draw call 2 (only needs 10 vertices).
  - ...
  - portion from 90000000-100000000 for draw calls n (only needs 10 vertices).

- Due to irregular offsets, many of the portions above needed vertex attribute conversions. However the conversions were inefficient because they converted the whole buffer's worth of vertices starting from given offset to the end.
  - draw call 1 allocated a conversion buffer for 90000000 vertices' attributes even though it needs only 10 vertices.
  - draw call 2 allocated a conversion buffer for 80000000 vertices' attributes even though it needs only 10 vertices.
  - ...

The problem is that the conversion buffer pool will always allocate a new sub-buffer to store the converted attribute values for a number of vertices starting from given offset to the end regardless of how many vertices the actually draw call needs. Leading to a lot of converted portions unused by any draw calls.

Currently the conversion buffers are keyed by triplet {format, offset, stride}. It doesn't take into account how many vertices needed hence the issue above happened.
The number of vertices would be included in the key i.e. {format, offset, stride, numVertices}. However, it would still lead to another problem in another use case:
- users draw using a same offset but with different number of vertices many times => many conversion buffers would be created.
Comment 10 Kimmo Kinnunen 2021-10-25 06:34:11 PDT
(In reply to Le Hoang Quyen from comment #9)
> Currently the conversion buffers are keyed by triplet {format, offset,
> stride}. It doesn't take into account how many vertices needed hence the
> issue above happened.
> The number of vertices would be included in the key i.e. {format, offset,
> stride, numVertices}. However, it would still lead to another problem in
> another use case:
> - users draw using a same offset but with different number of vertices many
> times => many conversion buffers would be created.

Thanks for the explanation! I think the general logic should be better.

However, for this particular bug, it sounds like we need to just change the look up function from `offset == wantOffset` to `offset <= wantOffset` ?
Comment 11 Le Hoang Quyen 2021-10-25 07:30:32 PDT
(In reply to Kimmo Kinnunen from comment #10)
> (In reply to Le Hoang Quyen from comment #9)
> > Currently the conversion buffers are keyed by triplet {format, offset,
> > stride}. It doesn't take into account how many vertices needed hence the
> > issue above happened.
> > The number of vertices would be included in the key i.e. {format, offset,
> > stride, numVertices}. However, it would still lead to another problem in
> > another use case:
> > - users draw using a same offset but with different number of vertices many
> > times => many conversion buffers would be created.
> 
> Thanks for the explanation! I think the general logic should be better.
> 
> However, for this particular bug, it sounds like we need to just change the
> look up function from `offset == wantOffset` to `offset <= wantOffset` ?
The above offsets are just approximate numbers. The actual numbers might not be multiples of 100. And each drawing portions might use different alignments from each other.

Yes, the lookup could be changed to something better (perhaps using `offset mod alignment` as key instead of offset). However, `offset <= wantOffset` alone doesn't work because there are several limits of Metal I noticed:
- The final offset of the vertex attribute is calculated by:
  final offset = bufferOffset + attribOffset.
  - `bufferOffset` = offset set via MTLRenderCommandEncoder.setVertexBuffer
  - `attribOffset` = MTLVertexAttributeDescriptor's offset field.

- Metal requires:
  - `bufferOffset` to be multiples of the attribute's data type's alignment.
  - `attribOffset` must be multiples of 4.
  - Furthermore, `attribOffset` + attribute's size must be less than or equal to the fetching stride.
Comment 12 Kyle Piddington 2021-10-28 17:17:28 PDT
Created attachment 442768 [details]
Patch
Comment 13 EWS Watchlist 2021-10-28 17:18:32 PDT
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE
Comment 14 Darin Adler 2021-10-28 17:31:22 PDT
Comment on attachment 442768 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=442768&action=review

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/metal/VertexArrayMtl.mm:1079
> +    VertexConversionBufferMtl * vertexConverison = (VertexConversionBufferMtl *)conversion;

Spelling error "conversion".
Comment 15 Kyle Piddington 2021-10-29 20:03:06 PDT
Created attachment 442895 [details]
Patch
Comment 16 Dean Jackson 2021-11-01 15:37:21 PDT
Comment on attachment 442895 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=442895&action=review

> Source/ThirdParty/ANGLE/src/libANGLE/renderer/metal/VertexArrayMtl.mm:1074
> +    VertexConversionBufferMtl * vertexConverison = (VertexConversionBufferMtl *)conversion;

Typo here: vertexConverison
Comment 17 Kyle Piddington 2021-11-03 12:31:09 PDT
Created attachment 443223 [details]
Patch for landing
Comment 18 EWS 2021-11-03 13:10:28 PDT
Committed r285220 (243845@main): <https://commits.webkit.org/243845@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 443223 [details].