GPUBuffers created with GPUDevice.createBufferMapped are not CPU-accessible if they are not created with MAP_WRITE or MAP_READ usage. This necessitates a copy of written data to a CPU/GPU shared MTLBuffer, before copying this staging buffer to the intended destination on the GPU, during the call to GPUBuffer.unmap. Since MTLBuffer creation is expensive, we can cache reasonably-sized (TBD size) staging buffers for re-use on the GPUDevice. A GPUBuffer can re-use the device's staging buffer for its unmap call, or create a new one if one does not exist or is too small. If the staging buffer is not used within a certain time (TBD, i.e. ~10 frames), it is released. Bonus points: implement a (sorted?) LRU-cache of different-sized staging buffers and use the smallest one that will suffice. This would release unusually large staging buffers earlier, but require more storage overall.
<rdar://problem/51539224>
<rdar://problem/51539225>
Small addition: Specify MTLResourceCPUCacheModeWriteCombined in the creation option for these staging buffers, as the CPU never reads from them.