Bug 223732 - [GPU Process]: Improve getImageData() perf part 2: Use shared memory and a semaphore
Summary: [GPU Process]: Improve getImageData() perf part 2: Use shared memory and a se...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Canvas (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Myles C. Maxfield
URL:
Keywords: InRadar
Depends on:
Blocks: 220649
  Show dependency treegraph
 
Reported: 2021-03-25 00:27 PDT by Myles C. Maxfield
Modified: 2021-03-30 00:54 PDT (History)
9 users (show)

See Also:


Attachments
Patch (17.76 KB, patch)
2021-03-25 00:30 PDT, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Patch (17.84 KB, patch)
2021-03-25 01:57 PDT, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Patch (17.89 KB, patch)
2021-03-25 14:00 PDT, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Patch (18.16 KB, patch)
2021-03-25 17:18 PDT, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Patch (18.16 KB, patch)
2021-03-25 17:19 PDT, Myles C. Maxfield
rniwa: review+
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Updated (23.68 KB, patch)
2021-03-26 17:50 PDT, Myles C. Maxfield
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Updated (24.26 KB, patch)
2021-03-26 21:00 PDT, Myles C. Maxfield
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Myles C. Maxfield 2021-03-25 00:27:32 PDT
[GPU Process]: Improve getImageData() perf part 2: Use shared memory and a semaphore
Comment 1 Myles C. Maxfield 2021-03-25 00:30:09 PDT
Created attachment 424220 [details]
Patch
Comment 2 Myles C. Maxfield 2021-03-25 01:57:16 PDT
Created attachment 424222 [details]
Patch
Comment 3 Myles C. Maxfield 2021-03-25 14:00:10 PDT
Created attachment 424277 [details]
Patch
Comment 4 Myles C. Maxfield 2021-03-25 16:59:13 PDT
42% progression on MotionMark Images on Mac
Comment 5 Myles C. Maxfield 2021-03-25 17:18:58 PDT
Created attachment 424304 [details]
Patch
Comment 6 Myles C. Maxfield 2021-03-25 17:19:59 PDT
Created attachment 424305 [details]
Patch
Comment 7 Simon Fraser (smfr) 2021-03-25 19:26:53 PDT
Comment on attachment 424305 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424305&action=review

> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:84
> +                memcpy(getImageDataSharedMemory->data(), imageData->data()->data(), imageData->data()->byteLength());

I think this needs to RELEASE_ASSERT() on the size of getImageDataSharedMemory.

> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:162
> +    }

Does m_getImageDataSharedMemory ever go away? It's going to just get larger and larger.
Comment 8 Ryosuke Niwa 2021-03-25 19:36:05 PDT
Comment on attachment 424305 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424305&action=review

> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:82
> +            auto getImageDataSharedMemory = m_remoteRenderingBackend.getImageDataSharedMemory();

Maybe just sharedMemory is sufficient since it's pretty obvious we're dealing with GetImageData here.

> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:86
> +            else {
> +                if (getImageDataSharedMemory)

Why not else if?

> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:328
> +    ASSERT(!m_getImageDataSharedMemory || byteCount > m_getImageDataSharedMemory->size());
> +    m_getImageDataSharedMemory = SharedMemory::allocate(byteCount);

We should probably limit byteCount at some size, say, 32MB.

> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.h:82
> +    RefPtr<SharedMemory> getImageDataSharedMemory() { return m_getImageDataSharedMemory; }

Maybe sharedMemoryForGetImageData?

> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.h:121
> +    void sharedMemoryForGetImageData(size_t byteCount, CompletionHandler<void(const SharedMemory::IPCHandle&)>&&);

Maybe updateSharedMemoryForGetImageData?
The crucial thing here is that we want the new size, right?

> Source/WebKit/WebProcess/GPU/graphics/RemoteImageBufferProxy.h:201
> +        SharedMemory* sharedMemory = m_remoteRenderingBackendProxy->maybePrepareSharedMemoryForGetImageData(dataSize);

Why not just sharedMemoryForGetImageData?
Comment 9 Kimmo Kinnunen 2021-03-26 00:52:37 PDT
Comment on attachment 424305 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424305&action=review

It probably works very well, so just some free to ignore bikeshedding related to beautyness:

Your dialogue of operation is:

Client: -  Create me a semaphore.
Server: - Here is a semaphore 7.
Client: - Got it. Create me a buffer size of 8877.
Server: - Here is a buffer size of 8877.
Client: - Got it.


I find it a bit error prone and confusing, as it always prompts the question: if the caller wants a buffer, why doesn't it just.. alloc it? And there's some inefficiency like creating excess temp semaphores but that's even more minor.

The option would be to implement the dialogue like this:
Client: - Here is semaphore 77 and buffer the size of 8877.  Report to me when you got those.
Server: - Done.
Client: - Got it.

This also, in theory can improve the memory attribution, if we consider the allocator typically the entity wanting something to happen and as such the entity who is responsible for the allocation.

It's slightly unfortunate that the "display list" is disconnected from the normal messaging so you need that manual synchronousness. If the drawing commands were part of the pic, there'd be no need for that. Of course, there's probably zero parallelism lost in this case.

Other very small thing is that the mode is like this:
1) allocate buffer
2) replace old buffer

I don't think in this scenario you can do anything with the old buffer if your step 1) fails.

So on buffer resize you have two buffers in flight. The alternative could be
1) free old buffer
2) allocate new buffer

>> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:84
>> +                memcpy(getImageDataSharedMemory->data(), imageData->data()->data(), imageData->data()->byteLength());
> 
> I think this needs to RELEASE_ASSERT() on the size of getImageDataSharedMemory.

Yeah, it's a buffer overflow if the caller lies about the size. I wouldn't call it RELEASE_ASSERT as it's normal that the caller lies, so it should just act like so. E.g. close the connection or ignore the message.

> Source/WebKit/WebProcess/GPU/graphics/RemoteImageBufferProxy.h:210
> +        memcpy(imageData->data()->data(), sharedMemory->data(), dataSize);

I guess you considered the failure case memset here, but then that was be blocked on lack of the API in Semaphore.. 
It's a bit unfortunate. (The IPC Stream implementation detects the timeout by other means by semi-accident)

> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:154
> +        sendSync(Messages::RemoteRenderingBackend::SemaphoreForGetImageData(), Messages::RemoteRenderingBackend::SemaphoreForGetImageData::Reply(semaphore), m_renderingBackendIdentifier, 1_s);

this sync message can fail, though the below assignment is correct in this case.

> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:160
> +        sendSync(Messages::RemoteRenderingBackend::SharedMemoryForGetImageData(dataSize), Messages::RemoteRenderingBackend::SharedMemoryForGetImageData::Reply(handle), m_renderingBackendIdentifier, 1_s);

This sync message can fail, below assignment probably is not correct in this case.
Comment 10 Kimmo Kinnunen 2021-03-26 00:59:23 PDT
Comment on attachment 424305 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424305&action=review

> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:171
> +    m_getImageDataSemaphore->waitFor(5_s);

Another bike shedding very small source of confusion is:
Some of the messages timeout in 1_s, some in 5_s prompting some questions.
There's the IPC::Timeout which could be part of all your blocking function signatures. This way you'd be able to establish a top-level timeout and then it'd be easy to understand that your top-level function completes roughly in this time, regardless of if error handling has been forgotten in some sync sends and stuff.
Comment 11 Ryosuke Niwa 2021-03-26 01:07:35 PDT
Comment on attachment 424305 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424305&action=review

>> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:162
>> +    }
> 
> Does m_getImageDataSharedMemory ever go away? It's going to just get larger and larger.

Pretty much. We need to clear with some timer or at least when we receive a memory pressure warning.
Comment 12 Ryosuke Niwa 2021-03-26 01:25:22 PDT
(In reply to Kimmo Kinnunen from comment #9)
> Comment on attachment 424305 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=424305&action=review
> 
> It probably works very well, so just some free to ignore bikeshedding
> related to beautyness:
> 
> Your dialogue of operation is:
> 
> Client: -  Create me a semaphore.
> Server: - Here is a semaphore 7.
> Client: - Got it. Create me a buffer size of 8877.
> Server: - Here is a buffer size of 8877.
> Client: - Got it.
> 
> I find it a bit error prone and confusing, as it always prompts the
> question: if the caller wants a buffer, why doesn't it just.. alloc it? And
> there's some inefficiency like creating excess temp semaphores but that's
> even more minor.
> 
> The option would be to implement the dialogue like this:
> Client: - Here is semaphore 77 and buffer the size of 8877.  Report to me
> when you got those.
> Server: - Done.
> Client: - Got it.

We don't want to allocate this buffer in WebContent process because then WebContent can dictate whether it's writable by WebContent or not. It's also important GPU Process can guarantee itself of a necessary buffer size. We don't want to be relying on WebContent to create & report the buffer of a right size.

> Other very small thing is that the mode is like this:
> 1) allocate buffer
> 2) replace old buffer
> 
> I don't think in this scenario you can do anything with the old buffer if
> your step 1) fails.
> 
> So on buffer resize you have two buffers in flight. The alternative could be
> 1) free old buffer
> 2) allocate new buffer

This is a good point. We should probably just clear the reference to the shared memory region before asking for the data.

> >> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:84
> >> +                memcpy(getImageDataSharedMemory->data(), imageData->data()->data(), imageData->data()->byteLength());
> > 
> > I think this needs to RELEASE_ASSERT() on the size of getImageDataSharedMemory.
> 
> Yeah, it's a buffer overflow if the caller lies about the size. I wouldn't
> call it RELEASE_ASSERT as it's normal that the caller lies, so it should
> just act like so. E.g. close the connection or ignore the message.

Yeah, sorry, I missed this. This needs be a message check indeed.
Comment 13 Kimmo Kinnunen 2021-03-26 07:49:03 PDT
(In reply to Ryosuke Niwa from comment #12)
> We don't want to allocate this buffer in WebContent process because then
> WebContent can dictate whether it's writable by WebContent or not.

The GPU process should not care if the WebProcess overwrites the pixel data area that the web process asks GPU Process to use. If WebProcess wants to trash the pixel data, it should be free to do so.

>  It's also
> important GPU Process can guarantee itself of a necessary buffer size. We
> don't want to be relying on WebContent to create & report the buffer of a
> right size.

This does not make sense to me. IIRC plenty of already used cases of the pattern where the WebContent process allocates the memory. As far as I understand there is additional security gained by the GPU Process allocating the memory what so ever.

The OS will tell the GPU Process how big the SHM area *really* is. Again: WebKit already relies upon this -- in fact, the last time I looked the very implementation of RRB display lists does this, e.g. the same class we are now talking about adding stuff to.
Comment 14 Kimmo Kinnunen 2021-03-26 07:50:25 PDT
(In reply to Kimmo Kinnunen from comment #13)
> As far as I understand there is additional security gained by the GPU Process allocating
> the memory what so ever.

This should read "As far as I understand there is *no* additional  ..."
Comment 15 Kimmo Kinnunen 2021-03-26 08:12:18 PDT
.. where as when the web content process allocates the SHM area, there's a tiny security benefit that you don't need to have arbitrary sanity limit on the buffer size. If the process is able to allocate 1TB bitmap, then it's probably good to go. If that's a no-go, then the memory limits have expressed this by denying the allocation from the real culprit. (see the comment about limiting the allocation to 32mb)
Comment 16 Myles C. Maxfield 2021-03-26 13:36:45 PDT
> Your dialogue of operation is:
> 
> Client: -  Create me a semaphore.
> Server: - Here is a semaphore 7.
> Client: - Got it. Create me a buffer size of 8877.
> Server: - Here is a buffer size of 8877.
> Client: - Got it.
> 
> 
> I find it a bit error prone and confusing, as it always prompts the
> question: if the caller wants a buffer, why doesn't it just.. alloc it? And
> there's some inefficiency like creating excess temp semaphores but that's
> even more minor.
> 
> The option would be to implement the dialogue like this:
> Client: - Here is semaphore 77 and buffer the size of 8877.  Report to me
> when you got those.
> Server: - Done.
> Client: - Got it.

and
> We don't want to allocate this buffer in WebContent process because then
> WebContent can dictate whether it's writable by WebContent or not.

and
> If WebProcess wants to
> trash the pixel data, it should be free to do so.

The web content will be reading this shmem and the GPU Process will be writing it. I do understand the principle of "least access" where access is locked down to the minimum necessary to use the resource. On the other hand, I do agree with Kimmo's simplicity argument, and I think Kimmo's rebuttal ("trash the pixel data") is convincing. I think the benefits of Kimmo's design outweigh the potential security risk of having the web process able to write into the shmem.

> It's also
> important GPU Process can guarantee itself of a necessary buffer size. We
> don't want to be relying on WebContent to create & report the buffer of a
> right size.

and
> The OS will tell the GPU Process how big the SHM area *really* is.

In this patch, there is no "size" argument passed separately from the IPCHandle, so the web process can't lie about how bit the shmem is. Both processes are going to have to do OOB checks, so I don't think these kinds of OOB security problems exist with Kimmo's alternate design.
Comment 17 Myles C. Maxfield 2021-03-26 13:39:59 PDT
> Client: - Here is semaphore 77 and buffer the size of 8877.

There will have to be separate messages for these, because the buffer size is sensitive to image size, but the semaphore isn't sensitive to anything. So there will only be one time when the semaphore travels across the IPC boundary, but there might be many times that a shmem travels across the IPC boundary - if, for example, the GPUP calls getImageData() on progressively larger and larger images.

However, we might want a _third_ message which transfers both at the same time, just so we don't need two sync IPC calls in a row if both objects are necessary.
Comment 18 Myles C. Maxfield 2021-03-26 13:48:18 PDT
(In reply to Myles C. Maxfield from comment #16)
> In this patch, there is no "size" argument passed separately from the
> IPCHandle, so the web process can't lie about how bit the shmem is.

Wait a minute, I think this is false.

void SharedMemory::IPCHandle::encode(IPC::Encoder& encoder) const
{
    encoder << static_cast<uint64_t>(handle.m_size);
    encoder << dataSize;
    encoder << IPC::MachPort(handle.m_port, MACH_MSG_TYPE_MOVE_SEND);
    handle.m_port = MACH_PORT_NULL;
}

bool SharedMemory::IPCHandle::decode(IPC::Decoder& decoder, IPCHandle& ipcHandle)
{
    ASSERT(!ipcHandle.handle.m_port);
    ASSERT(!ipcHandle.handle.m_size);

    SharedMemory::Handle handle;

    uint64_t bufferSize;
    if (!decoder.decode(bufferSize))
        return false;

    uint64_t dataLength;
    if (!decoder.decode(dataLength))
        return false;

    // SharedMemory::Handle::size() is rounded up to the nearest page.
    if (dataLength > bufferSize)
        return false;

    IPC::MachPort machPort;
    if (!decoder.decode(machPort))
        return false;
    
    handle.m_size = bufferSize;
    handle.m_port = machPort.port();
    ipcHandle.handle = WTFMove(handle);
    ipcHandle.dataSize = dataLength;
    return true;
}

and

RefPtr<SharedMemory> SharedMemory::map(const Handle& handle, Protection protection)
{
    ...
    kern_return_t kr = mach_vm_map(mach_task_self(), &mappedAddress, handle.m_size, 0, VM_FLAGS_ANYWHERE, handle.m_port, 0, false, vmProtection, vmProtection, VM_INHERIT_NONE);
    ...
}

It looks like the size data *is* transferred out-of-band. So a compromised web process could tell the GPU Process that a shmem area is huge when it actually isn't, and then run a getImageData which causes the GPU Process to write into a "huge" shmem, thereby causing it to write off the end.

Because of this, I don't think the web process can send the shmem to the GPU Process.
Comment 19 Myles C. Maxfield 2021-03-26 15:22:28 PDT
Comment on attachment 424305 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424305&action=review

>> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:171
>> +    m_getImageDataSemaphore->waitFor(5_s);
> 
> Another bike shedding very small source of confusion is:
> Some of the messages timeout in 1_s, some in 5_s prompting some questions.
> There's the IPC::Timeout which could be part of all your blocking function signatures. This way you'd be able to establish a top-level timeout and then it'd be easy to understand that your top-level function completes roughly in this time, regardless of if error handling has been forgotten in some sync sends and stuff.

I'm having trouble understanding what you mean here. Do you think you could write what you're describing in code?
Comment 20 Myles C. Maxfield 2021-03-26 17:50:58 PDT
Created attachment 424427 [details]
Updated
Comment 21 Myles C. Maxfield 2021-03-26 21:00:07 PDT
Created attachment 424438 [details]
Updated
Comment 22 Ryosuke Niwa 2021-03-27 02:59:55 PDT
(In reply to Myles C. Maxfield from comment #18)
> (In reply to Myles C. Maxfield from comment #16)
> > In this patch, there is no "size" argument passed separately from the
> > IPCHandle, so the web process can't lie about how bit the shmem is.
> 
> Wait a minute, I think this is false.
...
> and
...
> It looks like the size data *is* transferred out-of-band. So a compromised
> web process could tell the GPU Process that a shmem area is huge when it
> actually isn't, and then run a getImageData which causes the GPU Process to
> write into a "huge" shmem, thereby causing it to write off the end.

Luckily, mach_vm_map will check the size for you and will fail if we give it a larger size than the actual mapping region :) However, we still don't want to let WebContent give us some memory region we write into in GetImageData.

(In reply to Kimmo Kinnunen from comment #13)
> (In reply to Ryosuke Niwa from comment #12)
> > We don't want to allocate this buffer in WebContent process because then
> > WebContent can dictate whether it's writable by WebContent or not.
> 
> The GPU process should not care if the WebProcess overwrites the pixel data
> area that the web process asks GPU Process to use. If WebProcess wants to
> trash the pixel data, it should be free to do so.

In this patch, yes because we're using memcpy. However, a logical next step is to avoid creating a temporary ImageData object by directly copying from IOSurface into this shared memory. When we do that, we'd be handing this piece of memory to vImage and that's gonna tell GPU to DMA into this memory region in some cases. We really don't want to risk any other process trying to write into this memory region meanwhile. It's possible we have some safe guard somewhere in kernel, GPU driver, etc... but it's simply not something we should be risking for a very marginal benefit of being able to allocate this memory region in WebContent process.

> The OS will tell the GPU Process how big the SHM area *really* is. Again:
> WebKit already relies upon this -- in fact, the last time I looked the very
> implementation of RRB display lists does this, e.g. the same class we are
> now talking about adding stuff to.

We should probably reexamine that. Most notably, it's dangerous to let WebContent allocate a shared memory of arbitrary size & map it in GPU process because that would make the address of such a memory location in GPU process more predictable, potentially bypassing ASLR. In fact, we probably want to limit both the size of each region as well as the number of shared memory each WebContent process is allowed to share with GPU Process.
Comment 23 Myles C. Maxfield 2021-03-28 22:33:51 PDT
I'll refrain from committing this until we resolve the discussion above.
Comment 24 Kimmo Kinnunen 2021-03-29 00:56:16 PDT
(In reply to Myles C. Maxfield from comment #19)
> Comment on attachment 424305 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=424305&action=review
> 
> >> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:171
> >> +    m_getImageDataSemaphore->waitFor(5_s);
> > 
> > Another bike shedding very small source of confusion is:
> > Some of the messages timeout in 1_s, some in 5_s prompting some questions.
> > There's the IPC::Timeout which could be part of all your blocking function signatures. This way you'd be able to establish a top-level timeout and then it'd be easy to understand that your top-level function completes roughly in this time, regardless of if error handling has been forgotten in some sync sends and stuff.
> 
> I'm having trouble understanding what you mean here. Do you think you could
> write what you're describing in code?

BufferProxy::getImageData() {
  IPC::Timeout timeout = 5_s;
  backendProxy->sharedMemoryForGetImageData(..., timeout);

  ... // Hopefully someday you'd pass the same timeout here, but the code is already there to do the other thingy.

  backendProxy->waitForGetImageDataToComplete(..., timeout);
}
  

BackendProxy::sharedMemoryForGetImageData(size_t dataSize, IPC::Timeout timeout) {
  if  (...)
     sendSync(..., timeout);
}

BackendProxy::waitForGetImageDataToComplete(IPC::Timeout timeout) {
  semaphore->wait(timeout);
}

IPC::Timeout is intended to support consistent timeout handling in logic that does compound IPC.


> It looks like the size data *is* transferred out-of-band. So a compromised web process could tell the GPU Process that a shmem area is huge when it actually isn't, and then run a getImageData which causes the GPU Process to write into a "huge" shmem, thereby causing it to write off the end.
> Because of this, I don't think the web process can send the shmem to the GPU Process.

It would appear with cursory reading, but this is not the correct reading.

SharedMemory::IPCHandle {
  Handle handle;  // "This is what I, the untrusted bandit, tell you to use as the shared memory contents"
  uint64_t dataSize { 0 }; // "This is the size of the data that I, the untrusted bandit, stored in that memory area."
};

So notably: WebContent process says: 
"This is the size of the data I put to the shared memory. But remember: I could be lying.". 
By extension, WebContent process says:
"I did not say anything about the size of the shared memory area. You should figure that out yourself."

And as you can read the code, this is _expected_ to be handled, by anybody who handles the IPC::SharedMemory::IPCHandle.

I do think talking about MachPorts and SharedMemory::IPCHandles is wrong in the IPC layer. I fixed similar problem already for IPC::Semaphore, but did not have time to fix it for the IPCHandle nor IOSurface. It is another discussion which this shouldn't divert to.
Comment 25 Ryosuke Niwa 2021-03-29 01:13:15 PDT
(In reply to Kimmo Kinnunen from comment #24)
>
> > It looks like the size data *is* transferred out-of-band. So a compromised web process could tell the GPU Process that a shmem area is huge when it actually isn't, and then run a getImageData which causes the GPU Process to write into a "huge" shmem, thereby causing it to write off the end.
> > Because of this, I don't think the web process can send the shmem to the GPU Process.
> 
> It would appear with cursory reading, but this is not the correct reading.
...
> And as you can read the code, this is _expected_ to be handled, by anybody
> who handles the IPC::SharedMemory::IPCHandle.
> 
> I do think talking about MachPorts and SharedMemory::IPCHandles is wrong in
> the IPC layer. I fixed similar problem already for IPC::Semaphore, but did
> not have time to fix it for the IPCHandle nor IOSurface. It is another
> discussion which this shouldn't divert to.

As I noted above, the kernel will verify that the size to be mapped is smaller than the size of the actual memory region being shared with a given process. Since Handle::m_size is what we call mach_vm_map, and SharedMemory::IPCHandle::decode will fail if Handle::m_size < IPCHandle::dataSize, IPCHandle::dataSize can never be bigger than the actual memory region being shared.
Comment 26 Kimmo Kinnunen 2021-03-29 01:19:43 PDT
(In reply to Ryosuke Niwa from comment #22)
> In this patch, yes because we're using memcpy. However, a logical next step
> is to avoid creating a temporary ImageData object by directly copying from
> IOSurface into this shared memory. When we do that, we'd be handing this
> piece of memory to vImage and that's gonna tell GPU to DMA into this memory
> region in some cases. We really don't want to risk any other process trying
> to write into this memory region meanwhile. It's possible we have some safe
> guard somewhere in kernel, GPU driver, etc... but it's simply not something
> we should be risking for a very marginal benefit of being able to allocate
> this memory region in WebContent process.

Sure.. It is a bit hard to give reviews though if the design must be based on:
- Something hypothetical threat which isn't really defined
- Something that might happen in the future
- Something that might or might not be a problem in any case (If there's a security bug in the kernel, does it make a difference the memory is mapped RO or RW in the web process?)

OTOH, nobody did ask me for a review, so in that sense it might be a non-issue that the review would be hard to give..

> > The OS will tell the GPU Process how big the SHM area *really* is. Again:
> > WebKit already relies upon this -- in fact, the last time I looked the very
> 
> We should probably reexamine that. Most notably, it's dangerous to let
> WebContent allocate a shared memory of arbitrary size & map it in GPU
> process because that would make the address of such a memory location in GPU
> process more predictable, potentially bypassing ASLR. In fact, we probably
> want to limit both the size of each region as well as the number of shared
> memory each WebContent process is allowed to share with GPU Process.

Sure.. I just don't understand how is this justified in this process. 
I think WebProcess has been able to do this for the existence of WebKit2. Just in this review it's now a problem?
I'm not sure in that above description there's very concrete steps of what is the actual, concrete problem.
What is the mechanism that makes mapping shm more predictable than mapping anonymous memory as shareable?

Anyhow, I don't question that these issues might exist -- I definitively don't know much about the system.

I just question a bit the process where sometimes there's rush to push code in one way, but then when the reviewer changes, suddenly some other "possiblys" and "not worth the risks" evaluation comes into play the other direction.

If WebProcess shouldn't pass shared memory to other processes, why is the feature there in the first place? Why is it used in many places?
Comment 27 Myles C. Maxfield 2021-03-29 14:15:49 PDT
I did discover while working on this patch that the uint64_t dataSize member of IPCHandle is necessary to use. If you map a region that is smaller than the page size and then you try to read/write a byte that is inside the page but past the region, the process crashes.
Comment 28 Myles C. Maxfield 2021-03-29 14:49:50 PDT
Committed r275180 (235881@main): <https://commits.webkit.org/235881@main>
Comment 29 Radar WebKit Bug Importer 2021-03-29 14:50:18 PDT
<rdar://problem/75974323>
Comment 30 Kimmo Kinnunen 2021-03-30 00:54:22 PDT
Myles, the longIPCTimeout/shortIPCTimeout wasn't at all what the IPC::Timeout was intended for..