Bug 231352

Summary: IPC should not have client side message hysteresis
Product: WebKit Reporter: Kimmo Kinnunen <kkinnunen>
Component: WebKit2Assignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: ggaren, kkinnunen, simon.fraser, webkit-bug-importer, wenson_hsieh
Priority: P2 Keywords: InRadar
Version: WebKit Local Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=231304
Bug Depends on: 239895    
Bug Blocks:    

Description Kimmo Kinnunen 2021-10-06 23:18:39 PDT
IPC should not have client side message hysteresis

Bug 231304 added the hysteresis to stream implementation.

The messages may have action at distance that is detectable via out-of-band observations.
At best, the hysteresis needs to be disabled for interfaces with such message behaviours.
At worst, this is not understood and results in hard to understand bugs.

Example:
- Current webgl would benefit from hysteresis
- Current webgl has prepareForDisplay() message
- If prepareForDisplay() is buffered due to hysteresis, no frame gets displayed.

Leaving a message in the client side buffer is poor form.

The problem the hysteresis is solving is;
- client side produces messages fast
- server side processes messages faster
- server side goes to sleep fast
- client side ends up spending significant time in semaphore notification


This should be solved for example with server side hysteresis wait:
When server runs out of messages, it waits with a deadline on the stream semaphore but does not set the "server is waiting flag".
To match the current hysteresis implementation, the wait time could be tuned to be roughly (current hysteresis item count * average time to produce 1 item).
When server does not have new messages messages after the hysteresis wait, it sets the "server is waiting flag" and waits on the semaphore.
Comment 1 Kimmo Kinnunen 2021-10-06 23:28:52 PDT
Other example is I believe future RemoteGraphicsContext::flushContext.
The sequence is
-> RemoteGrapicsContext::FlushContext

<- RemoteRenderingBackend::didFlush

If we buffer FlushContext due to hysteresis, we never get the didFlush?
Comment 2 Wenson Hsieh 2021-10-07 10:14:27 PDT
(In reply to Kimmo Kinnunen from comment #1)
> Other example is I believe future RemoteGraphicsContext::flushContext.
> The sequence is
> -> RemoteGrapicsContext::FlushContext
> 
> <- RemoteRenderingBackend::didFlush
> 
> If we buffer FlushContext due to hysteresis, we never get the didFlush?

It should be the streamable IPC client's responsibility to send any pending wakeup message in this scenario, before blocking on a sync response.

(I made sync messages do this by default, but added a method for clients to use when blocking in other scenarios)
Comment 3 Wenson Hsieh 2021-10-07 10:29:34 PDT
(In reply to Kimmo Kinnunen from comment #0)
> IPC should not have client side message hysteresis
> 
> Bug 231304 added the hysteresis to stream implementation.
> 
> The messages may have action at distance that is detectable via out-of-band
> observations.
> At best, the hysteresis needs to be disabled for interfaces with such
> message behaviours.
> At worst, this is not understood and results in hard to understand bugs.
> 
> Example:
> - Current webgl would benefit from hysteresis
> - Current webgl has prepareForDisplay() message
> - If prepareForDisplay() is buffered due to hysteresis, no frame gets
> displayed.
> 
> Leaving a message in the client side buffer is poor form.
> 
> The problem the hysteresis is solving is;
> - client side produces messages fast
> - server side processes messages faster
> - server side goes to sleep fast
> - client side ends up spending significant time in semaphore notification
> 
> 
> This should be solved for example with server side hysteresis wait:
> When server runs out of messages, it waits with a deadline on the stream
> semaphore but does not set the "server is waiting flag".
> To match the current hysteresis implementation, the wait time could be tuned
> to be roughly (current hysteresis item count * average time to produce 1
> item).
> When server does not have new messages messages after the hysteresis wait,
> it sets the "server is waiting flag" and waits on the semaphore.

This server-side, time-based hysteresis idea sounds promising — I'll give it a try as soon as I get a chance!
Comment 4 Radar WebKit Bug Importer 2021-10-13 23:19:14 PDT
<rdar://problem/84234110>
Comment 5 Geoffrey Garen 2022-04-27 10:59:19 PDT
> When server runs out of messages, it waits with a deadline on the stream semaphore but does not set the "server is waiting flag".

Is there a point to waiting on the semaphore rather than just sleeping? If the "server is waiting flag" is not set, will anybody ever signal the semaphore?

How do we prevent the deadline from creating frame latency?

> To match the current hysteresis implementation, the wait time could be tuned to be roughly (current hysteresis item count * average time to produce 1 item).

How would this tuning account for hardware differences in average time to produce 1 item?

> - If prepareForDisplay() is buffered due to hysteresis, no frame gets displayed.
> If we buffer FlushContext due to hysteresis, we never get the didFlush?

Does client-side hysteresis become viable if we simply have items like prepareForDisplay() and FlushContext carry a "terminate hysteresis and send immediately" flag with them?

I think I see the theoretical benefit to server-side hysteresis, or at least to not doing count-based hysteresis (less error-prone, messages can't get lost forever); but I also see two potential downsides:

(1) The server will always incur either extra cost or extra latency in discovering that the client has produced sufficient display list items, since doing so requires some kind of synchronization on shared memory, whereas it is virtually free for the client to discover that it has added display list items.

I suppose it is an experimental question whether this cost / latency matters in practice.

(2) I believe the speed win of client-side hysteresis is already a known quantity.
Comment 6 Kimmo Kinnunen 2022-04-27 11:43:17 PDT
(In reply to Geoffrey Garen from comment #5)
> > When server runs out of messages, it waits with a deadline on the stream semaphore but does not set the "server is waiting flag".
> 
> Is there a point to waiting on the semaphore rather than just sleeping? If
> the "server is waiting flag" is not set, will anybody ever signal the
> semaphore?

Why *would* you sleep instead of sleeping by waiting on the semaphore? If the code is basically structured around "and then we wait on the semaphore", it's natural to sleep by waiting.

If there's multiple producers on the same semaphore they will signal the semaphore. In webgl, where there is one work queue per N contexts.

In case there's a sync message, we can signal the semaphore to immediately wake up the hysteresis-sleeping server. (Until the sync message is implemented via event link)

> How do we prevent the deadline from creating frame latency?

What do you mean *frame* latency?

There's latency for buffering the messages client side.
There's latency for buffering the messages server side.

> > To match the current hysteresis implementation, the wait time could be tuned to be roughly (current hysteresis item count * average time to produce 1 item).
> 
> How would this tuning account for hardware differences in average time to
> produce 1 item?

Are you asking concretely or are you pointing out philosophically the futileness of the "average time"?

Concretely I'd start with 100ns and build an (exponential?) increasing wait with few steps. And see how that behaves.

Futility aspect is the same for choosing N for client-side items as it is for choosing X ns for server-side wait. In other words, if 100 client-side items is selected, I can ask "how the selection of the 100 items account for hardware differences"?
 
> > - If prepareForDisplay() is buffered due to hysteresis, no frame gets displayed.
> > If we buffer FlushContext due to hysteresis, we never get the didFlush?
> 
> Does client-side hysteresis become viable if we simply have items like
> prepareForDisplay() and FlushContext carry a "terminate hysteresis and send
> immediately" flag with them?

prepareForDisplay has a result so that is marked already. It would be sent immediately.

FlushContext would have a result once the interface is moving away from display list type abstraction to normal "IPC function call" abstraction.

FlushContext currently would need such a flag, as there's ad hoc reply didFlushContext that is not part of the message system specification. Flags would indicate the intent, yes in this case this would help.

Similar case we have for FinalizeRenderingUpdate-didFinalizeRenderingUpdate.

We can add flags on top of special cases. The point being, how would an average programmer who doesn't care about the IPC understand that "unless you put flag QQ to your message, it may never get delivered". When you put it like that, it does not make much sense.

If you open any .messages.in interface, there's plenty of async messages that implicitly assume forward progress, as the intended side-effect is naturally expected to happen. Peppering the code-base with "SendOptions::ReallySendItNow" doesn't sound eloquent.. 

> (1) The server will always incur either extra cost or extra latency in
> discovering that the client has produced sufficient display list items,
> since doing so requires some kind of synchronization on shared memory,
> whereas it is virtually free for the client to discover that it has added
> display list items.

I don't understand the distinction between these:
A) The server sleeps 100ns because the client doesn't signal. That's 100ns latency.
B) The server sleeps 100ns because it didn't say client should signal. That's 100ns latency.
Comment 7 Kimmo Kinnunen 2022-04-27 12:17:44 PDT
Client side buffering, less error prone could be to mark messages as “may be buffered”, eg opt in to not sending vs opt out.
Comment 8 Kimmo Kinnunen 2022-04-29 05:00:59 PDT
Did an implementation of the backoff wait for the receiver in bug 239895.

I have not tuned the numbers for a phone yet, I don't know if it's progressing anything.