Bug 107633 - CoreIPC ommits some messages when sending a lot of messages in a very short time.
Summary: CoreIPC ommits some messages when sending a lot of messages in a very short t...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Linux
: P2 Normal
Assignee: Rafael Brandao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-22 23:05 PST by Seulgi Kim
Modified: 2013-06-20 22:19 PDT (History)
12 users (show)

See Also:


Attachments
Patch (2.06 KB, patch)
2013-01-28 10:09 PST, Rafael Brandao
andersca: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Seulgi Kim 2013-01-22 23:05:19 PST
I observed CoordinatedLayerTreeHostProxy::CreateCompositingLayer message are sent but not received when a lot of messages are sent in a short time.
I tested http://black.company100.net/test/TC/leaves1000 (this site creates 1000 compositing layers) and checked that CoordinatedLayerTreeHost actually sends all messages but CoordinatedLayerTreeHostProxy doesn't receive some messages.
This test page sends 1000 messages almost simultaneously.
CoordinatedLayerTree in EFL and Qt had the same results, so I assume CoreIPC in unix has problems.
Comment 1 Rafael Brandao 2013-01-22 23:11:28 PST
Could you try to make your messages be sent right away like I did on https://bugs.webkit.org/show_bug.cgi?id=105466 to see if you're still missing any message? I know it's non-optimal to send all messages right away, but that might help to identify where's the fault, like it helped there.
Comment 2 Rafael Brandao 2013-01-25 13:07:07 PST
I could reproduce this error in a stress test. I will investigate it further now.
Comment 3 Rafael Brandao 2013-01-28 10:09:27 PST
Created attachment 185000 [details]
Patch
Comment 4 Rafael Brandao 2013-01-28 10:16:27 PST
Adding more folks to CC. What do you think of this? As much as I dislike adding a sleep, I think it's a very low cost compared to the cost of us trying to find out mysterious/random bugs on UIProcess due lack of messages.

The way CoreIPC::sendOutgoingMessages is designed silently drops messages if there's any error when we try to send it. On Mac, it is not even handled this. On Unix, I think we are lacking handling properly some socket layer errors, like EWOULDBLOCK and EAGAIN. They mean we are temporarily lacking resources to send it, but there's nothing wrong with the message itself or the socket. By falling back to sleep, I can avoid losing important messages that should later be dealt by UIProcess, causing random crashes.

In particular, the marblebox example (http://ariya.github.com/js/marblebox) crashes on Qt when there are around 100 balls bouncing around. The more, the worse it gets.
Comment 5 Rafael Brandao 2013-01-28 10:41:56 PST
The crash happens on Qt and on Nix when we lose a UpdateLayer message on Coordinated Graphics from a newly created layer (so we've got CreateLayer message). Without the update, the layer maintains the default values which makes it a 0x0 sized layer which cannot be drawn into. Then you can also receive a UpdateTile for that layer which will access a backing store for such buggy layer. The backing store, in this case, does not exist, thus we crash.

Another possibility is to lose the "DidRenderFrame" message and then we're stuck on WebProcess, always waiting for "RenderNextFrame" from UIProcess which will never happen.

I believe there are many other tricky situations we can reproduce if we cannot guarantee message delivery that are beyond Coordinated Graphics code.
Comment 6 Anders Carlsson 2013-01-28 10:55:15 PST
Comment on attachment 185000 [details]
Patch

I don't think this is the right fix - it'll possibly just result in a busy-loop if you're unlucky. There's no explanation of the underlying cause either.
Comment 7 Rafael Brandao 2013-01-28 11:23:29 PST
(In reply to comment #6)
> (From update of attachment 185000 [details])
> I don't think this is the right fix - it'll possibly just result in a busy-loop if you're unlucky. There's no explanation of the underlying cause either.

What I can tell you so far is:

"When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode. In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case." (source: http://linux.die.net/man/2/sendmsg)

In this marble example, we have many small messages to be sent on each frame painted with information of each layer's state. There's likely a layer per bouncing ball, and besides that we also send messages with new tiles when we create new balls. We might be filling all this send buffer.

Is this information helpful?
Comment 8 Rafael Brandao 2013-01-28 11:25:55 PST
There's ongoing effort to reduce the number of messages traded between UIProcess and WebProcess on Coordinated Graphics, like https://bugs.webkit.org/show_bug.cgi?id=107625 (which merges all layer creation messages into one).

Still, do you think it's relevant to address this problem in particular?
Comment 9 Anders Carlsson 2013-01-28 12:36:16 PST
(In reply to comment #7)
> (In reply to comment #6)
> > (From update of attachment 185000 [details] [details])
> > I don't think this is the right fix - it'll possibly just result in a busy-loop if you're unlucky. There's no explanation of the underlying cause either.
> 
> What I can tell you so far is:
> 
> "When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode. In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case." (source: http://linux.die.net/man/2/sendmsg)

In that case there has got to be a way to get notified when messages can be sent to the socket again, without having to busy wait. The connection work queue should never ever block in this manner.
Comment 10 Noam Rosenthal 2013-01-28 12:38:31 PST
(In reply to comment #8)
> There's ongoing effort to reduce the number of messages traded between UIProcess and WebProcess on Coordinated Graphics, like https://bugs.webkit.org/show_bug.cgi?id=107625 (which merges all layer creation messages into one).
> 
> Still, do you think it's relevant to address this problem in particular?

Yes; The other bugs may delay the problem but we still want this problem gone.
Comment 11 Noam Rosenthal 2013-01-28 13:10:20 PST
> > "When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode. In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case." (source: http://linux.die.net/man/2/sendmsg)
> 
> In that case there has got to be a way to get notified when messages can be sent to the socket again, without having to busy wait. The connection work queue should never ever block in this manner.

http://homepages.cwi.nl/~aeb/linux/man2html/man7/socket.7.html
We probably want to listen to something like SIGIO with POLLOUT (but I'm a bit rusty with my unix socket programming skillz).
Comment 12 Caio Marcelo de Oliveira Filho 2013-01-28 13:40:50 PST
(In reply to comment #10)
> (In reply to comment #8)
> > There's ongoing effort to reduce the number of messages traded between UIProcess and WebProcess on Coordinated Graphics, like https://bugs.webkit.org/show_bug.cgi?id=107625 (which merges all layer creation messages into one).

I'm working on a patch to reduce calls by compressing update information for the layers into a single message. ConnectionUnix knows how to handle big messages with attachments, but a lot of small messages seems to be filling the receiver buffer before it can read them.


> > 
> > Still, do you think it's relevant to address this problem in particular?
> 
> Yes; The other bugs may delay the problem but we still want this problem gone.

Agreed.
Comment 13 Caio Marcelo de Oliveira Filho 2013-01-31 04:45:46 PST
(In reply to comment #12)
> (In reply to comment #10)
> > (In reply to comment #8)
> > > There's ongoing effort to reduce the number of messages traded between UIProcess and WebProcess on Coordinated Graphics, like https://bugs.webkit.org/show_bug.cgi?id=107625 (which merges all layer creation messages into one).
> 
> I'm working on a patch to reduce calls by compressing update information for the layers into a single message. ConnectionUnix knows how to handle big messages with attachments, but a lot of small messages seems to be filling the receiver buffer before it can read them.

Side note: the current refactoring going on for Coordinated Graphics will give the same effect of this to us. Work is in the bug 103854.
Comment 14 Dongseong Hwang 2013-02-11 16:45:45 PST
(In reply to comment #13)
> Side note: the current refactoring going on for Coordinated Graphics will give the same effect of this to us. Work is in the bug 103854.

Yes, my team works on this.

(In reply to comment #12)
> I'm working on a patch to reduce calls by compressing update information for the layers into a single message. ConnectionUnix knows how to handle big messages with attachments, but a lot of small messages seems to be filling the receiver buffer before it can read them.

Oh, what kind of work? which level do you work: coords grpx or low level?

(In reply to comment #12)
> > > Still, do you think it's relevant to address this problem in particular?
> > 
> > Yes; The other bugs may delay the problem but we still want this problem gone.
> 
> Agreed.

I agree too. It's potential bug that can spend a lot of time of other developers like me.