<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>107633</bug_id>
          
          <creation_ts>2013-01-22 23:05:19 -0800</creation_ts>
          <short_desc>CoreIPC ommits some messages when sending a lot of messages in a very short time.</short_desc>
          <delta_ts>2013-06-20 22:19:50 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>WebKit2</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Seulgi Kim">dev</reporter>
          <assigned_to name="Rafael Brandao">rafael.lobo</assigned_to>
          <cc>andersca</cc>
    
    <cc>cmarcelo</cc>
    
    <cc>dev</cc>
    
    <cc>dongseong.hwang</cc>
    
    <cc>jturcotte</cc>
    
    <cc>kbalazs</cc>
    
    <cc>kenneth</cc>
    
    <cc>kimmo.t.kinnunen</cc>
    
    <cc>mrobinson</cc>
    
    <cc>noam</cc>
    
    <cc>rafael.lobo</cc>
    
    <cc>sergio</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>813873</commentid>
    <comment_count>0</comment_count>
    <who name="Seulgi Kim">dev</who>
    <bug_when>2013-01-22 23:05:19 -0800</bug_when>
    <thetext>I observed CoordinatedLayerTreeHostProxy::CreateCompositingLayer message are sent but not received when a lot of messages are sent in a short time.
I tested http://black.company100.net/test/TC/leaves1000 (this site creates 1000 compositing layers) and checked that CoordinatedLayerTreeHost actually sends all messages but CoordinatedLayerTreeHostProxy doesn&apos;t receive some messages.
This test page sends 1000 messages almost simultaneously.
CoordinatedLayerTree in EFL and Qt had the same results, so I assume CoreIPC in unix has problems.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>813878</commentid>
    <comment_count>1</comment_count>
    <who name="Rafael Brandao">rafael.lobo</who>
    <bug_when>2013-01-22 23:11:28 -0800</bug_when>
    <thetext>Could you try to make your messages be sent right away like I did on https://bugs.webkit.org/show_bug.cgi?id=105466 to see if you&apos;re still missing any message? I know it&apos;s non-optimal to send all messages right away, but that might help to identify where&apos;s the fault, like it helped there.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>816860</commentid>
    <comment_count>2</comment_count>
    <who name="Rafael Brandao">rafael.lobo</who>
    <bug_when>2013-01-25 13:07:07 -0800</bug_when>
    <thetext>I could reproduce this error in a stress test. I will investigate it further now.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>817839</commentid>
    <comment_count>3</comment_count>
      <attachid>185000</attachid>
    <who name="Rafael Brandao">rafael.lobo</who>
    <bug_when>2013-01-28 10:09:27 -0800</bug_when>
    <thetext>Created attachment 185000
Patch</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>817850</commentid>
    <comment_count>4</comment_count>
    <who name="Rafael Brandao">rafael.lobo</who>
    <bug_when>2013-01-28 10:16:27 -0800</bug_when>
    <thetext>Adding more folks to CC. What do you think of this? As much as I dislike adding a sleep, I think it&apos;s a very low cost compared to the cost of us trying to find out mysterious/random bugs on UIProcess due lack of messages.

The way CoreIPC::sendOutgoingMessages is designed silently drops messages if there&apos;s any error when we try to send it. On Mac, it is not even handled this. On Unix, I think we are lacking handling properly some socket layer errors, like EWOULDBLOCK and EAGAIN. They mean we are temporarily lacking resources to send it, but there&apos;s nothing wrong with the message itself or the socket. By falling back to sleep, I can avoid losing important messages that should later be dealt by UIProcess, causing random crashes.

In particular, the marblebox example (http://ariya.github.com/js/marblebox) crashes on Qt when there are around 100 balls bouncing around. The more, the worse it gets.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>817865</commentid>
    <comment_count>5</comment_count>
    <who name="Rafael Brandao">rafael.lobo</who>
    <bug_when>2013-01-28 10:41:56 -0800</bug_when>
    <thetext>The crash happens on Qt and on Nix when we lose a UpdateLayer message on Coordinated Graphics from a newly created layer (so we&apos;ve got CreateLayer message). Without the update, the layer maintains the default values which makes it a 0x0 sized layer which cannot be drawn into. Then you can also receive a UpdateTile for that layer which will access a backing store for such buggy layer. The backing store, in this case, does not exist, thus we crash.

Another possibility is to lose the &quot;DidRenderFrame&quot; message and then we&apos;re stuck on WebProcess, always waiting for &quot;RenderNextFrame&quot; from UIProcess which will never happen.

I believe there are many other tricky situations we can reproduce if we cannot guarantee message delivery that are beyond Coordinated Graphics code.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>817873</commentid>
    <comment_count>6</comment_count>
      <attachid>185000</attachid>
    <who name="Anders Carlsson">andersca</who>
    <bug_when>2013-01-28 10:55:15 -0800</bug_when>
    <thetext>Comment on attachment 185000
Patch

I don&apos;t think this is the right fix - it&apos;ll possibly just result in a busy-loop if you&apos;re unlucky. There&apos;s no explanation of the underlying cause either.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>817907</commentid>
    <comment_count>7</comment_count>
    <who name="Rafael Brandao">rafael.lobo</who>
    <bug_when>2013-01-28 11:23:29 -0800</bug_when>
    <thetext>(In reply to comment #6)
&gt; (From update of attachment 185000 [details])
&gt; I don&apos;t think this is the right fix - it&apos;ll possibly just result in a busy-loop if you&apos;re unlucky. There&apos;s no explanation of the underlying cause either.

What I can tell you so far is:

&quot;When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode. In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case.&quot; (source: http://linux.die.net/man/2/sendmsg)

In this marble example, we have many small messages to be sent on each frame painted with information of each layer&apos;s state. There&apos;s likely a layer per bouncing ball, and besides that we also send messages with new tiles when we create new balls. We might be filling all this send buffer.

Is this information helpful?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>817910</commentid>
    <comment_count>8</comment_count>
    <who name="Rafael Brandao">rafael.lobo</who>
    <bug_when>2013-01-28 11:25:55 -0800</bug_when>
    <thetext>There&apos;s ongoing effort to reduce the number of messages traded between UIProcess and WebProcess on Coordinated Graphics, like https://bugs.webkit.org/show_bug.cgi?id=107625 (which merges all layer creation messages into one).

Still, do you think it&apos;s relevant to address this problem in particular?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>817994</commentid>
    <comment_count>9</comment_count>
    <who name="Anders Carlsson">andersca</who>
    <bug_when>2013-01-28 12:36:16 -0800</bug_when>
    <thetext>(In reply to comment #7)
&gt; (In reply to comment #6)
&gt; &gt; (From update of attachment 185000 [details] [details])
&gt; &gt; I don&apos;t think this is the right fix - it&apos;ll possibly just result in a busy-loop if you&apos;re unlucky. There&apos;s no explanation of the underlying cause either.
&gt; 
&gt; What I can tell you so far is:
&gt; 
&gt; &quot;When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode. In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case.&quot; (source: http://linux.die.net/man/2/sendmsg)

In that case there has got to be a way to get notified when messages can be sent to the socket again, without having to busy wait. The connection work queue should never ever block in this manner.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>817999</commentid>
    <comment_count>10</comment_count>
    <who name="Noam Rosenthal">noam</who>
    <bug_when>2013-01-28 12:38:31 -0800</bug_when>
    <thetext>(In reply to comment #8)
&gt; There&apos;s ongoing effort to reduce the number of messages traded between UIProcess and WebProcess on Coordinated Graphics, like https://bugs.webkit.org/show_bug.cgi?id=107625 (which merges all layer creation messages into one).
&gt; 
&gt; Still, do you think it&apos;s relevant to address this problem in particular?

Yes; The other bugs may delay the problem but we still want this problem gone.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>818022</commentid>
    <comment_count>11</comment_count>
    <who name="Noam Rosenthal">noam</who>
    <bug_when>2013-01-28 13:10:20 -0800</bug_when>
    <thetext>&gt; &gt; &quot;When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode. In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case.&quot; (source: http://linux.die.net/man/2/sendmsg)
&gt; 
&gt; In that case there has got to be a way to get notified when messages can be sent to the socket again, without having to busy wait. The connection work queue should never ever block in this manner.

http://homepages.cwi.nl/~aeb/linux/man2html/man7/socket.7.html
We probably want to listen to something like SIGIO with POLLOUT (but I&apos;m a bit rusty with my unix socket programming skillz).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>818055</commentid>
    <comment_count>12</comment_count>
    <who name="Caio Marcelo de Oliveira Filho">cmarcelo</who>
    <bug_when>2013-01-28 13:40:50 -0800</bug_when>
    <thetext>(In reply to comment #10)
&gt; (In reply to comment #8)
&gt; &gt; There&apos;s ongoing effort to reduce the number of messages traded between UIProcess and WebProcess on Coordinated Graphics, like https://bugs.webkit.org/show_bug.cgi?id=107625 (which merges all layer creation messages into one).

I&apos;m working on a patch to reduce calls by compressing update information for the layers into a single message. ConnectionUnix knows how to handle big messages with attachments, but a lot of small messages seems to be filling the receiver buffer before it can read them.


&gt; &gt; 
&gt; &gt; Still, do you think it&apos;s relevant to address this problem in particular?
&gt; 
&gt; Yes; The other bugs may delay the problem but we still want this problem gone.

Agreed.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>821391</commentid>
    <comment_count>13</comment_count>
    <who name="Caio Marcelo de Oliveira Filho">cmarcelo</who>
    <bug_when>2013-01-31 04:45:46 -0800</bug_when>
    <thetext>(In reply to comment #12)
&gt; (In reply to comment #10)
&gt; &gt; (In reply to comment #8)
&gt; &gt; &gt; There&apos;s ongoing effort to reduce the number of messages traded between UIProcess and WebProcess on Coordinated Graphics, like https://bugs.webkit.org/show_bug.cgi?id=107625 (which merges all layer creation messages into one).
&gt; 
&gt; I&apos;m working on a patch to reduce calls by compressing update information for the layers into a single message. ConnectionUnix knows how to handle big messages with attachments, but a lot of small messages seems to be filling the receiver buffer before it can read them.

Side note: the current refactoring going on for Coordinated Graphics will give the same effect of this to us. Work is in the bug 103854.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>830329</commentid>
    <comment_count>14</comment_count>
    <who name="Dongseong Hwang">dongseong.hwang</who>
    <bug_when>2013-02-11 16:45:45 -0800</bug_when>
    <thetext>(In reply to comment #13)
&gt; Side note: the current refactoring going on for Coordinated Graphics will give the same effect of this to us. Work is in the bug 103854.

Yes, my team works on this.

(In reply to comment #12)
&gt; I&apos;m working on a patch to reduce calls by compressing update information for the layers into a single message. ConnectionUnix knows how to handle big messages with attachments, but a lot of small messages seems to be filling the receiver buffer before it can read them.

Oh, what kind of work? which level do you work: coords grpx or low level?

(In reply to comment #12)
&gt; &gt; &gt; Still, do you think it&apos;s relevant to address this problem in particular?
&gt; &gt; 
&gt; &gt; Yes; The other bugs may delay the problem but we still want this problem gone.
&gt; 
&gt; Agreed.

I agree too. It&apos;s potential bug that can spend a lot of time of other developers like me.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>185000</attachid>
            <date>2013-01-28 10:09:27 -0800</date>
            <delta_ts>2013-01-28 10:55:15 -0800</delta_ts>
            <desc>Patch</desc>
            <filename>bug-107633-20130128150614.patch</filename>
            <type>text/plain</type>
            <size>2107</size>
            <attacher name="Rafael Brandao">rafael.lobo</attacher>
            
              <data encoding="base64">U3VidmVyc2lvbiBSZXZpc2lvbjogMTQwOTcxCmRpZmYgLS1naXQgYS9Tb3VyY2UvV2ViS2l0Mi9D
aGFuZ2VMb2cgYi9Tb3VyY2UvV2ViS2l0Mi9DaGFuZ2VMb2cKaW5kZXggNjI5MzM2MjNjODVkZDNh
NzZkNWIyMGE4N2YwNzc5YTVhMGFmYzMwYS4uNzNjYmYxZWYyZWZlYjQxMmYzODRiZThhYjc5MzZk
ZDBmZjZmNjg1MCAxMDA2NDQKLS0tIGEvU291cmNlL1dlYktpdDIvQ2hhbmdlTG9nCisrKyBiL1Nv
dXJjZS9XZWJLaXQyL0NoYW5nZUxvZwpAQCAtMSwzICsxLDE1IEBACisyMDEzLTAxLTI4ICBSYWZh
ZWwgQnJhbmRhbyAgPHJhZmFlbC5sb2JvQG9wZW5ib3NzYS5vcmc+CisKKyAgICAgICAgQ29yZUlQ
QyBvbW1pdHMgc29tZSBtZXNzYWdlcyB3aGVuIHNlbmRpbmcgYSBsb3Qgb2YgbWVzc2FnZXMgaW4g
YSB2ZXJ5IHNob3J0IHRpbWUuCisgICAgICAgIGh0dHBzOi8vYnVncy53ZWJraXQub3JnL3Nob3df
YnVnLmNnaT9pZD0xMDc2MzMKKworICAgICAgICBSZXZpZXdlZCBieSBOT0JPRFkgKE9PUFMhKS4K
KworICAgICAgICAqIFBsYXRmb3JtL0NvcmVJUEMvdW5peC9Db25uZWN0aW9uVW5peC5jcHA6Cisg
ICAgICAgIChDb3JlSVBDOjpDb25uZWN0aW9uOjpzZW5kT3V0Z29pbmdNZXNzYWdlKTogSGFuZGxl
IHRoZSBzb2NrZXQgbGF5ZXIncyBlcnJvciB0aGF0IGluZGljYXRlcworICAgICAgICB3aGV0aGVy
IHdlIGFyZSB0ZW1wb3JhcmlseSBsYWNraW5nIGF2YWlsYWJsZSByZXNvdXJjZXMuIEFsc28gYWRk
cyBhbiBBU1NFUlRfTk9UX1JFQUNIIHRvIG1ha2UgaXQKKyAgICAgICAgZWFzaWVyIHRvIGRldGVj
dCBvbiBkZWJ1ZyBtb2RlIHdoZW4gd2UgcmVhY2ggb3RoZXIgZXJyb3JzLgorCiAyMDEzLTAxLTI4
ICBSZW5hdGEgSG9kb3ZhbiAgPHJlbmlAd2Via2l0Lm9yZz4KIAogICAgICAgICBbUXRdW1dpbl1b
V0syXSBCdWlsZCBmaXggYWZ0ZXIgcjE0MDk1Ny4KZGlmZiAtLWdpdCBhL1NvdXJjZS9XZWJLaXQy
L1BsYXRmb3JtL0NvcmVJUEMvdW5peC9Db25uZWN0aW9uVW5peC5jcHAgYi9Tb3VyY2UvV2ViS2l0
Mi9QbGF0Zm9ybS9Db3JlSVBDL3VuaXgvQ29ubmVjdGlvblVuaXguY3BwCmluZGV4IDJkZjYxNjQx
Zjc3MDlmYTY3YjI0N2VmNDQxNzM3MDQ3OWE1ZjJjYjQuLjg0YmYxNWRhYTA5Yzk3MWYzZDMyZWIz
YzM2NzM5NzJkOWU4MzI2MWUgMTAwNjQ0Ci0tLSBhL1NvdXJjZS9XZWJLaXQyL1BsYXRmb3JtL0Nv
cmVJUEMvdW5peC9Db25uZWN0aW9uVW5peC5jcHAKKysrIGIvU291cmNlL1dlYktpdDIvUGxhdGZv
cm0vQ29yZUlQQy91bml4L0Nvbm5lY3Rpb25Vbml4LmNwcApAQCAtNTUwLDggKzU1MCwyMyBAQCBi
b29sIENvbm5lY3Rpb246OnNlbmRPdXRnb2luZ01lc3NhZ2UoTWVzc2FnZUlEIG1lc3NhZ2VJRCwg
UGFzc093blB0cjxNZXNzYWdlRW5jbwogCiAgICAgaW50IGJ5dGVzU2VudCA9IDA7CiAgICAgd2hp
bGUgKChieXRlc1NlbnQgPSBzZW5kbXNnKG1fc29ja2V0RGVzY3JpcHRvciwgJm1lc3NhZ2UsIDAp
KSA9PSAtMSkgewotICAgICAgICBpZiAoZXJybm8gIT0gRUlOVFIpCisgICAgICAgIGlmIChlcnJu
byA9PSBFQUdBSU4gfHwgZXJybm8gPT0gRVdPVUxEQkxPQ0spIHsKKyAgICAgICAgICAgIC8vIEVy
cm9yIG9uIHNvY2tldCBsYXllciB0aGF0IGluZGljYXRlcyB3aGV0aGVyIHJlc291cmNlcyBhcmUg
dGVtcG9yYXJpbHkgdW5hdmFpbGFibGUuCisgICAgICAgICAgICAvLyBXZSBmYWxsYmFjayB0byBz
bGVlcCBmb3IgYSBzaG9ydCB0aW1lIGhvcGluZyB0aGUgbmV4dCBpdGVyYXRpb24gd2Ugd2lsbCBi
ZSBhYmxlIHRvIHNlbmQgaXQuCisgICAgICAgICAgICB1c2xlZXAoNTApOworICAgICAgICB9CisK
KyAgICAgICAgc3dpdGNoIChlcnJubykgeworICAgICAgICBjYXNlIEVCQURGOgorICAgICAgICBj
YXNlIEVOT1RTT0NLOgorICAgICAgICBjYXNlIEVGQVVMVDoKKyAgICAgICAgY2FzZSBFTVNHU0la
RToKKyAgICAgICAgY2FzZSBFTk9NRU06CisgICAgICAgIGNhc2UgRUlOVkFMOgorICAgICAgICBj
YXNlIEVQSVBFOgorICAgICAgICAgICAgQVNTRVJUX05PVF9SRUFDSEVEKCk7CiAgICAgICAgICAg
ICByZXR1cm4gZmFsc2U7CisgICAgICAgIH0KICAgICB9CiAgICAgcmV0dXJuIHRydWU7CiB9Cg==
</data>
<flag name="review"
          id="203916"
          type_id="1"
          status="-"
          setter="andersca"
    />
          </attachment>
      

    </bug>

</bugzilla>