ASSIGNED 89153
http/tests/websocket/tests/hybi/workers/close.html is flaky
https://bugs.webkit.org/show_bug.cgi?id=89153
Summary http/tests/websocket/tests/hybi/workers/close.html is flaky
Yuta Kitamura
Reported 2012-06-14 18:48:54 PDT
Chromium bug: http://code.google.com/p/chromium/issues/detail?id=132898 Layout test http/tests/websocket/tests/hybi/workers/close.html is failing flakily on Chromium bots on all platforms: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Fwebsocket%2Ftests%2Fhybi%2Fworkers%2Fclose.html Li, apparently this flakiness started from your patch at r120291. Could you take a look?
Attachments
Patch (3.00 KB, patch)
2012-06-21 06:31 PDT, Li Yin
no flags
Li Yin
Comment 1 2012-06-14 19:26:16 PDT
Okay, I will investigate it ASAP. Sorry for that.
Li Yin
Comment 2 2012-06-14 20:49:51 PDT
I can't reproduce this issue. I check the newest code from trunk, and http/tests/websocket/tests/hybi/workers/close.html works on chromium-gtk and WebKit-gtk+ port.
Yuta Kitamura
Comment 3 2012-06-14 22:10:56 PDT
* You might have a better luck with Debug builds. * You might want to repeat to run the test as this failure is flaky. At least I can reproduce the failure pretty reliably on Chromium ToT with Linux+Debug. I can observe the similar flakiness in a browser as well as DumpRenderTree. Diff always looks like the following: --- /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-expected.txt +++ /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-actual.txt @@ -45,7 +45,6 @@ Invalid code test: 12 Code NaN must cause INVALID_ACCESS_ERR. PASS PASS: worker: exceptionName is invalidAccessErr -PASS PASS: onerror() was called. runCodeTest: onclose(). PASS PASS: worker: closeEvent.code is abnormalClosure Skip invalid string test.
Li Yin
Comment 4 2012-06-14 22:43:32 PDT
(In reply to comment #3) > * You might have a better luck with Debug builds. > * You might want to repeat to run the test as this failure is flaky. > > At least I can reproduce the failure pretty reliably on Chromium ToT with Linux+Debug. I can observe the similar flakiness in a browser as well as DumpRenderTree. > > Diff always looks like the following: > > --- /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-expected.txt > +++ /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-actual.txt > @@ -45,7 +45,6 @@ > Invalid code test: 12 > Code NaN must cause INVALID_ACCESS_ERR. > PASS PASS: worker: exceptionName is invalidAccessErr > -PASS PASS: onerror() was called. > runCodeTest: onclose(). > PASS PASS: worker: closeEvent.code is abnormalClosure > Skip invalid string test. Oh, I tested it in Release mode of chromium-gtk and WebKit-gtk+ port. According to your words and my test, that is to say, the test is failed just in debug mode.
Li Yin
Comment 5 2012-06-14 23:32:23 PDT
I reproduce it on Chromium-gtk Debug mode. I will follow up. Thanks for your reporting.
Li Yin
Comment 6 2012-06-21 06:31:55 PDT
Li Yin
Comment 7 2012-06-21 06:53:18 PDT
This is a timing related issue. Between ThreadableWebSocketChannelClientWrapper::didReceiveMessageError and ThreadableWebSocketChannelClientWrapper::didReceiveMessageErrorCallback, suppose WorkerThreadableWebSocketChannel::Peer::didConnect() is invoked, then WebSocket::didConnect will be invoked, beacuse the m_state is changed to be CLOSING by WebSocket::close (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L373). It will call didClose directly, WorkerThreadableWebSocketChannel::disconnect will be invoked. It will clearClientWrapper. So wrapper->m_client->didReceiveMessageError() can't be invoked beacuse wrapper->m_client is clear. (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/ThreadableWebSocketChannelClientWrapper.cpp#L293)
Csaba Osztrogonác
Comment 8 2012-06-29 01:20:48 PDT
It is valid bug on Qt too.
Li Yin
Comment 9 2012-07-01 19:20:42 PDT
(In reply to comment #8) > It is valid bug on Qt too. Is the result same with chromium? If so, this patch can fix the issue.
Li Yin
Comment 10 2012-07-10 17:53:11 PDT
Hi Yuta, Would you please review this patch? Thanks in advance.
Yuta Kitamura
Comment 11 2012-07-13 00:57:11 PDT
Comment on attachment 148783 [details] Patch I'm not yet sure that this fix is the right fix; my current feeling is that this patch just hides the problem and doesn't fix the root cause. Could you tell me the detailed scenario that causes didClose() callback to be called earlier than didReceiveMessageError() callback? There should be some race inside communication between main thread and worker thread, and my gut feeling is that WebSocket.cpp doesn't need to be changed to fix this issue (but I might be wrong).
Li Yin
Comment 12 2012-07-15 06:36:46 PDT
(In reply to comment #11) > (From update of attachment 148783 [details]) > I'm not yet sure that this fix is the right fix; my current feeling is that this patch just hides the problem and doesn't fix the root cause. > > Could you tell me the detailed scenario that causes didClose() callback to be called earlier than didReceiveMessageError() callback? There should be some race inside communication between main thread and worker thread, and my gut feeling is that WebSocket.cpp doesn't need to be changed to fix this issue (but I might be wrong). When the WebSocket state is connecting, calling close will invoke the didReceiveMessageError function, If the context is workerContext, the didReceiveMessageErrorCallback will be added into event loop, can't be executed immediately. Suppose in that time, the socket is connected, it will trigger WebSocket::didConnect. Here (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L451), it will cause error. In WorkerContext, the task should be added into event loop, so it can work in Doeument, not WorkerContext.
Balazs Ankes
Comment 13 2012-07-25 02:45:27 PDT
Added to Qt skiplist in r123595.
Li Yin
Comment 14 2012-07-25 22:31:30 PDT
The root cause is that we should add the task into EventLoop queue in WorkerContext, but WebSocket::didConnect (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L451) can't follow it. Hi Yuta, I wander if my clarification is more clearer? Any question, please feel free to let me know.
Anders Carlsson
Comment 15 2014-02-05 10:57:25 PST
Comment on attachment 148783 [details] Patch Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.
Alexey Proskuryakov
Comment 16 2014-04-15 16:20:22 PDT
*** Bug 131716 has been marked as a duplicate of this bug. ***
Alexey Proskuryakov
Comment 17 2014-04-15 16:20:46 PDT
Marked as flaky in <http://trac.webkit.org/r167332>
Ahmad Saleem
Comment 18 2024-03-29 22:16:15 PDT
https://chromium.googlesource.com/chromium/blink/+/977662eb6360bab2875aafddfedd2b753aaf9fc7 We have 'http/tests/websocket/tests/hybi/workers/close.html [ Pass Failure ]' expectations for 'iOS' and 'mac'.
Note You need to log in before you can comment on or make changes to this bug.