Chromium bug: http://code.google.com/p/chromium/issues/detail?id=132898 Layout test http/tests/websocket/tests/hybi/workers/close.html is failing flakily on Chromium bots on all platforms: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Fwebsocket%2Ftests%2Fhybi%2Fworkers%2Fclose.html Li, apparently this flakiness started from your patch at r120291. Could you take a look?
Okay, I will investigate it ASAP. Sorry for that.
I can't reproduce this issue. I check the newest code from trunk, and http/tests/websocket/tests/hybi/workers/close.html works on chromium-gtk and WebKit-gtk+ port.
* You might have a better luck with Debug builds. * You might want to repeat to run the test as this failure is flaky. At least I can reproduce the failure pretty reliably on Chromium ToT with Linux+Debug. I can observe the similar flakiness in a browser as well as DumpRenderTree. Diff always looks like the following: --- /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-expected.txt +++ /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-actual.txt @@ -45,7 +45,6 @@ Invalid code test: 12 Code NaN must cause INVALID_ACCESS_ERR. PASS PASS: worker: exceptionName is invalidAccessErr -PASS PASS: onerror() was called. runCodeTest: onclose(). PASS PASS: worker: closeEvent.code is abnormalClosure Skip invalid string test.
(In reply to comment #3) > * You might have a better luck with Debug builds. > * You might want to repeat to run the test as this failure is flaky. > > At least I can reproduce the failure pretty reliably on Chromium ToT with Linux+Debug. I can observe the similar flakiness in a browser as well as DumpRenderTree. > > Diff always looks like the following: > > --- /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-expected.txt > +++ /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-actual.txt > @@ -45,7 +45,6 @@ > Invalid code test: 12 > Code NaN must cause INVALID_ACCESS_ERR. > PASS PASS: worker: exceptionName is invalidAccessErr > -PASS PASS: onerror() was called. > runCodeTest: onclose(). > PASS PASS: worker: closeEvent.code is abnormalClosure > Skip invalid string test. Oh, I tested it in Release mode of chromium-gtk and WebKit-gtk+ port. According to your words and my test, that is to say, the test is failed just in debug mode.
I reproduce it on Chromium-gtk Debug mode. I will follow up. Thanks for your reporting.
Created attachment 148783 [details] Patch
This is a timing related issue. Between ThreadableWebSocketChannelClientWrapper::didReceiveMessageError and ThreadableWebSocketChannelClientWrapper::didReceiveMessageErrorCallback, suppose WorkerThreadableWebSocketChannel::Peer::didConnect() is invoked, then WebSocket::didConnect will be invoked, beacuse the m_state is changed to be CLOSING by WebSocket::close (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L373). It will call didClose directly, WorkerThreadableWebSocketChannel::disconnect will be invoked. It will clearClientWrapper. So wrapper->m_client->didReceiveMessageError() can't be invoked beacuse wrapper->m_client is clear. (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/ThreadableWebSocketChannelClientWrapper.cpp#L293)
It is valid bug on Qt too.
(In reply to comment #8) > It is valid bug on Qt too. Is the result same with chromium? If so, this patch can fix the issue.
Hi Yuta, Would you please review this patch? Thanks in advance.
Comment on attachment 148783 [details] Patch I'm not yet sure that this fix is the right fix; my current feeling is that this patch just hides the problem and doesn't fix the root cause. Could you tell me the detailed scenario that causes didClose() callback to be called earlier than didReceiveMessageError() callback? There should be some race inside communication between main thread and worker thread, and my gut feeling is that WebSocket.cpp doesn't need to be changed to fix this issue (but I might be wrong).
(In reply to comment #11) > (From update of attachment 148783 [details]) > I'm not yet sure that this fix is the right fix; my current feeling is that this patch just hides the problem and doesn't fix the root cause. > > Could you tell me the detailed scenario that causes didClose() callback to be called earlier than didReceiveMessageError() callback? There should be some race inside communication between main thread and worker thread, and my gut feeling is that WebSocket.cpp doesn't need to be changed to fix this issue (but I might be wrong). When the WebSocket state is connecting, calling close will invoke the didReceiveMessageError function, If the context is workerContext, the didReceiveMessageErrorCallback will be added into event loop, can't be executed immediately. Suppose in that time, the socket is connected, it will trigger WebSocket::didConnect. Here (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L451), it will cause error. In WorkerContext, the task should be added into event loop, so it can work in Doeument, not WorkerContext.
Added to Qt skiplist in r123595.
The root cause is that we should add the task into EventLoop queue in WorkerContext, but WebSocket::didConnect (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L451) can't follow it. Hi Yuta, I wander if my clarification is more clearer? Any question, please feel free to let me know.
Comment on attachment 148783 [details] Patch Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.
*** Bug 131716 has been marked as a duplicate of this bug. ***
Marked as flaky in <http://trac.webkit.org/r167332>
https://chromium.googlesource.com/chromium/blink/+/977662eb6360bab2875aafddfedd2b753aaf9fc7 We have 'http/tests/websocket/tests/hybi/workers/close.html [ Pass Failure ]' expectations for 'iOS' and 'mac'.