Bug 89153 - http/tests/websocket/tests/hybi/workers/close.html is flaky
Summary: http/tests/websocket/tests/hybi/workers/close.html is flaky
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Li Yin
URL:
Keywords:
: 131716 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-06-14 18:48 PDT by Yuta Kitamura
Modified: 2017-05-08 14:42 PDT (History)
7 users (show)

See Also:


Attachments
Patch (3.00 KB, patch)
2012-06-21 06:31 PDT, Li Yin
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuta Kitamura 2012-06-14 18:48:54 PDT
Chromium bug: http://code.google.com/p/chromium/issues/detail?id=132898

Layout test http/tests/websocket/tests/hybi/workers/close.html is failing flakily on Chromium bots on all platforms:

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Fwebsocket%2Ftests%2Fhybi%2Fworkers%2Fclose.html

Li, apparently this flakiness started from your patch at r120291. Could you take a look?
Comment 1 Li Yin 2012-06-14 19:26:16 PDT
Okay, I will investigate it ASAP.
Sorry for that.
Comment 2 Li Yin 2012-06-14 20:49:51 PDT
I can't reproduce this issue.
I check the newest code from trunk, and http/tests/websocket/tests/hybi/workers/close.html works on chromium-gtk and WebKit-gtk+ port.
Comment 3 Yuta Kitamura 2012-06-14 22:10:56 PDT
* You might have a better luck with Debug builds.
* You might want to repeat to run the test as this failure is flaky.

At least I can reproduce the failure pretty reliably on Chromium ToT with Linux+Debug. I can observe the similar flakiness in a browser as well as DumpRenderTree.

Diff always looks like the following:

--- /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-expected.txt 
+++ /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-actual.txt 
@@ -45,7 +45,6 @@
 Invalid code test: 12
 Code NaN must cause INVALID_ACCESS_ERR.
 PASS PASS: worker: exceptionName is invalidAccessErr
-PASS PASS: onerror() was called.
 runCodeTest: onclose().
 PASS PASS: worker: closeEvent.code is abnormalClosure
 Skip invalid string test.
Comment 4 Li Yin 2012-06-14 22:43:32 PDT
(In reply to comment #3)
> * You might have a better luck with Debug builds.
> * You might want to repeat to run the test as this failure is flaky.
> 
> At least I can reproduce the failure pretty reliably on Chromium ToT with Linux+Debug. I can observe the similar flakiness in a browser as well as DumpRenderTree.
> 
> Diff always looks like the following:
> 
> --- /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-expected.txt 
> +++ /b/build/slave/Webkit_Linux__dbg_/build/layout-test-results/http/tests/websocket/tests/hybi/workers/close-actual.txt 
> @@ -45,7 +45,6 @@
>  Invalid code test: 12
>  Code NaN must cause INVALID_ACCESS_ERR.
>  PASS PASS: worker: exceptionName is invalidAccessErr
> -PASS PASS: onerror() was called.
>  runCodeTest: onclose().
>  PASS PASS: worker: closeEvent.code is abnormalClosure
>  Skip invalid string test.

Oh, I tested it in Release mode of chromium-gtk and WebKit-gtk+ port.

According to your words and my test, that is to say, the test is failed just in debug mode.
Comment 5 Li Yin 2012-06-14 23:32:23 PDT
I reproduce it on Chromium-gtk Debug mode.

I will follow up. Thanks for your reporting.
Comment 6 Li Yin 2012-06-21 06:31:55 PDT
Created attachment 148783 [details]
Patch
Comment 7 Li Yin 2012-06-21 06:53:18 PDT
This is a timing related issue.

Between ThreadableWebSocketChannelClientWrapper::didReceiveMessageError and ThreadableWebSocketChannelClientWrapper::didReceiveMessageErrorCallback,
suppose WorkerThreadableWebSocketChannel::Peer::didConnect() is invoked, then WebSocket::didConnect will be invoked, beacuse the m_state is changed to be CLOSING by WebSocket::close (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L373).
It will call didClose directly, WorkerThreadableWebSocketChannel::disconnect will be invoked. It will clearClientWrapper.

So wrapper->m_client->didReceiveMessageError() can't be invoked beacuse wrapper->m_client is clear.
(http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/ThreadableWebSocketChannelClientWrapper.cpp#L293)
Comment 8 Csaba Osztrogonác 2012-06-29 01:20:48 PDT
It is valid bug on Qt too.
Comment 9 Li Yin 2012-07-01 19:20:42 PDT
(In reply to comment #8)
> It is valid bug on Qt too.

Is the result same with chromium? If so, this patch can fix the issue.
Comment 10 Li Yin 2012-07-10 17:53:11 PDT
Hi Yuta,
   Would you please review this patch? Thanks in advance.
Comment 11 Yuta Kitamura 2012-07-13 00:57:11 PDT
Comment on attachment 148783 [details]
Patch

I'm not yet sure that this fix is the right fix; my current feeling is that this patch just hides the problem and doesn't fix the root cause.

Could you tell me the detailed scenario that causes didClose() callback to be called earlier than didReceiveMessageError() callback? There should be some race inside communication between main thread and worker thread, and my gut feeling is that WebSocket.cpp doesn't need to be changed to fix this issue (but I might be wrong).
Comment 12 Li Yin 2012-07-15 06:36:46 PDT
(In reply to comment #11)
> (From update of attachment 148783 [details])
> I'm not yet sure that this fix is the right fix; my current feeling is that this patch just hides the problem and doesn't fix the root cause.
> 
> Could you tell me the detailed scenario that causes didClose() callback to be called earlier than didReceiveMessageError() callback? There should be some race inside communication between main thread and worker thread, and my gut feeling is that WebSocket.cpp doesn't need to be changed to fix this issue (but I might be wrong).

When the WebSocket state is connecting, calling close will invoke the didReceiveMessageError function,
If the context is workerContext, the didReceiveMessageErrorCallback will be added into event loop, can't be executed immediately.
Suppose in that time, the socket is connected, it will trigger WebSocket::didConnect. 
Here (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L451), it will cause error. In WorkerContext, the task should be added into event loop, so it can work in Doeument, not WorkerContext.
Comment 13 Balazs Ankes 2012-07-25 02:45:27 PDT
Added to Qt skiplist in r123595.
Comment 14 Li Yin 2012-07-25 22:31:30 PDT
The root cause is that we should add the task into EventLoop queue in WorkerContext, but WebSocket::didConnect (http://trac.webkit.org/browser/trunk/Source/WebCore/Modules/websockets/WebSocket.cpp#L451) can't follow it.

Hi Yuta,
I wander if my clarification is more clearer?
Any question, please feel free to let me know.
Comment 15 Anders Carlsson 2014-02-05 10:57:25 PST
Comment on attachment 148783 [details]
Patch

Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.
Comment 16 Alexey Proskuryakov 2014-04-15 16:20:22 PDT
*** Bug 131716 has been marked as a duplicate of this bug. ***
Comment 17 Alexey Proskuryakov 2014-04-15 16:20:46 PDT
Marked as flaky in <http://trac.webkit.org/r167332>