Bug 60881

Summary: Inspector may close at the start of the next inspector test in DRT
Product: WebKit Reporter: Csaba Osztrogonác <ossy>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Blocker CC: apavlov, cmarcelo, kent.hansen, lars.knoll, loislo, loki, noam, ossy, pfeldman, robert, zherczeg
Priority: P1 Keywords: Qt, QtTriaged
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 60802, 62278    
Attachments:
Description Flags
patch pfeldman: review+

Description Csaba Osztrogonác 2011-05-16 05:23:21 PDT
After r86499 inspector/styles/styles-url-linkify.html fails on 32 bit debug mode:

--- /ramdisk/qt-linux-32-debug/build/layout-test-results/inspector/styles/styles-url-linkify-expected.txt	2011-05-14 16:06:23.691172055 -0700
+++ /ramdisk/qt-linux-32-debug/build/layout-test-results/inspector/styles/styles-url-linkify-actual.txt	2011-05-14 16:06:23.691172055 -0700
@@ -1,19 +1,4 @@
+FAIL: Timed out waiting for notifyDone to be called
 Tests that URLs are linked to and completed correctly. Bugs 51663, 53171
 
 
-Partial URLs completed:
-http://example.com/
-http://example.com/moo
-https://secure.com/moo
-https://secure.com/moo
-http://example.com/moo
-http://example.com/foo/zoo/moo
-http://example.com/foo/boo/moo
-http://example.com/moo
-http://example.com/foo?a=b
-http://example.com/foo?a=b
-Link for a URI from CSS document:
-inspector/styles/resources/fromcss.png
-Link for a URI from iframe inline stylesheet:
-inspector/styles/resources/iframed.png
-
Comment 1 Csaba Osztrogonác 2011-05-16 05:48:11 PDT
Unfortunately inspector/styles/styles-url-linkify.html doesn't fail on its own, but if you run all tests.
Comment 2 Csaba Osztrogonác 2011-05-16 06:44:27 PDT
I can reproduce the fail with executing the 21 tests run after the last DumpRenderTree restart: 

inspector/profiler/heap-snapshot.html
inspector/protocol/console-agent.html
inspector/protocol/runtime-agent.html
inspector/styles/get-set-stylesheet-text.html
inspector/styles/metrics-box-sizing.html
inspector/styles/parse-utf8-bom.html
inspector/styles/styles-add-invalid-property.html
inspector/styles/styles-cancel-editing.html
inspector/styles/styles-commit-editing.html
inspector/styles/styles-computed-trace.html
inspector/styles/styles-disable-inherited.html
inspector/styles/styles-disable-then-change.html
inspector/styles/styles-disable-then-delete.html
inspector/styles/styles-history.html
inspector/styles/styles-iframe.html
inspector/styles/styles-new-API.html
inspector/styles/styles-source-lines-inline.html
inspector/styles/styles-source-lines.html
inspector/styles/styles-source-offsets.html
inspector/styles/styles-update-from-js.html
inspector/styles/styles-url-linkify.html --> FAIL
Comment 3 Csaba Osztrogonác 2011-05-16 07:18:01 PDT
After r86499 fast/js/array-sort-modifying-tostring.html fails on Qt WK2 bot.
It doesn't fail on its own, but fail if you run all fast/js tests.

--- /tmp/layout-test-results/fast/js/array-sort-modifying-tostring-expected.txt	2011-05-16 14:17:02.487090658 +0000
+++ /tmp/layout-test-results/fast/js/array-sort-modifying-tostring-actual.txt	2011-05-16 14:17:02.487090658 +0000
@@ -1,11 +1 @@
-Test of array with toString() override that truncates array.
-
-On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
-
-
-PASS Array length is unchanged.
-PASS Array values are correct.
-PASS successfullyParsed is true
-
-TEST COMPLETE
-
+Timed out waiting for final message from web process
Comment 4 Csaba Osztrogonác 2011-05-18 00:05:11 PDT
Are Nokia interested to fix this regression? Or you specifically want to keep this regression in trunk forever? :(
Comment 5 Zoltan Herczeg 2011-05-18 00:13:19 PDT
I have investigated this bug. When JIT is enabled, two tests fails in the inspector in the same way: the testing itself does not start (function test() does never called, but the timeout mechanism is installed, which yields the error message). When JIT is disabled, there is a crash in inspector/styles/metrics-box-sizing.html. I think these issues are connected (likely a GC related issue).
Comment 6 Zoltan Herczeg 2011-05-18 00:36:03 PDT
The crash happens in ASSERT_GC_OBJECT_LOOKS_VALID(cell); JSCell.cpp:226
Comment 7 Zoltan Herczeg 2011-05-20 06:36:44 PDT
Here is what I found so far (after 3 days of debugging):

In inspector-test.js, layoutTestController.evaluateInWebInspector starts some kind of asyncron execution of function runTestInFrontend(), which does not happen for styles-disable-then-delete.html and styles-url-linkify.html in some unknown circumstances. The question now: why does it fail to start the script execution?
Comment 8 Zoltan Herczeg 2011-05-20 07:11:27 PDT
Source/WebKit/qt/WebCoreSupport/InspectorClientQt.cpp

bool InspectorClientQt::sendMessageToFrontend(const String& message)
{
#if ENABLE(INSPECTOR)
    if (m_inspectedWebPage->d->inspector->d->remoteFrontend) {
        // Code not enters here.
    }
    if (!m_frontendWebPage) // <--- EARLY RETURN HERE!
        return false;

    Page* frontendPage = QWebPagePrivate::core(m_frontendWebPage);
    return doDispatchMessageOnFrontendPage(frontendPage, message);
#else
    return false;
#endif
}

If I run it alone, the m_frontendWebPage does exists. When I run it with run-layout-tests, it does not exists, and there is an early return.
Comment 9 Zoltan Herczeg 2011-05-23 04:25:26 PDT
Now I have a theory what cause this error, and why it is a flakey test:

After every tests:
 1) LayoutTestControllerQt.cpp:382 calls closeWebInspector()
 2) which calls close() in InspectorController.cpp
 3) which calls disconnectFromBackend() in WebCore/generated/InspectorFrontend.cpp
 4) which calls WebInspector.dispatchMessageFromBackend in WebCore/inspector/front-end/inspector.js (this is JavaScript now!)
 5) which calls setTimeout() with "Inspector.disconnectFromBackend" argument

The issue here is the setTimeout() behaviour, which may happen AFTER the next test is started by DumpRenderTree. Hence, the next test starts with closing the WebInspector. From that point, the test will fail with timeout.

WebInspector experts, please help me to solve this issue.
Comment 10 Zoltan Herczeg 2011-05-23 06:26:36 PDT
inspector-test.js :

function closeInspectorAndNotifyDone()
{
    if (window._watchDogTimer)
        clearTimeout(window._watchDogTimer);

    layoutTestController.closeWebInspector();

    // A helper function should be called here which would check
    // whether the inspector exists, and call notifyDone only if not.
    setTimeout(function() {
        layoutTestController.notifyDone();
    }, 0);
}
Comment 11 Ilya Tikhonovsky 2011-05-27 03:17:04 PDT
(In reply to comment #10)
> inspector-test.js :
> 
> function closeInspectorAndNotifyDone()
> {
>     if (window._watchDogTimer)
>         clearTimeout(window._watchDogTimer);
> 
>     layoutTestController.closeWebInspector();
> 
>     // A helper function should be called here which would check
>     // whether the inspector exists, and call notifyDone only if not.
>     setTimeout(function() {
>         layoutTestController.notifyDone();
>     }, 0);
> }

Seems to me it just slow. The same test on chromium-debug usually requires 15 seconds. We are trying to speed up our tests, but it is very tricky process because we are loading our front-end which has 50kloc of js.

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=styles-url-linkify.html&group=%40ToT%20-%20chromium.org
Comment 12 Zoltan Herczeg 2011-05-27 04:08:18 PDT
> Seems to me it just slow. The same test on chromium-debug usually requires 15 seconds. We are trying to speed up our tests, but it is very tricky process because we are loading our front-end which has 50kloc of js.
> 
> http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=styles-url-linkify.html&group=%40ToT%20-%20chromium.org

I feel the story is not that simple. Both layoutTestController.closeWebInspector() and and layoutTestController.notifyDone() start an async operation, which adds a new event to an event queue. The event queue is allowed to execute the events in any order, hence the layoutTestController.notifyDone() can be performed earlier than layoutTestController.closeWebInspector(). When layoutTestController.notifyDone() is executed, the next test is started by DRT. There is a substring comparison in the begining of the main loop of the DRT, which checks whether the test is an inspector test, and if so, it starts the inspector (even though the previous one was not even closed! -> leaking an inspector here). Right after the test starts, events in the queue are processed again. The next event closes the WebInspector. Thus, the test fail with timeout. Does this make sense?
Comment 13 Zoltan Herczeg 2011-06-08 09:36:55 PDT
I think it is very important to fix this bug. Today it caused new flakey tests! I tried to raise the visibility of the bug, and contacted the inspector developers.

Basically the issue is:
setTimeout(A, 0); setTimeout(B, 0) -> in this case, nothing guarantees, that A runs first, B later. It is likely, but not necessary. The setTimeout only guarantees, that the function will eventually be called.

Both "layoutTestController.closeWebInspector()" and "setTimeout(function() {
layoutTestController.notifyDone(); }, 0)" starts a setTimeout async call. If layoutTestController.notifyDone() comes first, the next test is starting to run. It opens, an inspector (without closing the previous one, etc.) However, the first processed message will be closeInspector(). Later the runInInspector or something fails, since there is no inspector. Since it has no return value, you don't even know that the call is failed. After that, the test stops, and wait for a watchdog timeout event.

Short, but stupid fix:

   setTimeout(function() {
      layoutTestController.notifyDone();
   }, 100 /* much bigger than 0 */);

It will slow down the inspector tests, but will likely works.

More clever idea:
Add a new bool function to DRT, which returns true, if the inspector exists. Before layoutTestController.notifyDone(), check this value, and if it is true, start another setTimeout() call. This is difficult solution, since affects all inspectors, but I feel this is the correct one. However, I don't want to start working on it until someone says this is the best solution. Any better idea is welcome.
Comment 14 Robert Hogan 2011-06-09 11:37:02 PDT
If I'm reading your comments correctly Zoltan, the fix may be to ensure that EventSender sends events in the order they were received. This is a problem with EventSender that affects other tests too, particularly Drag n Drop, and cases where EventSender.skipForward() (or whatever it's called) is used.
Comment 15 Zoltan Herczeg 2011-06-09 11:41:36 PDT
(In reply to comment #14)
> If I'm reading your comments correctly Zoltan, the fix may be to ensure that EventSender sends events in the order they were received. This is a problem with EventSender that affects other tests too, particularly Drag n Drop, and cases where EventSender.skipForward() (or whatever it's called) is used.

http://www.whatwg.org/specs/web-apps/current-work/multipage/timers.html#dom-windowtimers-settimeout

says:

8. Wait until any invocations of this algorithm started before this one
whose timeout is equal to or less than this one's have completed.

It seems this behaviour might be expected by the standard as well, although I am not sure what "Living Standard" means, and how trustworthy the link above.
Comment 16 Csaba Osztrogonác 2011-06-09 19:35:08 PDT
I skipped all inspector/profiler tests on Qt: http://trac.webkit.org/changeset/88514
Comment 17 Csaba Osztrogonác 2011-06-09 19:52:52 PDT
Nowadays Qt buildbot is absolutely useless because of this crazy bug. :-(((

I tried to find a subset of inspector tests and disable them to 
make buildbot green again, but I didn't manage after a day digging. 
I had to disable all inspector tests until fix: http://trac.webkit.org/changeset/88516
Comment 18 Zoltan Herczeg 2011-06-14 06:58:42 PDT
I spent another day for debugging this. It seems the issue is little more complex than I thought. It seems there is another way of calling closeWebInspector() in styles-disable-then-change.html , which affects the next test: styles-disable-then-delete.html . This different closeWebInspector() hits the m_nestingLevel >= maxTimerNestingLevel condition in WebCore/page/DOMTimer.cpp, which increase the minimum 1 ms timeout to 10 ms. However, the notifyDone does not called from here, and the normal code path does not reach this treshold, so the timeout is still 1 ms. 10 ms is way bigger than 1 ms, and the timer fires after the next test is started.

Debugging the inspector is really difficult because it is half JavaScript, half C++, and full of setTimeout async calls.
Comment 19 Ilya Tikhonovsky 2011-06-14 07:52:42 PDT
got a pair of tests where second is constantly failing
inspector/profiler/detailed-heapshots-dominators-shown-node-count-preserved-when-sorting.html
inspector/styles/styles-commit-editing.html
Comment 20 Zoltan Herczeg 2011-06-21 07:13:50 PDT
InspectorBackendStub.js : dispatcher[functionName].apply(dispatcher, params);
calls jsInspectorFrontendHostPrototypeFunctionDisconnectFromBackend

However, it seems DOM.hideNodeHighlight starts the timer. Would it be possible, that the message content changes when the new test starts?
Comment 21 Zoltan Herczeg 2011-06-22 05:28:56 PDT
Finally, I solved this stupid bug. Heh, it nearly turned out to be the first bug that I cannot solve, but it isn't.

Here is the culprit:

inspector.js

var messagesToDispatch = [];

WebInspector.dispatch = function(message) {
    messagesToDispatch.push(message);
    setTimeout(function() {
        InspectorBackend.dispatch(messagesToDispatch.shift());
    }, 0);
}

"messagesToDispatch" is basically a FIFO queue, which means the messages are processed in order, regardless of how the unnamed sub-function is called. Naturally, the lastly appended message in this queue is the closeWebInspector for each test case. However, depending on the test case the setTimeout depth can be increased to 5 to some messages, which increases the minimum timeout to 10 ms to that particular call. This call may be processed after(!) the next test is started, which means the last item in the queue is processed there.

Possible fix: check messagesToDispatch.length() == 0 before reporting the test case finish.
Comment 22 Zoltan Herczeg 2011-06-22 06:19:09 PDT
Created attachment 98160 [details]
patch

Finally! A fix!
Comment 23 Pavel Feldman 2011-06-22 06:41:58 PDT
Comment on attachment 98160 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98160&action=review

> LayoutTests/http/tests/inspector/inspector-test.js:18
> +        if (!WebInspector.dispatchQueueIsEmpty()) {

This looks sane. Before we land this, could you try calling InspectorTest.runAfterPendingDispatches(<didEvaluateForTestInFrontend>) instead? We maintain dispatch counter and can tell at any moment whether all of them are served.
Comment 24 Zoltan Herczeg 2011-06-23 02:21:03 PDT
> This looks sane. Before we land this, could you try calling InspectorTest.runAfterPendingDispatches(<didEvaluateForTestInFrontend>) instead? We maintain dispatch counter and can tell at any moment whether all of them are served.

No, I still get "FAIL: Timed out waiting for notifyDone to be called" for styles-url-linkify.html
Comment 25 Zoltan Herczeg 2011-06-23 05:04:10 PDT
Landed in http://trac.webkit.org/changeset/89556