Bug 138468 - [GTK] WebProcess crashes when quickly attempting many DnD operations
Summary: [GTK] WebProcess crashes when quickly attempting many DnD operations
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-06 10:24 PST by Mario Sanchez Prada
Modified: 2016-01-29 08:26 PST (History)
11 users (show)

See Also:


Attachments
Patch proposal (2.35 KB, patch)
2014-11-06 10:42 PST, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Proof of concept: don't allow concurrent drag operations for the same WebPage (3.87 KB, patch)
2015-12-24 09:31 PST, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Patch proposal + new layout test (13.30 KB, patch)
2016-01-22 08:55 PST, Mario Sanchez Prada
mcatanzaro: review+
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews105 for mac-yosemite-wk2 (935.53 KB, application/zip)
2016-01-22 09:49 PST, Build Bot
no flags Details
Patch proposal + new layout test (2 secs timeout) (13.30 KB, patch)
2016-01-22 10:27 PST, Mario Sanchez Prada
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews105 for mac-yosemite-wk2 (1.02 MB, application/zip)
2016-01-22 12:06 PST, Build Bot
no flags Details
Patch proposal + new layout test (no timeout) (12.79 KB, patch)
2016-01-25 03:11 PST, Mario Sanchez Prada
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-yosemite (767.59 KB, application/zip)
2016-01-25 04:05 PST, Build Bot
no flags Details
Archive of layout-test-results from ews116 for mac-yosemite (795.27 KB, application/zip)
2016-01-25 04:08 PST, Build Bot
no flags Details
Archive of layout-test-results from ews105 for mac-yosemite-wk2 (784.59 KB, application/zip)
2016-01-25 04:10 PST, Build Bot
no flags Details
Patch proposal + new layout test (no timeout) (12.85 KB, patch)
2016-01-25 06:32 PST, Mario Sanchez Prada
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-yosemite (745.04 KB, application/zip)
2016-01-25 07:20 PST, Build Bot
no flags Details
Archive of layout-test-results from ews105 for mac-yosemite-wk2 (790.23 KB, application/zip)
2016-01-25 07:36 PST, Build Bot
no flags Details
Archive of layout-test-results from ews114 for mac-yosemite (793.20 KB, application/zip)
2016-01-25 07:38 PST, Build Bot
no flags Details
Patch proposal + new layout test (no timeout) (14.00 KB, patch)
2016-01-25 13:13 PST, Mario Sanchez Prada
mcatanzaro: review+
Details | Formatted Diff | Diff
Patch to fix the build breakage with GTK+ < 3.16 (6.86 KB, patch)
2016-01-29 07:01 PST, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Patch to fix the build breakage with GTK+ < 3.16 (7.02 KB, patch)
2016-01-29 07:36 PST, Mario Sanchez Prada
mcatanzaro: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mario Sanchez Prada 2014-11-06 10:24:07 PST
We recently hit a crash in WebCore::Clipboard::setDestinationOperation() when trying to move around (dragging and dropping) links very quickly inside a WebKitGTK+ (2.4.6) based application using WebKit 1.

At a first glance I thought it might be an issue on how dragSourceEndedAt() is being called from the WebKitGTK+ API, perhaps it does not make sense to do it when the GdkDragAction returned by gdk_drag_context_get_selected_action() is GDK_ACTION_DEFAULT. However, looking at how the same function is being called from other ports I haven't found such an idiom anywhere, and looking at the implementation of EventHandler::dragSourceEndedAt() I see that dragState().clipboard is the only pointer being dereferenced there, which is the source of the crash in this case, as it's null in this specific scenario.

So, I wonder if a null-check is missing there, or whether it might be a deeper issue which we are only seeing the tip of the iceberg of.

See the backtrace below (replace Clipboard with DataTransfer to match the current code in WebCore, as this stack trace is from WebKitGTK+ 2.4.6):

#0  WebCore::Clipboard::setDestinationOperation (this=0x0, 
    operation=operation@entry=WebCore::DragOperationNone)
    at ../Source/WebCore/dom/Clipboard.cpp:382
#1  0xb2d7c797 in WebCore::EventHandler::dragSourceEndedAt (this=0x82ae030, 
    event=..., operation=WebCore::DragOperationNone)
    at ../Source/WebCore/page/EventHandler.cpp:3148
#2  0xb26851b9 in webkit_web_view_drag_end (widget=0x82851a0, 
    context=0x80c2de0) at ../Source/WebKit/gtk/webkit/webkitwebview.cpp:1545
#3  0xb7f75f7c in g_cclosure_marshal_VOID__OBJECTv ()
   from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#4  0xb7f71807 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#5  0xb7f73161 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#6  0xb7f8d4ab in g_signal_emit_valist ()
   from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#7  0xb7f8e4d5 in g_signal_emit_by_name ()
   from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#8  0xb799d64b in ?? () from /usr/lib/i386-linux-gnu/libgtk-3.so.0
#9  0xb799d8e1 in ?? () from /usr/lib/i386-linux-gnu/libgtk-3.so.0
#10 0xb6c3ed4d in ?? () from /usr/lib/i386-linux-gnu/libgdk-3.so.0
#11 0xb7e9fd4e in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#12 0xb7ea03ef in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#13 0xb7ea3192 in g_main_context_dispatch ()
   from /lib/i386-linux-gnu/libglib-2.0.so.0
#14 0xb7ea34b0 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#15 0xb7ea3591 in g_main_context_iteration ()
   from /lib/i386-linux-gnu/libglib-2.0.so.0
#16 0xb73ead64 in g_application_run ()
   from /usr/lib/i386-linux-gnu/libgio-2.0.so.0
#17 0xb7c1e48e in ffi_call_SYSV () from /usr/lib/i386-linux-gnu/libffi.so.6
#18 0xb7c1e1ef in ffi_call () from /usr/lib/i386-linux-gnu/libffi.so.6
#19 0xb7dfdce8 in ?? () from /usr/lib/libgjs.so.0
#20 0xb7dff5ff in ?? () from /usr/lib/libgjs.so.0
#21 0xb6e7a440 in ?? () from /usr/lib/i386-linux-gnu/libmozjs-24.so.0
#22 0xb6e869ea in ?? () from /usr/lib/i386-linux-gnu/libmozjs-24.so.0
#23 0xb6e876b6 in ?? () from /usr/lib/i386-linux-gnu/libmozjs-24.so.0
#24 0xb6e88865 in ?? () from /usr/lib/i386-linux-gnu/libmozjs-24.so.0
#25 0xb6f29d33 in JS::Evaluate(JSContext*, JS::Handle<JSObject*>, JS::CompileOptions, unsigned short const*, unsigned int, JS::Value*) ()
   from /usr/lib/i386-linux-gnu/libmozjs-24.so.0
#26 0xb6f2a348 in JS::Evaluate(JSContext*, JS::Handle<JSObject*>, JS::CompileOptions, char const*, unsigned int, JS::Value*) ()
   from /usr/lib/i386-linux-gnu/libmozjs-24.so.0
#27 0xb7def9eb in gjs_eval_with_scope () from /usr/lib/libgjs.so.0
#28 0xb7de91d2 in gjs_context_eval () from /usr/lib/libgjs.so.0
#29 0x08048d65 in main ()


For what is worth, I added the null check locally and re-run the layout tests, and I could not spot any regression after that.

PS: Martin, I'm adding you to CC as I believe you have experience in the drag'n'drop stuff (specifically in the GTK+ port), but please feel free to ignore (or add someone else) if my assumption is wrong. Thanks!
Comment 1 Mario Sanchez Prada 2014-11-06 10:42:58 PST
Created attachment 241117 [details]
Patch proposal

I forgot to mention this before, but after checking it with gdb and 'threads apply all bt' it does not look to me as a threading issue, but more related to the fact that we are calling dragSourceEndedAt() with DragOperationNone as the 'operation' parameter, so I wonder if that's what's causing that we dataTransfer is null in this case (thus the need for the check)

I'm also currently attaching the patch that I've been testing today, to make it easier to discuss.
Comment 2 Mario Sanchez Prada 2014-11-07 03:46:45 PST
Adding to CC people suggested by webkit-patch
Comment 3 Mario Sanchez Prada 2015-12-24 09:28:44 PST
I'm renaming this issue after the investigation carried out during the Web Engines Hackfest and a few more experiments I run today, since it seems clear now to me that the problem is not in EventHandler itself, but in how the GTK port is using it.

More specifically, it seems that WebCore is only able to handle one DnD for the main frame of the current page at the same time, since the flow looks something like this, for a single DnD operation:
 
  1. Once you start dragging and the "drag hysteresis" is exceeded, a DataTransfer object is created and the UI Process is notified 
  2. The startDrag() event is handled in the UI Process by WebKitGTK's DragAndDropHandler, which calls into gtk_drag_begin(webView)
  3. Once you stop dragging (button released), "drag-end" signal is emitted for the WebView, calling DragAndDropHandler::finishDrag()
  4. Retrieved all the necessary info about the drag event, a "DragEnded" message is sent to the WebPage in the web process
  5. From WebPage::dragEnded(), m_page->mainFrame().eventHandler().dragSourceEndedAt(event, (DragOperation)operation) is called
  6. Back in EventHandler, dragSourceEndedAt() is executed, which will call dragState().dataTransfer->setDestinationOperation(operation)
  7. That will work as dragState().dataTransfer (created in step 1) will be still valid for the Frame

What I found is that, when doing DnD very fast over the same element (e.g. a link), WebKit2GTK+ would mix things up as it's very likely that more than one call to PageClientImpl::startDrag() happen before the "drag-end" signal is emitted for the first DnD operation, resulting in EventHandler::dragSourceEndedAt() being called over the wrong DataTransfer element for the first time that the DragEnded message is sent:

As a new DnD operation has been generated by the time the first DragEnded message is received back in the WebProcess, the DataTransfer element in there will be associated to the 2nd DnD operation, not the original 1st one. And this, in the worst case scenario, can lead to the Web process to crash if we are "unlucky enough" to have that DataTransfer object already cleared/freed by the time that second DragEnded message is received, which is not an easy to reproduce event, but can happen (and I became pretty good at crashing it).

So, I guess one way to fix this would be to allow the EventHandler handle different DnD operations in a better way, but not sure if it's worth the effort, since it's an unlikely scenario anyway. The other way I can see to fix it is to handle this situation in a better way in the DragAndDropHandler class of the GTK port, so that (perhaps) an "obsolete" DnD operation for a given WebPage is cancelled if a new one arrives before having dealt with that one first.

Opinions?
Comment 4 Mario Sanchez Prada 2015-12-24 09:31:53 PST
Created attachment 267900 [details]
Proof of concept: don't allow concurrent drag operations for the same WebPage

See attached a patch that implements the second approach proposed, for discussion purposes only for now (not sure if the approach is correct, lacking tests)
Comment 5 Michael Catanzaro 2015-12-26 13:07:09 PST
Comment on attachment 267900 [details]
Proof of concept: don't allow concurrent drag operations for the same WebPage

View in context: https://bugs.webkit.org/attachment.cgi?id=267900&action=review

Seems like a good approach to me.

> Source/WebKit2/UIProcess/gtk/DragAndDropHandler.cpp:122
> +    // WebCore::EventHandler does not support more than one DnD operation ata the same time for

"ata" -> "at"
Comment 6 Carlos Garcia Campos 2015-12-28 01:27:24 PST
Comment on attachment 267900 [details]
Proof of concept: don't allow concurrent drag operations for the same WebPage

LGTM
Comment 7 Mario Sanchez Prada 2015-12-28 15:00:47 PST
Thanks for the feedback. I'm now on holidays (coming back on Jan the 6th) but replying now just to let you know I'll be working on this once I'm back based on the positive comments, trying to incorporate a test to the patch before formally asking for review.

Thanks, and Merry Christmas
Comment 8 Mario Sanchez Prada 2016-01-22 08:55:42 PST
Created attachment 269573 [details]
Patch proposal + new layout test

Sorry for the delay, got busy with other things after holidays and could not find the right time to work on this before.

Now attaching a cleaned up patch together with a new layout test to make sure that we don't crash. Also, even though this issue has been detected on GTK, I think it makes sense to have the patch as a cross platform one, which is why I haven't placed it under the platform/gtk/ directory.

Please review, thanks!
Comment 9 Michael Catanzaro 2016-01-22 09:26:10 PST
Comment on attachment 269573 [details]
Patch proposal + new layout test

Great test! I presume it reliably crashes the web process on your machine?

Please wait one day before committing, just in case someone else has comments.
Comment 10 Mario Sanchez Prada 2016-01-22 09:32:02 PST
(In reply to comment #9)
> Comment on attachment 269573 [details]
> Patch proposal + new layout test
> 
> Great test!

Thanks

> I presume it reliably crashes the web process on your machine?

Not reliably, but "most of the times only". Think the nature of this issue is not really easy to reproduce so, after quite some time trying a 100% reliable test, in the end I decided to go for this one that crashes most of the times instead, which is much better than nothing IMHO.
 
> Please wait one day before committing, just in case someone else has
> comments.

No problem, I was going to wait until Monday anyway
Comment 11 Build Bot 2016-01-22 09:49:26 PST
Comment on attachment 269573 [details]
Patch proposal + new layout test

Attachment 269573 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/725988

New failing tests:
fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html
Comment 12 Build Bot 2016-01-22 09:49:31 PST
Created attachment 269575 [details]
Archive of layout-test-results from ews105 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 13 Mario Sanchez Prada 2016-01-22 10:25:33 PST
--- /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash-expected.txt
+++ /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash-actual.txt
@@ -28,7 +28,7 @@
 
 
 PASS dragHasStarted is true
-PASS dragHasEnded is true
+FAIL dragHasEnded should be true. Was false.
 PASS successfullyParsed is true
 
 TEST COMPLETE


This is weird... maybe the 1sec timeout that I added to ensure that the ondragend callback has been called is not enough on the Mac? I'll upload a new version of the patch with a 2sec timeout instead to try then...
Comment 14 Mario Sanchez Prada 2016-01-22 10:27:14 PST
Created attachment 269578 [details]
Patch proposal + new layout test (2 secs timeout)

Let's see what the EWS thinks of this one...
Comment 15 Build Bot 2016-01-22 12:06:51 PST
Comment on attachment 269578 [details]
Patch proposal + new layout test (2 secs timeout)

Attachment 269578 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/726393

New failing tests:
fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html
Comment 16 Build Bot 2016-01-22 12:06:56 PST
Created attachment 269598 [details]
Archive of layout-test-results from ews105 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 17 Mario Sanchez Prada 2016-01-22 18:33:48 PST
It failed again, which does not make many sense IMHO, as it means the ondragend event is never executed, on the mac... maybe it's a bug in that platform?

I'll check this again on Monday but, unless I can find a reasonable explanation, I'm thinking I'll perhaps be pushing the previos patch (only 1sec timeout) adding this new test to the TestExpectations on the Mac.
Comment 18 Michael Catanzaro 2016-01-23 07:27:36 PST
Comment on attachment 269573 [details]
Patch proposal + new layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=269573&action=review

> LayoutTests/fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html:51
> +    window.setTimeout(finishTest, 1000);

Could you [please] get rid of this nasty delay by counting the number of executions of dragEnd(), and calling finish() once it hits maxNumberOfRuns?
Comment 19 Mario Sanchez Prada 2016-01-25 02:59:13 PST
(In reply to comment #18)
> Comment on attachment 269573 [details]
> Patch proposal + new layout test
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269573&action=review
> 
> > LayoutTests/fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html:51
> > +    window.setTimeout(finishTest, 1000);
> 
> Could you [please] get rid of this nasty delay by counting the number of
> executions of dragEnd(), and calling finish() once it hits maxNumberOfRuns?

Yes, that's what I thought over the weekend I would be doing. Patch coming soon...
Comment 20 Mario Sanchez Prada 2016-01-25 03:11:29 PST
Created attachment 269735 [details]
Patch proposal + new layout test (no timeout)

Modified layout test not to use a timeout.
Comment 21 Build Bot 2016-01-25 04:05:08 PST
Comment on attachment 269735 [details]
Patch proposal + new layout test (no timeout)

Attachment 269735 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/736436

New failing tests:
fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html
Comment 22 Build Bot 2016-01-25 04:05:13 PST
Created attachment 269737 [details]
Archive of layout-test-results from ews100 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 23 Build Bot 2016-01-25 04:08:09 PST
Comment on attachment 269735 [details]
Patch proposal + new layout test (no timeout)

Attachment 269735 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/736423

New failing tests:
fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html
Comment 24 Build Bot 2016-01-25 04:08:15 PST
Created attachment 269738 [details]
Archive of layout-test-results from ews116 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 25 Build Bot 2016-01-25 04:10:07 PST
Comment on attachment 269735 [details]
Patch proposal + new layout test (no timeout)

Attachment 269735 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/736439

New failing tests:
fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html
Comment 26 Build Bot 2016-01-25 04:10:13 PST
Created attachment 269739 [details]
Archive of layout-test-results from ews105 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 27 Mario Sanchez Prada 2016-01-25 06:32:05 PST
Created attachment 269744 [details]
Patch proposal + new layout test (no timeout)
Comment 28 Build Bot 2016-01-25 07:20:47 PST
Comment on attachment 269744 [details]
Patch proposal + new layout test (no timeout)

Attachment 269744 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/736948

New failing tests:
fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html
Comment 29 Build Bot 2016-01-25 07:20:53 PST
Created attachment 269746 [details]
Archive of layout-test-results from ews100 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 30 Build Bot 2016-01-25 07:35:57 PST
Comment on attachment 269744 [details]
Patch proposal + new layout test (no timeout)

Attachment 269744 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/736972

New failing tests:
fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html
Comment 31 Build Bot 2016-01-25 07:36:04 PST
Created attachment 269748 [details]
Archive of layout-test-results from ews105 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 32 Build Bot 2016-01-25 07:38:44 PST
Comment on attachment 269744 [details]
Patch proposal + new layout test (no timeout)

Attachment 269744 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/736940

New failing tests:
fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html
Comment 33 Build Bot 2016-01-25 07:38:50 PST
Created attachment 269749 [details]
Archive of layout-test-results from ews114 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 34 Michael Catanzaro 2016-01-25 07:59:53 PST
Thanks for removing the timeout from the test. We should only use timeouts when absolutely necessary since it makes the test vulnerable to flakiness, and significantly slows down the tests.

At this point, I would stop fighting with the bots and just land this. I would file a Mac bug to make this test work, and land it with an expected fail in the Mac test expectations, so it might get fixed if someone with a Mac is interested. I agree, it's a test that ought to pass on every platform.
Comment 35 Mario Sanchez Prada 2016-01-25 08:24:24 PST
(In reply to comment #34)
> Thanks for removing the timeout from the test. We should only use timeouts
> when absolutely necessary since it makes the test vulnerable to flakiness,
> and significantly slows down the tests.

Agree. Shame on me I put it in the first place, happy to see it go away...

> At this point, I would stop fighting with the bots and just land this. I
> would file a Mac bug to make this test work, and land it with an expected
> fail in the Mac test expectations, so it might get fixed if someone with a
> Mac is interested. I agree, it's a test that ought to pass on every platform.

It's kind of weird, though, that the ondragend callback is not the only thing failing on the Mac now: when I had the timeout I could also see the prints from every call to runNextStep(), which are gone now as well.

Anyway, I agree it's probably not the most efficient use of time to keep fighting the EWS, but still I'd like to give it one more shot before adding it to the TestExpectations Mac file and calling it a day: tonight I'm hoping to do some investigation on my mac mini, see if I can figure out what's wrong. If after that I still have no clue, then I'll do what you just said.
Comment 36 Mario Sanchez Prada 2016-01-25 13:08:42 PST
(In reply to comment #35)
> [...]
> It's kind of weird, though, that the ondragend callback is not the only
> thing failing on the Mac now: when I had the timeout I could also see the
> prints from every call to runNextStep(), which are gone now as well.

Looks like the issue in the mac-wk2 is expected after all, as I can see a whole lot of tests being skipped already in the TestExpectations file for that platform, with a reference to bug 42194 ("WebKitTestRunner needs a more-complete implementation of eventSender"), which explains the timeout.

As for the failures still reported for the other mac bots, they seem to happen due to the last print happening out of order, which likely means we will still a timeout in there, even if an small one. So, I'm going to try a new patch introducing zero-seconds timeouts to simulate an on_idle() callback, as it's done in other drag-and-drop tests. Let's see if that helps...

I'll also add the test to the mac-wk2 TestExpectations file now, with a reference to bug 42194.

> Anyway, I agree it's probably not the most efficient use of time to keep
> fighting the EWS, but still I'd like to give it one more shot before adding
> it to the TestExpectations Mac file and calling it a day: tonight I'm hoping
> to do some investigation on my mac mini, see if I can figure out what's
> wrong. If after that I still have no clue, then I'll do what you just said.

JFTR, I've reduced the new layout test to remove the WKT/DRT specific stuff and run it manually on my Mac mini, and everything worked as expected, which kind of confirms this is an issue with the helper tools for testing, and not with WK or the test itself.
Comment 37 Mario Sanchez Prada 2016-01-25 13:13:48 PST
Created attachment 269782 [details]
Patch proposal + new layout test (no timeout)

Added new patch adding zero-seconds timeout and failure expectation for the mac-wk2 bot, as mentioned before.

Fingers crossed...
Comment 38 Mario Sanchez Prada 2016-01-25 15:47:33 PST
Comment on attachment 269782 [details]
Patch proposal + new layout test (no timeout)

Finally! Setting the r? flag now, please review. Thanks!
Comment 39 Michael Catanzaro 2016-01-25 17:35:02 PST
Comment on attachment 269782 [details]
Patch proposal + new layout test (no timeout)

View in context: https://bugs.webkit.org/attachment.cgi?id=269782&action=review

> LayoutTests/fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html:83
> +<div id="console"></div>

Is this div doing anything?
Comment 40 Mario Sanchez Prada 2016-01-26 01:42:45 PST
(In reply to comment #39)
> Comment on attachment 269782 [details]
> Patch proposal + new layout test (no timeout)
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269782&action=review
> 
> > LayoutTests/fast/events/drag-and-drop-link-fast-multiple-times-does-not-crash.html:83
> > +<div id="console"></div>
> 
> Is this div doing anything?

Yes. It's where javascript functions such as debug() or shouldBeTrue() will "print" their output into. See LayoutTests/resources/js-test-pre.js
Comment 41 Mario Sanchez Prada 2016-01-26 01:46:06 PST
Committed r195586: <http://trac.webkit.org/changeset/195586>
Comment 42 Alexey Proskuryakov 2016-01-26 09:20:09 PST
FWIW, js-test-pre.js will create the console element if there isn't one in markup. It is sometimes useful to have an explicit one in test source to guarantee its position in the DOM tree, which may be the case here.
Comment 43 Darin Adler 2016-01-26 09:25:14 PST
(In reply to comment #42)
> FWIW, js-test-pre.js will create the console element if there isn't one in
> markup. It is sometimes useful to have an explicit one in test source to
> guarantee its position in the DOM tree, which may be the case here.

To put this in a larger context, tests using js-test-pre.js are often written in ways that are needlessly complex, for example with the test JavaScript put in a separate file (valuable for pure JavaScript tests that can be run without involving HTML, but not a good practice for the other tests), or with explicit elements like the console that are only needed in certain special cases, not for all tests.

When writing new tests we want to avoid repeating these old patterns unnecessarily.
Comment 44 Mario Sanchez Prada 2016-01-29 07:01:47 PST
Created attachment 270206 [details]
Patch to fix the build breakage with GTK+ < 3.16

It seems we overlooked the fact that gtk_drag_cancel() is available from GTK+ 3.16, meaning we broke the build for older versions of GTK from 3.6 onwards.

Unfortunately, there's no clear alternative to gtk_drag_cancel() in GTK to reliably cancel DnD operations so we are proposing this patch that basically restores the old behaviour for builds using old versions of GTK+, while keeping it fixed for newer ones. While this is non-optimal, it seems like the best way to fix this problem (which is not easy to reproduce anyway) at least in the short term.

Please review, thanks!
Comment 45 Mario Sanchez Prada 2016-01-29 07:02:45 PST
Reopening this for now
Comment 46 Mario Sanchez Prada 2016-01-29 07:36:47 PST
Created attachment 270212 [details]
Patch to fix the build breakage with GTK+ < 3.16

Now uploading a patch that actually builds!
Comment 47 Michael Catanzaro 2016-01-29 08:14:25 PST
Comment on attachment 270212 [details]
Patch to fix the build breakage with GTK+ < 3.16

:/
Comment 48 Mario Sanchez Prada 2016-01-29 08:26:39 PST
Committed r195811: <http://trac.webkit.org/changeset/195811>