Bug 42785

Summary: Crash in WebKit2WebProcess in WaitForMultipleObjects beneath WorkQueue::workQueueThreadBody when running tests that produce a lot of output
Product: WebKit Reporter: Adam Roben (:aroben) <aroben>
Component: WebKit2Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, mjs, zoltan
Priority: P2 Keywords: InRadar, PlatformOnly
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows XP   
Attachments:
Description Flags
Patch andersca: review+

Description Adam Roben (:aroben) 2010-07-21 14:01:09 PDT
To reproduce:

1. run-webkit-tests --webkit-test-runner

WebKit2WebProcess.exe crashes when WebKitTestRunner exits. It's crashing due to a null dereference inside WaitForMultipleObjects. Here's the backtrace:

 	00000000()	
 	0236ff10()	
 	kernel32.dll!_WaitForMultipleObjects@16()  + 0x18 bytes	
>	WebKit.dll!WorkQueue::workQueueThreadBody()  Line 63 + 0x1e bytes	C++
 	WebKit.dll!WorkQueue::workQueueThreadBody(void * context=0x01d90a8c)  Line 46	C++
 	JavaScriptCore.dll!WTF::threadEntryPoint(void * contextData=0x01d90c18)  Line 65 + 0x9 bytes	C++
 	JavaScriptCore.dll!WTF::wtfThreadEntryPoint(void * param=0x01d90c48)  Line 202 + 0x9 bytes	C++
 	msvcr80.dll!_callthreadstartex()  Line 348 + 0x6 bytes	C
 	msvcr80.dll!_threadstartex(void * ptd=0x01d90c58)  Line 326 + 0x5 bytes	C
 	kernel32.dll!_BaseThreadStart@8()  + 0x37 bytes
Comment 1 Adam Roben (:aroben) 2010-07-21 14:02:13 PDT
MSDN says:

If one of these handles is closed while the wait is still pending, the function's behavior is undefined.
<http://msdn.microsoft.com/en-us/library/ms687025(VS.85).aspx>

My guess is this is the situation we're running into.
Comment 2 Adam Roben (:aroben) 2010-07-21 14:02:39 PDT
<rdar://problem/8218522>
Comment 3 Anders Carlsson 2010-07-21 14:12:25 PDT
(In reply to comment #1)
> MSDN says:
> 
> If one of these handles is closed while the wait is still pending, the function's behavior is undefined.
> <http://msdn.microsoft.com/en-us/library/ms687025(VS.85).aspx>
> 
> My guess is this is the situation we're running into.

Yeah probably. We need to add a way to unregister handles on a Work Queue.
Comment 4 Adam Roben (:aroben) 2010-07-21 14:52:36 PDT
I thought about using a WorkItem to close the handle. Doing it this way guarantees that we're not waiting on the handle at the time we close it, because WorkItems are only processed when WaitForMultipleObjects is not being called.

However, if the WorkQueue is invalidated before we process one of these close-the-handle WorkItems, the handle will be leaked. And this doesn't help us close the WorkQueue's m_performWorkEvent itself.
Comment 5 Adam Roben (:aroben) 2010-07-21 14:53:48 PDT
If we changed WorkQueue to always call performWork(), even when m_performWorkEvent is not the handle that was signaled, we could still use the WorkItem approach described in comment 4, as we'd be guaranteed that all WorkItems would be processed before the queue stops.
Comment 6 Adam Roben (:aroben) 2010-07-21 14:59:08 PDT
It looks like Mac doesn't process any more WorkItems once a WorkQueue is invalidated. On Windows, we might still process some more WorkItems once a WorkQueue is invalidated. We should fix Windows to work like Mac. This means that my idea in comment 5 won't work.
Comment 7 Anders Carlsson 2010-07-21 15:04:16 PDT
(In reply to comment #6)
> It looks like Mac doesn't process any more WorkItems once a WorkQueue is invalidated. On Windows, we might still process some more WorkItems once a WorkQueue is invalidated. We should fix Windows to work like Mac. This means that my idea in comment 5 won't work.

Agreed. We should also free all work items when a work queue is invalidated.
Comment 8 Adam Roben (:aroben) 2010-07-22 06:33:59 PDT
I think using a pair of events will work. The first event will be used to tell the work queue thread to stop waiting. The second event will be used to tell the main thread that waiting has stopped.
Comment 9 Adam Roben (:aroben) 2010-07-22 06:45:59 PDT
It looks like Connection::invalidate is never being called in the UI process. I don't see any code that would do this. It looks like the right place to do it would be in WebProcessProxy.

Connection::invalidate is also not called in the web process before the crash happens.
Comment 10 Adam Roben (:aroben) 2010-07-22 08:29:13 PDT
I've done some more investigation and identified 3 issues that are probably contributing to this bug. The first is the low-level issue identified in comment 1:

Bug 42826: Crash in WaitForMultipleObjects in WorkQueue because WorkQueue keeps waiting on closed HANDLEs

However, due to what I discovered in comment 9, it's not entirely clear that this is the immediate cause of the bug, since it looks like HANDLEs aren't getting closed at all.

HANDLEs aren't getting closed because of the combination of these two bugs:

Bug 42825: WebKitTestRunner never destroys its main PlatformWebView (and thus its WKView[Ref])
Bug 42828: WebKit2 fails to shut down the web process if some WKPages still exist when the UI process exits

If we fix one or both of those, it's possible the crash will go away. Or maybe then we'll *really* start running into bug 42826.

I'm going to mark this bug as blocked by the above three bugs, even though we might be able to fix the crash by fixing just one of the above bugs, and we aren't totally sure that bug 42826 is even an issue in this case. I'll work on the above three bugs and then see if this crash still happens.
Comment 11 Adam Roben (:aroben) 2010-07-26 12:44:21 PDT
(In reply to comment #6)
> It looks like Mac doesn't process any more WorkItems once a WorkQueue is invalidated. On Windows, we might still process some more WorkItems once a WorkQueue is invalidated.

Specifically, WorkItems added by registerHandle might still be processed. Those added by scheduleWork will not be processed, due to checking done in performWork.
Comment 12 Adam Roben (:aroben) 2010-07-26 12:58:33 PDT
Strangely, turning on full page heap makes the crash go away.
Comment 13 Adam Roben (:aroben) 2010-07-26 13:58:41 PDT
Fixing both bug 42826 and bug 42825 does not fix this crash.

At the time the crash happens, no Connection or WorkQueue has been destroyed or invalidated.

The crash seems to happen soon after the InjectedBundle sends the render tree dump to the UI process.
Comment 14 Adam Roben (:aroben) 2010-07-26 14:30:38 PDT
This command does not crash:

run-webkit-tests --webkit-test-runner fast/js/parseFloat.html

Maybe the crash has to do with the amount of data being sent in the dump?
Comment 15 Maciej Stachowiak 2010-07-26 17:34:37 PDT
(In reply to comment #14)
> This command does not crash:
> 
> run-webkit-tests --webkit-test-runner fast/js/parseFloat.html
> 
> Maybe the crash has to do with the amount of data being sent in the dump?

Maybe large messages get split in some way? (Or need to be but aren't?)
Comment 16 Adam Roben (:aroben) 2010-07-27 07:13:45 PDT
> (In reply to comment #14)
> > This command does not crash:
> > 
> > run-webkit-tests --webkit-test-runner fast/js/parseFloat.html
> > 
> > Maybe the crash has to do with the amount of data being sent in the dump?
> 
> Maybe large messages get split in some way? (Or need to be but aren't?)

<http://trac.webkit.org/changeset/63852> supposedly made CoreIPC able to read large messages. But maybe there's a bug in there somewhere, or something else we have to do.(In reply to comment #15)
Comment 17 Adam Roben (:aroben) 2010-07-28 08:14:43 PDT
As an experiment, I modified InjectedBundle to send strings of a certain length in InjectedBundle::done, rather than sending the test output. Sending a string of 2023 characters crashes almost every time. This turns into a 4074-byte message. It looks like there are usually two 20-byte messages that get sent with it, for a total of 4114 bytes. This is just a little bit over the 4096-byte buffer size we specify to ::CreateNamedPipe. Increasing the buffer size we specify to ::CreateNamedPipe to 8192 bytes fixes the crash in this experiment.

So it looks like we have a problem with sending messages larger than the pipe's buffer size.
Comment 18 Adam Roben (:aroben) 2010-07-28 08:20:04 PDT
MSDN says:

The input and output buffer sizes are advisory. The actual buffer size reserved for each end of the named pipe is either the system default, the system minimum or maximum, or the specified size rounded up to the next allocation boundary. The buffer size specified should be small enough that your process will not run out of nonpaged pool, but large enough to accommodate typical requests.

Whenever a pipe write operation occurs, the system first tries to charge the memory against the pipe write quota. If the remaining pipe write quota is enough to fulfill the request, the write operation completes immediately. If the remaining pipe write quota is too small to fulfill the request, the system will try to expand the buffers to accommodate the data using nonpaged pool reserved for the process. The write operation will block until the data is read from the pipe so that the additional buffer quota can be released. Therefore, if your specified buffer size is too small, the system will grow the buffer as needed, but the downside is that the operation will block. If the operation is overlapped, a system thread is blocked; otherwise, the application thread is blocked.
http://msdn.microsoft.com/en-us/library/aa365150(VS.85).aspx
Comment 19 Adam Roben (:aroben) 2010-07-28 08:21:43 PDT
Maybe the ArgumentEncoder is being destroyed before ::WriteFile has finished accessing its data?
Comment 20 Adam Roben (:aroben) 2010-07-28 08:22:45 PDT
(In reply to comment #19)
> Maybe the ArgumentEncoder is being destroyed before ::WriteFile has finished accessing its data?

The docs for ::WriteFile say:


Accessing the output buffer while a write operation is using the buffer may lead to corruption of the data written from that buffer. Applications must not write to, reallocate, or free the output buffer that a write operation is using until the write operation completes. This can be particularly problematic when using an asynchronous file handle. Additional information regarding synchronous versus asynchronous file handles can be found later in the Synchronization and File Position section and Synchronous and Asynchronous I/O.
http://msdn.microsoft.com/en-us/library/aa365747(VS.85).aspx

This seems to make my theory more plausible.
Comment 21 Adam Roben (:aroben) 2010-07-28 08:24:29 PDT
The docs also say:

Because the write operation starts at the offset that is specified in the OVERLAPPED structure, and WriteFile may return before the system-level write operation is complete (write pending), neither the offset nor any other part of the structure should be modified, freed, or reused by the application until the event is signaled (that is, the write completes).
http://msdn.microsoft.com/en-us/library/aa365747(VS.85).aspx

So, I think <http://trac.webkit.org/changeset/63852> was buggy. We need to wait for ::WriteFile to complete before destroying the ArgumentEncoder.
Comment 22 Adam Roben (:aroben) 2010-07-28 08:31:06 PDT
One way I think we could fix this is:

1) Allocate an event and set it as the overlapped.hEvent member before passing the overlapped structure to ::WriteFile
2) If ::WriteFile succeeds, destroy the event
3) If ::WriteFile fails with ERROR_IO_PENDING, call ::RegisterWaitForSingleObject, passing overlapped.hEvent as the object to wait on and a newly-heap-allocated structure that holds the event and the ArgumentEncoder as the context pointer
4) When the wait callback function is called, destroy the ArgumentEncoder and the event
Comment 23 Adam Roben (:aroben) 2010-07-28 08:38:01 PDT
(In reply to comment #22)
> One way I think we could fix this is:
> 
> 1) Allocate an event and set it as the overlapped.hEvent member before passing the overlapped structure to ::WriteFile
> 2) If ::WriteFile succeeds, destroy the event
> 3) If ::WriteFile fails with ERROR_IO_PENDING, call ::RegisterWaitForSingleObject, passing overlapped.hEvent as the object to wait on and a newly-heap-allocated structure that holds the event and the ArgumentEncoder as the context pointer
> 4) When the wait callback function is called, destroy the ArgumentEncoder and the event

Anders pointed out that this strategy will likely create too many events if many large messages are sent. He suggested having a single write event (like we have a single read event), and changing sendOutgoingMessages to be able to send only some messages, and then resume later.
Comment 24 Adam Roben (:aroben) 2010-07-28 11:33:43 PDT
Created attachment 62850 [details]
Patch
Comment 25 Adam Roben (:aroben) 2010-07-28 12:59:07 PDT
Committed r64223: <http://trac.webkit.org/changeset/64223>