Bug 165082

Summary: [SOUP] Network process crash in WebKit::CustomProtocolManagerImpl::didFailWithError
Product: WebKit Reporter: Michael Catanzaro <mcatanzaro>
Component: WebKitGTKAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Major CC: achristensen, aestes, bugs-noreply, cgarcia, commit-queue, loic.yhuel, mcatanzaro, mkwst
Priority: P2    
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1398951
https://bugzilla.redhat.com/show_bug.cgi?id=1402191
https://bugzilla.redhat.com/show_bug.cgi?id=1419362
https://bugzilla.redhat.com/show_bug.cgi?id=1426449
Attachments:
Description Flags
Patch
achristensen: review-
New patch
none
Patch none

Description Michael Catanzaro 2016-11-27 07:26:34 PST
This might be code that was recently removed from master, but anyway:

Truncated backtrace:
Thread no. 1 (10 frames)
 #0 WTF::GRefPtr<_GInputStream>::operator! at /usr/src/debug/webkitgtk-2.14.1/Source/WTF/wtf/glib/GRefPtr.h:109
 #1 WebKit::CustomProtocolManagerImpl::didFailWithError at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/NetworkProcess/CustomProtocols/soup/CustomProtocolManagerImpl.cpp:123
 #2 IPC::callMemberFunctionImpl<WebKit::CustomProtocolManager, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&), std::tuple<unsigned long, WebCore::ResourceError>, 0ul, 1ul>(WebKit::CustomProtocolManager*, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&), std::tuple<unsigned long, WebCore::ResourceError>&&, std::integer_sequence<unsigned long, 0ul, 1ul>) at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/Platform/IPC/HandleMessage.h:13
 #3 IPC::callMemberFunction<WebKit::CustomProtocolManager, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&), std::tuple<unsigned long, WebCore::ResourceError>, std::integer_sequence<unsigned long, 0ul, 1ul> >(std::tuple<unsigned long, WebCore::ResourceError>&&, WebKit::CustomProtocolManager*, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&)) at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/Platform/IPC/HandleMessage.h:19
 #4 IPC::handleMessage<Messages::CustomProtocolManager::DidFailWithError, WebKit::CustomProtocolManager, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&)> at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/Platform/IPC/HandleMessage.h:99
 #5 WebKit::CustomProtocolManager::didReceiveMessage at /usr/src/debug/webkitgtk-2.14.1/x86_64-redhat-linux-gnu/DerivedSources/WebKit2/CustomProtocolManagerMessageReceiver.cpp:45
 #6 IPC::Connection::dispatchWorkQueueMessageReceiverMessage at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/Platform/IPC/Connection.cpp:285
 #7 WTF::Function<void ()>::operator()() const at /usr/src/debug/webkitgtk-2.14.1/Source/WTF/wtf/Function.h:50
 #8 WTF::RunLoop::performWork at /usr/src/debug/webkitgtk-2.14.1/Source/WTF/wtf/RunLoop.cpp:105
 #9 WTF::RunLoop::<lambda(gpointer)>::operator() at /usr/src/debug/webkitgtk-2.14.1/Source/WTF/wtf/glib/RunLoopGLib.cpp:66

Full backtrace downstream.
Comment 1 Michael Catanzaro 2016-11-27 07:26:58 PST
116 reports of this.
Comment 2 Michael Catanzaro 2016-11-27 07:28:34 PST
Oh and the user report: "Simply opened an email within Evolution and the crash occured.  I think the email contained HTML elements."
Comment 3 Michael Catanzaro 2016-12-07 06:12:21 PST
(In reply to comment #1)
> 116 reports of this.

We're now up to 285 reports of this crash.
Comment 4 Michael Catanzaro 2017-01-02 07:52:05 PST
We're now up to 850 reports of this crash (in Fedora). Bug #165655 is potentially-related.
Comment 5 Carlos Garcia Campos 2017-01-03 02:16:15 PST
There is one thing I don't understand from the backtrace, the main thread is a WorkQueue thread. How can this happen? 

So what happen seems to be that CustomProtocolManager messages are handled in the work queue thread instead of the real main thread. Other IPC messages are handled in the real main thread (Thread 11) and we have two threads using CustomProtocolManager at the same time. Another thing is why we ping custom uri schemes, though. I need a way to reproduce this.
Comment 6 Michael Catanzaro 2017-01-03 04:57:13 PST
*** Bug 165655 has been marked as a duplicate of this bug. ***
Comment 7 Michael Catanzaro 2017-01-03 05:16:55 PST
(In reply to comment #5)
> There is one thing I don't understand from the backtrace, the main thread is
> a WorkQueue thread. How can this happen? 

It can't happen. Thread 1 is the crashing thread, not the main thread.

In bug 1398951 the main thread is thread 11 (LWP 8497, lower number than any of the other threads) which is using CustomProtocolManagerImpl at the time of the crash, same as the crashing thread.

In bug 1402191, which I just realized was not public (fixed), the main thread is thread 5 (LWP 5528) which is calling WebKit::NetworkLoad::NetworkLoad. In this backtrace, the only the crashing thread's backtrace contains CustomProtocolManagerImpl.
Comment 8 Carlos Garcia Campos 2017-01-03 05:27:12 PST
(In reply to comment #7)
> (In reply to comment #5)
> > There is one thing I don't understand from the backtrace, the main thread is
> > a WorkQueue thread. How can this happen? 
> 
> It can't happen. Thread 1 is the crashing thread, not the main thread.

Isn't Thread 1 always the main thread?

> In bug 1398951 the main thread is thread 11 (LWP 8497, lower number than any
> of the other threads) which is using CustomProtocolManagerImpl at the time
> of the crash, same as the crashing thread.
> 
> In bug 1402191, which I just realized was not public (fixed), the main
> thread is thread 5 (LWP 5528) which is calling
> WebKit::NetworkLoad::NetworkLoad. In this backtrace, the only the crashing
> thread's backtrace contains CustomProtocolManagerImpl.
Comment 9 Michael Catanzaro 2017-01-03 08:01:15 PST
No, thread 1 in gdb is the thread that crashed.
Comment 10 Michael Catanzaro 2017-01-03 08:03:33 PST
(In reply to comment #9)
> No, thread 1 in gdb is the thread that crashed.

At least in backtraces taken from core dumps it is. Maybe that's different if running a program via gdb.
Comment 11 Carlos Garcia Campos 2017-01-04 01:25:14 PST
Created attachment 297998 [details]
Patch

Ok, I finally understood the problem, see the changelog for the details. Alex, if this can be done in Mac too, I can do it in a follow up patch, or maybe even as part of the refactoring proposed in bug #165028
Comment 12 Alex Christensen 2017-01-04 07:17:27 PST
I just verified that the API test added in r149194 passes on Mac with CustomProtocolManager being a MessageReceiver instead of a WorkQueueMessageReceiver.  I did this with and without NetworkSession, but with NetworkSession also requires changing globalCustomProtocolManager() to be a std::unique_ptr and setCustomProtocolManager to call the std::unique_ptr constructor explicitly.
Let's give it a shot and see what Andy says.  We don't have many clients using WebKit2's CustomProtocolManager because it's not public API, so it's possible we have a similar thread safety issue on Mac and just don't get a lot of stack traces.
Comment 13 Carlos Garcia Campos 2017-01-04 08:22:14 PST
(In reply to comment #12)
> I just verified that the API test added in r149194 passes on Mac with
> CustomProtocolManager being a MessageReceiver instead of a
> WorkQueueMessageReceiver.  I did this with and without NetworkSession, but
> with NetworkSession also requires changing globalCustomProtocolManager() to
> be a std::unique_ptr and setCustomProtocolManager to call the
> std::unique_ptr constructor explicitly.
> Let's give it a shot and see what Andy says.  We don't have many clients
> using WebKit2's CustomProtocolManager because it's not public API, so it's
> possible we have a similar thread safety issue on Mac and just don't get a
> lot of stack traces.

In mac the manager is thread safe, the registered schemes were protected in r149198, in r149520 methods were dispatched to the main thread and in r149121 mutex were added. I think we should revert all those things if we change to use message receiver again. There are a lot of changes in mac code that I can't test myself, but I can try.
Comment 14 Alex Christensen 2017-01-04 11:38:33 PST
(In reply to comment #12)
> ...and setCustomProtocolManager to call the
> std::unique_ptr constructor explicitly.
Woah, definitely don't do this.  Use unique_ptr properly if we make this change.

Why not just make the soup custom protocol handler thread safe?  I think it would be strange to have one port be a work queue message receiver and one port not.
Comment 15 Andy Estes 2017-01-04 12:00:50 PST
(In reply to comment #12)
> I just verified that the API test added in r149194 passes on Mac with
> CustomProtocolManager being a MessageReceiver instead of a
> WorkQueueMessageReceiver.  I did this with and without NetworkSession, but
> with NetworkSession also requires changing globalCustomProtocolManager() to
> be a std::unique_ptr and setCustomProtocolManager to call the
> std::unique_ptr constructor explicitly.
> Let's give it a shot and see what Andy says.  We don't have many clients
> using WebKit2's CustomProtocolManager because it's not public API, so it's
> possible we have a similar thread safety issue on Mac and just don't get a
> lot of stack traces.

It looks like synchronous XHR is now handled by sending a delayed synchronous CoreIPC message from the Web process to the Networking process. The networking process handles the load asynchronously and replies to the Web process when finished.

Since there is no longer a nested, non-common-mode run loop, there's no longer a need for a WorkQueue in CustomProtocolManager.

At least on Mac, there is still a need for some synchronization between the main thread and the thread that NSURLProtocol methods are called on, but that shouldn't affect the Soup implementation.
Comment 16 Andy Estes 2017-01-04 12:09:43 PST
Carlos, if you want to remove the WorkQueue and related synchronization on all platforms, I'd review that patch.
Comment 17 Alex Christensen 2017-01-04 16:32:54 PST
Comment on attachment 297998 [details]
Patch

I agree.  Let's keep this the same for all platforms.
Comment 18 Carlos Garcia Campos 2017-01-04 22:48:46 PST
(In reply to comment #14)
> (In reply to comment #12)
> > ...and setCustomProtocolManager to call the
> > std::unique_ptr constructor explicitly.
> Woah, definitely don't do this.  Use unique_ptr properly if we make this
> change.
> 
> Why not just make the soup custom protocol handler thread safe?  I think it
> would be strange to have one port be a work queue message receiver and one
> port not.

Because we don't want jumps from one thread to the other and mutexes if we can just use the main thread.
Comment 19 Carlos Garcia Campos 2017-01-04 22:49:09 PST
(In reply to comment #16)
> Carlos, if you want to remove the WorkQueue and related synchronization on
> all platforms, I'd review that patch.

Okidoki
Comment 20 Carlos Garcia Campos 2017-01-05 02:50:24 PST
Created attachment 298083 [details]
New patch

Made it common to cocoa and soup. I left the mutexes and initiating run loop thing because I don't know if they are still needed in mac. I'm not sure the Xcode changes are correct.
Comment 21 Carlos Garcia Campos 2017-01-05 02:50:58 PST
Michael, in the stable branch I'll land the other patch
Comment 22 Carlos Garcia Campos 2017-01-05 03:37:08 PST
The new cpp file is not buili in mac, so the Xcode changes are definitely not correct, *sigh*.
Comment 23 Alex Christensen 2017-01-05 12:00:13 PST
Created attachment 298119 [details]
Patch
Comment 24 WebKit Commit Bot 2017-01-05 13:28:48 PST
Comment on attachment 298119 [details]
Patch

Clearing flags on attachment: 298119

Committed r210374: <http://trac.webkit.org/changeset/210374>
Comment 25 WebKit Commit Bot 2017-01-05 13:28:53 PST
All reviewed patches have been landed.  Closing bug.
Comment 26 Carlos Garcia Campos 2017-01-07 00:28:49 PST
Thank you very much, Alex.
Comment 27 Michael Catanzaro 2017-02-02 13:50:41 PST
(In reply to comment #21)
> Michael, in the stable branch I'll land the other patch

It looks like you forgot? We have 190 reports of this crash in Fedora from users who have upgraded to 2.14.3 alone!
Comment 28 Carlos Garcia Campos 2017-02-02 22:28:25 PST
(In reply to comment #27)
> (In reply to comment #21)
> > Michael, in the stable branch I'll land the other patch
> 
> It looks like you forgot? We have 190 reports of this crash in Fedora from
> users who have upgraded to 2.14.3 alone!

I guess I did. I'll add it to the wiki to ensure I won't forget again, it's the best way to make sure.
Comment 29 Loïc Yhuel 2017-04-05 11:07:04 PDT
This commit reverted https://bugs.webkit.org/show_bug.cgi?id=146014 so reintroduced the warning.