RESOLVED FIXED 165082
[SOUP] Network process crash in WebKit::CustomProtocolManagerImpl::didFailWithError
https://bugs.webkit.org/show_bug.cgi?id=165082
Summary [SOUP] Network process crash in WebKit::CustomProtocolManagerImpl::didFailWit...
Michael Catanzaro
Reported 2016-11-27 07:26:34 PST
This might be code that was recently removed from master, but anyway: Truncated backtrace: Thread no. 1 (10 frames) #0 WTF::GRefPtr<_GInputStream>::operator! at /usr/src/debug/webkitgtk-2.14.1/Source/WTF/wtf/glib/GRefPtr.h:109 #1 WebKit::CustomProtocolManagerImpl::didFailWithError at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/NetworkProcess/CustomProtocols/soup/CustomProtocolManagerImpl.cpp:123 #2 IPC::callMemberFunctionImpl<WebKit::CustomProtocolManager, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&), std::tuple<unsigned long, WebCore::ResourceError>, 0ul, 1ul>(WebKit::CustomProtocolManager*, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&), std::tuple<unsigned long, WebCore::ResourceError>&&, std::integer_sequence<unsigned long, 0ul, 1ul>) at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/Platform/IPC/HandleMessage.h:13 #3 IPC::callMemberFunction<WebKit::CustomProtocolManager, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&), std::tuple<unsigned long, WebCore::ResourceError>, std::integer_sequence<unsigned long, 0ul, 1ul> >(std::tuple<unsigned long, WebCore::ResourceError>&&, WebKit::CustomProtocolManager*, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&)) at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/Platform/IPC/HandleMessage.h:19 #4 IPC::handleMessage<Messages::CustomProtocolManager::DidFailWithError, WebKit::CustomProtocolManager, void (WebKit::CustomProtocolManager::*)(unsigned long, WebCore::ResourceError const&)> at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/Platform/IPC/HandleMessage.h:99 #5 WebKit::CustomProtocolManager::didReceiveMessage at /usr/src/debug/webkitgtk-2.14.1/x86_64-redhat-linux-gnu/DerivedSources/WebKit2/CustomProtocolManagerMessageReceiver.cpp:45 #6 IPC::Connection::dispatchWorkQueueMessageReceiverMessage at /usr/src/debug/webkitgtk-2.14.1/Source/WebKit2/Platform/IPC/Connection.cpp:285 #7 WTF::Function<void ()>::operator()() const at /usr/src/debug/webkitgtk-2.14.1/Source/WTF/wtf/Function.h:50 #8 WTF::RunLoop::performWork at /usr/src/debug/webkitgtk-2.14.1/Source/WTF/wtf/RunLoop.cpp:105 #9 WTF::RunLoop::<lambda(gpointer)>::operator() at /usr/src/debug/webkitgtk-2.14.1/Source/WTF/wtf/glib/RunLoopGLib.cpp:66 Full backtrace downstream.
Attachments
Patch (5.59 KB, patch)
2017-01-04 01:25 PST, Carlos Garcia Campos
achristensen: review-
New patch (48.83 KB, patch)
2017-01-05 02:50 PST, Carlos Garcia Campos
no flags
Patch (50.36 KB, patch)
2017-01-05 12:00 PST, Alex Christensen
no flags
Michael Catanzaro
Comment 1 2016-11-27 07:26:58 PST
116 reports of this.
Michael Catanzaro
Comment 2 2016-11-27 07:28:34 PST
Oh and the user report: "Simply opened an email within Evolution and the crash occured. I think the email contained HTML elements."
Michael Catanzaro
Comment 3 2016-12-07 06:12:21 PST
(In reply to comment #1) > 116 reports of this. We're now up to 285 reports of this crash.
Michael Catanzaro
Comment 4 2017-01-02 07:52:05 PST
We're now up to 850 reports of this crash (in Fedora). Bug #165655 is potentially-related.
Carlos Garcia Campos
Comment 5 2017-01-03 02:16:15 PST
There is one thing I don't understand from the backtrace, the main thread is a WorkQueue thread. How can this happen? So what happen seems to be that CustomProtocolManager messages are handled in the work queue thread instead of the real main thread. Other IPC messages are handled in the real main thread (Thread 11) and we have two threads using CustomProtocolManager at the same time. Another thing is why we ping custom uri schemes, though. I need a way to reproduce this.
Michael Catanzaro
Comment 6 2017-01-03 04:57:13 PST
*** Bug 165655 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 7 2017-01-03 05:16:55 PST
(In reply to comment #5) > There is one thing I don't understand from the backtrace, the main thread is > a WorkQueue thread. How can this happen? It can't happen. Thread 1 is the crashing thread, not the main thread. In bug 1398951 the main thread is thread 11 (LWP 8497, lower number than any of the other threads) which is using CustomProtocolManagerImpl at the time of the crash, same as the crashing thread. In bug 1402191, which I just realized was not public (fixed), the main thread is thread 5 (LWP 5528) which is calling WebKit::NetworkLoad::NetworkLoad. In this backtrace, the only the crashing thread's backtrace contains CustomProtocolManagerImpl.
Carlos Garcia Campos
Comment 8 2017-01-03 05:27:12 PST
(In reply to comment #7) > (In reply to comment #5) > > There is one thing I don't understand from the backtrace, the main thread is > > a WorkQueue thread. How can this happen? > > It can't happen. Thread 1 is the crashing thread, not the main thread. Isn't Thread 1 always the main thread? > In bug 1398951 the main thread is thread 11 (LWP 8497, lower number than any > of the other threads) which is using CustomProtocolManagerImpl at the time > of the crash, same as the crashing thread. > > In bug 1402191, which I just realized was not public (fixed), the main > thread is thread 5 (LWP 5528) which is calling > WebKit::NetworkLoad::NetworkLoad. In this backtrace, the only the crashing > thread's backtrace contains CustomProtocolManagerImpl.
Michael Catanzaro
Comment 9 2017-01-03 08:01:15 PST
No, thread 1 in gdb is the thread that crashed.
Michael Catanzaro
Comment 10 2017-01-03 08:03:33 PST
(In reply to comment #9) > No, thread 1 in gdb is the thread that crashed. At least in backtraces taken from core dumps it is. Maybe that's different if running a program via gdb.
Carlos Garcia Campos
Comment 11 2017-01-04 01:25:14 PST
Created attachment 297998 [details] Patch Ok, I finally understood the problem, see the changelog for the details. Alex, if this can be done in Mac too, I can do it in a follow up patch, or maybe even as part of the refactoring proposed in bug #165028
Alex Christensen
Comment 12 2017-01-04 07:17:27 PST
I just verified that the API test added in r149194 passes on Mac with CustomProtocolManager being a MessageReceiver instead of a WorkQueueMessageReceiver. I did this with and without NetworkSession, but with NetworkSession also requires changing globalCustomProtocolManager() to be a std::unique_ptr and setCustomProtocolManager to call the std::unique_ptr constructor explicitly. Let's give it a shot and see what Andy says. We don't have many clients using WebKit2's CustomProtocolManager because it's not public API, so it's possible we have a similar thread safety issue on Mac and just don't get a lot of stack traces.
Carlos Garcia Campos
Comment 13 2017-01-04 08:22:14 PST
(In reply to comment #12) > I just verified that the API test added in r149194 passes on Mac with > CustomProtocolManager being a MessageReceiver instead of a > WorkQueueMessageReceiver. I did this with and without NetworkSession, but > with NetworkSession also requires changing globalCustomProtocolManager() to > be a std::unique_ptr and setCustomProtocolManager to call the > std::unique_ptr constructor explicitly. > Let's give it a shot and see what Andy says. We don't have many clients > using WebKit2's CustomProtocolManager because it's not public API, so it's > possible we have a similar thread safety issue on Mac and just don't get a > lot of stack traces. In mac the manager is thread safe, the registered schemes were protected in r149198, in r149520 methods were dispatched to the main thread and in r149121 mutex were added. I think we should revert all those things if we change to use message receiver again. There are a lot of changes in mac code that I can't test myself, but I can try.
Alex Christensen
Comment 14 2017-01-04 11:38:33 PST
(In reply to comment #12) > ...and setCustomProtocolManager to call the > std::unique_ptr constructor explicitly. Woah, definitely don't do this. Use unique_ptr properly if we make this change. Why not just make the soup custom protocol handler thread safe? I think it would be strange to have one port be a work queue message receiver and one port not.
Andy Estes
Comment 15 2017-01-04 12:00:50 PST
(In reply to comment #12) > I just verified that the API test added in r149194 passes on Mac with > CustomProtocolManager being a MessageReceiver instead of a > WorkQueueMessageReceiver. I did this with and without NetworkSession, but > with NetworkSession also requires changing globalCustomProtocolManager() to > be a std::unique_ptr and setCustomProtocolManager to call the > std::unique_ptr constructor explicitly. > Let's give it a shot and see what Andy says. We don't have many clients > using WebKit2's CustomProtocolManager because it's not public API, so it's > possible we have a similar thread safety issue on Mac and just don't get a > lot of stack traces. It looks like synchronous XHR is now handled by sending a delayed synchronous CoreIPC message from the Web process to the Networking process. The networking process handles the load asynchronously and replies to the Web process when finished. Since there is no longer a nested, non-common-mode run loop, there's no longer a need for a WorkQueue in CustomProtocolManager. At least on Mac, there is still a need for some synchronization between the main thread and the thread that NSURLProtocol methods are called on, but that shouldn't affect the Soup implementation.
Andy Estes
Comment 16 2017-01-04 12:09:43 PST
Carlos, if you want to remove the WorkQueue and related synchronization on all platforms, I'd review that patch.
Alex Christensen
Comment 17 2017-01-04 16:32:54 PST
Comment on attachment 297998 [details] Patch I agree. Let's keep this the same for all platforms.
Carlos Garcia Campos
Comment 18 2017-01-04 22:48:46 PST
(In reply to comment #14) > (In reply to comment #12) > > ...and setCustomProtocolManager to call the > > std::unique_ptr constructor explicitly. > Woah, definitely don't do this. Use unique_ptr properly if we make this > change. > > Why not just make the soup custom protocol handler thread safe? I think it > would be strange to have one port be a work queue message receiver and one > port not. Because we don't want jumps from one thread to the other and mutexes if we can just use the main thread.
Carlos Garcia Campos
Comment 19 2017-01-04 22:49:09 PST
(In reply to comment #16) > Carlos, if you want to remove the WorkQueue and related synchronization on > all platforms, I'd review that patch. Okidoki
Carlos Garcia Campos
Comment 20 2017-01-05 02:50:24 PST
Created attachment 298083 [details] New patch Made it common to cocoa and soup. I left the mutexes and initiating run loop thing because I don't know if they are still needed in mac. I'm not sure the Xcode changes are correct.
Carlos Garcia Campos
Comment 21 2017-01-05 02:50:58 PST
Michael, in the stable branch I'll land the other patch
Carlos Garcia Campos
Comment 22 2017-01-05 03:37:08 PST
The new cpp file is not buili in mac, so the Xcode changes are definitely not correct, *sigh*.
Alex Christensen
Comment 23 2017-01-05 12:00:13 PST
WebKit Commit Bot
Comment 24 2017-01-05 13:28:48 PST
Comment on attachment 298119 [details] Patch Clearing flags on attachment: 298119 Committed r210374: <http://trac.webkit.org/changeset/210374>
WebKit Commit Bot
Comment 25 2017-01-05 13:28:53 PST
All reviewed patches have been landed. Closing bug.
Carlos Garcia Campos
Comment 26 2017-01-07 00:28:49 PST
Thank you very much, Alex.
Michael Catanzaro
Comment 27 2017-02-02 13:50:41 PST
(In reply to comment #21) > Michael, in the stable branch I'll land the other patch It looks like you forgot? We have 190 reports of this crash in Fedora from users who have upgraded to 2.14.3 alone!
Carlos Garcia Campos
Comment 28 2017-02-02 22:28:25 PST
(In reply to comment #27) > (In reply to comment #21) > > Michael, in the stable branch I'll land the other patch > > It looks like you forgot? We have 190 reports of this crash in Fedora from > users who have upgraded to 2.14.3 alone! I guess I did. I'll add it to the wiki to ensure I won't forget again, it's the best way to make sure.
Loïc Yhuel
Comment 29 2017-04-05 11:07:04 PDT
This commit reverted https://bugs.webkit.org/show_bug.cgi?id=146014 so reintroduced the warning.
Note You need to log in before you can comment on or make changes to this bug.