Bug 142375

Summary: [GTK] UI process crashes if webkit_web_view_get_tls_info is called before internal load-committed event
Product: WebKit Reporter: Michael Catanzaro <mcatanzaro>
Component: WebKitGTKAssignee: Michael Catanzaro <mcatanzaro>
Status: RESOLVED FIXED    
Severity: Normal CC: agomez, berto, cgarcia, commit-queue, gustavo, mcatanzaro, mrobinson, ysuzuki, zan
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1198891
Bug Depends on: 150927    
Bug Blocks:    
Attachments:
Description Flags
Fix crash in webkit_web_view_get_tls_info
cgarcia: review-, cgarcia: commit-queue-
BT from gdb
none
Patch
mcatanzaro: review+
Another similar BT from gdb none

Description Michael Catanzaro 2015-03-05 18:05:35 PST
webkit_web_view_get_tls_info does not check if mainFrame->certificateInfo() is null before using it. It will always be null before WebFrameProxy::didCommitLoad is called, so we always crash in that case. We never noticed because Epiphany is probably the only user of this function, and it only calls it after receiving the WebKitGTK+ LOAD_COMMITTED event, since there is no point in calling this function before that. Now, I think it is user error to call this function before LOAD_COMMITTED, but we should return FALSE and maybe use g_warning() rather than crash.

So that is easy to fix. But, here is the fun part: it is possible that, if we have delayed the emission of LOAD_COMMITTED, WebFrameProxy::didStartProvisionalLoad gets called for the *next* load before we emit LOAD_COMMITTED for the previous load. In that case, the backtrace (crash reported downstream) looks like this:

 #0 webkit_web_view_get_tls_info at /lib64/libwebkit2gtk-4.0.so.37
 #1 load_changed_cb
 #6 webkitWebViewEmitLoadChanged(_WebKitWebView*, WebKitLoadEvent, bool) at /lib64/libwebkit2gtk-4.0.so.37
 #7 webkitWebViewEmitDelayedLoadEvents(_WebKitWebView*) at /lib64/libwebkit2gtk-4.0.so.37
 #8 webkitWebViewLoadChanged(_WebKitWebView*, WebKitLoadEvent) at /lib64/libwebkit2gtk-4.0.so.37
 #9 WebKit::WebPageProxy::didStartProvisionalLoadForFrame(unsigned long, unsigned long, WTF::String const&, WTF::String const&, WebKit::UserData const&) at /lib64/libwebkit2gtk-4.0.so.37
 #10 void IPC::handleMessage<Messages::WebPageProxy::DidStartProvisionalLoadForFrame, WebKit::WebPageProxy, void (WebKit::WebPageProxy::*)(unsigned long, unsigned long, WTF::String const&, WTF::String const&, WebKit::UserData const&)>(IPC::MessageDecoder&, WebKit::WebPageProxy*, void (WebKit::WebPageProxy::*)(unsigned long, unsigned long, WTF::String const&, WTF::String const&, WebKit::UserData const&)) at /lib64/libwebkit2gtk-4.0.so.37

So LOAD_COMMITTED gets emitted in response to the internal load started event, and Ephy crashes because it has used webkit_web_view_get_tls_info before the internal load committed event! Now, the crash is easy to fix, but is it really wise to emit LOAD_COMMITTED in this case? It is being done intentionally, judging by the comment "Finish a possible previous load waiting for main resource" in webkitWebViewLoadChanged, but I bet this will cause other unexpected, impossible to reproduce bugs. For instance, a random example: with this crash fixed, Epiphany would next call webkit_web_view_get_uri() and get the URI for the new page that's loading, not the page that was committed. Then it will get the next load started event without ever returning to the main loop (which I suppose a reasonable application might expect to happen). On the other hand, a reasonable application is more likely to expect to always receive a LOAD_COMMITTED for each LOAD_STARTED, so I guess we can't just skip it.

Eh, I will just fix the crash, but this is something to think about.
Comment 1 Michael Catanzaro 2015-03-05 18:23:01 PST
Mmmm, no, my explanation does not quite hold up, because the WebFrameProxy should always have a valid certificateInfo if any load has ever been committed in that frame before. What am I missing....
Comment 2 Michael Catanzaro 2015-03-05 18:27:22 PST
Created attachment 248028 [details]
Fix crash in webkit_web_view_get_tls_info
Comment 3 WebKit Commit Bot 2015-03-05 18:29:59 PST
Thanks for the patch. If this patch contains new public API please make sure it follows the guidelines for new WebKit2 GTK+ API. See http://trac.webkit.org/wiki/WebKitGTK/AddingNewWebKit2API
Comment 4 Martin Robinson 2015-03-05 18:33:31 PST
Comment on attachment 248028 [details]
Fix crash in webkit_web_view_get_tls_info

View in context: https://bugs.webkit.org/attachment.cgi?id=248028&action=review

> Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestSSL.cpp:47
> +        webkit_web_view_get_tls_info(m_webView, nullptr, nullptr);
> +        // Success: didn't crash
> +        LoadTrackingTest::provisionalLoadStarted();

Wouldn't it make sense to test that the return value is FALSE here as well?
Comment 5 Michael Catanzaro 2015-03-05 18:46:36 PST
(In reply to comment #4)
> Wouldn't it make sense to test that the return value is FALSE here as well?

That's what I tired at first, but see comment #1 for why it will sometimes return TRUE!

Actually, you could make a strong argument that that, too, is a bug we should fix.
Comment 6 Michael Catanzaro 2015-03-05 18:47:36 PST
(In reply to comment #5)
> That's what I tired at first

Um, Freudian slip? Good night!
Comment 7 Carlos Garcia Campos 2015-03-05 22:49:25 PST
(In reply to comment #0)
> webkit_web_view_get_tls_info does not check if mainFrame->certificateInfo()
> is null before using it. It will always be null before
> WebFrameProxy::didCommitLoad is called, so we always crash in that case.

Yes, it's documented:

"This function should be called after a response has been received from the server, so you can connect to #WebKitWebView::load-changed and call this function when it's emitted with %WEBKIT_LOAD_COMMITTED event."

It's true that we shouldn't crash, but it's a programmer error in any case.

> We
> never noticed because Epiphany is probably the only user of this function,
> and it only calls it after receiving the WebKitGTK+ LOAD_COMMITTED event,
> since there is no point in calling this function before that. Now, I think
> it is user error to call this function before LOAD_COMMITTED, but we should
> return FALSE and maybe use g_warning() rather than crash.

We have actually noticed several times, see bugs #91482 and #95949 for example. And it's the reason why we have all the delayed events mess in WebKitWebView, something that I think should be fixed in WebCore, but that I never found the time to do.

> So that is easy to fix. But, here is the fun part: it is possible that, if
> we have delayed the emission of LOAD_COMMITTED,
> WebFrameProxy::didStartProvisionalLoad gets called for the *next* load
> before we emit LOAD_COMMITTED for the previous load. In that case, the
> backtrace (crash reported downstream) looks like this:
> 
>  #0 webkit_web_view_get_tls_info at /lib64/libwebkit2gtk-4.0.so.37
>  #1 load_changed_cb
>  #6 webkitWebViewEmitLoadChanged(_WebKitWebView*, WebKitLoadEvent, bool) at
> /lib64/libwebkit2gtk-4.0.so.37
>  #7 webkitWebViewEmitDelayedLoadEvents(_WebKitWebView*) at
> /lib64/libwebkit2gtk-4.0.so.37
>  #8 webkitWebViewLoadChanged(_WebKitWebView*, WebKitLoadEvent) at
> /lib64/libwebkit2gtk-4.0.so.37
>  #9 WebKit::WebPageProxy::didStartProvisionalLoadForFrame(unsigned long,
> unsigned long, WTF::String const&, WTF::String const&, WebKit::UserData
> const&) at /lib64/libwebkit2gtk-4.0.so.37
>  #10 void
> IPC::handleMessage<Messages::WebPageProxy::DidStartProvisionalLoadForFrame,
> WebKit::WebPageProxy, void (WebKit::WebPageProxy::*)(unsigned long, unsigned
> long, WTF::String const&, WTF::String const&, WebKit::UserData
> const&)>(IPC::MessageDecoder&, WebKit::WebPageProxy*, void
> (WebKit::WebPageProxy::*)(unsigned long, unsigned long, WTF::String const&,
> WTF::String const&, WebKit::UserData const&)) at
> /lib64/libwebkit2gtk-4.0.so.37
> 
> So LOAD_COMMITTED gets emitted in response to the internal load started
> event, and Ephy crashes because it has used webkit_web_view_get_tls_info
> before the internal load committed event! Now, the crash is easy to fix, but
> is it really wise to emit LOAD_COMMITTED in this case? It is being done
> intentionally, judging by the comment "Finish a possible previous load
> waiting for main resource" in webkitWebViewLoadChanged, but I bet this will
> cause other unexpected, impossible to reproduce bugs. For instance, a random
> example: with this crash fixed, Epiphany would next call
> webkit_web_view_get_uri() and get the URI for the new page that's loading,
> not the page that was committed. Then it will get the next load started
> event without ever returning to the main loop (which I suppose a reasonable
> application might expect to happen). On the other hand, a reasonable
> application is more likely to expect to always receive a LOAD_COMMITTED for
> each LOAD_STARTED, so I guess we can't just skip it.
> 
> Eh, I will just fix the crash, but this is something to think about.

Yes, the delayed signals thing is problematic, we need to finish previous loads, so if committed was fired, but we were waiting for the main resource to emit it, if a new load starts we emit all delayed events, because the client always expects at least a finished signal. I see how this could be problematic, though. This has caused my a lot of headaches indeed, and I think the only solution is to make WebCore report load events consistently when loading from the history cache.
Comment 8 Carlos Garcia Campos 2015-03-05 22:53:37 PST
Comment on attachment 248028 [details]
Fix crash in webkit_web_view_get_tls_info

View in context: https://bugs.webkit.org/attachment.cgi?id=248028&action=review

> Source/WebKit2/UIProcess/API/gtk/WebKitWebView.cpp:3424
> -    const WebCore::CertificateInfo& certificateInfo = mainFrame->certificateInfo()->certificateInfo();
> +    WebCertificateInfo* webkitCertificateInfo = mainFrame->certificateInfo();
> +    if (!webkitCertificateInfo)
> +        return FALSE;

I have always tried to avoid this, because it hides any possible problem, like the one you discovered. We should only return FALSE if the load hasn't been committed, as you say in the documentation, but that's not what you are doing here.
Comment 9 Michael Catanzaro 2015-03-06 07:49:27 PST
Hopefully we'll be allowed to change when the internal load committed event is emitted when loading from the page cache, but if not I see another possible way out: we could add a WEBKIT_LOAD_ABANDONED event and drop all delayed load events if the internal provisional load started event is ever emitted before we have emitted our delayed WEBKIT_LOAD_COMMITTED.

(In reply to comment #8)
> I have always tried to avoid this, because it hides any possible problem,
> like the one you discovered. We should only return FALSE if the load hasn't
> been committed, as you say in the documentation, but that's not what you are
> doing here.

OK, I agree.
Comment 10 Michael Catanzaro 2015-03-07 16:01:26 PST
(In reply to comment #8) 
> I have always tried to avoid this, because it hides any possible problem,
> like the one you discovered. We should only return FALSE if the load hasn't
> been committed, as you say in the documentation, but that's not what you are
> doing here.

I think the PageLoadState class was designed explicitly to prevent me from figuring out what the page load state is (including whether or not the load has been committed). I'm confused why. The easiest way to get the state I see would be to save the last WebKitLoadEvent in the WebKitWebViewPrivate struct; would that be OK? We could even make it a public property (if we add one new state to indicate no load has ever occurred).
Comment 11 Carlos Garcia Campos 2015-03-08 00:07:39 PST
(In reply to comment #10)
> (In reply to comment #8) 
> > I have always tried to avoid this, because it hides any possible problem,
> > like the one you discovered. We should only return FALSE if the load hasn't
> > been committed, as you say in the documentation, but that's not what you are
> > doing here.
> 
> I think the PageLoadState class was designed explicitly to prevent me from
> figuring out what the page load state is (including whether or not the load
> has been committed). I'm confused why. The easiest way to get the state I
> see would be to save the last WebKitLoadEvent in the WebKitWebViewPrivate
> struct; would that be OK? We could even make it a public property (if we add
> one new state to indicate no load has ever occurred).

I think we should fix it in WebCore and remove all the delayed events mess, this workaround has gone too far already. We discarded the idea of having a state property (like the wk1 one) in favor of the load signal with enum parameter. application can use that to track the load state if needed.
Comment 12 Michael Catanzaro 2015-03-08 08:52:08 PDT
(In reply to comment #11)
> I think we should fix it in WebCore and remove all the delayed events mess,
> this workaround has gone too far already. We discarded the idea of having a
> state property (like the wk1 one) in favor of the load signal with enum
> parameter. application can use that to track the load state if needed.

But once that is fixed (this is really two separate issues), if we still don't track the load state, what change shall we make in webkit_web_view_get_tls_info? Your request is to "only return FALSE if the load hasn't been committed", but that function cannot know if the load has been committed.

The function has to return FALSE if the certificateInfo is invalid, because the only other option is to return TRUE and that would be crazy, but I presume you want it to g_return_val_if_fail to get a critical warning if the load has not been committed? How about we use g_return_val_if_fail if certificateInfo is invalid, since if it's invalid and the programmer is calling the function there is a bug in either the application or in WebKit.
Comment 13 Carlos Garcia Campos 2015-03-09 00:06:40 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > I think we should fix it in WebCore and remove all the delayed events mess,
> > this workaround has gone too far already. We discarded the idea of having a
> > state property (like the wk1 one) in favor of the load signal with enum
> > parameter. application can use that to track the load state if needed.
> 
> But once that is fixed (this is really two separate issues), if we still
> don't track the load state, what change shall we make in
> webkit_web_view_get_tls_info? Your request is to "only return FALSE if the
> load hasn't been committed", but that function cannot know if the load has
> been committed.

It's not *my* request, it's what the documentation is saying in your patch :-) Anyway, if WebCore ensures that we don't receive committed until there's a main resource with a response, we can assume in webkit_web_view_get_tls_info that the user hasn't called it before committed and simply return FALSE when the certificate ois null, like your patch does. Because in that case, there's no risk of hiding other issues.

> The function has to return FALSE if the certificateInfo is invalid, because
> the only other option is to return TRUE and that would be crazy, but I
> presume you want it to g_return_val_if_fail to get a critical warning if the
> load has not been committed? How about we use g_return_val_if_fail if
> certificateInfo is invalid, since if it's invalid and the programmer is
> calling the function there is a bug in either the application or in WebKit.

My point is that we can unconditionally return FALSE if WebCore is updated yo we don't need all the delayed events mess. But as long as we have the delayed event code, I prefer to crash here to catch other issues, we are already saying the user not to call this before committed.
Comment 14 Michael Catanzaro 2015-07-20 09:02:10 PDT
*** Bug 147111 has been marked as a duplicate of this bug. ***
Comment 15 Michael Catanzaro 2015-10-29 08:39:46 PDT
*** Bug 150671 has been marked as a duplicate of this bug. ***
Comment 16 Andres Gomez Garcia 2015-10-29 14:37:35 PDT
I'm hitting this almost every time the WebProcess crashes first.
Comment 17 Andres Gomez Garcia 2015-10-30 02:39:50 PDT
Created attachment 264388 [details]
BT from gdb
Comment 18 Carlos Garcia Campos 2015-11-05 05:32:01 PST
With the patch attached to bug #150927 we could get rid of all the delayed events mess and other hacks we have.
Comment 19 Carlos Garcia Campos 2015-12-07 02:19:24 PST
Created attachment 266758 [details]
Patch

This is the patch we could land if bug #150927 is fixed.
Comment 20 Andres Gomez Garcia 2015-12-19 08:57:33 PST
Created attachment 267689 [details]
Another similar BT from gdb

BT obtained from 2.10.4 with CMake args:

'-DPORT=GTK -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS_RELEASE="-O0 -g1 -DNDEBUG -DG_DISABLE_CAST_CHECKS" -DCMAKE_CXX_FLAGS_RELEASE="-O0 -g1 -DNDEBUG -DG_DISABLE_CAST_CHECKS"'
Comment 21 Michael Catanzaro 2015-12-23 12:49:10 PST
*** Bug 152532 has been marked as a duplicate of this bug. ***
Comment 22 Michael Catanzaro 2015-12-23 12:50:05 PST
Comment on attachment 266758 [details]
Patch

Still blocked on bug #150927.
Comment 23 Carlos Garcia Campos 2016-01-12 03:41:44 PST
Committed r194890: <http://trac.webkit.org/changeset/194890>
Comment 24 Michael Catanzaro 2016-01-18 13:07:18 PST
*** Bug 142381 has been marked as a duplicate of this bug. ***