Bug 76399 - [Qt][WK2] QtWebIconDatabaseClient leaves a dangling pointer on WebIconDatabase after its destruction
Summary: [Qt][WK2] QtWebIconDatabaseClient leaves a dangling pointer on WebIconDatabas...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Rafael Brandao
URL:
Keywords:
: 76906 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-01-16 12:01 PST by Rafael Brandao
Modified: 2012-01-24 05:31 PST (History)
9 users (show)

See Also:


Attachments
Patch (3.18 KB, patch)
2012-01-16 12:04 PST, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (7.01 KB, patch)
2012-01-18 14:22 PST, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (8.35 KB, patch)
2012-01-19 04:35 PST, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (3.74 KB, patch)
2012-01-19 15:20 PST, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (3.02 KB, patch)
2012-01-24 04:17 PST, Rafael Brandao
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rafael Brandao 2012-01-16 12:01:26 PST
[Qt][WK2] QtWebIconDatabaseClient may be deleted before running its slot
Comment 1 Rafael Brandao 2012-01-16 12:04:43 PST
Created attachment 122671 [details]
Patch
Comment 2 Rafael Brandao 2012-01-16 12:24:15 PST
(In reply to comment #1)
> Created an attachment (id=122671) [details]
> Patch

This bug has been a pain to reproduce on release mode, but for some reason it was easily caught on debug mode.

When you run qmltests, you may get the assert inside WTF::Mutex::lock used by QtWebIconDatabaseClient::iconImageForPageURL. The call to pthread_mutex_lock was returning EINVAL which means that that mutex was not an initialized object. On other runs, I could get a seg fault inside WebIconDatabase::imageForPageURL, on the line with contents "if (!m_webContext || !m_iconDatabaseImpl || !m_iconDatabaseImpl->isOpen() || pageURL.isEmpty())". So it become clear that m_iconDatabaseImpl has been deleted, but someone still kept a reference to it.

It happens between tests when we destroy all WebViews and then the QtWebContext is destroyed, taking QtWebIconDatabaseClient with him. So at this point no one should have a reference to it. Once we start a new test, new web views are created, a new QtWebContext and a new icon client are created. Each web view was pointing to the new icon client reference... so still, no one seemed to keep a reference to the old client. And then suddenly something just called the slot requestIconForPageURL from this old client that was deleted already. The only thing I can imagine is that this event was pending on the event queue when we deleted the last WebView.

As I've said it's hard to reproduce, I've tried to insert more prints on the code but suddenly all started working again without this change. I've ran this change on Caio and Jesus machines on debug mode and the asserts that I've described stopped happening (but still there is an unrelated assert triggering there on a later test). I'd like to hear your thoughts about this change, it seems weird that we need to use deleteLater for this one.

This log (https://gist.github.com/bc8dc6da13d8d98c0a8a) can show the deleted client being called after all the new views are created. These two (https://gist.github.com/1620867 and https://gist.github.com/1620896) show some of the issues that we had before.
Comment 3 Simon Hausmann 2012-01-17 11:11:34 PST
Do you have a backtrace from _where_ iconImageForPageURL was called from?

I don't like this "solution", so I'm curious if there's another way of fixing it.
Comment 4 Rafael Brandao 2012-01-17 12:58:50 PST
(In reply to comment #3)
> Do you have a backtrace from _where_ iconImageForPageURL was called from?
> 
> I don't like this "solution", so I'm curious if there's another way of fixing it.

(In reply to comment #3)
> Do you have a backtrace from _where_ iconImageForPageURL was called from?
> 
> I don't like this "solution", so I'm curious if there's another way of fixing it.(In reply to comment #3)
> Do you have a backtrace from _where_ iconImageForPageURL was called from?
> 
> I don't like this "solution", so I'm curious if there's another way of fixing it.

This "solution" may not solve the real problem. I was checking again those backtraces that I've posted here again and the function is called after we receive messages from WebProcess with new icon data. IconDatabase and WebIconDatabase are never destroyed and the last one keeps a reference to the icon client. After we destroy the client, a message from WebProcess may get into UIProcess before we have the chance to create a new one. Then WebIconDatabase handles the message with the old reference... and then fail. So I might have to cleanup that reference, but I'm not sure if this alone is enough (maybe we should take care of such pending slot call as I've thought initially, or maybe not).
Comment 5 Simon Hausmann 2012-01-18 01:36:53 PST
From the backtraces it looks like the QtWebIconDatabaseClient is indeed perhaps the stale object. It's life time should be the same as the WebIconDatabase the WebContext holds, in other words the intended destruction cycle should be:

QtWebContext dies -> has the last ref on the WebContext -> WebContext gets destroyed -> has the last ref on the WebIconDatabase -> WebIconDatabase gets destroyed and the QtWebIconDatabaseClient died in ~QtWebContext. That should all happen without any intermediate returns to the event loop.

UNLESS the "has the last ref" conditions don't hold, in which case WebIconDatabase quickly ends up with a dangling "clientInfo" pointer in WebIconDatabase::m_iconDatabaseClient::m_client.

If that is the case - and it seems somewhat likely - then we need to make sure QtWebIconDatabaseClient destructs correctly, i.e. using


QtWebIconDatabaseClient::~QtWebIconDatabaseClient()
{
+    WKIconDatabaseSetIconDatabaseClient(toAPI(m_iconDatabase), 0);
}

QtWebContext::~QtWebContext()
{
+    m_iconDatabase.clear();
    if (s_defaultContext == this)
        s_defaultContext = 0;
    contextMap.remove(m_contextID);
}

(Note that m_iconDatabase has the _wrong_ name, it should be m_iconDatabaseClient)
Comment 6 Alexis Menard (darktears) 2012-01-18 05:57:12 PST
(In reply to comment #5)
> From the backtraces it looks like the QtWebIconDatabaseClient is indeed perhaps the stale object. It's life time should be the same as the WebIconDatabase the WebContext holds, in other words the intended destruction cycle should be:
> 
> QtWebContext dies -> has the last ref on the WebContext -> WebContext gets destroyed -> has the last ref on the WebIconDatabase -> WebIconDatabase gets destroyed and the QtWebIconDatabaseClient died in ~QtWebContext. That should all happen without any intermediate returns to the event loop.

Except that WebContext is never deleted today. It leaks as there is circular dependency that is not going to be fixed before WebKit2 supports multiple WebProcess (my talk with andersca). WebContext has a ref to WebPageProxy but WebPageProxy has a ref to WebProcessProxy which has a ref to WebContext.

> 
> UNLESS the "has the last ref" conditions don't hold, in which case WebIconDatabase quickly ends up with a dangling "clientInfo" pointer in WebIconDatabase::m_iconDatabaseClient::m_client.
> 
> If that is the case - and it seems somewhat likely - then we need to make sure QtWebIconDatabaseClient destructs correctly, i.e. using
> 
> 
> QtWebIconDatabaseClient::~QtWebIconDatabaseClient()
> {
> +    WKIconDatabaseSetIconDatabaseClient(toAPI(m_iconDatabase), 0);
> }
> 
> QtWebContext::~QtWebContext()
> {
> +    m_iconDatabase.clear();
>     if (s_defaultContext == this)
>         s_defaultContext = 0;
>     contextMap.remove(m_contextID);
> }
> 
> (Note that m_iconDatabase has the _wrong_ name, it should be m_iconDatabaseClient)
Comment 7 Simon Hausmann 2012-01-18 06:14:45 PST
(In reply to comment #6)
> (In reply to comment #5)
> > From the backtraces it looks like the QtWebIconDatabaseClient is indeed perhaps the stale object. It's life time should be the same as the WebIconDatabase the WebContext holds, in other words the intended destruction cycle should be:
> > 
> > QtWebContext dies -> has the last ref on the WebContext -> WebContext gets destroyed -> has the last ref on the WebIconDatabase -> WebIconDatabase gets destroyed and the QtWebIconDatabaseClient died in ~QtWebContext. That should all happen without any intermediate returns to the event loop.

Aha!

Actually it seems a whole lot simpler: WebContext has RefPtr<WebProcessProxy> m_process and WebProcessProxy has RefPtr<WebContext> m_context.

Strange, that looks like a straight-forward dependency to break.

Anyway, if it can't be broken then what I suggested in the previous comment should work and with this circular reference present the back-trace makes much more sense now.
Comment 8 Alexis Menard (darktears) 2012-01-18 06:22:50 PST
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > From the backtraces it looks like the QtWebIconDatabaseClient is indeed perhaps the stale object. It's life time should be the same as the WebIconDatabase the WebContext holds, in other words the intended destruction cycle should be:
> > > 
> > > QtWebContext dies -> has the last ref on the WebContext -> WebContext gets destroyed -> has the last ref on the WebIconDatabase -> WebIconDatabase gets destroyed and the QtWebIconDatabaseClient died in ~QtWebContext. That should all happen without any intermediate returns to the event loop.
> 
> Aha!
> 
> Actually it seems a whole lot simpler: WebContext has RefPtr<WebProcessProxy> m_process and WebProcessProxy has RefPtr<WebContext> m_context.
> 
> Strange, that looks like a straight-forward dependency to break.
>

I tried to break it :
http://trac.webkit.org/changeset/93784
with disconnectProcess() it happens to be a bad idea :(
 
> Anyway, if it can't be broken then what I suggested in the previous comment should work and with this circular reference present the back-trace makes much more sense now.
Comment 9 Rafael Brandao 2012-01-18 09:07:34 PST
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #5)
> > > > From the backtraces it looks like the QtWebIconDatabaseClient is indeed perhaps the stale object. It's life time should be the same as the WebIconDatabase the WebContext holds, in other words the intended destruction cycle should be:
> > > > 
> > > > QtWebContext dies -> has the last ref on the WebContext -> WebContext gets destroyed -> has the last ref on the WebIconDatabase -> WebIconDatabase gets destroyed and the QtWebIconDatabaseClient died in ~QtWebContext. That should all happen without any intermediate returns to the event loop.
> > 
> > Aha!
> > 
> > Actually it seems a whole lot simpler: WebContext has RefPtr<WebProcessProxy> m_process and WebProcessProxy has RefPtr<WebContext> m_context.
> > 
> > Strange, that looks like a straight-forward dependency to break.
> >
> 
> I tried to break it :
> http://trac.webkit.org/changeset/93784
> with disconnectProcess() it happens to be a bad idea :(
> 
> > Anyway, if it can't be broken then what I suggested in the previous comment should work and with this circular reference present the back-trace makes much more sense now.

(In reply to comment #5)
> From the backtraces it looks like the QtWebIconDatabaseClient is indeed perhaps the stale object. It's life time should be the same as the WebIconDatabase the WebContext holds, in other words the intended destruction cycle should be:
> 
> QtWebContext dies -> has the last ref on the WebContext -> WebContext gets destroyed -> has the last ref on the WebIconDatabase -> WebIconDatabase gets destroyed and the QtWebIconDatabaseClient died in ~QtWebContext. That should all happen without any intermediate returns to the event loop.
> 
> UNLESS the "has the last ref" conditions don't hold, in which case WebIconDatabase quickly ends up with a dangling "clientInfo" pointer in WebIconDatabase::m_iconDatabaseClient::m_client.
> 
> If that is the case - and it seems somewhat likely - then we need to make sure QtWebIconDatabaseClient destructs correctly, i.e. using
> 
> 
> QtWebIconDatabaseClient::~QtWebIconDatabaseClient()
> {
> +    WKIconDatabaseSetIconDatabaseClient(toAPI(m_iconDatabase), 0);
> }
> 
> QtWebContext::~QtWebContext()
> {
> +    m_iconDatabase.clear();
>     if (s_defaultContext == this)
>         s_defaultContext = 0;
>     contextMap.remove(m_contextID);
> }
> 
> (Note that m_iconDatabase has the _wrong_ name, it should be m_iconDatabaseClient)

I've left building in debug mode yesterday using a similar approach and by running the tests today I assume the problem is not there anymore. As Alexis pointed out, we don't destroy WebContext, and consequently we don't destroy WebIconDatabase neither IconDatabase. I'll send a new patch with the proposed fix and will also change the name as you've suggested.
Comment 10 Rafael Brandao 2012-01-18 13:54:05 PST
Changing the bug title to reflect the real problem.
Comment 11 Rafael Brandao 2012-01-18 14:22:54 PST
Created attachment 122994 [details]
Patch
Comment 12 Rafael Brandao 2012-01-18 14:26:20 PST
Comment on attachment 122994 [details]
Patch

Simon, I don't think we need to explicitly call m_iconDatabaseClient.clear() on QtWebContext destructor because that OwnPtr is already being destroyed anyway, that's why I didn't put it there.
Comment 13 Simon Hausmann 2012-01-19 01:16:26 PST
(In reply to comment #12)
> (From update of attachment 122994 [details])
> Simon, I don't think we need to explicitly call m_iconDatabaseClient.clear() on QtWebContext destructor because that OwnPtr is already being destroyed anyway, that's why I didn't put it there.

You're right that it's not _strictly_ needed.

It's the order however that matters. We must make sure that the WebIconDatabase that the WebIconDatabaseClient destructor uses in its call to WKSetIconDatabaseClient is still valid. We can be sneaky and silently rely on the reverse destruction order of the QtWebContext members, or we can play it safe and do it explicitly.

We could also change QtWebIconDatabaseClient to hold a _ref_ on the database, instead of storing a naked pointer.
Comment 14 Rafael Brandao 2012-01-19 04:35:37 PST
Created attachment 123103 [details]
Patch
Comment 15 Rafael Brandao 2012-01-19 06:12:32 PST
Comment on attachment 123103 [details]
Patch

I just skipped application scheme qmltests and then I've got an assert at some other point in the favicon test. I'll look into this to see what is wrong, so I'm cleaning cq for now.
Comment 16 Rafael Brandao 2012-01-19 06:26:22 PST
Comment on attachment 123103 [details]
Patch

I just looked into this assert and it has nothing to do with this patch. The broken tests are the ones which webview doesn't have its size previously set. I've confirmed with Jesus, he got the same results as me.
Comment 17 Simon Hausmann 2012-01-19 13:20:57 PST
Comment on attachment 123103 [details]
Patch

I don't like the fact that this patch does two things in one:

    1) A stylistic rename

    2) A rather tricky bug fix.

I think it's better to do those as separate patches. What do you think?
Comment 18 Rafael Brandao 2012-01-19 15:20:42 PST
Created attachment 123207 [details]
Patch
Comment 19 Rafael Brandao 2012-01-19 15:24:59 PST
Comment on attachment 123207 [details]
Patch

Here's the patch without the stylistic changes. Now I was just thinking that as we're keeping a reference to WebIconDatabase on our client, we should always have a valid pointer to it upon its destruction, right? If this is true, then maybe that explicit call to clear our client might not be needed anymore, but there's no warm on doing it anyway. ;-)
Comment 20 Simon Hausmann 2012-01-23 12:49:12 PST
(In reply to comment #19)
> (From update of attachment 123207 [details])
> Here's the patch without the stylistic changes. Now I was just thinking that as we're keeping a reference to WebIconDatabase on our client, we should always have a valid pointer to it upon its destruction, right? If this is true, then maybe that explicit call to clear our client might not be needed anymore, but there's no warm on doing it anyway. ;-)

That is actually correct, good point :)
Comment 21 Simon Hausmann 2012-01-23 12:50:05 PST
Comment on attachment 123207 [details]
Patch

r=me. I'll leave it to you if you want to land the patch as-is or remove the clear() before landing.
Comment 22 Rafael Brandao 2012-01-24 04:17:05 PST
Created attachment 123713 [details]
Patch
Comment 23 Rafael Brandao 2012-01-24 05:01:13 PST
*** Bug 76906 has been marked as a duplicate of this bug. ***
Comment 24 Csaba Osztrogonác 2012-01-24 05:31:04 PST
Comment on attachment 123713 [details]
Patch

Clearing flags on attachment: 123713

Committed r105732: <http://trac.webkit.org/changeset/105732>
Comment 25 Csaba Osztrogonác 2012-01-24 05:31:15 PST
All reviewed patches have been landed.  Closing bug.