Bug 23116 - [GTK] Fix crash due a callback called from GIO after the destruction of the ResourceHandle
Summary: [GTK] Fix crash due a callback called from GIO after the destruction of the R...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Critical
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-05 06:40 PST by Benjamin Otte
Modified: 2009-02-09 12:42 PST (History)
3 users (show)

See Also:


Attachments
suggested fix (2.56 KB, patch)
2009-01-05 06:41 PST, Benjamin Otte
eric: review-
Details | Formatted Diff | Diff
patch 1 (2.94 KB, patch)
2009-01-07 02:42 PST, Benjamin Otte
no flags Details | Formatted Diff | Diff
patch 2 (1.71 KB, patch)
2009-01-07 08:39 PST, Benjamin Otte
no flags Details | Formatted Diff | Diff
patch 2, try 2 (14.21 KB, patch)
2009-01-08 13:33 PST, Benjamin Otte
zecke: review-
Details | Formatted Diff | Diff
patch 2, try 3 (14.48 KB, patch)
2009-01-09 14:41 PST, Benjamin Otte
no flags Details | Formatted Diff | Diff
Cleanup before calling didFail (4.34 KB, patch)
2009-01-12 12:09 PST, Holger Freyther
no flags Details | Formatted Diff | Diff
Be able to detach from an async GIO operation (7.67 KB, patch)
2009-01-12 12:12 PST, Holger Freyther
no flags Details | Formatted Diff | Diff
Make sure to unref the m_gfile and such (5.58 KB, patch)
2009-01-12 12:14 PST, Holger Freyther
no flags Details | Formatted Diff | Diff
Be able to detach from an async GIO operation (7.67 KB, patch)
2009-01-12 12:31 PST, Holger Freyther
no flags Details | Formatted Diff | Diff
one more crash fix extracted from Company patch (1.46 KB, patch)
2009-01-13 11:43 PST, Alexander Butenko
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Otte 2009-01-05 06:40:03 PST
libsoup causes random crashes when async callbacks fire after a ResourceHandle has already been removed. The attached patch fixes this by ref/deref'ing the resource.
Comment 1 Benjamin Otte 2009-01-05 06:41:08 PST
Created attachment 26429 [details]
suggested fix
Comment 2 Eric Seidel (no email) 2009-01-06 13:59:55 PST
Comment on attachment 26429 [details]
suggested fix

This needs a ChangeLog and should use spaces instead of tabs.  Please review http://webkit.org/coding/coding-style.html

Is there no way to use a RefPtr here instead of manual ref/deref?  We try hard to avoid manual ref/deref because it's very error prone.
Comment 3 Benjamin Otte 2009-01-07 02:42:39 PST
Created attachment 26488 [details]
patch 1

Yay, looks like I stepped right into all newcomer traps at once.

Anyway, I'm cleaning up the patch, and here's the first and uncontroversial fix: keeping track of the idle source and removing it in the destructor.

For the second part the problem is more complex: It is not possible to remove the async callback (yes, this is a bug in the GIO API), so the data passed to the GIO callbacks must stay valid until this callback is called. The easiest method is to just ref the ResourceHandle, but that means Webkit has no control about the lifetime of the ResourceHandle. I'm going to prepare a patch for that.
Comment 4 Holger Freyther 2009-01-07 08:37:57 PST
(In reply to comment #3)
> Created an attachment (id=26488) [review]
> patch 1

We always use four (4) spaces. Your patch still has a style issue...


> Anyway, I'm cleaning up the patch, and here's the first and uncontroversial
> fix: keeping track of the idle source and removing it in the destructor.
> 
> For the second part the problem is more complex: It is not possible to remove
> the async callback (yes, this is a bug in the GIO API), so the data passed to
> the GIO callbacks must stay valid until this callback is called. The easiest
> method is to just ref the ResourceHandle, but that means Webkit has no control
> about the lifetime of the ResourceHandle. I'm going to prepare a patch for
> that.

This is backwards and will be ugly. You would need to know when the WebCore side is detached from the ResourceHandle... and then never ever call into the ResourceClient callbacks... It might be best to not use GIO for file/data at all...

Comment 5 Benjamin Otte 2009-01-07 08:39:33 PST
Created attachment 26492 [details]
patch 2

And this is the only way I could come up with a patch for the GIO crash that wouldn't involve a separate GioOperation object.
Comment 6 Benjamin Otte 2009-01-07 09:17:16 PST
(In reply to comment #4)
> We always use four (4) spaces. Your patch still has a style issue...
>
Yeah, sorry, I missed 2 lines. On the plus side, I fixed indentation in ::startData, so my patch doesn't make the situation worse. Go me!
 
> This is backwards and will be ugly. You would need to know when the WebCore
> side is detached from the ResourceHandle... and then never ever call into the
> ResourceClient callbacks... It might be best to not use GIO for file/data at
> all...
> 
Not using GIO is not an option as far as I can see unless we want to manually code up all formats - and that's not just file, but at least ftp, too.
And I'm sure I don't want to write another ftp backend, the one for GIO was enough. So at least long term GIO is the way to go.

I'm fine with someone just ripping it out if he doesn't like my patch. But I'd prefer to have a solution that does not make ~30 tests crash in every LayoutTests run.
Comment 7 Holger Freyther 2009-01-07 11:45:23 PST
(In reply to comment #5)
> Created an attachment (id=26492) [review]
> patch 2
> 
> And this is the only way I could come up with a patch for the GIO crash that
> wouldn't involve a separate GioOperation object.
> 

And another coding style violation and the rest sounded too unfriendly, I will try to catch you on irc.
Comment 8 Holger Freyther 2009-01-07 13:46:51 PST
Comment on attachment 26488 [details]
patch 1


> +    if (m_idleHandler) {
> +      g_source_remove (m_idleHandler);
> +      m_idleHandler = 0;
> +    }

style issue... please change when landing.
Comment 9 Holger Freyther 2009-01-07 15:45:00 PST
Comment on attachment 26488 [details]
patch 1

Landed in r39690. Tabs in the ChangeLog are forbidden as well... at least the commit hook rejected it...
Comment 10 Holger Freyther 2009-01-07 15:45:11 PST
Comment on attachment 26488 [details]
patch 1

Landed in r39690. Tabs in the ChangeLog are forbidden as well... at least the commit hook rejected it...

Thanks.
Comment 11 Benjamin Otte 2009-01-08 13:33:26 PST
Created attachment 26530 [details]
patch 2, try 2

This got a bit bigger than I had expected, but it's working perfectly fine now.
The code is using Holger's suggested method of using g_object_set_data() for GIO operations, so it can unset the ResourceHandle when it gets cancelled or deleted.
Comment 12 Holger Freyther 2009-01-08 15:51:33 PST
Comment on attachment 26530 [details]
patch 2, try 2

Sorry, there are way too many style issues to fix them when landing. Please indent with spaces. Please try to avoid using NULL in C++ code. E.g. when you shuffle code around also remove the NULL and use 0? thanks
Comment 13 Benjamin Otte 2009-01-09 14:41:49 PST
Created attachment 26577 [details]
patch 2, try 3

Same patch without style issues.
I hope.
Comment 14 Holger Freyther 2009-01-10 16:33:52 PST
This is no crash in libsoup, and not a crash with the usage of Soup, but coming from GIO... which is used by the Soup backend...
Comment 15 Holger Freyther 2009-01-11 08:27:11 PST
Comment on attachment 26577 [details]
patch 2, try 3

> commit 675d7e669ec7147a4f6c1d0972574b4ffe441f6b
> Author: Benjamin Otte <otte@gnome.org>
> Date:   Thu Jan 8 22:26:33 2009 +0100
> 
>     [SOUP] bug 23116, part 2

like in the previous bug, extra points if you could create a proper description here. Now I would have to do git commit --amend after taking your patch, or go though svn-apply..



> +        Also included is a crash fix for when a file:// url is a directory.


Hmm, we talked about this didn't we? One patch per issue... So far I see a GIO crash from the callback and you see a second crash, split it up.


>  ResourceHandle::~ResourceHandle()
>  {
> +    cancel();
>  }

This is creating a interesting question. If cancel was already called and we call cancel again nothing will happen. If it was not called before we will invoke call the ResourceHandleClient. This is something that this backend was not doing before and at least Qt is not doing that. I would prefer to not change the behavior without a good reason. 


>  static void fillResponseFromMessage(SoupMessage* msg, ResourceResponse* response)
>  {
>      SoupMessageHeadersIter iter;
> -    const char* name = NULL;
> -    const char* value = NULL;
> +    const char* name = 0;
> +    const char* value = 0;

Okay, unrelated cleanup :)


>  static void restartedCallback(SoupMessage* msg, gpointer data)
>  {
>      ResourceHandle* handle = static_cast<ResourceHandle*>(data);
> -    if (!handle)
> -        return;

unrelated cleanup...

Handle* handle = static_cast<ResourceHandle*>(data);
> -    if (!handle)
> -        return;

unrelated cleanup


>      if (SOUP_STATUS_IS_TRANSPORT_ERROR(msg->status_code)) {
>          char* uri = soup_uri_to_string(soup_message_get_uri(msg), FALSE);
>          ResourceError error("webkit-network-error", ERROR_TRANSPORT, uri, String::fromUTF8(msg->reason_phrase));
>          g_free(uri);
> +        g_object_unref(d->m_msg);
> +        d->m_msg = 0;

unrelated...



> +        SoupMessage *msg = d->m_msg;

style

....

smaller patches (one patch per issue) would be so much easier to review. The only reason for a big patch doing more than one thing is a rewrite because something is inherently wrong with the current implementation...
Comment 16 Holger Freyther 2009-01-11 10:42:50 PST
Proposed order of changes (each in a separate patch):
   1.) Call into the client after giving up references (like you do with ResourceError and cleanupGioOperation)
   2.) Use g_object_set_data/get_data
   3.) Kill m_cancelled
   4.) Your file as directory fix (which I didn't find in the patch)
   5.) Failing immediately and giving up the references to the SoupMessage and such.

So 1.) and 2.) will fix the GIO crashes we are seeing.

Comment 17 Benjamin Otte 2009-01-12 01:43:46 PST
I'm very unlikely to split this up into small separate patches anytime soon, because:
- I'm at LCA/holidays now
- The svn/ChangeLog workflow is incredibly depressing for small-patch workflow
- every time I compile Webkit it takes at least 5 minutes on my box, because the linking is swapping like mad, so this will cost me some hours just for compiling
- I consider it a rewrite at this point, because it quite fundamentally changes the behavior of the SOUP ResourceHandle.

In retrospect it would have been easier to split this patch up, had I known about the multiple crashers and leaks this patch now fixes in advance, but most of those were hidden bugs that just showed up while I refactored.
Comment 18 Benjamin Otte 2009-01-12 01:54:27 PST
On to the patch in detail:

> like in the previous bug, extra points if you could create a proper description
> here. Now I would have to do git commit --amend after taking your patch, or go
> though svn-apply..
> 
Yup, next patch will have the commit message as the ChangeLog. Bonus points if I can attach patches with a git log, but without a ChangeLog entry.

> >  ResourceHandle::~ResourceHandle()
> >  {
> > +    cancel();
> >  }
> 
> This is creating a interesting question. If cancel was already called and we
> call cancel again nothing will happen. If it was not called before we will
> invoke call the ResourceHandleClient. This is something that this backend was
> not doing before and at least Qt is not doing that. I would prefer to not
> change the behavior without a good reason. 
> 
I actually looked into various backends and CURL calls cancel() unconditionally in the destructor, so I guess it is fine.

> Okay, unrelated cleanup :)
> 
As said above, I considered it a rewrite, so I just made it a clean one :)

> > +        SoupMessage *msg = d->m_msg;
> 
> style
> 
Care toelaborate what I did wrong here?
Comment 19 Holger Freyther 2009-01-12 05:43:31 PST
(In reply to comment #17)
> I'm very unlikely to split this up into small separate patches anytime soon,
> because:

> - I consider it a rewrite at this point, because it quite fundamentally changes
> the behavior of the SOUP ResourceHandle.

And this is where it is getting an attitude problem (CADT). You have done something, you have done more, then even more and suddenly it is working and then you declare you are done. But in reality fixing a GIO crash and changing when you unref a SoupMessage are two different things. The only thing they have in common is the file they are in.

Anyway I will post my patch set then.
Comment 20 Holger Freyther 2009-01-12 12:09:57 PST
Created attachment 26640 [details]
Cleanup before calling didFail

Cleanup before calling didFail because after didFail the ResourceHandle might be gone....
Comment 21 Holger Freyther 2009-01-12 12:12:52 PST
Created attachment 26641 [details]
Be able to detach from an async GIO operation

the GIO callback might be called even after we did cancel the operation. Place the ResourceHandle* in the GObject* and set it to null when the ResourceHandle forgot about the m_gfile/input_stream.
Comment 22 Holger Freyther 2009-01-12 12:14:13 PST
Created attachment 26642 [details]
Make sure to unref the m_gfile and such

On destruction make sure to cleanup the GIO objects.
Comment 23 Holger Freyther 2009-01-12 12:31:44 PST
Created attachment 26644 [details]
Be able to detach from an async GIO operation

0 -> NULL, spotted by Company.
Comment 24 Alexander Butenko 2009-01-13 11:43:53 PST
Created attachment 26678 [details]
one more crash fix extracted from Company patch
Comment 25 Holger Freyther 2009-01-14 14:36:17 PST
Comment on attachment 26678 [details]
one more crash fix extracted from Company patch


> +        if (client())
> +	    client()->didFinishLoading(this);

Coding Style issue... probably tabs... the one who lands need to fix that. and to the one who lands please add a reference to this bug here so people see from which patchset this is coming.
Comment 26 Christian Dywan 2009-01-19 16:17:36 PST
2009-01-20  Alexander V. Butenko  <alex@digiqube.com>

        Reviewed by Holger Freyther.

        http://bugs.webkit.org/show_bug.cgi?id=23116
        [GTK] Fix crash due a callback called from GIO after the
        destruction of the ResourceHandle

        * platform/network/soup/ResourceHandleSoup.cpp:
        (WebCore::ResourceHandle::cancel): Only call didFinishLoading if
        'client' is set, otherwise cancelling a load could lead to a crash.
Comment 27 Holger Freyther 2009-02-09 12:42:47 PST
Closing this for now.