Summary: | Calling nativeImageForCurrentFrame() causes assertion failure: m_verifier.isSafeToUse() | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Huajun.Li <huajun.li.lee> | ||||||||||||||||||||||||
Component: | WebCore Misc. | Assignee: | Sergio Villar Senin <svillar> | ||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||
Severity: | Normal | CC: | allan.jensen, aroben, beidson, cmarcelo, darin, dongseong.hwang, gyuyoung.kim, hyuki.kim, japhet, jochen, leandro, levin, levin+threading, menard, naginenis, rakuco, s.choi, svillar, tonikitoo, webkit.review.bot, yong.li.webkit, zoltan | ||||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||||||||
Hardware: | PC | ||||||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||||||
Attachments: |
|
Description
Huajun.Li
2011-09-04 19:41:26 PDT
So the problem is her: http://www.google.com/codesearch#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/platform/image-decoders/ImageDecoder.h&type=cs&l=246 In ImageDecoder.h virtual void setData(SharedBuffer* data, bool allDataReceived) { if (m_failed) return; m_data = data; On this last line, which ref counts a shared buffer. That shared buffer is part of the icon which is shared across threads. At the moment, I don't know if something really tricky is going on or this is not thread safe -- at the moment, it doesn't look threadsafe. I think the right question for us is what is the design. What is supposed to make this threadsafe? (In reply to comment #2) > I think the right question for us is what is the design. What is supposed to make this threadsafe? There are two comments in the code to indicate the design: 1. From IconDatabase.h, "Holding m_urlAndIconLock is required when accessing any of the following data structures or the objects they contain". This includes the data that is refcounted in this callstack. The lock is not held so the assert is correct in that sense. 2. In IconDatabase::synchronousIconForPageURL, there is a comment which starts with "PARANOID DISCUSSION" (added here http://trac.webkit.org/changeset/25439/trunk/WebCore/loader/icon/IconDatabase.cpp). It has a long explanation about why returning Image* is ok. I don't think it considered platforms which did ref counting of the guts of these structures. In short, I believe this is real a bug. At a high level the appropriate fix would be to lock m_urlAndIconLock when accessing the SharedBuffer or for synchronousIconForPageURL to return a PassRefPtr which isn't shared across threads.. Adding some people who seem to have been involved in the ewk api since it looks like the change needs to happen there. Sorry, I still fail to see what could be done in ewk. ewk_history_item_icon_surface_get's behavior follows quite closely what the PARANOID DISCUSSION comment describes, in that it just calls synchronousIconForPageURL and creates a platform-specific representation of the raw image (in ewk's case, a cairo_surface_t). We can't hold the m_urlAndIconLock mutex ourselves, and the rest of the call stack is inside WebCore. Can you elaborate a little on your thoughts? (In reply to comment #5) > Sorry, I still fail to see what could be done in ewk. > > ewk_history_item_icon_surface_get's behavior follows quite closely what the PARANOID DISCUSSION comment describes, The PARANOID DISCUSSION comment seems to make some assumptions that aren't always true. I don't think this method is safe for you to call like this since thing you do right after the call isn't safe. imo, it would be hard to see this issue when that comment was written. It seems to assume a model like what windows does which calls "BitmapImage::getHBITMAPOfSize" This method doesn't ref count any internal structures. > in that it just calls synchronousIconForPageURL and creates a platform-specific representation of the raw image (in ewk's case, a cairo_surface_t). We can't hold the m_urlAndIconLock mutex ourselves, and the rest of the call stack is inside WebCore. > > Can you elaborate a little on your thoughts? Your comments point in the right direction. Remove the call to IconDatabase ::synchronousIconForPageURL and add a new method to converts to your platform specific representation while holding the lock. Then call this method instead. Does that make sense to you? (The current state of things appears to leave you open to race conditions which make cause misc crashes at random points in your code.) PS imo, this method seems flawed and perhaps we should just remove it in every platform, but for this bug ewk is the target :). I suspect the OS X platform may have similar issues because it calls webGetNSImage which does some complicated things. I'll look at this for other platforms and file some bugs. (In reply to comment #6) > > in that it just calls synchronousIconForPageURL and creates a platform-specific representation of the raw image (in ewk's case, a cairo_surface_t). We can't hold the m_urlAndIconLock mutex ourselves, and the rest of the call stack is inside WebCore. > > > > Can you elaborate a little on your thoughts? > > Your comments point in the right direction. Remove the call to IconDatabase ::synchronousIconForPageURL and add a new method to converts to your platform specific representation while holding the lock. Then call this method instead. > > Does that make sense to you? (The current state of things appears to leave you open to race conditions which make cause misc crashes at random points in your code.) > > PS imo, this method seems flawed and perhaps we should just remove it in every platform, but for this bug ewk is the target :). I suspect the OS X platform may have similar issues because it calls webGetNSImage which does some complicated things. I'll look at this for other platforms and file some bugs. I ever checked QT port, and found it may have same issue, so it is not specific to ewk. :) (In reply to comment #7) > (In reply to comment #6) > > > in that it just calls synchronousIconForPageURL and creates a platform-specific representation of the raw image (in ewk's case, a cairo_surface_t). We can't hold the m_urlAndIconLock mutex ourselves, and the rest of the call stack is inside WebCore. > > > > > > Can you elaborate a little on your thoughts? > > > > Your comments point in the right direction. Remove the call to IconDatabase ::synchronousIconForPageURL and add a new method to converts to your platform specific representation while holding the lock. Then call this method instead. > > > > Does that make sense to you? (The current state of things appears to leave you open to race conditions which make cause misc crashes at random points in your code.) > > > > PS imo, this method seems flawed and perhaps we should just remove it in every platform, but for this bug ewk is the target :). I suspect the OS X platform may have similar issues because it calls webGetNSImage which does some complicated things. I'll look at this for other platforms and file some bugs. > > I ever checked QT port, and found it may have same issue, so it is not specific to ewk. :) And I also hit this bug in Gtk port when implementing some unit tests for our icon database client (In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > > > > in that it just calls synchronousIconForPageURL and creates a platform-specific representation of the raw image (in ewk's case, a cairo_surface_t). We can't hold the m_urlAndIconLock mutex ourselves, and the rest of the call stack is inside WebCore. > > > > > > > > Can you elaborate a little on your thoughts? > > > > > > Your comments point in the right direction. Remove the call to IconDatabase ::synchronousIconForPageURL and add a new method to converts to your platform specific representation while holding the lock. Then call this method instead. > > > > > > Does that make sense to you? (The current state of things appears to leave you open to race conditions which make cause misc crashes at random points in your code.) > > > > > > PS imo, this method seems flawed and perhaps we should just remove it in every platform, but for this bug ewk is the target :). I suspect the OS X platform may have similar issues because it calls webGetNSImage which does some complicated things. I'll look at this for other platforms and file some bugs. > > > > I ever checked QT port, and found it may have same issue, so it is not specific to ewk. :) > > And I also hit this bug in Gtk port when implementing some unit tests for our icon database client It indicates that there is likely a problem in the code (maybe for multiple platforms), and it needs a volunteer to fix it. (In reply to comment #9) > > And I also hit this bug in Gtk port when implementing some unit tests for our icon database client > > It indicates that there is likely a problem in the code (maybe for multiple platforms), and it needs a volunteer to fix it. Well, unless I'm missing something it's pretty clear to me that IconDatabase ::synchronousIconForPageURL cannot be used by any platform as it requires a lock in the mutex used to verify refcounting. This issue was likely never addressed as 1- it does not obviously affect release builds and 2- the code is kind of correct, because even if you don't get the lock, nothing wrong will happen as long as you call it from the same thread that created the image data. One possible fix off the top of my head would be replacing the last return by a call to some method in the IconDatabaseClient that will get the Image and will return the platform specific image. Does it sound good? (In reply to comment #10) > (In reply to comment #9) > > > And I also hit this bug in Gtk port when implementing some unit tests for our icon database client > > > > It indicates that there is likely a problem in the code (maybe for multiple platforms), and it needs a volunteer to fix it. > > Well, unless I'm missing something it's pretty clear to me that IconDatabase ::synchronousIconForPageURL cannot be used by any platform as it requires a lock in the mutex used to verify refcounting. This issue was likely never addressed as 1- it does not obviously affect release builds and 2- the code is kind of correct, because even if you don't get the lock, nothing wrong will happen as long as you call it from the same thread that created the image data. I agree with you mostly. (I don't think 2 is true but I haven't reviewed the code recently. iirc, there is a reason for the lock and verifying that ref counting only happens within the lock. When you do ref counting outside of the lock, there is another thread that may be doing it -- within the lock -- and then you have problems.) > > One possible fix off the top of my head would be replacing the last return by a call to some method in the IconDatabaseClient that will get the Image and will return the platform specific image. Does it sound good? From my comments above " Remove the call to IconDatabase ::synchronousIconForPageURL and add a new method to converts to your platform specific representation while holding the lock. Then call this method instead." I think we are saying the same thing. (In reply to comment #10) > (In reply to comment #9) > > > And I also hit this bug in Gtk port when implementing some unit tests for our icon database client > > > > It indicates that there is likely a problem in the code (maybe for multiple platforms), and it needs a volunteer to fix it. > > Well, unless I'm missing something it's pretty clear to me that IconDatabase ::synchronousIconForPageURL cannot be used by any platform as it requires a lock in the mutex used to verify refcounting. This issue was likely never addressed as 1- it does not obviously affect release builds and 2- the code is kind of correct, because even if you don't get the lock, nothing wrong will happen as long as you call it from the same thread that created the image data. Mac uses it without asserting, and your suggestions as to why it might okay (nothing wrong happening, etc etc) are accurate AFAIK (In reply to comment #12) > (In reply to comment #10) > > (In reply to comment #9) > > > > And I also hit this bug in Gtk port when implementing some unit tests for our icon database client > > > > > > It indicates that there is likely a problem in the code (maybe for multiple platforms), and it needs a volunteer to fix it. > > > > Well, unless I'm missing something it's pretty clear to me that IconDatabase ::synchronousIconForPageURL cannot be used by any platform as it requires a lock in the mutex used to verify refcounting. This issue was likely never addressed as 1- it does not obviously affect release builds and 2- the code is kind of correct, because even if you don't get the lock, nothing wrong will happen as long as you call it from the same thread that created the image data. > > Mac uses it without asserting, and your suggestions as to why it might okay (nothing wrong happening, etc etc) are accurate AFAIK I think Mac actually does the right thing to my memory (because it carefully avoid ref counting) but the api as opposed very easily leads to bad behavior. Created attachment 134786 [details]
Patch
Comment on attachment 134786 [details] Patch Attachment 134786 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/12266265 Comment on attachment 134786 [details] Patch Attachment 134786 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/12275012 Comment on attachment 134786 [details] Patch Attachment 134786 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12266272 Created attachment 134822 [details]
Patch
Comment on attachment 134822 [details] Patch Attachment 134822 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12267284 Comment on attachment 134822 [details]
Patch
The reviewers seem to be suggesting that EFL need to change their use of the original function to be threadsafe. It's not clear to me why adding this extra accessor is correct.
Actually I think this may the right fix. The problem was that ref counting was happening outside of the lock and now this access function takes the appropriate lock. Previously when I looked at this recent patch I though it needed a platform specific reviewer but now I think I may be able to do it. It needs the window build issue fixed though so I'll leave it as r- so another patch may be done to fix that. (In reply to comment #20) > (From update of attachment 134822 [details]) > The reviewers seem to be suggesting that EFL need to change their use of the original function to be threadsafe. It's not clear to me why adding this extra accessor is correct. Eric the problem is that the code inside this new function does some refcounting that must be done inside a lock. It is not something specific to EFL, all the platforms are potential targets. We have reported the same problem in Gtk and Qt guys said they also suffer it. It seems that the only "safe" ports are Win and the Mac because they prevent refcounting in their port code. I'll provide a patch soon with the fix for the Win port. Created attachment 138353 [details]
Patch
Let's see if this fixes the Win EWS
Comment on attachment 138353 [details] Patch Attachment 138353 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12470903 Created attachment 138370 [details]
Patch
Another try (crossing fingers)
Comment on attachment 138370 [details] Patch Attachment 138370 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12470916 Comment on attachment 138370 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=138370&action=review > Source/WebKit/gtk/webkit/webkitfavicondatabase.cpp:402 > + GRefPtr<GdkPixbuf> pixbuf = adoptGRef(cairoImageSurfaceToGdkPixbuf(icon)); You have to update at least this part of the patch, this is not going to work anymore because now we have NativeImageCairo since r115385, you need to call surface function to get the cairo_surface_t out of it. http://trac.webkit.org/changeset/115385 Any update on this patch? (In reply to comment #27) > http://trac.webkit.org/changeset/115385 > > Any update on this patch? It looks like it can't build on Windows for some reason that has yet to be determined, which makes landing it not possible :(. (In reply to comment #28) > (In reply to comment #27) > > http://trac.webkit.org/changeset/115385 > > > > Any update on this patch? > > It looks like it can't build on Windows for some reason that has yet to be determined, which makes landing it not possible :(. Yeah I have been trying to do that with aroben without much success. Looks like we'd have to give it another try. I guess that only people with Windows builds can help here... Created attachment 141424 [details]
Patch
Comment on attachment 141424 [details] Patch Attachment 141424 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12666548 Comment on attachment 141424 [details] Patch Attachment 141424 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/12674221 Looking at the error on windows, I would guess that Source/WebKit/cf/WebCoreSupport/WebInspectorClientCF.cpp needs to #include "config.h" (In reply to comment #33) > Looking at the error on windows, I would guess that Source/WebKit/cf/WebCoreSupport/WebInspectorClientCF.cpp needs to > #include "config.h" Thx for the tip! I did some mistake on the EFL code after updating the patch so I'll try it in the next patch Created attachment 141460 [details]
Patch
Comment on attachment 141460 [details] Patch Attachment 141460 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12675285 why unconfirmed? changed to new *** Bug 85473 has been marked as a duplicate of this bug. *** (In reply to comment #37) > why unconfirmed? changed to new I'm guessing because unconfirmed is the default and nobody had bothered to change it yet. Created attachment 145380 [details]
addition to the proposed patch
It looks like webkit2 Qt port needs this additional code to avoid similar crash.
There are two alternative solutions to this. The first is to let SharedBuffer inherit from ThreadSafeRefCounted instead of RefCounted . For all platforms with atomic integers this comes completely free, but for other it will be a lot more expensive. The second is to copy the sharedbuffer before reusing it the BMPDecoder in the ICODecoder. Both have been tested and works, but have different performance downsides. (In reply to comment #41) > There are two alternative solutions to this. > > The first is to let SharedBuffer inherit from ThreadSafeRefCounted instead of RefCounted . For all platforms with atomic integers this comes completely free, but for other it will be a lot more expensive. > > The second is to copy the sharedbuffer before reusing it the BMPDecoder in the ICODecoder. > > Both have been tested and works, but have different performance downsides. The client just needs the image but not SharedBuffer. Sergio's patch looks good to me. (In reply to comment #42) > (In reply to comment #41) > > There are two alternative solutions to this. > > > > The first is to let SharedBuffer inherit from ThreadSafeRefCounted instead of RefCounted . For all platforms with atomic integers this comes completely free, but for other it will be a lot more expensive. > > > > The second is to copy the sharedbuffer before reusing it the BMPDecoder in the ICODecoder. > > > > Both have been tested and works, but have different performance downsides. > > The client just needs the image but not SharedBuffer. Sergio's patch looks good to me. True, the patch actually looks good. I thought there might have been a problem since the patch stalled. (In reply to comment #43) > (In reply to comment #42) > > (In reply to comment #41) > > > There are two alternative solutions to this. > > > > > > The first is to let SharedBuffer inherit from ThreadSafeRefCounted instead of RefCounted . For all platforms with atomic integers this comes completely free, but for other it will be a lot more expensive. > > > > > > The second is to copy the sharedbuffer before reusing it the BMPDecoder in the ICODecoder. > > > > > > Both have been tested and works, but have different performance downsides. > > > > The client just needs the image but not SharedBuffer. Sergio's patch looks good to me. > > True, the patch actually looks good. I thought there might have been a problem since the patch stalled. It breaks the windows build. If that were solved, this could moved forward. (In reply to comment #43) > (In reply to comment #42) > > (In reply to comment #41) > > > Both have been tested and works, but have different performance downsides. > > > > The client just needs the image but not SharedBuffer. Sergio's patch looks good to me. > > True, the patch actually looks good. I thought there might have been a problem since the patch stalled. As some others said I tried several approaches to fix the windows build but I'm a bit stalled now (specially because I don't have a windows machine to test). I spent some hours with aroben a couple of weeks ago but we weren't able to fix it. I suspect that the problem is a hack in the windows build in this file Source/WebKit/cf/WebCoreSupport/WebInspectorClientCF.cpp where I found this comment: // FIXME: On Windows, we require all WebKit source files to include config.h // before including any other files. Failing to include config.h will leave // WTF_USE_CF undefined, causing build failures in this // file. But Mac doesn't have a config.h for WebKit, so we can't include the // Windows one here. For now we can just define WTF_USE_CF and // WTF_USE_CFNETWORK manually, but we need a better long-term solution. #ifndef WTF_USE_CF #define WTF_USE_CF 1 #endif Rings any bell for any Windows hacker? (In reply to comment #45) > #ifndef WTF_USE_CF > #define WTF_USE_CF 1 > #endif > > Rings any bell for any Windows hacker? Have you tried git blame / svn blame to see who wrote it? (In reply to comment #45) > (In reply to comment #43) > > (In reply to comment #42) > > > (In reply to comment #41) > > > > Both have been tested and works, but have different performance downsides. > > > > > > The client just needs the image but not SharedBuffer. Sergio's patch looks good to me. > > > > True, the patch actually looks good. I thought there might have been a problem since the patch stalled. > > As some others said I tried several approaches to fix the windows build but I'm a bit stalled now (specially because I don't have a windows machine to test). I spent some hours with aroben a couple of weeks ago but we weren't able to fix it. I suspect that the problem is a hack in the windows build in this file Source/WebKit/cf/WebCoreSupport/WebInspectorClientCF.cpp where I found this comment: fwiw, the lastest error on Windows was different: "5>..\..\cf\WebCoreSupport\WebInspectorClientCF.cpp(36) : warning C4067: unexpected tokens following preprocessor directive - expected a newline" so perhaps the change to WebInspectorClientCF.cpp wasn't done in a way that Windows likes? (It looks ok in the diff but perhaps these lines are different from the other lines in the file?) Is anyone still working on this? (In reply to comment #48) > Is anyone still working on this? My impression is that it is stuck due to not figuring out the Windows build issue. If you'd like to take it on, I'm sure folks would appreciate it. (In reply to comment #49) > (In reply to comment #48) > > Is anyone still working on this? > > My impression is that it is stuck due to not figuring out the Windows build issue. If you'd like to take it on, I'm sure folks would appreciate it. I think the current patch seems like it is a good fix except for the Windows build break. Created attachment 147876 [details]
Merged together and try to fix windows build
I don't have a Win Safari build environment. I'll use challenge the bot.
I think the build failure is due to missing definition for PLATFORM() and OS() macros.
Created attachment 147883 [details]
Try again.
I knew it...
BTW, Sergio, any reason why we want WTF_USE_CG in this file?
Comment on attachment 147883 [details] Try again. Attachment 147883 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/12969088 Comment on attachment 147883 [details] Try again. Attachment 147883 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12960676 Created attachment 148097 [details]
Try fixing the build error (also removed change in webkit/cf)
cannot see why we need the WTF_USE_CG hack in webkit/cf
Comment on attachment 148097 [details] Try fixing the build error (also removed change in webkit/cf) Attachment 148097 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12979170 Created attachment 148104 [details]
put WTF_USE_CG back, this should build now
It builds now. Ping reviewers. Comment on attachment 148104 [details] put WTF_USE_CG back, this should build now View in context: https://bugs.webkit.org/attachment.cgi?id=148104&action=review This should do it. Congrats to all! :) > Source/WebKit/blackberry/WebCoreSupport/FrameLoaderClientBlackBerry.cpp:-1159 > - if (!img || !img->data()) !img->data() got dropped but that seems ok. > Source/WebKit/efl/ewk/ewk_history.cpp:341 > ERR("icon is NULL."); This error got a little bit broader. Before if icon->nativeImageForCurrentFrame(); returned 0 this error wasn't printed. But I think this is ok (in all instances). > Source/WebKit/efl/ewk/ewk_history.cpp:353 > ERR("icon is NULL."); Same comment as above. > Source/WebKit/efl/ewk/ewk_settings.cpp:218 > ERR("no icon for url %s", url); Same comment as before. > Source/WebKit/efl/ewk/ewk_settings.cpp:232 > ERR("no icon for url %s", url); Same comment as before. Committed r120694: <http://trac.webkit.org/changeset/120694> |