UNCONFIRMED 112121
Keep a weak reference of decoded image in lazyDecodingPixelRef
https://bugs.webkit.org/show_bug.cgi?id=112121
Summary Keep a weak reference of decoded image in lazyDecodingPixelRef
KangYuan
Reported 2013-03-12 02:40:55 PDT
LazyDecodingPixelRef need to do a map lookup every time to read image pixels. In some canvas2D cases, this can take 1/6 to 4/1 time of total canvas flush. To avoid the map lookup, I suggest: - make a weak reference of decoded image in lazyDecodingPixelRef, set it when the first time of onLockPixels, and don't to lookup into the map each time after that - add a imagePruned() interface in lazyDecodingPixelRef - ImageDecodingStore own the image memory and inform lazyDecodingPixelRef when the image memory get pruned I attached two traces. One is a clean build, the other is I hacked code according to above plan.
Attachments
a clean build, only add trace event on lazyDecodingPixelRef::onLockPixel() (17.63 MB, text/plain)
2013-03-12 02:43 PDT, KangYuan
no flags
hack the code, not to lookup map everytime (13.61 MB, text/plain)
2013-03-12 02:46 PDT, KangYuan
no flags
Patch (10.91 KB, patch)
2013-03-17 22:10 PDT, KangYuan
no flags
Patch (10.97 KB, patch)
2013-03-18 03:10 PDT, KangYuan
no flags
KangYuan
Comment 1 2013-03-12 02:43:50 PDT
Created attachment 192683 [details] a clean build, only add trace event on lazyDecodingPixelRef::onLockPixel()
KangYuan
Comment 2 2013-03-12 02:46:24 PDT
Created attachment 192684 [details] hack the code, not to lookup map everytime
KangYuan
Comment 3 2013-03-17 22:10:57 PDT
KangYuan
Comment 4 2013-03-17 22:18:11 PDT
I uploaded a patch, please have a review, thanks!
Hin-Chung Lam
Comment 5 2013-03-17 22:35:32 PDT
Do you have a test page that demonstrates this performance provlem?
KangYuan
Comment 6 2013-03-17 22:59:49 PDT
The case is FishIETank with 200 fish. you can try here. http://ie.microsoft.com/testdrive/Performance/FishIETank/Default.html The uploaded trace files is collected from a slighly modified version of fishtank, which only changed setTimeOut to requestAnimationFrame.
WebKit Review Bot
Comment 7 2013-03-17 23:08:25 PDT
Comment on attachment 193490 [details] Patch Attachment 193490 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17180389 New failing tests: platform/chromium/virtual/deferred/fast/images/icon-decoding.html
KangYuan
Comment 8 2013-03-18 00:03:41 PDT
Sorry, why i can't find platform/chromium/virtual/deferred/fast/images/icon-decoding.html...
Hin-Chung Lam
Comment 9 2013-03-18 00:50:21 PDT
platform/chromium/virtual/deferred/fast/images/icon-decoding.html is a virtual test. You can run DRT with --enable-deferred-image-decoding and on fast/images/icon-decoding.html. I'm not sure a weak pointer is the right approach here. A cache entry can mutate while a reference is held by LazyDecodingPixelRef. So it is totally possible one thread is reading while another thread is writing. There's why the synchronization is done through ImageFrameGenerator, such that no two threads can read and write at the same time. There's also a problem with ImageDecodingStore pointing to LazyDecodingPixelRef. I can only imagine this is not thread safe, one thread is evicting the cache while the other thread is reading. My guess (I haven't tried it on the test page yet) is that the problem is we're generating too many LazyDecodingPixelRef in DeferredImageDecoder. The code now generates a new LazyDecodingPixelRef for each draw, but this can be cached and reused. If that's the case multiple draws sharing the same bitmap will go through the same LazyDecodingPixel and onLockPixels() will be called once. We should see if this can cut down cache lookup.
KangYuan
Comment 10 2013-03-18 03:06:42 PDT
Thanks, i can ran test case now. I realize the threading issue. So I add willRemoveFromCache() to ensure the image is immutable during locked by lazyDecodingPixelRef. Is that not enough? Maybe I miss some paths that reader-writer co-exists. I will double check, and also appreciate if someone point me how or where it can happen. i think LazyDecodingPixelRef is generated for each image resources on the web page. It's the source bitmap for each drawImage (most times). We don't generate a new lazyDecodingPixelRef for each draw. Instead, we need to lock/unlock pixels for each draw, from the lazy-decoded image source. Also, i fixed the crash for test cases and will upload again. I am ok to further discuss the solution. The patch is just my proposal... Thanks for your comments. -Kang Yuan
KangYuan
Comment 11 2013-03-18 03:10:28 PDT
Hin-Chung Lam
Comment 12 2013-03-18 19:36:13 PDT
I'm investigating this problem now. I also plan to add more tracing events to deferred image decoding path: https://bugs.webkit.org/show_bug.cgi?id=112648.
Hin-Chung Lam
Comment 13 2013-03-18 20:58:05 PDT
I investigated this locally but I don't have get the same conclusion as you do. Here's the data I captured with IE Fish Tank: Trace 1 (1134 counts) LazyDecodingPixelRef::onLockPixels 7.093ms LazyDecodingPixelRef::onUnlockPixels 3.51ms ImageDecodingStore::lockCache 1.216ms ImageDecodingStore::unlockCache 1.142ms Trace 2 (1328 counts) LazyDecodingPixelRef::onLockPixels 5.388ms LazyDecodingPixelRef::onUnlockPixels 4.574ms ImageDecodingStore::lockCache 1.605ms ImageDecodingStore::unlockCache 1.555ms In both cases there were no image decodes and images were fully cached. Because there were so few cached images the hashmap lookup were in the micro seconds range. There's one problem shown in the trace however: the same LazyDecodingPixelRef is locked and unlocked over and over again within each tile and also the picture. This test case is particularly bad because there's only one image ever. We plan to fix this in Chromium such that high priority images are locked once for the entire picture. I imagine it will solve your problem as well as there'll be no more than 1 lock per image per picture. See the progress here: http://code.google.com/p/chromium/issues/detail?id=196306
KangYuan
Comment 14 2013-03-18 22:47:35 PDT
Thanks for the efforts! Seems you saw the same thing with I do, expect you don't think it's performance issue. For the performance data, I ran the case on Android where impl-side-paint and threaded compositing is enabled. Do you run on the desktop? You can share me the trace file if possible. You can check from my uploaded trace that it takes about 8 ms for one frame to lock/unlock (i didn't add trace for unlock), considering it only took ~27ms on main thread for one frame. That's why i think it's performance issue. Another reason is the lock/unlock almost don't spend time if we disable deferral decoding. Here is call stack for lock/unlock. I will check http://code.google.com/p/chromium/issues/detail?id=196306 first. -Kang Yuan
Andreas Kling
Comment 15 2014-02-05 11:25:07 PST
Comment on attachment 193513 [details] Patch Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.
Note You need to log in before you can comment on or make changes to this bug.