155546 – Add the asynchronous image decoding mode

RESOLVED FIXED 155546

Add the asynchronous image decoding mode

https://bugs.webkit.org/show_bug.cgi?id=155546

Summary Add the asynchronous image decoding mode

Said Abou-Hallawa

Reported 2016-03-16 10:13:39 PDT

In case of large images, the image decoding will start immediately after having the size of the image. A new class named ImageFrameDecoder which inherits form ImageDecoder will be added to manage the fork. In ImageFrameDecoder:: setData() and after knowing the size of the image is available, a heretic will be checked to decide whether to an asynchronous image decoding is needed or not. If it is needed, a separate thread will be created to do the decoding such that the main thread is not blocked. In the main thread side, ImageSource::frameAtIndex() will check whether the ImageDecoder is in asynchronous decoding mode or not. If it is, no caching will be carried out on the main thread. ImageSource::frameAtIndex(), will return whatever is stored in the ImageFraeDataBuffer[index] regardless whether it is complete or not.

Attachments
Patch (182.96 KB, patch) 2016-03-16 10:18 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (183.05 KB, patch) 2016-03-16 10:28 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (174.78 KB, patch) 2016-03-16 10:46 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (178.26 KB, patch) 2016-03-16 11:56 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch for review (30.72 KB, patch) 2016-03-16 12:31 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (372.16 KB, patch) 2016-09-01 12:58 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (432.80 KB, patch) 2016-09-01 15:35 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (433.95 KB, patch) 2016-09-01 15:53 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews117 for mac-yosemite (1.50 MB, application/zip) 2016-09-01 17:02 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews125 for ios-simulator-elcapitan-wk2 (5.68 MB, application/zip) 2016-09-01 17:08 PDT, Build Bot	no flags	Details
Patch (443.29 KB, patch) 2016-09-01 17:39 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (443.39 KB, patch) 2016-09-01 18:36 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (441.33 KB, patch) 2016-09-02 09:13 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (441.23 KB, patch) 2016-09-02 09:57 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (441.73 KB, patch) 2016-09-02 10:18 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (441.73 KB, patch) 2016-09-02 10:36 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews117 for mac-yosemite (1.64 MB, application/zip) 2016-09-02 11:36 PDT, Build Bot	no flags	Details
Patch (441.57 KB, patch) 2016-09-02 12:12 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (441.35 KB, patch) 2016-09-02 13:14 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews114 for mac-yosemite (1.56 MB, application/zip) 2016-09-02 14:35 PDT, Build Bot	no flags	Details
Patch for review (107.28 KB, patch) 2016-09-02 15:09 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (47.26 KB, patch) 2016-10-05 12:21 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (48.70 KB, patch) 2016-10-05 12:33 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (54.01 KB, patch) 2016-10-05 12:44 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch for EWS. Includes the patch of wk162478 (54.50 KB, patch) 2016-10-05 14:29 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch for review (29.26 KB, patch) 2016-10-05 14:32 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (15.56 KB, patch) 2016-10-11 19:49 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (16.33 KB, patch) 2016-10-15 21:15 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (16.40 KB, patch) 2016-10-15 22:04 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (18.19 KB, patch) 2016-10-15 22:33 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (18.54 KB, patch) 2016-10-16 00:42 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (18.54 KB, patch) 2016-10-16 01:12 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (22.00 KB, patch) 2016-10-18 16:16 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (31.67 KB, patch) 2016-10-19 17:11 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Patch (31.56 KB, patch) 2016-11-03 18:08 PDT, Said Abou-Hallawa	no flags	Details Formatted Diff Diff
Show Obsolete (34) View All Add attachment proposed patch, testcase, etc.

Said Abou-Hallawa

Comment 1 2016-03-16 10:18:28 PDT

Created attachment 274193 [details] Patch

Said Abou-Hallawa

Comment 2 2016-03-16 10:28:22 PDT

Created attachment 274194 [details] Patch

Said Abou-Hallawa

Comment 3 2016-03-16 10:46:40 PDT

Created attachment 274195 [details] Patch

Said Abou-Hallawa

Comment 4 2016-03-16 11:56:44 PDT

Created attachment 274205 [details] Patch

Said Abou-Hallawa

Comment 5 2016-03-16 12:31:11 PDT

Created attachment 274209 [details] Patch for review

Said Abou-Hallawa

Comment 6 2016-09-01 12:58:19 PDT

Created attachment 287662 [details] Patch

Said Abou-Hallawa

Comment 7 2016-09-01 15:35:45 PDT

Created attachment 287693 [details] Patch

Said Abou-Hallawa

Comment 8 2016-09-01 15:53:55 PDT

Created attachment 287695 [details] Patch

Build Bot

Comment 9 2016-09-01 17:02:42 PDT

Comment on attachment 287695 [details] Patch Attachment 287695 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1990073 New failing tests: http/tests/misc/slow-loading-animated-image.html

Build Bot

Comment 10 2016-09-01 17:02:48 PDT

Created attachment 287703 [details] Archive of layout-test-results from ews117 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews117 Port: mac-yosemite Platform: Mac OS X 10.10.5

Build Bot

Comment 11 2016-09-01 17:08:45 PDT

Comment on attachment 287695 [details] Patch Attachment 287695 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/1990089 New failing tests: fast/images/animated-gif-restored-from-bfcache.html

Build Bot

Comment 12 2016-09-01 17:08:50 PDT

Created attachment 287704 [details] Archive of layout-test-results from ews125 for ios-simulator-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews125 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5

Said Abou-Hallawa

Comment 13 2016-09-01 17:39:12 PDT

Created attachment 287707 [details] Patch

Said Abou-Hallawa

Comment 14 2016-09-01 18:36:37 PDT

Created attachment 287712 [details] Patch

Said Abou-Hallawa

Comment 15 2016-09-02 09:13:20 PDT

Created attachment 287763 [details] Patch

Said Abou-Hallawa

Comment 16 2016-09-02 09:57:51 PDT

Created attachment 287768 [details] Patch

Said Abou-Hallawa

Comment 17 2016-09-02 10:18:12 PDT

Created attachment 287771 [details] Patch

Said Abou-Hallawa

Comment 18 2016-09-02 10:36:06 PDT

Created attachment 287775 [details] Patch

Build Bot

Comment 19 2016-09-02 11:36:00 PDT

Comment on attachment 287775 [details] Patch Attachment 287775 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1995098 New failing tests: fast/dom/HTMLFormElement/document-deactivation-callback-crash.html http/tests/misc/slow-loading-animated-image.html

Build Bot

Comment 20 2016-09-02 11:36:05 PDT

Created attachment 287789 [details] Archive of layout-test-results from ews117 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews117 Port: mac-yosemite Platform: Mac OS X 10.10.5

Said Abou-Hallawa

Comment 21 2016-09-02 12:12:03 PDT

Created attachment 287797 [details] Patch

Said Abou-Hallawa

Comment 22 2016-09-02 13:14:27 PDT

Created attachment 287807 [details] Patch

Build Bot

Comment 23 2016-09-02 14:35:14 PDT

Comment on attachment 287807 [details] Patch Attachment 287807 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1996213 New failing tests: fast/dom/HTMLElement/class-list-gc.html

Build Bot

Comment 24 2016-09-02 14:35:18 PDT

Created attachment 287816 [details] Archive of layout-test-results from ews114 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews114 Port: mac-yosemite Platform: Mac OS X 10.10.5

Said Abou-Hallawa

Comment 25 2016-09-02 15:08:54 PDT

The test fast/dom/HTMLElement/class-list-gc.html crashes on mac-debug. This test crashes once before on 9/1/2016: https://build.webkit.org/builders/Apple%20iOS%209%20Simulator%20Debug%20WK2%20(Tests)/builds/4910. Most likely the latest patch has nothing to do with this crash.

Said Abou-Hallawa

Comment 26 2016-09-02 15:09:57 PDT

Created attachment 287822 [details] Patch for review

Geoffrey Garen

Comment 27 2016-09-12 14:45:30 PDT

Comment on attachment 287822 [details] Patch for review View in context: https://bugs.webkit.org/attachment.cgi?id=287822&action=review I think the low-level details deserve another go-round here. Two other questions I asked Said: (1) Why is this patch a memory use regression on a benchmark that paints lots of animated GIFs? (2) On that same benchmark, what is the minimum number of concurrent animations for which this patch still shows a speedup? > Source/WTF/wtf/AtomicType.h:35 > +template<typename T> > +class AtomicType { std::atomic<T> already exists, and this class is way too expensive for clients that only want atomic types. So, I think we need a different class name here. I would call this Broadcast<T> or Watchable<T>. > Source/WTF/wtf/AtomicType.h:58 > + m_condition.notifyOne(); This needs to be notifyAll(). Otherwise, If two threads call waitUntilEqual passing in the same value, one will deadlock. > Source/WTF/wtf/CircularQueue.h:35 > +class CircularQueue { I would call this FixedQueue. > Source/WTF/wtf/CircularQueue.h:39 > + CircularQueue() > + { > + } This function should assert or static_assert that BufferSize is a power of two. Non-powers-of-two will be needlessly slow. > Source/WTF/wtf/CircularQueue.h:44 > + void enQueue(const T& value) Should be enqueue. > Source/WTF/wtf/CircularQueue.h:49 > + else > + m_rear = (m_rear + 1) % BufferSize; If m_rear wraps around and catches m_front, the behavior we want is to delete the oldest item and add a new item, right? But this code seems to delete the whole list and add a new item. That doesn't seem right. > Source/WTF/wtf/CircularQueue.h:54 > + T& deQueue() Should be dequeue. It's not safe for removal to return a reference; future operations may invalidate the reference. This function should return by value. > Source/WTF/wtf/CircularQueue.h:66 > + void empty() Let's call this clear(). > Source/WTF/wtf/CircularQueue.h:73 > + int m_front { -1 }; > + int m_rear { -1 }; I think we usually call these m_head and m_tail. m_head is the first item inserted, m_tail the last. It's not good to accept any size_t as a buffer size and then use int as an index. That invites numeric overflow / underflow. One way you could solve this would be to accept only int as BufferSize, and continue to use negative values as control markers. That might be good enough -- though first class data structures prefer to accept size_t for buffer size. Another way to solve this would be to adopt a separate state value indicating empty vs somewhat full vs full, or just indicating full and inferring the other cases as necessary. > Source/WTF/wtf/CircularQueue.h:78 > +class SafeCircularQueue : public CircularQueue<T, BufferSize> { This class should go in its own file, named SafeCircularQueue. I would call this ThreadSafeFixedQueue. > Source/WTF/wtf/CircularQueue.h:98 > + CircularQueue<T, BufferSize>::empty(); No need to namespace this call. If you need template name resolution, you can use "this->empty()". > Source/WTF/wtf/CircularQueue.h:109 > + CircularQueue<T, BufferSize>::enQueue(value); No need to namespace this call. If you need template name resolution, you can use "this->empty()". > Source/WTF/wtf/CircularQueue.h:125 > + T& value = CircularQueue<T, BufferSize>::deQueue(); > + if (CircularQueue<T, BufferSize>::isEmpty()) No need to namespace this call. If you need template name resolution, you can use "this->empty()". > Source/WTF/wtf/CircularQueue.h:134 > + Semaphore m_full; Does m_full signal being totally full, or does it just indicate being non-empty? It seems to indicate being non-empty. > Source/WTF/wtf/Semaphore.h:34 > +class Semaphore { It looks like AtomicType<int> could provide the same functionality as this class if you added a "waitUntilGreaterThan" function -- or, better, if you added a general-purpose "waitUntil" function that accepted an arbitrary Functor predicate, to which it passed its current value. I think that would be better than adding a distinct type with specialized operations for increment and decrement.

Said Abou-Hallawa

Comment 28 2016-10-05 12:21:19 PDT

Created attachment 290736 [details] Patch

Said Abou-Hallawa

Comment 29 2016-10-05 12:33:12 PDT

Created attachment 290737 [details] Patch

Said Abou-Hallawa

Comment 30 2016-10-05 12:44:00 PDT

Created attachment 290742 [details] Patch

Said Abou-Hallawa

Comment 31 2016-10-05 14:29:20 PDT

Created attachment 290749 [details] Patch for EWS. Includes the patch of wk162478

Said Abou-Hallawa

Comment 32 2016-10-05 14:32:05 PDT

Created attachment 290750 [details] Patch for review

Said Abou-Hallawa

Comment 33 2016-10-06 17:36:42 PDT

(In reply to comment #27) > Comment on attachment 287822 [details] > Patch for review > > View in context: > https://bugs.webkit.org/attachment.cgi?id=287822&action=review > > I think the low-level details deserve another go-round here. > > Two other questions I asked Said: > > (1) Why is this patch a memory use regression on a benchmark that paints > lots of animated GIFs? > I did some logging in the image decoding code. 1. I created a stress test page which has ten different large animated images; each of them is repeated four times. 2. I logged a few metrics about the decodedFrames, drawnFrames and memoryFrames and the decodedSize. 3. I ran the test page with synchronous and asynchronous mode each for 200 seconds. 4. I watched the WebKit process in the ActivityMonitor on Mac. And here is what the logging gave me: decodedFrames drawnFrames memoryFrames decodedSize. Sync 21886 103748 55 36.61 MB Async 48407 342028 74 54.38 MB And here is what the ActivityMonitor gave me: CPU Max Memory Sync 100% 281 MB Async 220% 323 MB -- Asynchronous image decoding drew 300% more than what the synchronous image decoding did. -- Caching the frames are responsible of increasing the memory set by 18MB. -- The ActivityMonitor shows an increase of 42MB. The remaining 24MB should be an extra overhead in WebKit and CoreGraphics. -- A jetsam can happen earlier in the case of asynchronous image decoding. > (2) On that same benchmark, what is the minimum number of concurrent > animations for which this patch still shows a speedup? > I think this question should be answered after figuring out what we should do for the memory regression. Suppose the test case gives 20 fps with the synchronous mode and it gives 60 fps with asynchronous image decoding. Having the memory grows differently with the two modes will not give an accurate answer. If we manage to have the same memory level in both modes, then with the a synchronous mode I can keep adding images till I get the same fps I get with synchronous mode.

Said Abou-Hallawa

Comment 34 2016-10-11 19:49:40 PDT

Created attachment 291329 [details] Patch

Said Abou-Hallawa

Comment 35 2016-10-15 21:15:38 PDT

Created attachment 291734 [details] Patch

Said Abou-Hallawa

Comment 36 2016-10-15 22:04:44 PDT

Created attachment 291735 [details] Patch

Said Abou-Hallawa

Comment 37 2016-10-15 22:33:42 PDT

Created attachment 291736 [details] Patch

Said Abou-Hallawa

Comment 38 2016-10-16 00:42:32 PDT

Created attachment 291737 [details] Patch

Said Abou-Hallawa

Comment 39 2016-10-16 01:12:51 PDT

Created attachment 291738 [details] Patch

Geoffrey Garen

Comment 40 2016-10-17 11:12:13 PDT

> > (2) On that same benchmark, what is the minimum number of concurrent > > animations for which this patch still shows a speedup? > > > > I think this question should be answered after figuring out what we should > do for the memory regression. Suppose the test case gives 20 fps with the > synchronous mode and it gives 60 fps with asynchronous image decoding. > Having the memory grows differently with the two modes will not give an > accurate answer. If we manage to have the same memory level in both modes, > then with the a synchronous mode I can keep adding images till I get the > same fps I get with synchronous mode. Can we answer this question now?

Said Abou-Hallawa

Comment 41 2016-10-18 10:49:57 PDT

(In reply to comment #40) > > > (2) On that same benchmark, what is the minimum number of concurrent > > > animations for which this patch still shows a speedup? > > > > > > > I think this question should be answered after figuring out what we should > > do for the memory regression. Suppose the test case gives 20 fps with the > > synchronous mode and it gives 60 fps with asynchronous image decoding. > > Having the memory grows differently with the two modes will not give an > > accurate answer. If we manage to have the same memory level in both modes, > > then with the a synchronous mode I can keep adding images till I get the > > same fps I get with synchronous mode. > > Can we answer this question now? Are we okay with the memory regression which is could be encountered when asynchrnoulsy decoding an animated image frames? As I mentioned above both CG and WebKit are responsible of this memory regression.

Simon Fraser (smfr)

Comment 42 2016-10-18 12:31:56 PDT

Comment on attachment 291738 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=291738&action=review > Source/WebCore/ChangeLog:17 > + The goal is have the decoded image ready before it has to be drawn. So > + the image does not have to wait the current frame to be decoded. The This should be one sentence. > Source/WebCore/ChangeLog:21 > + trickiest part is when should decoding the frame start? If it is too early > + the decoded frames might be destroyed even before they are used by the > + MemoryCache which means decoding them was a waste of resources. And if it > + is too late, the image decoding will end up being synchronous. This describes the question, but not how you solved it. > Source/WebCore/platform/graphics/ImageFrame.h:85 > + enum class Decoding { Empty, BeingDecoded, Partial, Complete }; Can a frame be Partial and also BeingDecoded? I assume Partial means that we haven't received all the data from the network yet. > Source/WebCore/platform/graphics/ImageFrameCache.cpp:221 > + if (!m_decodingQueue) > + m_decodingQueue = WorkQueue::create("org.webkit.ImageDecoder", WorkQueue::Type::Serial, WorkQueue::QOS::UserInteractive); I think one work queue per image might be too heavyweight, but you need the behavior of a serial queue so I'm not sure how to get that with a shared queue. Does creating this queue show up on profiles? > Source/WebCore/platform/graphics/ImageFrameCache.cpp:223 > + ASSERT(m_decodingQueue); Why? > Source/WebCore/platform/graphics/ImageFrameCache.cpp:229 > + if (isAsyncDecodingStarted()) Maybe hasAsyncDecodingStarted(). However, this naming is ambiguous since it suggests it could go back to false. I'd just say hasDecodingQueue() > Source/WebCore/platform/graphics/ImageFrameCache.cpp:240 > + decodingQueue()->dispatch([this] { > + ImageFrameRequest frameRequest; > + > + while (m_frameRequestQueue.dequeue(frameRequest)) > + cacheFrameAtIndex(frameRequest.index, frameRequest.subsamplingLevel); > + > + // Signal the main thread that the decoding loop has been terminated. > + m_closeSemaphore.signal(); > + }); What keeps |this| alive while this queue is running? I'm not clear on whether, for a single ImageFrameCache, this loop is ever started more than once. Can the queue go empty, and then can someone call requestFrameAsyncDecodingAtIndex() and put more things in the queue? > Source/WebCore/platform/graphics/ImageFrameCache.cpp:254 > + if (frame.hasNativeImage() && !frame.hasInvalidNativeImage(subsamplingLevel)) Confusing logic. Can frame have hasValidNativeImage(subsamplingLevel) ? > Source/WebCore/platform/graphics/ImageFrameCache.cpp:269 > + // Block execution till m_closeSemaphore is signaled by the decoding WorkQueue > + m_closeSemaphore.wait(std::numeric_limits<double>::max()); This seems bad. I don't think we ever want to block the main thread when tearing down images; we should just allow the dispatch to get orphaned. > Source/WebCore/platform/graphics/ImageFrameCache.cpp:273 > +void ImageFrameCache::cacheFrameAtIndex(size_t index, SubsamplingLevel subsamplingLevel) This needs a comment saying that it can be called on a non-main thread. > Source/WebCore/platform/graphics/ImageFrameCache.cpp:279 > + if (!isAsyncDecodingStarted()) > + return; How is that possible? > Source/WebCore/platform/graphics/ImageFrameCache.cpp:282 > + ImageFrame& frame = m_frames[index]; Is there any chance that the main thread has changed m_frames.size() behind our back? > Source/WebCore/platform/graphics/ImageFrameCache.cpp:293 > + // Clear the current image frame and update the observer with this clearance. > + unsigned decodedSize = frame.clear(); > + decodedSizeDecreased(decodedSize); > + > + // Copy the new image to the cache. > + setFrameNativeImageAtIndex(nativeImage, index, subsamplingLevel); > + > + // Update the observer with the new image frame > + decodedSize = frame.frameBytes(); > + decodedSizeIncreased(decodedSize); I think this could live in a new function called replaceFrameAtIndex() or something. > Source/WebCore/platform/graphics/ImageFrameCache.h:142 > + BinarySemaphore m_closeSemaphore; What does "close" mean in this context? Isn't it more like m_decodingCompleteSemaphore? > Source/WebCore/platform/graphics/ImageSource.cpp:184 > + if (!m_frameCache.isAsyncDecodingStarted()) > + m_frameCache.startAsyncDecodingLoop(); startAsyncDecodingLoop() checks isAsyncDecodingStarted() so don't do it here. You could rename startAsyncDecodingLoop() to prepareForAsyncDecoding(), but why not just call that from requestFrameAsyncDecodingAtIndex()?

Geoffrey Garen

Comment 43 2016-10-18 13:49:37 PDT

> Are we okay with the memory regression which is could be encountered when > asynchrnoulsy decoding an animated image frames? As I mentioned above both > CG and WebKit are responsible of this memory regression. I don't know. Some information that can help us decide if we're OK with the increase in memory use: (1) What is the minimum number of concurrent animations for which this patch still shows a speedup? (2) In cases where this patch does not show a speedup, does it still use more memory? If so, how much more? If this patch is a big speedup in a lot of animated cases, and the only reason for increased memory use is that we do more decoding, which the slow case artificially skips by virtue of skipping frames, then we're probably OK with the increase in memory use, or able to work around it. On the other hand, if this patch is only a speedup on a contrived test case, or it increases memory use in all cases, even when it's not a speedup, then we probably need to make a change.

Said Abou-Hallawa

Comment 44 2016-10-18 16:10:57 PDT

Comment on attachment 291738 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=291738&action=review >> Source/WebCore/ChangeLog:17 >> + the image does not have to wait the current frame to be decoded. The > > This should be one sentence. Fixed. >> Source/WebCore/ChangeLog:21 >> + is too late, the image decoding will end up being synchronous. > > This describes the question, but not how you solved it. I did not solve it in this patch. This patch does not change the behavior in any way. All it does is just adding a new mode for painting an image. This question should be answered in the following bugs: https://bugs.webkit.org/show_bug.cgi?id=161566 https://bugs.webkit.org/show_bug.cgi?id=161705 >> Source/WebCore/platform/graphics/ImageFrame.h:85 >> + enum class Decoding { Empty, BeingDecoded, Partial, Complete }; > > Can a frame be Partial and also BeingDecoded? I assume Partial means that we haven't received all the data from the network yet. BeingDecoded means we request this frame to be decoded. After finishing decoding the frame, its decoding status might be Partial or Complete. The meaning of Partial is as you described: we haven't received all the data from the network yet. >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:221 >> + m_decodingQueue = WorkQueue::create("org.webkit.ImageDecoder", WorkQueue::Type::Serial, WorkQueue::QOS::UserInteractive); > > I think one work queue per image might be too heavyweight, but you need the behavior of a serial queue so I'm not sure how to get that with a shared queue. > > Does creating this queue show up on profiles? Creating a WorkQueue is very cheap and it happens only once per image. Although the queue itself is of type serial, having one queue per image allows parallelism. We have four possible combinations here: 1. One shared queue with Type::Serial: This will not allow parallelism and it will not scale well with many animated images in the page. 2. One shared queue with Type::Concurrent: This will allow parallelism but sharing the queue adds unwanted code complexity and the benefit is minimal. 3. Multiple queues each with Type::Serial: This is what is implemented here. It provides simpler interface and allows parallelism. 4. Multiple queues each with Type:: Concurrent: This should not be used for image decoding since decoding the image frames should happen in order. >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:223 >> + ASSERT(m_decodingQueue); > > Why? I wanted to make sure m_decodingQueue returns a valid pointer. Deleted. >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:229 >> + if (isAsyncDecodingStarted()) > > Maybe hasAsyncDecodingStarted(). However, this naming is ambiguous since it suggests it could go back to false. I'd just say hasDecodingQueue() Done. >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:240 >> + }); > > What keeps |this| alive while this queue is running? > > I'm not clear on whether, for a single ImageFrameCache, this loop is ever started more than once. Can the queue go empty, and then can someone call requestFrameAsyncDecodingAtIndex() and put more things in the queue? Once we create the queue and start the decoding loop, the decoding loop will stay alive till stopAsyncDecodingLoop() is called. stopAsyncDecodingLoop() itself will block execution till the decoding loop is terminated and then it deletes m_decodingQueue. This workflow has to be guaranteed by the caller. I added a destructor and in it I added an assertion that the decoding loop is not active. >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:254 >> + if (frame.hasNativeImage() && !frame.hasInvalidNativeImage(subsamplingLevel)) > > Confusing logic. Can frame have hasValidNativeImage(subsamplingLevel) ? hasInvalidNativeImage() returns true if the frame as image but this image was sampled with a different subsamplingLevel. In this case we need to redecode the frame but with the new subsamplingLevel. The condition is ugly because hasInvalidNativeImage() returns false if the frame does not have an image, I changed ImageFrame::hasInvalidNativeImage() to be ImageFrame::hasValidNativeImage() with reversing the logic. I also replaced all hasInvalidNativeImage() to be !hasValidNativeImage(). The above statement became if (frame.hasValidNativeImage(subsamplingLevel)) return; >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:269 >> + m_closeSemaphore.wait(std::numeric_limits<double>::max()); > > This seems bad. I don't think we ever want to block the main thread when tearing down images; we should just allow the dispatch to get orphaned. I do not think it is bad. This is the only way to prevent the image decoding loop from accessing a deleted pointer. Once cacheFrameAtIndex() returns, this semaphore will be signaled. Decoding an image should not be a lengthy operation. It should happens in few milli-seconds. Calling stopAsyncDecodingLoop() should only happen when the image renderers are deleted or the animation is finished (number of repetition is reached). >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:273 >> +void ImageFrameCache::cacheFrameAtIndex(size_t index, SubsamplingLevel subsamplingLevel) > > This needs a comment saying that it can be called on a non-main thread. I added ASSERT(!isMainThread()) at the top of this function to emphasize that this function should not be called on the main thread. >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:279 >> + return; > > How is that possible? Main thread: stopAsyncDecodingLoop() is called. Main thread: stopAsyncDecodingLoop() closes m_frameRequestQueue. Main thread: stopAsyncDecodingLoop() is blocked because of m_closeSemaphore. Decoding thread: cacheFrameAtIndex() is running until CG finishes decoding the frame. It returns to the decoding loop. Decoding thread: the decoding loop termites and signals m_closeSemaphore. Main thread: stopAsyncDecodingLoop() deletes the queue and returns. Main thread: The code below gets invoked to insert the frame in the cache. >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:282 >> + ImageFrame& frame = m_frames[index]; > > Is there any chance that the main thread has changed m_frames.size() behind our back? Changing the frames.size() can happen because the decoder changes the frameCount when it receives more data. But we always increase the size. See ImageFrameCache::growFrames(). >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:293 >> + decodedSizeIncreased(decodedSize); > > I think this could live in a new function called replaceFrameAtIndex() or something. Done. >> Source/WebCore/platform/graphics/ImageFrameCache.h:142 >> + BinarySemaphore m_closeSemaphore; > > What does "close" mean in this context? Isn't it more like m_decodingCompleteSemaphore? Done. >> Source/WebCore/platform/graphics/ImageSource.cpp:184 >> + m_frameCache.startAsyncDecodingLoop(); > > startAsyncDecodingLoop() checks isAsyncDecodingStarted() so don't do it here. You could rename startAsyncDecodingLoop() to prepareForAsyncDecoding(), but why not just call that from requestFrameAsyncDecodingAtIndex()? Done. This function is moved to the header file and it just calls m_frameCache.requestFrameAsyncDecodingAtIndex(index, subsamplingLevel).

Said Abou-Hallawa

Comment 45 2016-10-18 16:16:07 PDT

Created attachment 291997 [details] Patch

Simon Fraser (smfr)

Comment 46 2016-10-18 16:25:31 PDT

> >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:269 > >> + m_closeSemaphore.wait(std::numeric_limits<double>::max()); > > > > This seems bad. I don't think we ever want to block the main thread when tearing down images; we should just allow the dispatch to get orphaned. > > I do not think it is bad. This is the only way to prevent the image decoding > loop from accessing a deleted pointer. Once cacheFrameAtIndex() returns, > this semaphore will be signaled. > > Decoding an image should not be a lengthy operation. It should happens in > few milli-seconds. Calling stopAsyncDecodingLoop() should only happen when > the image renderers are deleted or the animation is finished (number of > repetition is reached). But the whole point of this patch is to prevent the UI thread from being blocked on image decoding. We will have failed if still have a code path that can potentially block on image decoding. We've just move the problem around. For example, we may now hit this wait() under GC code for an <img> that is being collected.

Said Abou-Hallawa

Comment 47 2016-10-19 17:11:35 PDT

Created attachment 292128 [details] Patch

Said Abou-Hallawa

Comment 48 2016-10-19 17:25:20 PDT

(In reply to comment #46) > > >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:269 > > >> + m_closeSemaphore.wait(std::numeric_limits<double>::max()); > > > > > > This seems bad. I don't think we ever want to block the main thread when tearing down images; we should just allow the dispatch to get orphaned. > > > > I do not think it is bad. This is the only way to prevent the image decoding > > loop from accessing a deleted pointer. Once cacheFrameAtIndex() returns, > > this semaphore will be signaled. > > > > Decoding an image should not be a lengthy operation. It should happens in > > few milli-seconds. Calling stopAsyncDecodingLoop() should only happen when > > the image renderers are deleted or the animation is finished (number of > > repetition is reached). > > But the whole point of this patch is to prevent the UI thread from being > blocked on image decoding. We will have failed if still have a code path > that can potentially block on image decoding. We've just move the problem > around. For example, we may now hit this wait() under GC code for an <img> > that is being collected. Okay this is what I did in the last patch: -- ImageFrameCache is now a RefCounted class. -- ImageSource has a member of type Ref<ImageFrameCache> to allow ref-counting it -- The decoding thread protects the ImageFrameCache and WorkQueue by add-ref 'this' and decodeQueue(). They will be deref-ed when exiting the decoding thread. -- Because the ImageSource::m_decoder is shared between ImageSource and ImageFrameCache, it is now protected from being set to nullptr if we started a decoding thread. This is because when doing asynchronous image decoding we own all the frames memory and no need to delete the ImageDecoder to get rid of all the frames.

Said Abou-Hallawa

Comment 49 2016-10-24 13:46:25 PDT

Regarding whether we should use a single concurrent queue or use multiple serial queues, I did the following experiment: I created a page with 11 different animated images with different sizes and different number of frames. I repeated each image 4 times. I measured the missing frames in both cases. By the missing frame I mean the following: the frame that is not ready for painting when its time fires. In other words I measured how many times, the decoding time was more than the frame duration time. I ran the experiment for each case three times; each time for five minutes. I got the total number of the missing frames for all the images. Run 1 Run 2 Run 3 Average Average (frames/11 images/4 repetition/5 minutes) frames/image/second Single concurrent queue: 13062 13957 13308 13442 1.02 Multiple serial queues: 15150 14457 15076 14894 1.13 I also traced the WebKit in the debugger and I found there are 11 threads created by the system in both cases. Since the missing frames are less in the case of the multiple serial queues than in the case of the single concurrent queue and since the threading overhead is the same in both cases, I do not think there is a need to use the single concurrent queue. Also the code is cleaner in the case of multiple serial queues.

Said Abou-Hallawa

Comment 50 2016-10-25 11:44:12 PDT

Regarding the memory regression of the asynchronous image decoding, I measured the peak memory allocated by WebKit in a period of time. The result was, on average, the peak memory of the asynchronous decoding is 3% more the peak of the synchronous decoding. Here are the details: 1. 11 animated images each repeated 4 times. The size of the images ranges from 2MB to 75MB. 2. I did five runs, each run lasts 5 minutes. During these 5 minutes I measured the average FPS and the maximum peak memory. Average Run 1 Run 2 Run 3 Run 4 Run 5 Synchronous Average FPS 30.7 31.1 30.77 30.66 30.57 30.40 Memory peak 508.09 513.13 512.64 506.26 496.56 511.86 Asynchronous Average FPS 53.288 54.45 52.80 52.47 53.07 53.65 Memory peak 523.444 545.25 519.27 507.21 515.27 530.22 The memory regression = (523.444 - 508.09) / 508.09 * 100 = 3.02%

Said Abou-Hallawa

Comment 51 2016-11-03 11:39:12 PDT

This patch is ready for review. -- For the memory regression in CG, the issue is in a system component that is being investigated separately. But we might have a simple workaround in Webkit to overcome this regression. -- For the single concurrent work queue versus the multiple serial queues, I think the cost, the overhead and the benefit in both cases are almost the same. I can to change the patch to use a single concurrent work queue if this will look better.

Geoffrey Garen

Comment 52 2016-11-03 11:56:44 PDT

I would still like an answer to this question: (1) What is the minimum number of concurrent animations for which this patch still shows a speedup?

Said Abou-Hallawa

Comment 53 2016-11-03 13:35:23 PDT

Here is the WK2 comparison between the frame rate of animated image using async versus sync image decoding with different number of images: # images Async Sync_no_catchup* improvement% Sync_catchup** improvement% 1 58.84 58.44 0.68 58.55 0.50 2 58.80 56.02 4.96 57.30 2.62 3 58.67 56.14 4.51 57 2.93 4 58.19 54.50 6.77 54.96 5.88 5 55.80 52.03 7.25 51.45 8.45 6 53.70 51.75 3.77 52.17 2.93 7 53.35 43.26 23.32 25.35 110.45 8 54.19 34.10 58.91 20.72 161.53 9 53.21 34.91 52.42 19.14 178.00 10 51.97 26.75 94.28 14.62 255.47 11 51.74 25.11 106.05 13.66 278.77 [*] A webkit build after removing the animation catchup code: <http://trac.webkit.org/changeset/207386>. [**] A webkit build before removing the animation catchup code: <http://trac.webkit.org/changeset/207386>. So it looks like with number of images greater than 6, the frame rate drops significantly. For WK1, the frame rate is not the only improvement but the responsiveness as well like editing and scrolling.

Simon Fraser (smfr)

Comment 54 2016-11-03 16:57:48 PDT

Comment on attachment 292128 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=292128&action=review > Source/WebCore/ChangeLog:18 > + Because we have to keep the allocated memory for an image under control, > + the BitmapImage may delete all or some of its decoded frames. MemoryCache > + may request it to do so. Or it will delete its frames if the total size > + exceeds some threshold (2MB for iOS, 5MB for other ports). This means large > + images or animated images with many frames may need to decode its frames > + all the times it's drawn. This is describing current behavior, so I don't think it needs to go into the changelog. > Source/WebCore/platform/graphics/ImageFrameCache.cpp:267 > + // We need to protect this and m_decodingQueue from being deleted while we are in the decoding loop. > + decodingQueue()->dispatch([this, protectedThis = Ref<ImageFrameCache>(*this), protectedQueue = decodingQueue()] { Normally we would make Ref<ImageFrameCache>(*this) a local variable, named "protectedThis". "this" also owns the queue, so I don't think you need to protect the queue too. > Source/WebCore/platform/graphics/ImageSource.h:68 > + bool isAsyncDecodingRequired() { return size().area() * sizeof(RGBA32) >= 100 * KB; } It's pretty subtle that you're hiding this important policy decision in an inline function. Make it not-inline. Does this enable async decoding right now? I don't see a setting being consulted.

Said Abou-Hallawa

Comment 55 2016-11-03 18:08:34 PDT

Created attachment 293839 [details] Patch

Said Abou-Hallawa

Comment 56 2016-11-03 18:12:51 PDT

Comment on attachment 292128 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=292128&action=review >> Source/WebCore/ChangeLog:18 >> + all the times it's drawn. > > This is describing current behavior, so I don't think it needs to go into the changelog. Deleted. >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:267 >> + decodingQueue()->dispatch([this, protectedThis = Ref<ImageFrameCache>(*this), protectedQueue = decodingQueue()] { > > Normally we would make Ref<ImageFrameCache>(*this) a local variable, named "protectedThis". "this" also owns the queue, so I don't think you need to protect the queue too. protectedThis is defined as a local variable. But I had to keep protectedQueue also because it can be deleted by stopAsyncDecodingQueue() which sets m_decodingQueue to nullptr. >> Source/WebCore/platform/graphics/ImageSource.h:68 >> + bool isAsyncDecodingRequired() { return size().area() * sizeof(RGBA32) >= 100 * KB; } > > It's pretty subtle that you're hiding this important policy decision in an inline function. Make it not-inline. Does this enable async decoding right now? I don't see a setting being consulted. This function is moved to the source file. No async decoding is enabled by this patch. The next patch https://bugs.webkit.org/show_bug.cgi?id=161566 will enable it for animated images.

WebKit Commit Bot

Comment 57 2016-11-03 20:00:48 PDT

Comment on attachment 293839 [details] Patch Clearing flags on attachment: 293839 Committed r208365: <http://trac.webkit.org/changeset/208365>

WebKit Commit Bot

Comment 58 2016-11-03 20:00:56 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Images

Assignee

Said Abou-Hallawa

Reported

2016-03-16 10:13 PDT

Modified

2016-11-03 20:00 PDT History

CC List

13 users Show

URL

Keywords

Depends on

155498 162478 163298

Blocks

155322 155566 161566

Dependencies

tree graph