Bug 155546 - Add the asynchronous image decoding mode
Summary: Add the asynchronous image decoding mode
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Images (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Said Abou-Hallawa
URL:
Keywords:
Depends on: 155498 162478 163298
Blocks: 155322 155566 161566
  Show dependency treegraph
 
Reported: 2016-03-16 10:13 PDT by Said Abou-Hallawa
Modified: 2016-11-03 20:00 PDT (History)
13 users (show)

See Also:


Attachments
Patch (182.96 KB, patch)
2016-03-16 10:18 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (183.05 KB, patch)
2016-03-16 10:28 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (174.78 KB, patch)
2016-03-16 10:46 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (178.26 KB, patch)
2016-03-16 11:56 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch for review (30.72 KB, patch)
2016-03-16 12:31 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (372.16 KB, patch)
2016-09-01 12:58 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (432.80 KB, patch)
2016-09-01 15:35 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (433.95 KB, patch)
2016-09-01 15:53 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews117 for mac-yosemite (1.50 MB, application/zip)
2016-09-01 17:02 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews125 for ios-simulator-elcapitan-wk2 (5.68 MB, application/zip)
2016-09-01 17:08 PDT, Build Bot
no flags Details
Patch (443.29 KB, patch)
2016-09-01 17:39 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (443.39 KB, patch)
2016-09-01 18:36 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (441.33 KB, patch)
2016-09-02 09:13 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (441.23 KB, patch)
2016-09-02 09:57 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (441.73 KB, patch)
2016-09-02 10:18 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (441.73 KB, patch)
2016-09-02 10:36 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews117 for mac-yosemite (1.64 MB, application/zip)
2016-09-02 11:36 PDT, Build Bot
no flags Details
Patch (441.57 KB, patch)
2016-09-02 12:12 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (441.35 KB, patch)
2016-09-02 13:14 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews114 for mac-yosemite (1.56 MB, application/zip)
2016-09-02 14:35 PDT, Build Bot
no flags Details
Patch for review (107.28 KB, patch)
2016-09-02 15:09 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (47.26 KB, patch)
2016-10-05 12:21 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (48.70 KB, patch)
2016-10-05 12:33 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (54.01 KB, patch)
2016-10-05 12:44 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch for EWS. Includes the patch of wk162478 (54.50 KB, patch)
2016-10-05 14:29 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch for review (29.26 KB, patch)
2016-10-05 14:32 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (15.56 KB, patch)
2016-10-11 19:49 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (16.33 KB, patch)
2016-10-15 21:15 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (16.40 KB, patch)
2016-10-15 22:04 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (18.19 KB, patch)
2016-10-15 22:33 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (18.54 KB, patch)
2016-10-16 00:42 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (18.54 KB, patch)
2016-10-16 01:12 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (22.00 KB, patch)
2016-10-18 16:16 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (31.67 KB, patch)
2016-10-19 17:11 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (31.56 KB, patch)
2016-11-03 18:08 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Said Abou-Hallawa 2016-03-16 10:13:39 PDT
In case of large images, the image decoding will start immediately after having the size of the image. A new class named ImageFrameDecoder which inherits form ImageDecoder will be added to manage the fork. In ImageFrameDecoder:: setData() and after knowing the size of the image is available, a heretic will be checked to decide whether to an asynchronous image decoding is needed or not.  If it is needed, a separate thread will be created to do the decoding such that the main thread is not blocked. In the main thread side, ImageSource::frameAtIndex() will check whether the ImageDecoder is in asynchronous decoding mode or not. If it is, no caching will be carried out on the main thread. ImageSource::frameAtIndex(), will return whatever is stored in the ImageFraeDataBuffer[index] regardless whether it is complete or not.
Comment 1 Said Abou-Hallawa 2016-03-16 10:18:28 PDT
Created attachment 274193 [details]
Patch
Comment 2 Said Abou-Hallawa 2016-03-16 10:28:22 PDT
Created attachment 274194 [details]
Patch
Comment 3 Said Abou-Hallawa 2016-03-16 10:46:40 PDT
Created attachment 274195 [details]
Patch
Comment 4 Said Abou-Hallawa 2016-03-16 11:56:44 PDT
Created attachment 274205 [details]
Patch
Comment 5 Said Abou-Hallawa 2016-03-16 12:31:11 PDT
Created attachment 274209 [details]
Patch for review
Comment 6 Said Abou-Hallawa 2016-09-01 12:58:19 PDT
Created attachment 287662 [details]
Patch
Comment 7 Said Abou-Hallawa 2016-09-01 15:35:45 PDT
Created attachment 287693 [details]
Patch
Comment 8 Said Abou-Hallawa 2016-09-01 15:53:55 PDT
Created attachment 287695 [details]
Patch
Comment 9 Build Bot 2016-09-01 17:02:42 PDT
Comment on attachment 287695 [details]
Patch

Attachment 287695 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1990073

New failing tests:
http/tests/misc/slow-loading-animated-image.html
Comment 10 Build Bot 2016-09-01 17:02:48 PDT
Created attachment 287703 [details]
Archive of layout-test-results from ews117 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 11 Build Bot 2016-09-01 17:08:45 PDT
Comment on attachment 287695 [details]
Patch

Attachment 287695 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/1990089

New failing tests:
fast/images/animated-gif-restored-from-bfcache.html
Comment 12 Build Bot 2016-09-01 17:08:50 PDT
Created attachment 287704 [details]
Archive of layout-test-results from ews125 for ios-simulator-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-elcapitan-wk2  Platform: Mac OS X 10.11.5
Comment 13 Said Abou-Hallawa 2016-09-01 17:39:12 PDT
Created attachment 287707 [details]
Patch
Comment 14 Said Abou-Hallawa 2016-09-01 18:36:37 PDT
Created attachment 287712 [details]
Patch
Comment 15 Said Abou-Hallawa 2016-09-02 09:13:20 PDT
Created attachment 287763 [details]
Patch
Comment 16 Said Abou-Hallawa 2016-09-02 09:57:51 PDT
Created attachment 287768 [details]
Patch
Comment 17 Said Abou-Hallawa 2016-09-02 10:18:12 PDT
Created attachment 287771 [details]
Patch
Comment 18 Said Abou-Hallawa 2016-09-02 10:36:06 PDT
Created attachment 287775 [details]
Patch
Comment 19 Build Bot 2016-09-02 11:36:00 PDT
Comment on attachment 287775 [details]
Patch

Attachment 287775 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1995098

New failing tests:
fast/dom/HTMLFormElement/document-deactivation-callback-crash.html
http/tests/misc/slow-loading-animated-image.html
Comment 20 Build Bot 2016-09-02 11:36:05 PDT
Created attachment 287789 [details]
Archive of layout-test-results from ews117 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 21 Said Abou-Hallawa 2016-09-02 12:12:03 PDT
Created attachment 287797 [details]
Patch
Comment 22 Said Abou-Hallawa 2016-09-02 13:14:27 PDT
Created attachment 287807 [details]
Patch
Comment 23 Build Bot 2016-09-02 14:35:14 PDT
Comment on attachment 287807 [details]
Patch

Attachment 287807 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1996213

New failing tests:
fast/dom/HTMLElement/class-list-gc.html
Comment 24 Build Bot 2016-09-02 14:35:18 PDT
Created attachment 287816 [details]
Archive of layout-test-results from ews114 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 25 Said Abou-Hallawa 2016-09-02 15:08:54 PDT
The test fast/dom/HTMLElement/class-list-gc.html crashes on mac-debug. This test crashes once before on 9/1/2016: https://build.webkit.org/builders/Apple%20iOS%209%20Simulator%20Debug%20WK2%20(Tests)/builds/4910. Most likely the latest patch has nothing to do with this crash.
Comment 26 Said Abou-Hallawa 2016-09-02 15:09:57 PDT
Created attachment 287822 [details]
Patch for review
Comment 27 Geoffrey Garen 2016-09-12 14:45:30 PDT
Comment on attachment 287822 [details]
Patch for review

View in context: https://bugs.webkit.org/attachment.cgi?id=287822&action=review

I think the low-level details deserve another go-round here.

Two other questions I asked Said:

(1) Why is this patch a memory use regression on a benchmark that paints lots of animated GIFs?

(2) On that same benchmark, what is the minimum number of concurrent animations for which this patch still shows a speedup?

> Source/WTF/wtf/AtomicType.h:35
> +template<typename T>
> +class AtomicType {

std::atomic<T> already exists, and this class is way too expensive for clients that only want atomic types. So, I think we need a different class name here.

I would call this Broadcast<T> or Watchable<T>.

> Source/WTF/wtf/AtomicType.h:58
> +        m_condition.notifyOne();

This needs to be notifyAll(). Otherwise, If two threads call waitUntilEqual passing in the same value, one will deadlock.

> Source/WTF/wtf/CircularQueue.h:35
> +class CircularQueue {

I would call this FixedQueue.

> Source/WTF/wtf/CircularQueue.h:39
> +    CircularQueue()
> +    {
> +    }

This function should assert or static_assert that BufferSize is a power of two. Non-powers-of-two will be needlessly slow.

> Source/WTF/wtf/CircularQueue.h:44
> +    void enQueue(const T& value)

Should be enqueue.

> Source/WTF/wtf/CircularQueue.h:49
> +        else
> +            m_rear = (m_rear + 1) % BufferSize;

If m_rear wraps around and catches m_front, the behavior we want is to delete the oldest item and add a new item, right? But this code seems to delete the whole list and add a new item. That doesn't seem right.

> Source/WTF/wtf/CircularQueue.h:54
> +    T& deQueue()

Should be dequeue.

It's not safe for removal to return a reference; future operations may invalidate the reference. This function should return by value.

> Source/WTF/wtf/CircularQueue.h:66
> +    void empty()

Let's call this clear().

> Source/WTF/wtf/CircularQueue.h:73
> +    int m_front { -1 };
> +    int m_rear { -1 };

I think we usually call these m_head and m_tail. m_head is the first item inserted, m_tail the last.

It's not good to accept any size_t as a buffer size and then use int as an index. That invites numeric overflow / underflow.

One way you could solve this would be to accept only int as BufferSize, and continue to use negative values as control markers. That might be good enough -- though first class data structures prefer to accept size_t for buffer size.

Another way to solve this would be to adopt a separate state value indicating empty vs somewhat full vs full, or just indicating full and inferring the other cases as necessary.

> Source/WTF/wtf/CircularQueue.h:78
> +class SafeCircularQueue : public CircularQueue<T, BufferSize> {

This class should go in its own file, named SafeCircularQueue.

I would call this ThreadSafeFixedQueue.

> Source/WTF/wtf/CircularQueue.h:98
> +        CircularQueue<T, BufferSize>::empty();

No need to namespace this call. If you need template name resolution, you can use "this->empty()".

> Source/WTF/wtf/CircularQueue.h:109
> +            CircularQueue<T, BufferSize>::enQueue(value);

No need to namespace this call. If you need template name resolution, you can use "this->empty()".

> Source/WTF/wtf/CircularQueue.h:125
> +            T& value = CircularQueue<T, BufferSize>::deQueue();
> +            if (CircularQueue<T, BufferSize>::isEmpty())

No need to namespace this call. If you need template name resolution, you can use "this->empty()".

> Source/WTF/wtf/CircularQueue.h:134
> +    Semaphore m_full;

Does m_full signal being totally full, or does it just indicate being non-empty? It seems to indicate being non-empty.

> Source/WTF/wtf/Semaphore.h:34
> +class Semaphore {

It looks like AtomicType<int> could provide the same functionality as this class if you added a "waitUntilGreaterThan" function -- or, better, if you added a general-purpose "waitUntil" function that accepted an arbitrary Functor predicate, to which it passed its current value. I think that would be better than adding a distinct type with specialized operations for increment and decrement.
Comment 28 Said Abou-Hallawa 2016-10-05 12:21:19 PDT
Created attachment 290736 [details]
Patch
Comment 29 Said Abou-Hallawa 2016-10-05 12:33:12 PDT
Created attachment 290737 [details]
Patch
Comment 30 Said Abou-Hallawa 2016-10-05 12:44:00 PDT
Created attachment 290742 [details]
Patch
Comment 31 Said Abou-Hallawa 2016-10-05 14:29:20 PDT
Created attachment 290749 [details]
Patch for EWS. Includes the patch of wk162478
Comment 32 Said Abou-Hallawa 2016-10-05 14:32:05 PDT
Created attachment 290750 [details]
Patch for review
Comment 33 Said Abou-Hallawa 2016-10-06 17:36:42 PDT
(In reply to comment #27)
> Comment on attachment 287822 [details]
> Patch for review
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=287822&action=review
> 
> I think the low-level details deserve another go-round here.
> 
> Two other questions I asked Said:
> 
> (1) Why is this patch a memory use regression on a benchmark that paints
> lots of animated GIFs?
> 

I did some logging in the image decoding code.

1. I created a stress test page which has ten different large animated images; each of them is repeated four times.
2. I logged a few metrics about the decodedFrames, drawnFrames and memoryFrames and the decodedSize.
3. I ran the test page with synchronous and asynchronous mode each for 200 seconds.
4. I watched the WebKit process in the ActivityMonitor on Mac.

And here is what the logging gave me:

       decodedFrames     drawnFrames      memoryFrames       decodedSize.            
Sync     21886            103748             55               36.61 MB
Async    48407            342028             74               54.38 MB

And here is what the ActivityMonitor gave me:

         CPU               Max Memory
Sync     100%               281 MB
Async    220%               323 MB

-- Asynchronous image decoding drew 300% more than what the synchronous image decoding did.
-- Caching the frames are responsible of increasing the memory set by 18MB.
-- The ActivityMonitor shows an increase of 42MB. The remaining 24MB should be an extra overhead in WebKit and CoreGraphics.
-- A jetsam can happen earlier in the case of asynchronous image decoding.

> (2) On that same benchmark, what is the minimum number of concurrent
> animations for which this patch still shows a speedup?
> 

I think this question should be answered after figuring out what we should do for the memory regression. Suppose the test case gives 20 fps with the  synchronous mode and it gives 60 fps with asynchronous image decoding. Having the memory grows differently with the two modes will not give an accurate answer. If we manage to have the same memory level in both modes, then with the a synchronous  mode I can keep adding images till I get the same fps I get with synchronous mode.
Comment 34 Said Abou-Hallawa 2016-10-11 19:49:40 PDT
Created attachment 291329 [details]
Patch
Comment 35 Said Abou-Hallawa 2016-10-15 21:15:38 PDT
Created attachment 291734 [details]
Patch
Comment 36 Said Abou-Hallawa 2016-10-15 22:04:44 PDT
Created attachment 291735 [details]
Patch
Comment 37 Said Abou-Hallawa 2016-10-15 22:33:42 PDT
Created attachment 291736 [details]
Patch
Comment 38 Said Abou-Hallawa 2016-10-16 00:42:32 PDT
Created attachment 291737 [details]
Patch
Comment 39 Said Abou-Hallawa 2016-10-16 01:12:51 PDT
Created attachment 291738 [details]
Patch
Comment 40 Geoffrey Garen 2016-10-17 11:12:13 PDT
> > (2) On that same benchmark, what is the minimum number of concurrent
> > animations for which this patch still shows a speedup?
> > 
> 
> I think this question should be answered after figuring out what we should
> do for the memory regression. Suppose the test case gives 20 fps with the 
> synchronous mode and it gives 60 fps with asynchronous image decoding.
> Having the memory grows differently with the two modes will not give an
> accurate answer. If we manage to have the same memory level in both modes,
> then with the a synchronous  mode I can keep adding images till I get the
> same fps I get with synchronous mode.

Can we answer this question now?
Comment 41 Said Abou-Hallawa 2016-10-18 10:49:57 PDT
(In reply to comment #40)
> > > (2) On that same benchmark, what is the minimum number of concurrent
> > > animations for which this patch still shows a speedup?
> > > 
> > 
> > I think this question should be answered after figuring out what we should
> > do for the memory regression. Suppose the test case gives 20 fps with the 
> > synchronous mode and it gives 60 fps with asynchronous image decoding.
> > Having the memory grows differently with the two modes will not give an
> > accurate answer. If we manage to have the same memory level in both modes,
> > then with the a synchronous  mode I can keep adding images till I get the
> > same fps I get with synchronous mode.
> 
> Can we answer this question now?

Are we okay with the memory regression which is could be encountered when asynchrnoulsy decoding an animated image frames? As I mentioned above both CG and WebKit are responsible of this memory regression.
Comment 42 Simon Fraser (smfr) 2016-10-18 12:31:56 PDT
Comment on attachment 291738 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=291738&action=review

> Source/WebCore/ChangeLog:17
> +        The goal is have the decoded image ready before it has to be drawn. So 
> +        the image does not have to wait the current frame to be decoded. The

This should be one sentence.

> Source/WebCore/ChangeLog:21
> +        trickiest part is when should decoding the frame start? If it is too early
> +        the decoded frames might be destroyed even before they are used by the
> +        MemoryCache which means decoding them was a waste of resources. And if it
> +        is too late, the image decoding will end up being synchronous.

This describes the question, but not how you solved it.

> Source/WebCore/platform/graphics/ImageFrame.h:85
> +    enum class Decoding { Empty, BeingDecoded, Partial, Complete };

Can a frame be Partial and also BeingDecoded? I assume Partial means that we haven't received all the data from the network yet.

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:221
> +    if (!m_decodingQueue)
> +        m_decodingQueue = WorkQueue::create("org.webkit.ImageDecoder", WorkQueue::Type::Serial, WorkQueue::QOS::UserInteractive);

I think one work queue per image might be too heavyweight, but you need the behavior of a serial queue so I'm not sure how to get that with a shared queue.

Does creating this queue show up on profiles?

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:223
> +    ASSERT(m_decodingQueue);

Why?

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:229
> +    if (isAsyncDecodingStarted())

Maybe hasAsyncDecodingStarted(). However, this naming is ambiguous since it suggests it could go back to false. I'd just say hasDecodingQueue()

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:240
> +    decodingQueue()->dispatch([this] {
> +        ImageFrameRequest frameRequest;
> +        
> +        while (m_frameRequestQueue.dequeue(frameRequest))
> +            cacheFrameAtIndex(frameRequest.index, frameRequest.subsamplingLevel);
> +        
> +        // Signal the main thread that the decoding loop has been terminated.
> +        m_closeSemaphore.signal();
> +    });

What keeps |this| alive while this queue is running?

I'm not clear on whether, for a single ImageFrameCache, this loop is ever started more than once. Can the queue go empty, and then can someone call requestFrameAsyncDecodingAtIndex() and put more things in the queue?

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:254
> +    if (frame.hasNativeImage() && !frame.hasInvalidNativeImage(subsamplingLevel))

Confusing logic. Can frame have hasValidNativeImage(subsamplingLevel) ?

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:269
> +    // Block execution till m_closeSemaphore is signaled by the decoding WorkQueue
> +    m_closeSemaphore.wait(std::numeric_limits<double>::max());

This seems bad. I don't think we ever want to block the main thread when tearing down images; we should just allow the dispatch to get orphaned.

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:273
> +void ImageFrameCache::cacheFrameAtIndex(size_t index, SubsamplingLevel subsamplingLevel)

This needs a comment saying that it can be called on a non-main thread.

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:279
> +        if (!isAsyncDecodingStarted())
> +            return;

How is that possible?

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:282
> +        ImageFrame& frame = m_frames[index];

Is there any chance that the main thread has changed m_frames.size() behind our back?

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:293
> +        // Clear the current image frame and update the observer with this clearance.
> +        unsigned decodedSize = frame.clear();
> +        decodedSizeDecreased(decodedSize);
> +        
> +        // Copy the new image to the cache.
> +        setFrameNativeImageAtIndex(nativeImage, index, subsamplingLevel);
> +        
> +        // Update the observer with the new image frame
> +        decodedSize = frame.frameBytes();
> +        decodedSizeIncreased(decodedSize);

I think this could live in a new function called replaceFrameAtIndex() or something.

> Source/WebCore/platform/graphics/ImageFrameCache.h:142
> +    BinarySemaphore m_closeSemaphore;

What does "close" mean in this context? Isn't it more like m_decodingCompleteSemaphore?

> Source/WebCore/platform/graphics/ImageSource.cpp:184
> +    if (!m_frameCache.isAsyncDecodingStarted())
> +        m_frameCache.startAsyncDecodingLoop();

startAsyncDecodingLoop() checks isAsyncDecodingStarted() so don't do it here. You could rename startAsyncDecodingLoop() to prepareForAsyncDecoding(), but why not just call that from requestFrameAsyncDecodingAtIndex()?
Comment 43 Geoffrey Garen 2016-10-18 13:49:37 PDT
> Are we okay with the memory regression which is could be encountered when
> asynchrnoulsy decoding an animated image frames? As I mentioned above both
> CG and WebKit are responsible of this memory regression.

I don't know.

Some information that can help us decide if we're OK with the increase in memory use:

(1) What is the minimum number of concurrent animations for which this patch still shows a speedup?

(2) In cases where this patch does not show a speedup, does it still use more memory? If so, how much more?

If this patch is a big speedup in a lot of animated cases, and the only reason for increased memory use is that we do more decoding, which the slow case artificially skips by virtue of skipping frames, then we're probably OK with the increase in memory use, or able to work around it.

On the other hand, if this patch is only a speedup on a contrived test case, or it increases memory use in all cases, even when it's not a speedup, then we probably need to make a change.
Comment 44 Said Abou-Hallawa 2016-10-18 16:10:57 PDT
Comment on attachment 291738 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=291738&action=review

>> Source/WebCore/ChangeLog:17
>> +        the image does not have to wait the current frame to be decoded. The
> 
> This should be one sentence.

Fixed.

>> Source/WebCore/ChangeLog:21
>> +        is too late, the image decoding will end up being synchronous.
> 
> This describes the question, but not how you solved it.

I did not solve it in this patch. This patch does not change the behavior in any way. All it does is just adding a new mode for painting an image. This question should be answered in the following bugs:

https://bugs.webkit.org/show_bug.cgi?id=161566
https://bugs.webkit.org/show_bug.cgi?id=161705

>> Source/WebCore/platform/graphics/ImageFrame.h:85
>> +    enum class Decoding { Empty, BeingDecoded, Partial, Complete };
> 
> Can a frame be Partial and also BeingDecoded? I assume Partial means that we haven't received all the data from the network yet.

BeingDecoded means we request this frame to be decoded. After finishing decoding the frame, its decoding status might be Partial or Complete. The meaning of Partial is as you described: we haven't received all the data from the network yet.

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:221
>> +        m_decodingQueue = WorkQueue::create("org.webkit.ImageDecoder", WorkQueue::Type::Serial, WorkQueue::QOS::UserInteractive);
> 
> I think one work queue per image might be too heavyweight, but you need the behavior of a serial queue so I'm not sure how to get that with a shared queue.
> 
> Does creating this queue show up on profiles?

Creating a WorkQueue is very cheap and it happens only once per image. Although the queue itself is of type serial, having one queue per image allows parallelism. We have four possible combinations here:

1. One shared queue with Type::Serial: This will not allow parallelism and it will not scale well with many animated images in the page.
2. One shared queue with Type::Concurrent: This will allow parallelism but sharing the queue adds unwanted code complexity and the benefit is minimal.
3. Multiple queues each with Type::Serial: This is what is implemented here. It provides simpler interface and allows parallelism.
4. Multiple queues each with Type:: Concurrent: This should not be used for image decoding since decoding the image frames should happen in order.

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:223
>> +    ASSERT(m_decodingQueue);
> 
> Why?

I wanted to make sure m_decodingQueue returns a valid pointer.

Deleted.

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:229
>> +    if (isAsyncDecodingStarted())
> 
> Maybe hasAsyncDecodingStarted(). However, this naming is ambiguous since it suggests it could go back to false. I'd just say hasDecodingQueue()

Done.

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:240
>> +    });
> 
> What keeps |this| alive while this queue is running?
> 
> I'm not clear on whether, for a single ImageFrameCache, this loop is ever started more than once. Can the queue go empty, and then can someone call requestFrameAsyncDecodingAtIndex() and put more things in the queue?

Once we create the queue and start the decoding loop, the decoding loop will stay alive till stopAsyncDecodingLoop() is called. stopAsyncDecodingLoop() itself will block execution till the decoding loop is terminated and then it deletes m_decodingQueue. This workflow has to be guaranteed by the caller.

I added a destructor and in it I added an assertion that the decoding loop is not active.

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:254
>> +    if (frame.hasNativeImage() && !frame.hasInvalidNativeImage(subsamplingLevel))
> 
> Confusing logic. Can frame have hasValidNativeImage(subsamplingLevel) ?

hasInvalidNativeImage() returns true if the frame as image but this image was sampled with a different subsamplingLevel. In this case we need to redecode the frame but with the new subsamplingLevel. The condition is ugly because hasInvalidNativeImage() returns false if the frame does not have an image,

I changed ImageFrame::hasInvalidNativeImage() to be ImageFrame::hasValidNativeImage() with reversing the logic. I also replaced all hasInvalidNativeImage()  to be !hasValidNativeImage(). The above statement became 

    if (frame.hasValidNativeImage(subsamplingLevel))
        return;

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:269
>> +    m_closeSemaphore.wait(std::numeric_limits<double>::max());
> 
> This seems bad. I don't think we ever want to block the main thread when tearing down images; we should just allow the dispatch to get orphaned.

I do not think it is bad. This is the only way to prevent the image decoding loop from accessing a deleted pointer. Once cacheFrameAtIndex() returns, this semaphore will be signaled.

Decoding an image should not be a lengthy operation. It should happens in few milli-seconds. Calling stopAsyncDecodingLoop() should only happen when the image renderers are deleted or the animation is finished (number of repetition is reached).

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:273
>> +void ImageFrameCache::cacheFrameAtIndex(size_t index, SubsamplingLevel subsamplingLevel)
> 
> This needs a comment saying that it can be called on a non-main thread.

I added ASSERT(!isMainThread()) at the top of this function to emphasize that this function should not be called on the main thread.

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:279
>> +            return;
> 
> How is that possible?

Main thread: stopAsyncDecodingLoop() is called.
Main thread: stopAsyncDecodingLoop() closes m_frameRequestQueue.
Main thread: stopAsyncDecodingLoop() is blocked because of m_closeSemaphore. 

Decoding thread: cacheFrameAtIndex() is running until CG finishes decoding the frame. It returns to the decoding loop.
Decoding thread: the decoding loop termites and signals m_closeSemaphore.

Main thread: stopAsyncDecodingLoop() deletes the queue and returns.
Main thread: The code below gets invoked to insert the frame in the cache.

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:282
>> +        ImageFrame& frame = m_frames[index];
> 
> Is there any chance that the main thread has changed m_frames.size() behind our back?

Changing the frames.size() can happen because the decoder changes the frameCount when it receives more data. But we always increase the size. See ImageFrameCache::growFrames().

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:293
>> +        decodedSizeIncreased(decodedSize);
> 
> I think this could live in a new function called replaceFrameAtIndex() or something.

Done.

>> Source/WebCore/platform/graphics/ImageFrameCache.h:142
>> +    BinarySemaphore m_closeSemaphore;
> 
> What does "close" mean in this context? Isn't it more like m_decodingCompleteSemaphore?

Done.

>> Source/WebCore/platform/graphics/ImageSource.cpp:184
>> +        m_frameCache.startAsyncDecodingLoop();
> 
> startAsyncDecodingLoop() checks isAsyncDecodingStarted() so don't do it here. You could rename startAsyncDecodingLoop() to prepareForAsyncDecoding(), but why not just call that from requestFrameAsyncDecodingAtIndex()?

Done. This function is moved to the header file and it just calls m_frameCache.requestFrameAsyncDecodingAtIndex(index, subsamplingLevel).
Comment 45 Said Abou-Hallawa 2016-10-18 16:16:07 PDT
Created attachment 291997 [details]
Patch
Comment 46 Simon Fraser (smfr) 2016-10-18 16:25:31 PDT
> >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:269
> >> +    m_closeSemaphore.wait(std::numeric_limits<double>::max());
> > 
> > This seems bad. I don't think we ever want to block the main thread when tearing down images; we should just allow the dispatch to get orphaned.
> 
> I do not think it is bad. This is the only way to prevent the image decoding
> loop from accessing a deleted pointer. Once cacheFrameAtIndex() returns,
> this semaphore will be signaled.
> 
> Decoding an image should not be a lengthy operation. It should happens in
> few milli-seconds. Calling stopAsyncDecodingLoop() should only happen when
> the image renderers are deleted or the animation is finished (number of
> repetition is reached).

But the whole point of this patch is to prevent the UI thread from being blocked on image decoding. We will have failed if still have a code path that can potentially block on image decoding. We've just move the problem around. For example, we may now hit this wait() under GC code for an <img> that is being collected.
Comment 47 Said Abou-Hallawa 2016-10-19 17:11:35 PDT
Created attachment 292128 [details]
Patch
Comment 48 Said Abou-Hallawa 2016-10-19 17:25:20 PDT
(In reply to comment #46)
> > >> Source/WebCore/platform/graphics/ImageFrameCache.cpp:269
> > >> +    m_closeSemaphore.wait(std::numeric_limits<double>::max());
> > > 
> > > This seems bad. I don't think we ever want to block the main thread when tearing down images; we should just allow the dispatch to get orphaned.
> > 
> > I do not think it is bad. This is the only way to prevent the image decoding
> > loop from accessing a deleted pointer. Once cacheFrameAtIndex() returns,
> > this semaphore will be signaled.
> > 
> > Decoding an image should not be a lengthy operation. It should happens in
> > few milli-seconds. Calling stopAsyncDecodingLoop() should only happen when
> > the image renderers are deleted or the animation is finished (number of
> > repetition is reached).
> 
> But the whole point of this patch is to prevent the UI thread from being
> blocked on image decoding. We will have failed if still have a code path
> that can potentially block on image decoding. We've just move the problem
> around. For example, we may now hit this wait() under GC code for an <img>
> that is being collected.


Okay this is what I did in the last patch:

-- ImageFrameCache is now a RefCounted class.
-- ImageSource has a member of type Ref<ImageFrameCache> to allow ref-counting it
-- The decoding thread protects the ImageFrameCache and WorkQueue by add-ref 'this' and decodeQueue(). They will be deref-ed when exiting the decoding thread.
-- Because the ImageSource::m_decoder is shared between ImageSource and ImageFrameCache, it is now protected from being set to nullptr if we started a decoding thread. This is because when doing asynchronous image decoding we own all the frames memory and no need to delete the ImageDecoder to get rid of all the frames.
Comment 49 Said Abou-Hallawa 2016-10-24 13:46:25 PDT
Regarding whether we should use a single concurrent queue or use multiple serial queues, I did the following experiment:

I created a page with 11 different animated images with different sizes and different number of frames. I repeated each image 4 times. I measured the missing frames in both cases. By the missing frame I mean the following: the frame that is not ready for painting when its time fires. In other words I measured how many times, the decoding time was more than the frame duration time. I ran the experiment for each case three times; each time for five minutes. I got the total number of the missing frames for all the images.


				Run 1	Run 2	Run 3	Average						Average 
							(frames/11 images/4 repetition/5 minutes) 	frames/image/second

Single concurrent queue:	13062	13957	13308	13442						1.02
Multiple serial queues:		15150	14457	15076	14894						1.13

I also traced the WebKit in the debugger and I found there are 11 threads created by the system in both cases.

Since the missing frames are less in the case of the multiple serial queues than in the case of the single concurrent queue and since the threading overhead is the same in both cases, I do not think there is a need to use the single concurrent queue. Also the code is cleaner in the case of multiple serial queues.
Comment 50 Said Abou-Hallawa 2016-10-25 11:44:12 PDT
Regarding the memory regression of the asynchronous image decoding, I measured the peak memory allocated by WebKit in a period of time. The result was, on average, the peak memory of the asynchronous decoding is 3% more the peak of the synchronous decoding. 

Here are the details:

1. 11 animated images each repeated 4 times. The size of the images ranges from 2MB to 75MB.

2. I did five runs, each run lasts 5 minutes. During these 5 minutes I measured the average FPS and the maximum peak memory.

	        Average		Run 1	Run 2	Run 3	Run 4	Run 5
Synchronous							
Average FPS	30.7		31.1	30.77	30.66	30.57	30.40
Memory peak	508.09		513.13	512.64	506.26	496.56	511.86
							
Asynchronous							
Average FPS	53.288		54.45	52.80	52.47	53.07	53.65
Memory peak	523.444		545.25	519.27	507.21	515.27	530.22

The memory regression = (523.444 - 508.09) / 508.09 * 100 = 3.02%
Comment 51 Said Abou-Hallawa 2016-11-03 11:39:12 PDT
This patch is ready for review.

-- For the memory regression in CG, the issue is in a system component that is being investigated separately. But we might have a simple workaround in Webkit to overcome this regression.

-- For the single concurrent work queue versus the multiple serial queues, I think the cost, the overhead and the benefit in both cases are almost the same. I can to change the patch to use a single concurrent work queue if this will look better.
Comment 52 Geoffrey Garen 2016-11-03 11:56:44 PDT
I would still like an answer to this question:

(1) What is the minimum number of concurrent animations for which this patch still shows a speedup?
Comment 53 Said Abou-Hallawa 2016-11-03 13:35:23 PDT
Here is the WK2 comparison between the frame rate of animated image using async versus sync image decoding with different number of images:

# images	Async		Sync_no_catchup*	improvement%	Sync_catchup**	improvement%
1		58.84		58.44			0.68		58.55		0.50
2		58.80		56.02			4.96		57.30		2.62
3		58.67		56.14			4.51		57		2.93
4		58.19		54.50			6.77		54.96		5.88
5		55.80		52.03			7.25		51.45		8.45
6		53.70		51.75			3.77		52.17		2.93
7		53.35		43.26			23.32		25.35		110.45
8		54.19		34.10			58.91		20.72		161.53
9		53.21		34.91			52.42		19.14		178.00
10		51.97		26.75			94.28		14.62		255.47
11		51.74		25.11			106.05		13.66		278.77

[*] A webkit build after removing the animation catchup code: <http://trac.webkit.org/changeset/207386>.
[**] A webkit build before removing the animation catchup code: <http://trac.webkit.org/changeset/207386>.

So it looks like with number of images greater than 6, the frame rate drops significantly. For WK1, the frame rate is not the only improvement but the responsiveness as well like editing and scrolling.
Comment 54 Simon Fraser (smfr) 2016-11-03 16:57:48 PDT
Comment on attachment 292128 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=292128&action=review

> Source/WebCore/ChangeLog:18
> +        Because we have to keep the allocated memory for an image under control,
> +        the BitmapImage may delete all or some of its decoded frames. MemoryCache
> +        may request it to do so. Or it will delete its frames if the total size
> +        exceeds some threshold (2MB for iOS, 5MB for other ports). This means large
> +        images or animated images with many frames may need to decode its frames
> +        all the times it's drawn.

This is describing current behavior, so I don't think it needs to go into the changelog.

> Source/WebCore/platform/graphics/ImageFrameCache.cpp:267
> +    // We need to protect this and m_decodingQueue from being deleted while we are in the decoding loop.
> +    decodingQueue()->dispatch([this, protectedThis = Ref<ImageFrameCache>(*this), protectedQueue = decodingQueue()] {

Normally we would make Ref<ImageFrameCache>(*this) a local variable, named "protectedThis". "this" also owns the queue, so I don't think you need to protect the queue too.

> Source/WebCore/platform/graphics/ImageSource.h:68
> +    bool isAsyncDecodingRequired() { return size().area() * sizeof(RGBA32) >= 100 * KB; }

It's pretty subtle that you're hiding this important policy decision in an inline function. Make it not-inline. Does this enable async decoding right now? I don't see a setting being consulted.
Comment 55 Said Abou-Hallawa 2016-11-03 18:08:34 PDT
Created attachment 293839 [details]
Patch
Comment 56 Said Abou-Hallawa 2016-11-03 18:12:51 PDT
Comment on attachment 292128 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=292128&action=review

>> Source/WebCore/ChangeLog:18
>> +        all the times it's drawn.
> 
> This is describing current behavior, so I don't think it needs to go into the changelog.

Deleted.

>> Source/WebCore/platform/graphics/ImageFrameCache.cpp:267
>> +    decodingQueue()->dispatch([this, protectedThis = Ref<ImageFrameCache>(*this), protectedQueue = decodingQueue()] {
> 
> Normally we would make Ref<ImageFrameCache>(*this) a local variable, named "protectedThis". "this" also owns the queue, so I don't think you need to protect the queue too.

protectedThis is defined as a local variable. But I had to keep protectedQueue also because it can be deleted by stopAsyncDecodingQueue() which sets m_decodingQueue to nullptr.

>> Source/WebCore/platform/graphics/ImageSource.h:68
>> +    bool isAsyncDecodingRequired() { return size().area() * sizeof(RGBA32) >= 100 * KB; }
> 
> It's pretty subtle that you're hiding this important policy decision in an inline function. Make it not-inline. Does this enable async decoding right now? I don't see a setting being consulted.

This function is moved to the source file. No async decoding is enabled by this patch. The next patch https://bugs.webkit.org/show_bug.cgi?id=161566 will enable it for animated images.
Comment 57 WebKit Commit Bot 2016-11-03 20:00:48 PDT
Comment on attachment 293839 [details]
Patch

Clearing flags on attachment: 293839

Committed r208365: <http://trac.webkit.org/changeset/208365>
Comment 58 WebKit Commit Bot 2016-11-03 20:00:56 PDT
All reviewed patches have been landed.  Closing bug.