Bug 175890

Summary: Image data should be coalesced if it comes in small chunks before updating the ImageSource
Product: WebKit Reporter: Said Abou-Hallawa <sabouhallawa>
Component: ImagesAssignee: Said Abou-Hallawa <sabouhallawa>
Status: RESOLVED FIXED    
Severity: Normal CC: buildbot, cdumez, commit-queue, darin, dbates, dino, esprehn+autocc, gyuyoung.kim, japhet, simon.fraser, thorton, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=178020
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Said Abou-Hallawa 2017-08-23 10:40:23 PDT
1. Open https://www.nasa.gov/sites/default/files/thumbnails/image/pia21778.png

Results: On fast machine and fast network, the page finishes loading after seven seconds. On a slow network, the page can finish loading after five minutes.

NOTE: The size of this image is 45MB. When loading it, the server sends it in a small chunks, each chunk is less than 1MB. The networking process receives all the data within normal duration. For example on a fast network, the whole image data is received in less than a second. The problem is having to repeat the following steps with each chunk of data:

-- The new data chunk is added to a SharedBuffer.
-- CachedImage sets its BitmapImage with the new data which asks the SharedBuffer to create a CFData. Creating a CFData requires all the segments of the SharedBuffer to be combined in one single buffer.
-- BitmapImage sets the new data to its ImageSource which passes it to its ImageDecoder.
-- CachedImage notifies its clients that new data chunk has arrived. The image renderer, which is one of the CachedImage's clients, will repaint itself. This will cause the image to be drawn which will request the ImageDecoder to decode the image frame.

It turns out running theses steps is extremely expensive if they have to be repeated many times because a large image receives its data in small chunks. The solution can be coalescing the number of updates based on time, i.e. updating the ImageSource only after certain period of time. This period of time increases as long as the images has not finished loading.

This strategy has been working for iOS; see ImageSource::dataChanged(). We need to enable this strategy for all platforms. In the current iOS solution, ImageSource coalesces how many times it updates the ImageDecoder with the new data. But CachedImage still causes the image renderer to repaint itself even though the Image itself does not change. We need to fix this for all platforms including iOS.
Comment 1 Said Abou-Hallawa 2017-08-23 10:52:14 PDT
Created attachment 318886 [details]
Patch
Comment 2 Said Abou-Hallawa 2017-08-23 10:53:21 PDT
<rdar://problem/33963003>
Comment 3 Tim Horton 2017-08-23 16:32:01 PDT
Comment on attachment 318886 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=318886&action=review

> Source/WebCore/loader/cache/CachedImage.cpp:450
> +    ASSERT(m_coalesceDataCount < coalesceDataSeconds.get().size());

This assert seems super unnecessary especially now that it's a Vector instead of a C array.

> Source/WebCore/loader/cache/CachedImage.cpp:455
> +    m_coalesceDataCount = std::min<unsigned>(m_coalesceDataCount + 1, coalesceDataSeconds.get().size() - 1);

Isn't this just a silly way of writing:

if (m_coalesceDataCount < coalesceDataSeconds.get().size() - 1)
    m_coalesceDataCount++;

Also, is that the best name? Since we cap it based on the number of items in the durations array, it's more of an index into that array than a count of times we coalesced.

> Source/WebCore/loader/cache/CachedImage.cpp:468
> +    if (shouldCoalesceDataBuffer())
> +        return;

I'm confused. How does this not result in the data getting dropped on the floor?
Comment 4 Said Abou-Hallawa 2017-08-24 12:48:23 PDT
Created attachment 319013 [details]
Patch
Comment 5 Said Abou-Hallawa 2017-08-24 14:24:53 PDT
Comment on attachment 318886 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=318886&action=review

>> Source/WebCore/loader/cache/CachedImage.cpp:450
>> +    ASSERT(m_coalesceDataCount < coalesceDataSeconds.get().size());
> 
> This assert seems super unnecessary especially now that it's a Vector instead of a C array.

I am not sure if I agree with that. What is the difference between accessing a Vector and a C array with an out-of-bound index? I wanted to ensure that the index does not change outside the scope of this function such that it is larger than the last index in the Vector. But anyway I removed it.

>> Source/WebCore/loader/cache/CachedImage.cpp:455
>> +    m_coalesceDataCount = std::min<unsigned>(m_coalesceDataCount + 1, coalesceDataSeconds.get().size() - 1);
> 
> Isn't this just a silly way of writing:
> 
> if (m_coalesceDataCount < coalesceDataSeconds.get().size() - 1)
>     m_coalesceDataCount++;
> 
> Also, is that the best name? Since we cap it based on the number of items in the durations array, it's more of an index into that array than a count of times we coalesced.

The std::min() is replaced by the if-statemnt as suggested. The Vector's name is changed to coalesceDataDurations and the index's name is changed to m_currentCoalesceDataDuration.

>> Source/WebCore/loader/cache/CachedImage.cpp:468
>> +        return;
> 
> I'm confused. How does this not result in the data getting dropped on the floor?

It does not because the CachedRawResource passes its SharedBuffer to the CachedImage which just makes makeRef() of it. The CachedRawResource keeps accumulating all the incoming data in the same SharedBuffer. It is guaranteed that the SharedBuffer will be set in the ImageSource at least once in CachedImage::finishLoading().

My purpose of this change was to ensure the data is consistent between the CachedImage and the ImageSource. I thought as long as the ImageSource is not updated with the new data, it is better not to update the CachedImage with the new data. But after realizing that the SharedBuffer of the CachedRawResource is already referenced by the CachedImage, I think it is better to keep the old behavior which can be done by moving the check  if (shouldCoalesceDataBuffer()) to addIncrementalDataBuffer().
Comment 6 Simon Fraser (smfr) 2017-09-01 12:38:26 PDT
Comment on attachment 319013 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=319013&action=review

> Source/WebCore/loader/cache/CachedImage.cpp:449
> +    static const auto coalesceDataDurations = makeNeverDestroyed(Vector<double> { 0, 1, 3, 6, 15 });

Vector<Seconds>.

makeNeverDestroyed is overkill, but I guess you might need it if static Seconds[] doesn't work.

> Source/WebCore/loader/cache/CachedImage.cpp:456
> +    m_coalesceDataTime = time;

A function called shouldCoalesceDataBuffer() should not change state, I think. m_coalesceDataTime should be changed when you actually do the coalescing.

> Source/WebCore/loader/cache/CachedImage.h:163
> +    size_t m_currentCoalesceDataDuration { 0 };

This isn't a duration, right? So the name is misleading. It's an index into some array.
Comment 7 Darin Adler 2017-09-03 17:09:09 PDT
Comment on attachment 319013 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=319013&action=review

>> Source/WebCore/loader/cache/CachedImage.cpp:449
>> +    static const auto coalesceDataDurations = makeNeverDestroyed(Vector<double> { 0, 1, 3, 6, 15 });
> 
> Vector<Seconds>.
> 
> makeNeverDestroyed is overkill, but I guess you might need it if static Seconds[] doesn't work.

It’s not overkill because this is a Vector and it won’t compile otherwise. If this was an array, and it should be an array instead of a vector, then we would not need makeNeverDestroyed.

    static const double coalesceDataDurations[] = { 0, 1, 3, 6, 15 };
Comment 8 Said Abou-Hallawa 2017-09-05 15:24:34 PDT
Created attachment 319942 [details]
Patch
Comment 9 Tim Horton 2017-09-29 12:40:36 PDT
Comment on attachment 319942 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=319942&action=review

> Source/WebCore/loader/cache/CachedImage.cpp:450
> +    static const double dataBufferAddingIntervals[] = { 0, 1, 3, 6, 15 };

I think this one should be more clear that it's about backoff. Maybe incrementalLoadingBackoffIntervals?

Maybe propagate that naming through all of the function/member/variable names? Right now it's kind of mysterious.

e.g.


bool CachedImage::shouldDeferIncrementalLoad() const
{
    static const double incrementalLoadingBackoffIntervals[] = { 0, 1, 3, 6, 15 };
    unsigned interval = std::min<unsigned>(m_incrementalLoadCount, 4);

    // The first time through, the chunk time will be 0 and the image will get an update.
    return (MonotonicTime::now() - m_lastIncrementalLoadingUpdateTime).seconds() < incrementalLoadingBackoffIntervals[interval];
}

void CachedImage::didIncrementalLoad()
{
    m_lastIncrementalLoadingUpdateTime = MonotonicTime::now();
    ASSERT(m_incrementalLoadCount < std::numeric_limits<unsigned>::max());
    ++m_incrementalLoadCount;
}


I'm not totally sold on the names tho. Maybe someone else has an idea.
Comment 10 Said Abou-Hallawa 2017-09-29 19:06:08 PDT
Created attachment 322264 [details]
Patch
Comment 11 youenn fablet 2017-10-02 14:21:45 PDT
Would it be useful to have that kind of behavior for other types of resources? Or other related behavior like processing only when receiving the last byte?
Instead of doing it in the WebProcess, could NetworkProcess be told to adapt its pace of sending data chunks on a per load basis?
Comment 12 Said Abou-Hallawa 2017-10-02 15:26:00 PDT
(In reply to youenn fablet from comment #11)
> Would it be useful to have that kind of behavior for other types of
> resources?

There is no problem in receiving the resource's bytes in small chunks. The problem with the image is updating the decoder with small chunks of data. This is costly because the ImageDecoder has to re-decode the new data. Also repainting and drawing the RenderImage every time an image data is received is expensive. 

Or other related behavior like processing only when receiving the
> last byte?

No, we can't do that. We would like to show partial images instead of showing nothing till the whole image data is received.

> Instead of doing it in the WebProcess, could NetworkProcess be told to adapt
> its pace of sending data chunks on a per load basis?

I don't know if this solution is feasible or not. But I think dealing with this issue has to be done based on the resource types. Each resource can decide whether it needs to process any incoming data immediately or it can coalesce the number of processing the incoming data. So this decision has to be in WebProcess.
Comment 13 youenn fablet 2017-10-03 01:23:13 PDT
(In reply to Said Abou-Hallawa from comment #12)
> (In reply to youenn fablet from comment #11)
> > Would it be useful to have that kind of behavior for other types of
> > resources?
> 
> There is no problem in receiving the resource's bytes in small chunks. The
> problem with the image is updating the decoder with small chunks of data.
> This is costly because the ImageDecoder has to re-decode the new data. Also
> repainting and drawing the RenderImage every time an image data is received
> is expensive. 

The current patch seems to fix the issue for CachedImage.
If a similar optimization can be envisioned for other types of resources, it may be better to do it at a lower layer, the CachedResource layer being responsible to decide which sending they expect.

> 
> Or other related behavior like processing only when receiving the
> > last byte?
> 
> No, we can't do that. We would like to show partial images instead of
> showing nothing till the whole image data is received.

Sure for images, my question is more for other types of resources.

> > Instead of doing it in the WebProcess, could NetworkProcess be told to adapt
> > its pace of sending data chunks on a per load basis?
> 
> I don't know if this solution is feasible or not. But I think dealing with
> this issue has to be done based on the resource types. Each resource can
> decide whether it needs to process any incoming data immediately or it can
> coalesce the number of processing the incoming data. So this decision has to
> be in WebProcess.

If this issue is spanning several types of resources, it might be simpler to have CachedResource pass a parameter expressing what it is expecting.

For instance, ResourceLoader already does buffering according its dataBufferingPolicy option. Maybe extending this option would be a good way to see whether that approach would work for your use case.

If so, I would then suggest to move the buffering to NetworkProcess by passing an additional parameter to NetworkConnectionToWebProcess::ScheduleResourceLoad.
That would reduce the number of IPC calls.
Comment 14 Said Abou-Hallawa 2017-10-03 17:44:02 PDT
After looking at the code of NetworkConnectionToWebProcess, I realized that the network process implements a linear back-off buffering for the image resource. WebLoaderStrategy::scheduleLoad() sets the member maximumBufferingTime of the NetworkResourceLoadParameters to 500_ms if the resource type is CachedResource::ImageResource. NetworkConnectionToWebProcess creates a timer with 500_ms and when it fires it sends the received data to the WebProcess.

This patch implements an exponential back-off buffering. The timer interval keeps increasing till it reaches the last interval and then it uses this interval till loading the image finishes.

I agree having two buffering mechanisms the image resource is bad. I could change NetworkResourceLoadParameters::maximumBufferingTime to be a Vector<Seconds>. And I could change the NetworkResourceLoader to create its timers with the intervals from the Vector<Seconds> one after the other till the last one.

But we will have two problems:

1. This solution does not work for WK1. A similar buffering mechanism is needed for WK1 at least for iOS since the exponential buffering was implemented for iOS. We may have two solutions for this:
	a) WebCoreResourceHandleAsDelegate.didReceiveData() needs to implements the same buffering that NetworkResourceLoader implements.
	b) Keep the same buffering mechanism in ImageSource::dataChanged() but enable it for for WK1. And this is really ugly because we will have the buffering for the image in two different levels for WK1 and WK2.

2. This solution is mainly needed for ImageDocument for WK2. The bug is not noticeable when the image is included in an <img> element. For example open these images in Safari and notice how bad the performance is:

http://www.914world.com/bbs2/uploads_offsite/s6.postimg.org-1623-1505909118.1.gif
https://www.nasa.gov/sites/default/files/thumbnails/image/pia21778.png

Now open the same images as sub-resources, i.e. create HTML files with the following <img> element:

<img src = "http://www.914world.com/bbs2/uploads_offsite/s6.postimg.org-1623-1505909118.1.gif">
and
<img src = "https://www.nasa.gov/sites/default/files/thumbnails/image/pia21778.png">

The reason for the difference is "static Seconds maximumBufferingTimes(CachedResource*)" returns 500_ms for the CachedResource::ImageResource but it returns 0_s for CachedResource::MainResource. When the image is the main resource no buffering is made which is the case of the ImageDocument. But when the image is a sub-resource, updating the web process with the image data happens only every 500_ms. And this buffering is enough at least for non iOS platforms.

I do not think we can know for sure that the main resource is an image before starting loading it and parsing it. In fact, we create the ImageDocument and the ImageDocumentParse after receiving the first chunk of data.

So the buffering is is not implemented for WK1 and ImageDocument. And both are currently not handled by the NetworkProcess. And I think implementing the single image buffering in the network side for Wk1 and ImageDocument is expensive.

Having said that, I think having a single image buffering for all platforms is not in the scope of this bug.
Comment 15 youenn fablet 2017-10-03 17:50:57 PDT
It is late for me so I may have other feedback tomorrow.

Regarding WK1, I agree that the NetworkProcess idea is not fully portable to WK1.
At this point, not optimizing WK1 might be ok if we think this is the right solution for WK2.

I would check how this idea would fly with the various way images can be loaded (main document, sub resources) and whether doing it in NetworkProcess will or not solve potential problems for other resources.
Comment 16 Said Abou-Hallawa 2017-10-03 18:20:11 PDT
(In reply to youenn fablet from comment #15)
> It is late for me so I may have other feedback tomorrow.
> 
> Regarding WK1, I agree that the NetworkProcess idea is not fully portable to
> WK1.
> At this point, not optimizing WK1 might be ok if we think this is the right
> solution for WK2.

I disagree because the image buffering is currently implemented for WK1 on iOS in the function ImageSource::dataChanged() and we should not remove it unless we have a better way of doing it.

> 
> I would check how this idea would fly with the various way images can be
> loaded (main document, sub resources) and whether doing it in NetworkProcess
> will or not solve potential problems for other resources.
Comment 17 Said Abou-Hallawa 2017-10-06 11:51:22 PDT
I filed https://bugs.webkit.org/show_bug.cgi?id=178020 to address youenn's comments in a separate patch.

The attached patch improves the performance on WK1 for all images and on WK2 for the ImageDocument by enabling the existing backing off mechanism on all platforms. And I think we are not making anything worse. So this patch is ready for review.
Comment 18 Simon Fraser (smfr) 2017-10-09 10:04:42 PDT
Comment on attachment 322264 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=322264&action=review

> Source/WebCore/loader/cache/CachedImage.cpp:411
> +void CachedImage::doUpdateBuffer(SharedBuffer& data)

This name is awkward. Maybe updateBufferInternal?

> Source/WebCore/loader/cache/CachedImage.cpp:421
> +    if (shouldDeferUpdateImageData())
> +        return;
> +    
>      // Have the image update its data from its internal buffer.
>      // It will not do anything now, but will delay decoding until

The early return after shouldDeferUpdateImageData() and then the comment which says "It will not do anything now" are confusing. If updateImageData don't do anything, why did we early return?
Comment 19 Said Abou-Hallawa 2017-10-09 14:26:35 PDT
Created attachment 323217 [details]
Patch
Comment 20 WebKit Commit Bot 2017-10-09 17:07:51 PDT
Comment on attachment 323217 [details]
Patch

Clearing flags on attachment: 323217

Committed r223091: <http://trac.webkit.org/changeset/223091>
Comment 21 WebKit Commit Bot 2017-10-09 17:07:53 PDT
All reviewed patches have been landed.  Closing bug.