Asynchronous image decoding can accelerate the image rendering and allow the main thread to process other activities. But if the number of images that are asynchronously decoded increased, such that the number of decoding thread exceeds the number of cores on the system, the overall performance of the system will be affected. Even the rendering in WebKit will be choppy. What we need do is we need to use the system cores instead of leaving them idle. But at the same time, do not be too greedy and don't use all the available cores and don't consume too much power for decoding large or animated images. I think a good strategy for this purpose is to allow asynchronous image decoding to take up to half the available cores.
Created attachment 331446 [details] Patch
<rdar://problem/35017173>
Created attachment 331678 [details] Patch
Comment on attachment 331678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331678&action=review > Source/WebCore/platform/graphics/ImageSource.cpp:317 > + if (!DecodingQueue::canCreateDecodingQueue()) > + return false; This looks racy. Not sure if it matters. But, I think the race is: count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue.
Comment on attachment 331678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331678&action=review >> Source/WebCore/platform/graphics/ImageSource.cpp:317 >> + return false; > > This looks racy. Not sure if it matters. But, I think the race is: > > count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. I think Saam is right. Presumably the fix is to do the check and increment together under a lock?
Comment on attachment 331678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331678&action=review >>> Source/WebCore/platform/graphics/ImageSource.cpp:317 >>> + return false; >> >> This looks racy. Not sure if it matters. But, I think the race is: >> >> count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. > > I think Saam is right. Presumably the fix is to do the check and increment together under a lock? So if we can't make a queue we bail and decode on the main thread? Why not enqueue on some existing queue?
Comment on attachment 331678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331678&action=review >>>> Source/WebCore/platform/graphics/ImageSource.cpp:317 >>>> + return false; >>> >>> This looks racy. Not sure if it matters. But, I think the race is: >>> >>> count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. >> >> I think Saam is right. Presumably the fix is to do the check and increment together under a lock? > > So if we can't make a queue we bail and decode on the main thread? Why not enqueue on some existing queue? Actually, I think the increment and check are both on the main thread? Also, smfr’s point seems sensible.
(In reply to Tim Horton from comment #7) > Comment on attachment 331678 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331678&action=review > > >>>> Source/WebCore/platform/graphics/ImageSource.cpp:317 > >>>> + return false; > >>> > >>> This looks racy. Not sure if it matters. But, I think the race is: > >>> > >>> count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. > >> > >> I think Saam is right. Presumably the fix is to do the check and increment together under a lock? > > > > So if we can't make a queue we bail and decode on the main thread? Why not enqueue on some existing queue? > > Actually, I think the increment and check are both on the main thread? If so, I think we should remove the atomic then. I was just assuming it was off the main thread because of the atomic. > > Also, smfr’s point seems sensible.
Comment on attachment 331678 [details] Patch r- given the feedback.
Created attachment 331765 [details] Patch
Comment on attachment 331678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331678&action=review >>>>>> Source/WebCore/platform/graphics/ImageSource.cpp:317 >>>>>> + return false; >>>>> >>>>> This looks racy. Not sure if it matters. But, I think the race is: >>>>> >>>>> count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. >>>> >>>> I think Saam is right. Presumably the fix is to do the check and increment together under a lock? >>> >>> So if we can't make a queue we bail and decode on the main thread? Why not enqueue on some existing queue? >> >> Actually, I think the increment and check are both on the main thread? >> >> Also, smfr’s point seems sensible. > > If so, I think we should remove the atomic then. I was just assuming it was off the main thread because of the atomic. Re: This looks racy. Not sure if it matters. But, I think the race is: count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. No. This is not racy. The WorkQueues are created only from the main thread. Most of the time they are also destroyed on the main thread. The only case where a WorkQueue is destroyed on the decoding thread is when the ImageSource is deleted while the decoding is still running. In this case the only reference to the WorkQueue is kept by protectedDecodingQueue; see ImageSource::startAsyncDecodingQueue(). The condition above is okay. It is not ultimate but it will always be safe. The number of active WorkQueues can only decreased by the decoding threads. But no two threads will be increasing the number of active WorkQueues at the same time.
Comment on attachment 331678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331678&action=review >>>>>>> Source/WebCore/platform/graphics/ImageSource.cpp:317 >>>>>>> + return false; >>>>>> >>>>>> This looks racy. Not sure if it matters. But, I think the race is: >>>>>> >>>>>> count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. >>>>> >>>>> I think Saam is right. Presumably the fix is to do the check and increment together under a lock? >>>> >>>> So if we can't make a queue we bail and decode on the main thread? Why not enqueue on some existing queue? >>> >>> Actually, I think the increment and check are both on the main thread? >>> >>> Also, smfr’s point seems sensible. >> >> If so, I think we should remove the atomic then. I was just assuming it was off the main thread because of the atomic. > > Re: This looks racy. Not sure if it matters. But, I think the race is: > count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. > > No. This is not racy. The WorkQueues are created only from the main thread. Most of the time they are also destroyed on the main thread. The only case where a WorkQueue is destroyed on the decoding thread is when the ImageSource is deleted while the decoding is still running. In this case the only reference to the WorkQueue is kept by protectedDecodingQueue; see ImageSource::startAsyncDecodingQueue(). > > The condition above is okay. It is not ultimate but it will always be safe. The number of active WorkQueues can only decreased by the decoding threads. But no two threads will be increasing the number of active WorkQueues at the same time. Re: So if we can't make a queue we bail and decode on the main thread? Actually the image frame will not be decoded on the main thread. It will be decoded on a separate thread by the system when the drawing is initiated. Re: Why not enqueue on some existing queue? We mix sync and async image in the same page: small images and large images, non first tile paint and first tile paint. But we never tried before to block decoding/displaying an image because another one is being decoded. I think the goal of the async image decoding is to utilize the system resources wisely. It should not stress the system such that it affects the overall system performance. If the system resources are being affected by the async image decoding, we should fallback to the sync path. And I think this patch its trying to fix that. I think also delaying drawing too many expensive items should be in a higher level than the image decoding.
Comment on attachment 331678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331678&action=review >>>>>>>> Source/WebCore/platform/graphics/ImageSource.cpp:317 >>>>>>>> + return false; >>>>>>> >>>>>>> This looks racy. Not sure if it matters. But, I think the race is: >>>>>>> >>>>>>> count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. >>>>>> >>>>>> I think Saam is right. Presumably the fix is to do the check and increment together under a lock? >>>>> >>>>> So if we can't make a queue we bail and decode on the main thread? Why not enqueue on some existing queue? >>>> >>>> Actually, I think the increment and check are both on the main thread? >>>> >>>> Also, smfr’s point seems sensible. >>> >>> If so, I think we should remove the atomic then. I was just assuming it was off the main thread because of the atomic. >> >> Re: This looks racy. Not sure if it matters. But, I think the race is: >> count() is 1 away from returning false from this function. Two threads then ask this question at the same time, then both threads end up creating a queue. >> >> No. This is not racy. The WorkQueues are created only from the main thread. Most of the time they are also destroyed on the main thread. The only case where a WorkQueue is destroyed on the decoding thread is when the ImageSource is deleted while the decoding is still running. In this case the only reference to the WorkQueue is kept by protectedDecodingQueue; see ImageSource::startAsyncDecodingQueue(). >> >> The condition above is okay. It is not ultimate but it will always be safe. The number of active WorkQueues can only decreased by the decoding threads. But no two threads will be increasing the number of active WorkQueues at the same time. > > Re: So if we can't make a queue we bail and decode on the main thread? > > Actually the image frame will not be decoded on the main thread. It will be decoded on a separate thread by the system when the drawing is initiated. > > Re: Why not enqueue on some existing queue? > > We mix sync and async image in the same page: small images and large images, non first tile paint and first tile paint. But we never tried before to block decoding/displaying an image because another one is being decoded. I think the goal of the async image decoding is to utilize the system resources wisely. It should not stress the system such that it affects the overall system performance. If the system resources are being affected by the async image decoding, we should fallback to the sync path. And I think this patch its trying to fix that. I think also delaying drawing too many expensive items should be in a higher level than the image decoding. Re: If so, I think we should remove the atomic then. I was just assuming it was off the main thread because of the atomic. The increment can happen only from the main thread. But the decrement can happen from the main thread or from the decoding thread. This depends on which gets destroyed first: ImageSource or the decoding WorkQueue. So the increment and the decrement have to be synchronized.
(In reply to Said Abou-Hallawa from comment #10) > Created attachment 331765 [details] > Patch This is a more involving patch. Yes it creates the WorkQueue and increments the count in one place. But it requires more changes since the drawing depends only on our heuristics. The <img "decoding=async"> attribute is only a web developer's recommendation which can be ignored. No events are firing when decoding an image finishes. But HTMLImageElement::decode() method has to force async image decoding and a promise resolving. Also with the drawing code has to fall back to the sync code path.
Does this patch still fall back to sync decoding if we have too many images decoding right now? If so, I think that's still wrong; I think we should have a pool of queues and post decode requests onto those queues.
Comment on attachment 331765 [details] Patch Attachment 331765 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/6139329 New failing tests: http/tests/misc/resource-timing-resolution.html
Created attachment 331783 [details] Archive of layout-test-results from ews123 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews123 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.6
Created attachment 331891 [details] Patch
Created attachment 331892 [details] Patch
<rdar://problem/36757735>
Comment on attachment 331892 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331892&action=review > Source/WebCore/ChangeLog:12 > + We need to control the number of active threads which are created for image > + decoding. This has been an issue for power consumption and overall system > + degradation. This can be serious once the number of decoding thread exceeds > + the number of system available cores. I don't think we have evidence that image decoding ever exceeds the number of available cores. Do we? Can you supply that evidence? It would be surprising if image decoding exceeded the number of available cores, since WorkQueue uses libdispatch, whose primary job is to avoid creating a number of threads that exceeds the number of available cores. Also, if it were our goal to avoid using too many cores for image decoding, we would need to coordinate that limitation across web processes and non-web CPU activity. This patch doesn't do so, but again, luckily, libdispatch does. Note that libdispatch offers a priority called "user initiated" which is lower than "user interactive". If we want to guarantee that image decoding never beats the main thread, we can test deprioritizing it to "user interactive". I think I have to say r- because we can't accept an architectural change to improve performance without a test case that demonstrates that performance improved -- especially when we have reason to believe that the architecture change duplicates existing functionality in a potentially worse way.
> I don't think we have evidence that image decoding ever exceeds the number of available cores. Do we? Can you supply that evidence? In HaikuWebKit I end up with hundreds of ImageDecoder threads. > It would be surprising if image decoding exceeded the number of available cores, since WorkQueue uses libdispatch, whose primary job is to avoid creating a number of threads that exceeds the number of available cores. That's the case on Apple builds, but not for other platforms which use WorkQueueGeneric, there one thread is created for each ImageSource. Maybe the solution is to rewrite WorkQueueGeneric to do its own thread pooling, then?