Bug 65622

Summary:

[Chromium] Canvas 2D getImageData method is slow with GPU acceleration enabled

Product:

WebKit

Reporter:

Justin Novosad <junov>

Component:

Layout and Rendering

Assignee:

Justin Novosad <junov>

Status:

RESOLVED WONTFIX

Severity:

Normal

CC:

alokp, jamesr, schenney, senorblanco, vangelis

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

URL:

http://oxy.fi/back-to-visuals/

Attachments:

Description	Flags
Patch	jamesr: review-

Justin Novosad

Reported 2011-08-03 09:02:13 PDT

Steps: 1. Run the back to visuals chrome experiment in chrome 2. Repeat with accelerated 2d canvas disabled Notice that most parts of the animation run much faster with acceleration disabled :-( Diagnosis: -The main problem is with read-back performance. executing getImageData on a GPU-backed canvas triggers a synchronous read-back operation, which severely degrades performance. -A secondary performance issue is with un-pre-multiplication. Each pixel's color values are being divided by alpha on the CPU. Resolution strategies: 1. Reduce read-back frequency by demoting canvases to CPU-based rendering when read-backs are predicted. A simple predictor of read-backs is read-backs (i.e. a canvas that is the object of a read-back is likely to be read-back repeatedly), so we can start with that. Further improvements to increase the accuracy of readback prediction could include logging, heuristics, learning algorithms, javascript code analysis (look ahead). Implementing this strategy requires means of switching to between GPU and CPU drawing devices after a canvas is created. Primitives that draw differently on GPU vs. CPU must always be drawn using the same device to avoid popping artifacts. 2. Add a gpu hint property (webkit specific?) to the canvas element to guide the choice between GPU and CPU rendering devices. Who better than a savvy web developper to predict the likelihood and performance impact of readbacks? 3. Optimize the component swizzling and alpha divide code. When a read back from the GPU is performed, these operations could be performed on the GPU; when reading from CPU, the pixel loop could use SIMD and/or multi-threading.

Attachments
Patch (13.25 KB, patch) 2011-08-03 10:17 PDT, Justin Novosad	jamesr: review-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Justin Novosad

Comment 1 2011-08-03 10:17:29 PDT

Created attachment 102792 [details] Patch

Justin Novosad

Comment 2 2011-08-03 10:23:15 PDT

(In reply to comment #1) > Created an attachment (id=102792) [details] > Patch This first patch implements part of strategy 1. I am hiding it behind a build flag for now so that I can deliver the fix in small (easy to review) increments. The following increments will take care of: finding all places that need calls to useSkiaGpuHint; tuning performance by implementing better heuristics; fixing rendering glitches

Stephen White

Comment 3 2011-08-03 13:03:16 PDT

Comment on attachment 102792 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=102792&action=review Most of my comments have to do with naming and/or API, but I do have one overall comment: while this is may be a big performance win, it comes at a cost of code complexity. I'm not going to reject it on that basis, but perhaps some others should weigh in on whether we think this is a cost we're willing to bear. > Source/WebCore/platform/graphics/skia/ImageBufferSkia.cpp:257 > + platformContext->makeGrContextCurrent(); As discussed, I think it would make more sense to make an overload of makeGrContextCurrent() which takes the hint, rather than having a separate call and have it be order-dependent. Also the hint should be an enum (or flag), rather than a bool. > Source/WebCore/platform/graphics/skia/ImageSkia.cpp:258 > + platformContext->useSkiaGpuHint(bitmap); This should also probably be some variant of makeGrContextCurrent(), although a straight overload might be a bit confusing (takes a bitmap?). If you don't mind moving a little bit of the logic here (testing the bitmap for texture-backing), we might be able to just use the overloaded version of makeGrContextCurrent() above, and passing in an enum value here. > Source/WebCore/platform/graphics/skia/PlatformContextSkia.cpp:72 > +class PlatformContextSkia::GpuPublishCallback : public DrawingBuffer::WillPublishCallback { jamesr knows these publish callbacks the best, so I'll leave that to him. > Source/WebCore/platform/graphics/skia/PlatformContextSkia.cpp:224 > + , m_useSkiaGpu(false) This would be better just named m_useGPU at this point. The function should also be renamed useGpu(), but you can do that in another patch. > Source/WebCore/platform/graphics/skia/PlatformContextSkia.cpp:226 > + , m_alternateDevice(0) Alternate device is a little ambiguous to me. Maybe m_secondaryDevice? Hmm, not much better. m_inactiveDevice?

Justin Novosad

Comment 4 2011-08-03 13:26:18 PDT

> Alternate device is a little ambiguous to me. Maybe m_secondaryDevice? Hmm, not much better. m_inactiveDevice? I think I am going to go with m_noncurrentDevice. OK?

James Robinson

Comment 5 2011-08-04 16:01:48 PDT

Comment on attachment 102792 [details] Patch If we're going to switch from h/w to software, we should switch back all the way to software rendering (including removing compositing layers). With this patch, a switch back to s/w will make us software render, upload to a texture, and GPU composite for not much benefit. With that in mind I think it makes a lot more sense to do this sort of manipulation up at or closer to the CanvasRenderingContext2D level. I also wonder if we can't gather a bit more data. We have the ability to do histograms in WebKit code, maybe that would be useful? I think it'd be really useful to know how good a proxy having done one readback is for throwing the context into s/w permanently. A few things that might be useful to measure: - for contexts that do at least one readback, how many more readbacks do they do? - what's the distribution look like for the ratio of readbacks to canvas->canvas draws or canvas composites? R- because this will lead in lots of unnecessary compositing layers being left around for software-rendered canvases, but please consider the other feedback as well.

Justin Novosad

Comment 6 2011-08-09 08:08:14 PDT

(In reply to comment #5) > (From update of attachment 102792 [details]) > If we're going to switch from h/w to software, we should switch back all the way to software rendering (including removing compositing layers). The idea behind keeping the GPU device alive to allow switching back and forth is that some primitives don't render exactly the same in s/w, so we may want to switch back to GPU to avoid popping artifacts. Where I wanted to go with this (in later code changes) is to add book keeping code to track rendering statistics and to make the switching smarter based on heuristics that look at the canvas's rendering stats. I would discard the unused GPU device and compositing layer only once there is high confidence that switching back will not be necessary.

Stephen Chenney

Comment 7 2013-04-09 16:12:26 PDT

*** Bug 65207 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.