Bug 105821

Summary: Optimize the texture packing for texImage2D() and texSubImage2D() in WebGL
Product: WebKit Reporter: Jun Jiang <jun.a.jiang>
Component: WebGLAssignee: Jun Jiang <jun.a.jiang>
Status: RESOLVED FIXED    
Severity: Normal CC: dino, d-r, junov, kbr, noam, ojan.autocc, senorblanco, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Jun Jiang 2012-12-28 00:29:10 PST
In texture packing for texImage2D() and texSubImage2D() in WebGL, there are hundreds of unpack and pack operations and instances for different combinations of source format, destination format and Alpha operations. Current Implementation uses function pointers to denote each corresponding unpack and pack operation. The code logic is a little complex and could be made cleaner and simpler. Mozilla has done some optimizations based on this implementation, we could merge it back. There are basically two things: 1. use template functions and avoid code bloat. 2. merge the flipY operation into the packing operation.
Comment 1 Jun Jiang 2012-12-28 00:49:54 PST
Created attachment 180850 [details]
Patch
Comment 2 Jun Jiang 2012-12-31 00:44:43 PST
After applying the patch, the binary size of GraphicsContext3D.o for Non-CG ports are smaller than before. Taking chromium on Linux for example, the size of  GraphicsContext3D.o decreases from 60976 bytes to 33488 bytes (release build). 
While for CG port, the binary size of GraphicsContext3D.o is bigger than before, increasing from 405600 bytes to 1194444 bytes(release build, symbols not striped). The difference between CG port and other ports is that the number of possible source formats for Image from DOM elements are bigger. The detail is in FormatConverter::convert(). This difference introduces much more instances for template functions and results a binary size increase for CG port.
Comment 3 Jun Jiang 2013-01-07 05:49:18 PST
find two ways to reduce the binary size of GraphicsContext3D.o for CG port, one is to optimize  the convert() template function, another is to reduce possible formats from HTMLImageElement by considering the little/big endian of the machine.  Simulated CG setting by using these two methods in SKIA port and the size is smaller than original. Will upload a successive  patch.
Comment 4 Jun Jiang 2013-01-08 13:03:53 PST
Created attachment 181738 [details]
Patch
Comment 5 Kenneth Russell 2013-01-14 19:13:18 PST
Thanks for your patch and sorry for the delay reviewing it. I'm reviewing it now. Other reviewers, please hold off r+'ing this patch for now.
Comment 6 Kenneth Russell 2013-01-15 10:41:36 PST
Comment on attachment 181738 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=181738&action=review

This is nice work that gets closer to unifying the pixel packing/unpacking code again between Firefox and WebKit. Some relatively small issues, plus one higher-level one. Per offline discussion it sounds like a bug was caught during testing of this patch, so r-'ing this one, but please post a revision when ready.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:320
> +// Some code are merged back from Mozilla code in content/canvas/src/WebGLTexelConversions.h

A high-level point: it would be better if the code that is expected to be shared between Firefox and WebKit were clearly marked at its beginning and end, as it is in Firefox's http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/WebGLTexelConversions.h .

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1237
> +        || format == GraphicsContext3D::DataFormatA16Little

Since the 16-bit per channel big and little endian formats are only used by the Core Graphics port, and their use is rare, I wonder whether they could be handled differently to reduce the code size on all platforms.

Specifically, perhaps data of this format could be converted to one of the 8-bit formats inside GraphicsContext3DCG.cpp, and the cross-platform code wouldn't need to know about these formats.

If you agree, it isn't necessary to make that change in this patch -- it's large enough as it is -- but please file another bug and make it depend on this one to track that work.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1350
> +ALWAYS_INLINE unsigned TexelBytesForFormat(int format)

Can this take "GraphicsContext3D::DataFormat" rather than int?

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1408
> +        unpackedIntermediateSrcData = malloc(mWidth * MaxNumberOfComponents *MaxBytesPerComponent);

Use unpackedIntermediateSrcData = adoptArrayPtr(new uint8_t[mWidth * MaxNumberOfComponents * MaxBytesPerComponent]).

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1414
> +            free(unpackedIntermediateSrcData);

With OwnArrayPtr the call to free() is unnecessary.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1430
> +    const unsigned mWidth, mHeight;

Use WebKit naming convention here and throughout: m_ prefix.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1435
> +    void* unpackedIntermediateSrcData;

Avoid use of raw pointers -- use OwnArrayPtr<uint8_t> instead. Also, fix naming convention.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1549
> +    const bool CanSrcFormatComeFromDOMElementOrImageData = GraphicsContext3D::srcFormatComeFromDOMElementOrImageData(SrcFormat);

Capitalization: canSrcFormat...

Grammar: srcFormatComesFrom...

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1571
> +    const bool trivialPack = (DstFormat == GraphicsContext3D::DataFormatRGBA8 || DstFormat == GraphicsContext3D::DataFormatRGBA32F) && alphaOp == GraphicsContext3D::AlphaDoNothing && mDstStride > 0;

This method looks similar to Firefox's WebGLImageConverter::run. Why does this version need to compute these two flags?

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1577
> +        if (!trivialUnpack && trivialPack) {

Have you measured the cost of having these three if/else branches inside the loop? The alternative would be to clone the loop.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1591
> +bool webGLPackPixels(const uint8_t* sourceData, GraphicsContext3D::DataFormat sourceFormat, unsigned width, unsigned height, unsigned sourceUnpackAlignment, unsigned destinationFormat, unsigned destinationType, GraphicsContext3D::AlphaOp alphaOp, void* destinationData, bool flipY)

Avoid referring to "WebGL" inside GraphicsContext3D. This separate function doesn't appear to be needed at all; just put its body into GraphicsContext3D::packPixels.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1594
> +    int reminder = sourceUnpackAlignment ? (validSrc % sourceUnpackAlignment) : 0;

Typo: reminder -> remainder

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1600
> +        destinationData = (uint8_t *)destinationData + dstStride*(height - 1);

Use C++-style rather than C-style casts in WebKit code: static_cast<uint8_t*>(...). Also, spaces around "*" operator.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1610
> +        uint8_t* dst = (uint8_t*)destinationData;

Fix C-style cast.

> Source/WebCore/platform/graphics/GraphicsContext3D.h:645
> +    // The formats from DOM elements vary with Graphics ports. It can only be RGBA8 or BGRA8 for non-CG port while much more for CGi port.

Typo: CGi -> CG
Comment 7 Jun Jiang 2013-01-15 11:34:42 PST
Kenneth, thanks for your useful comments. That Mozilla did not use the trivialUnpack and trivialPack flags has some intrinsic reasons. Mozilla depends on compiler to do optimization instead of algorithm level. The unpack, convert, pack operations are done in a per-pixel basis and all these functions are inlined. During compilation, compiler will optimize off the cost for unnecessary unpack, convert or pack operations. It indeed brings performance gain. But it still has several drawbacks: first, in some cases, per-pixel assignment in a loop is slower than a single memcpy() for a row; Second, some ARCHs, like arm, could not use instruction acceleration(NEON) for the unpack and pack on a per-row basis to transfer more data in a single trip.
Comment 8 Jun Jiang 2013-01-16 03:30:54 PST
Created attachment 182953 [details]
Patch
Comment 9 Jun Jiang 2013-01-16 17:50:09 PST
Make changes according to Kenneth's suggestions and fix some issue with uploading textures to floating-point formats.
Comment 10 Kenneth Russell 2013-01-22 18:37:38 PST
Comment on attachment 182953 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=182953&action=review

Thanks for persisting with this patch. It gets much closer to unifying this code again with Mozilla's, which will make it easier to share code in the future. Two tiny nits, but neither is worth continuing to delay your patch. r=me

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1239
> +        || format == GraphicsContext3D::DataFormatA16Little

Please see the question in the previous review regarding these 16-bit per channel formats. Since they are exposed here only for the Core Graphics port, and they don't seem to be useful to represent any existing high dynamic range file format, I think we should hide their use inside CG-specific code and normalize incoming images to 8 bits per channel if necessary. Do you agree? If so please file another bug.

> Source/WebCore/platform/graphics/GraphicsContext3D.cpp:1434
> +    OwnArrayPtr<uint8_t> unpackedIntermediateSrcData;

Should have been m_unpackedIntermediateSrcData. Not a huge deal.
Comment 11 WebKit Review Bot 2013-01-22 19:04:20 PST
Comment on attachment 182953 [details]
Patch

Clearing flags on attachment: 182953

Committed r140497: <http://trac.webkit.org/changeset/140497>
Comment 12 WebKit Review Bot 2013-01-22 19:04:26 PST
All reviewed patches have been landed.  Closing bug.