Bug 35233

Summary: Optimize Latin-1 decoding in TextCodecLatin1::decode()
Product: WebKit Reporter: Andreas Kling <akling@apple.com>
Component: TextAssignee: Nobody <webkit-unassigned@lists.webkit.org>
Status: RESOLVED FIXED    
Severity: Normal CC: ap@webkit.org, benjamin@webkit.org, commit-queue@webkit.org, darin@apple.com, laszlo.gombos@webkit.org, skyul@company100.com, webkit.review.bot@gmail.com
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
Attachments:
Description Flags
Patch
sam: review-
Same patch with nits addressed
darin: review-
Same patch, templates instead of ifdefs
darin: review-, commit-queue: commit‑queue-
Same patch with 32-bit compile fix
darin: review+, commit-queue: commit‑queue-
Patch with bugfix for test failure
darin: review-
Same patch again, style fixed none

Description From 2010-02-22 03:40:54 PST
The vast majority of Latin-1 text will be (English) ASCII-only data.

We can exploit this fact to process the text one CPU-word at a time instead of byte-by-byte.
------- Comment #1 From 2010-02-22 03:46:24 PST -------
Created an attachment (id=49197) [details]
Patch

Cuts text decoding time in half on my 64-bit Linux machine (tested with data from slashdot.org and reddit.com)
------- Comment #2 From 2010-02-22 07:12:14 PST -------
(From update of attachment 49197 [details])
> +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64)
> +#define NON_ASCII_MASK 0x8080808080808080
> +typedef uint64_t CpuWord;
> +#else
> +#define NON_ASCII_MASK 0x80808080
> +typedef uint32_t CpuWord;
> +#endif

Nice optimization. Two nits.
NON_ASCII_MASK should be a constant instead of a #define.
Can you use intptr_t instead of CpuWord?

r- to address these.
------- Comment #3 From 2010-02-22 08:30:26 PST -------
Created an attachment (id=49217) [details]
Same patch with nits addressed

Thanks for the speedy review!
------- Comment #4 From 2010-02-22 10:00:56 PST -------
(From update of attachment 49217 [details])
We capitalize acronyms in WebKit code, so the constant should be called nonASCIIMask rather than nonAsciiMask.

> +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64)
> +const uintptr_t nonAsciiMask = 0x8080808080808080;
> +#else
> +const uintptr_t nonAsciiMask = 0x80808080;
> +#endif

This can be done without ifdefs as is done in the WTF header file HashFunctions.h.

    template<size_t size> struct NonASCIIMask;
    template<> struct NonASCIIMask<4> { static unsigned value() { return 0x80808080U; } }
    template<> struct NonASCIIMask<8> { static unsigned long long value() { return 0x8080808080808080UL; }

Then say:

    if (chunk & NonASCIIMask<sizeof(uintptr_t)>::value())

While this is a bit harder to read, it's safer. The code in the patch as is would fail in a subtle way if you had a 64-bit platform not listed in the ifdefs, only checking half of the bits.

The copying can be done the same way:

> +                    dest[0] = src[0];
> +                    dest[1] = src[1];
> +                    dest[2] = src[2];
> +                    dest[3] = src[3];
> +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64)
> +                    dest[4] = src[4];
> +                    dest[5] = src[5];
> +                    dest[6] = src[6];
> +                    dest[7] = src[7];
> +#endif

This copying can also be done with the template technique above, with an inline template member function that copies the correct number of bytes.

Possibly better for platforms that allow unaligned access, this can be done with a single line:

    *reinterpret_cast<uintptr_t*>(dest) = chunk;

You can see in StringHash.h that many platforms do allow unaligned access. I predict it would be a speedup to use that when possible.
------- Comment #5 From 2010-02-22 10:01:07 PST -------
Great idea.
------- Comment #6 From 2010-02-22 10:01:18 PST -------
I think the UTF-8 decoder could also be sped up the same way.
------- Comment #7 From 2010-02-23 04:16:50 PST -------
(In reply to comment #4)
> Possibly better for platforms that allow unaligned access, this can be done
> with a single line:
> 
>     *reinterpret_cast<uintptr_t*>(dest) = chunk;

`dest` is UChar* (16-bit), so that won't work.
------- Comment #8 From 2010-02-23 04:18:55 PST -------
Created an attachment (id=49275) [details]
Same patch, templates instead of ifdefs
------- Comment #9 From 2010-02-23 07:39:05 PST -------
(From update of attachment 49275 [details])
> +        * platform/text/TextCodecLatin1.cpp:
> +        (WebCore::):
> +        (WebCore::TextCodecLatin1::decode):

You should delete partial lines like that "WebCore::" one there. They're created due to a bug in the script.

> +template<> struct NonASCIIMask<8> {
> +    static uintptr_t value() { return 0x8080808080808080UL; }
> +};

I was concerned that compiling this might yield warnings on 32-bit platforms. That's why I used the type "unsigned long long" in my suggested code. I guess we're OK because the "UL" suffix makes it an unsigned long, not unsigned long long. I meant to use "ULL".

Please investigate writing full words as well and applying this same optimization to the increasingly-often-used UTF-8 decoder as well.
------- Comment #10 From 2010-02-23 07:39:28 PST -------
(In reply to comment #7)
> `dest` is UChar* (16-bit), so that won't work.

Oops, I see.
------- Comment #11 From 2010-02-23 08:16:03 PST -------
(From update of attachment 49275 [details])
Rejecting patch 49275 from commit-queue.

Failed to run "['WebKitTools/Scripts/build-webkit', '--debug']" exit_code: 1
Last 500 characters of output:
bit value into a 32-bit value
distcc[35647] ERROR: compile /Users/eseidel/Projects/CommitQueue/WebCore/platform/text/TextCodecLatin1.cpp on localhost failed
** BUILD FAILED **

The following build commands failed:
WebCore:
    Distributed-CompileC /Users/eseidel/Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/i386/TextCodecLatin1.o /Users/eseidel/Projects/CommitQueue/WebCore/platform/text/TextCodecLatin1.cpp normal i386 c++ com.apple.compilers.gcc.4_2
(1 failure)


Full output: http://webkit-commit-queue.appspot.com/results/300612
------- Comment #12 From 2010-02-23 10:17:14 PST -------
(From update of attachment 49275 [details])
> +template<> struct NonASCIIMask<8> {
> +    static uintptr_t value() { return 0x8080808080808080UL; }
> +};

As I feared, this does not compile in 32-bit Mac.

I believe the right way to do this is this:

    static unsigned long long value() { return 0x8080808080808080ULL; }

Should work on all platforms.
------- Comment #13 From 2010-02-24 01:59:04 PST -------
Created an attachment (id=49364) [details]
Same patch with 32-bit compile fix
------- Comment #14 From 2010-02-24 02:01:23 PST -------
(In reply to comment #9)
> Please investigate writing full words as well and applying this same
> optimization to the increasingly-often-used UTF-8 decoder as well.

Sounds like a good idea, but what code are you talking about? AFAICT the UTF-8 decoding is done by each port's own backend.
------- Comment #15 From 2010-02-24 08:12:49 PST -------
(In reply to comment #14)
> Sounds like a good idea, but what code are you talking about? AFAICT the UTF-8
> decoding is done by each port's own backend.

We had discussed using the UTF8 conversion code from wtf/unicode/UTF8.cpp to create a TextCodecUTF8 that could be more efficient than the platform-specific encoders, but I guess we never did it!
------- Comment #16 From 2010-02-24 20:03:21 PST -------
(From update of attachment 49364 [details])
Rejecting patch 49364 from commit-queue.

Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--quiet']" exit_code: 1
Running build-dumprendertree
Compiling Java tests
make: Nothing to be done for `default'.
Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests
Testing 12223 test cases.
fast/table/007.html -> failed

Exiting early after 1 failures. 8371 tests run.
136.90s total testing time

8370 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
5 test cases (<1%) had stderr output

Full output: http://webkit-commit-queue.appspot.com/results/309294
------- Comment #17 From 2010-02-25 08:04:32 PST -------
Created an attachment (id=49487) [details]
Patch with bugfix for test failure
------- Comment #18 From 2010-02-25 08:08:29 PST -------
Attachment 49487 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/platform/text/TextCodecLatin1.cpp:171:  use_lookup_table is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
Total errors found: 1 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
------- Comment #19 From 2010-02-25 08:17:20 PST -------
(From update of attachment 49487 [details])
The style bot is correct.

Instead of use_lookup_table the label should be useLookupTable.
------- Comment #20 From 2010-02-25 08:17:43 PST -------
I’m glad the test case caught the logic mistake!
------- Comment #21 From 2010-02-25 08:38:19 PST -------
Created an attachment (id=49492) [details]
Same patch again, style fixed
------- Comment #22 From 2010-02-25 10:13:52 PST -------
(From update of attachment 49492 [details])
Clearing flags on attachment: 49492

Committed r55242: <http://trac.webkit.org/changeset/55242>
------- Comment #23 From 2010-02-25 10:13:58 PST -------
All reviewed patches have been landed.  Closing bug.