Bug 35233 - Optimize Latin-1 decoding in TextCodecLatin1::decode()
Summary: Optimize Latin-1 decoding in TextCodecLatin1::decode()
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-22 03:40 PST by Andreas Kling
Modified: 2010-02-25 10:13 PST (History)
7 users (show)

See Also:


Attachments
Patch (3.39 KB, patch)
2010-02-22 03:46 PST, Andreas Kling
sam: review-
Details | Formatted Diff | Diff
Same patch with nits addressed (3.34 KB, patch)
2010-02-22 08:30 PST, Andreas Kling
darin: review-
Details | Formatted Diff | Diff
Same patch, templates instead of ifdefs (3.76 KB, patch)
2010-02-23 04:18 PST, Andreas Kling
darin: review-
commit-queue: commit-queue-
Details | Formatted Diff | Diff
Same patch with 32-bit compile fix (3.74 KB, patch)
2010-02-24 01:59 PST, Andreas Kling
darin: review+
commit-queue: commit-queue-
Details | Formatted Diff | Diff
Patch with bugfix for test failure (3.79 KB, patch)
2010-02-25 08:04 PST, Andreas Kling
darin: review-
Details | Formatted Diff | Diff
Same patch again, style fixed (3.79 KB, patch)
2010-02-25 08:38 PST, Andreas Kling
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Kling 2010-02-22 03:40:54 PST
The vast majority of Latin-1 text will be (English) ASCII-only data.

We can exploit this fact to process the text one CPU-word at a time instead of byte-by-byte.
Comment 1 Andreas Kling 2010-02-22 03:46:24 PST
Created attachment 49197 [details]
Patch

Cuts text decoding time in half on my 64-bit Linux machine (tested with data from slashdot.org and reddit.com)
Comment 2 Sam Weinig 2010-02-22 07:12:14 PST
Comment on attachment 49197 [details]
Patch

> +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64)
> +#define NON_ASCII_MASK 0x8080808080808080
> +typedef uint64_t CpuWord;
> +#else
> +#define NON_ASCII_MASK 0x80808080
> +typedef uint32_t CpuWord;
> +#endif

Nice optimization. Two nits.
NON_ASCII_MASK should be a constant instead of a #define.
Can you use intptr_t instead of CpuWord?

r- to address these.
Comment 3 Andreas Kling 2010-02-22 08:30:26 PST
Created attachment 49217 [details]
Same patch with nits addressed

Thanks for the speedy review!
Comment 4 Darin Adler 2010-02-22 10:00:56 PST
Comment on attachment 49217 [details]
Same patch with nits addressed

We capitalize acronyms in WebKit code, so the constant should be called nonASCIIMask rather than nonAsciiMask.

> +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64)
> +const uintptr_t nonAsciiMask = 0x8080808080808080;
> +#else
> +const uintptr_t nonAsciiMask = 0x80808080;
> +#endif

This can be done without ifdefs as is done in the WTF header file HashFunctions.h.

    template<size_t size> struct NonASCIIMask;
    template<> struct NonASCIIMask<4> { static unsigned value() { return 0x80808080U; } }
    template<> struct NonASCIIMask<8> { static unsigned long long value() { return 0x8080808080808080UL; }

Then say:

    if (chunk & NonASCIIMask<sizeof(uintptr_t)>::value())

While this is a bit harder to read, it's safer. The code in the patch as is would fail in a subtle way if you had a 64-bit platform not listed in the ifdefs, only checking half of the bits.

The copying can be done the same way:

> +                    dest[0] = src[0];
> +                    dest[1] = src[1];
> +                    dest[2] = src[2];
> +                    dest[3] = src[3];
> +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64)
> +                    dest[4] = src[4];
> +                    dest[5] = src[5];
> +                    dest[6] = src[6];
> +                    dest[7] = src[7];
> +#endif

This copying can also be done with the template technique above, with an inline template member function that copies the correct number of bytes.

Possibly better for platforms that allow unaligned access, this can be done with a single line:

    *reinterpret_cast<uintptr_t*>(dest) = chunk;

You can see in StringHash.h that many platforms do allow unaligned access. I predict it would be a speedup to use that when possible.
Comment 5 Darin Adler 2010-02-22 10:01:07 PST
Great idea.
Comment 6 Darin Adler 2010-02-22 10:01:18 PST
I think the UTF-8 decoder could also be sped up the same way.
Comment 7 Andreas Kling 2010-02-23 04:16:50 PST
(In reply to comment #4)
> Possibly better for platforms that allow unaligned access, this can be done
> with a single line:
> 
>     *reinterpret_cast<uintptr_t*>(dest) = chunk;

`dest` is UChar* (16-bit), so that won't work.
Comment 8 Andreas Kling 2010-02-23 04:18:55 PST
Created attachment 49275 [details]
Same patch, templates instead of ifdefs
Comment 9 Darin Adler 2010-02-23 07:39:05 PST
Comment on attachment 49275 [details]
Same patch, templates instead of ifdefs

> +        * platform/text/TextCodecLatin1.cpp:
> +        (WebCore::):
> +        (WebCore::TextCodecLatin1::decode):

You should delete partial lines like that "WebCore::" one there. They're created due to a bug in the script.

> +template<> struct NonASCIIMask<8> {
> +    static uintptr_t value() { return 0x8080808080808080UL; }
> +};

I was concerned that compiling this might yield warnings on 32-bit platforms. That's why I used the type "unsigned long long" in my suggested code. I guess we're OK because the "UL" suffix makes it an unsigned long, not unsigned long long. I meant to use "ULL".

Please investigate writing full words as well and applying this same optimization to the increasingly-often-used UTF-8 decoder as well.
Comment 10 Darin Adler 2010-02-23 07:39:28 PST
(In reply to comment #7)
> `dest` is UChar* (16-bit), so that won't work.

Oops, I see.
Comment 11 WebKit Commit Bot 2010-02-23 08:16:03 PST
Comment on attachment 49275 [details]
Same patch, templates instead of ifdefs

Rejecting patch 49275 from commit-queue.

Failed to run "['WebKitTools/Scripts/build-webkit', '--debug']" exit_code: 1
Last 500 characters of output:
bit value into a 32-bit value
distcc[35647] ERROR: compile /Users/eseidel/Projects/CommitQueue/WebCore/platform/text/TextCodecLatin1.cpp on localhost failed
** BUILD FAILED **

The following build commands failed:
WebCore:
	Distributed-CompileC /Users/eseidel/Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/i386/TextCodecLatin1.o /Users/eseidel/Projects/CommitQueue/WebCore/platform/text/TextCodecLatin1.cpp normal i386 c++ com.apple.compilers.gcc.4_2
(1 failure)


Full output: http://webkit-commit-queue.appspot.com/results/300612
Comment 12 Darin Adler 2010-02-23 10:17:14 PST
Comment on attachment 49275 [details]
Same patch, templates instead of ifdefs

> +template<> struct NonASCIIMask<8> {
> +    static uintptr_t value() { return 0x8080808080808080UL; }
> +};

As I feared, this does not compile in 32-bit Mac.

I believe the right way to do this is this:

    static unsigned long long value() { return 0x8080808080808080ULL; }

Should work on all platforms.
Comment 13 Andreas Kling 2010-02-24 01:59:04 PST
Created attachment 49364 [details]
Same patch with 32-bit compile fix
Comment 14 Andreas Kling 2010-02-24 02:01:23 PST
(In reply to comment #9)
> Please investigate writing full words as well and applying this same
> optimization to the increasingly-often-used UTF-8 decoder as well.

Sounds like a good idea, but what code are you talking about? AFAICT the UTF-8 decoding is done by each port's own backend.
Comment 15 Darin Adler 2010-02-24 08:12:49 PST
(In reply to comment #14)
> Sounds like a good idea, but what code are you talking about? AFAICT the UTF-8
> decoding is done by each port's own backend.

We had discussed using the UTF8 conversion code from wtf/unicode/UTF8.cpp to create a TextCodecUTF8 that could be more efficient than the platform-specific encoders, but I guess we never did it!
Comment 16 WebKit Commit Bot 2010-02-24 20:03:21 PST
Comment on attachment 49364 [details]
Same patch with 32-bit compile fix

Rejecting patch 49364 from commit-queue.

Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--quiet']" exit_code: 1
Running build-dumprendertree
Compiling Java tests
make: Nothing to be done for `default'.
Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests
Testing 12223 test cases.
fast/table/007.html -> failed

Exiting early after 1 failures. 8371 tests run.
136.90s total testing time

8370 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
5 test cases (<1%) had stderr output

Full output: http://webkit-commit-queue.appspot.com/results/309294
Comment 17 Andreas Kling 2010-02-25 08:04:32 PST
Created attachment 49487 [details]
Patch with bugfix for test failure
Comment 18 WebKit Review Bot 2010-02-25 08:08:29 PST
Attachment 49487 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/platform/text/TextCodecLatin1.cpp:171:  use_lookup_table is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
Total errors found: 1 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 19 Darin Adler 2010-02-25 08:17:20 PST
Comment on attachment 49487 [details]
Patch with bugfix for test failure

The style bot is correct.

Instead of use_lookup_table the label should be useLookupTable.
Comment 20 Darin Adler 2010-02-25 08:17:43 PST
Iā€™m glad the test case caught the logic mistake!
Comment 21 Andreas Kling 2010-02-25 08:38:19 PST
Created attachment 49492 [details]
Same patch again, style fixed
Comment 22 WebKit Commit Bot 2010-02-25 10:13:52 PST
Comment on attachment 49492 [details]
Same patch again, style fixed

Clearing flags on attachment: 49492

Committed r55242: <http://trac.webkit.org/changeset/55242>
Comment 23 WebKit Commit Bot 2010-02-25 10:13:58 PST
All reviewed patches have been landed.  Closing bug.