Bug 35233

Summary:

Optimize Latin-1 decoding in TextCodecLatin1::decode()

Product:

WebKit

Reporter:

Andreas Kling <kling>

Component:

Text

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

ap, benjamin, commit-queue, darin, laszlo.gombos, skyul, webkit.review.bot

Priority:

Version:

528+ (Nightly build)

Hardware:

OS:

All

Attachments:

Description	Flags
Patch	sam: review-
Same patch with nits addressed	darin: review-
Same patch, templates instead of ifdefs	darin: review-, commit-queue: commit-queue-
Same patch with 32-bit compile fix	darin: review+, commit-queue: commit-queue-
Patch with bugfix for test failure	darin: review-
Same patch again, style fixed	none

Andreas Kling

Reported 2010-02-22 03:40:54 PST

The vast majority of Latin-1 text will be (English) ASCII-only data. We can exploit this fact to process the text one CPU-word at a time instead of byte-by-byte.

Attachments
Patch (3.39 KB, patch) 2010-02-22 03:46 PST, Andreas Kling	sam: review-	Details Formatted Diff Diff
Same patch with nits addressed (3.34 KB, patch) 2010-02-22 08:30 PST, Andreas Kling	darin: review-	Details Formatted Diff Diff
Same patch, templates instead of ifdefs (3.76 KB, patch) 2010-02-23 04:18 PST, Andreas Kling	darin: review- commit-queue: commit-queue-	Details Formatted Diff Diff
Same patch with 32-bit compile fix (3.74 KB, patch) 2010-02-24 01:59 PST, Andreas Kling	darin: review+ commit-queue: commit-queue-	Details Formatted Diff Diff
Patch with bugfix for test failure (3.79 KB, patch) 2010-02-25 08:04 PST, Andreas Kling	darin: review-	Details Formatted Diff Diff
Same patch again, style fixed (3.79 KB, patch) 2010-02-25 08:38 PST, Andreas Kling	no flags	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Andreas Kling

Comment 1 2010-02-22 03:46:24 PST

Created attachment 49197 [details] Patch Cuts text decoding time in half on my 64-bit Linux machine (tested with data from slashdot.org and reddit.com)

Sam Weinig

Comment 2 2010-02-22 07:12:14 PST

Comment on attachment 49197 [details] Patch > +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64) > +#define NON_ASCII_MASK 0x8080808080808080 > +typedef uint64_t CpuWord; > +#else > +#define NON_ASCII_MASK 0x80808080 > +typedef uint32_t CpuWord; > +#endif Nice optimization. Two nits. NON_ASCII_MASK should be a constant instead of a #define. Can you use intptr_t instead of CpuWord? r- to address these.

Andreas Kling

Comment 3 2010-02-22 08:30:26 PST

Created attachment 49217 [details] Same patch with nits addressed Thanks for the speedy review!

Darin Adler

Comment 4 2010-02-22 10:00:56 PST

Comment on attachment 49217 [details] Same patch with nits addressed We capitalize acronyms in WebKit code, so the constant should be called nonASCIIMask rather than nonAsciiMask. > +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64) > +const uintptr_t nonAsciiMask = 0x8080808080808080; > +#else > +const uintptr_t nonAsciiMask = 0x80808080; > +#endif This can be done without ifdefs as is done in the WTF header file HashFunctions.h. template<size_t size> struct NonASCIIMask; template<> struct NonASCIIMask<4> { static unsigned value() { return 0x80808080U; } } template<> struct NonASCIIMask<8> { static unsigned long long value() { return 0x8080808080808080UL; } Then say: if (chunk & NonASCIIMask<sizeof(uintptr_t)>::value()) While this is a bit harder to read, it's safer. The code in the patch as is would fail in a subtle way if you had a 64-bit platform not listed in the ifdefs, only checking half of the bits. The copying can be done the same way: > + dest[0] = src[0]; > + dest[1] = src[1]; > + dest[2] = src[2]; > + dest[3] = src[3]; > +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64) > + dest[4] = src[4]; > + dest[5] = src[5]; > + dest[6] = src[6]; > + dest[7] = src[7]; > +#endif This copying can also be done with the template technique above, with an inline template member function that copies the correct number of bytes. Possibly better for platforms that allow unaligned access, this can be done with a single line: *reinterpret_cast<uintptr_t*>(dest) = chunk; You can see in StringHash.h that many platforms do allow unaligned access. I predict it would be a speedup to use that when possible.

Darin Adler

Comment 5 2010-02-22 10:01:07 PST

Great idea.

Darin Adler

Comment 6 2010-02-22 10:01:18 PST

I think the UTF-8 decoder could also be sped up the same way.

Andreas Kling

Comment 7 2010-02-23 04:16:50 PST

(In reply to comment #4) > Possibly better for platforms that allow unaligned access, this can be done > with a single line: > > *reinterpret_cast<uintptr_t*>(dest) = chunk; `dest` is UChar* (16-bit), so that won't work.

Andreas Kling

Comment 8 2010-02-23 04:18:55 PST

Created attachment 49275 [details] Same patch, templates instead of ifdefs

Darin Adler

Comment 9 2010-02-23 07:39:05 PST

Comment on attachment 49275 [details] Same patch, templates instead of ifdefs > + * platform/text/TextCodecLatin1.cpp: > + (WebCore::): > + (WebCore::TextCodecLatin1::decode): You should delete partial lines like that "WebCore::" one there. They're created due to a bug in the script. > +template<> struct NonASCIIMask<8> { > + static uintptr_t value() { return 0x8080808080808080UL; } > +}; I was concerned that compiling this might yield warnings on 32-bit platforms. That's why I used the type "unsigned long long" in my suggested code. I guess we're OK because the "UL" suffix makes it an unsigned long, not unsigned long long. I meant to use "ULL". Please investigate writing full words as well and applying this same optimization to the increasingly-often-used UTF-8 decoder as well.

Darin Adler

Comment 10 2010-02-23 07:39:28 PST

(In reply to comment #7) > `dest` is UChar* (16-bit), so that won't work. Oops, I see.

WebKit Commit Bot

Comment 11 2010-02-23 08:16:03 PST

Comment on attachment 49275 [details] Same patch, templates instead of ifdefs Rejecting patch 49275 from commit-queue. Failed to run "['WebKitTools/Scripts/build-webkit', '--debug']" exit_code: 1 Last 500 characters of output: bit value into a 32-bit value distcc[35647] ERROR: compile /Users/eseidel/Projects/CommitQueue/WebCore/platform/text/TextCodecLatin1.cpp on localhost failed ** BUILD FAILED ** The following build commands failed: WebCore: Distributed-CompileC /Users/eseidel/Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/i386/TextCodecLatin1.o /Users/eseidel/Projects/CommitQueue/WebCore/platform/text/TextCodecLatin1.cpp normal i386 c++ com.apple.compilers.gcc.4_2 (1 failure) Full output: http://webkit-commit-queue.appspot.com/results/300612

Darin Adler

Comment 12 2010-02-23 10:17:14 PST

Comment on attachment 49275 [details] Same patch, templates instead of ifdefs > +template<> struct NonASCIIMask<8> { > + static uintptr_t value() { return 0x8080808080808080UL; } > +}; As I feared, this does not compile in 32-bit Mac. I believe the right way to do this is this: static unsigned long long value() { return 0x8080808080808080ULL; } Should work on all platforms.

Andreas Kling

Comment 13 2010-02-24 01:59:04 PST

Created attachment 49364 [details] Same patch with 32-bit compile fix

Andreas Kling

Comment 14 2010-02-24 02:01:23 PST

(In reply to comment #9) > Please investigate writing full words as well and applying this same > optimization to the increasingly-often-used UTF-8 decoder as well. Sounds like a good idea, but what code are you talking about? AFAICT the UTF-8 decoding is done by each port's own backend.

Darin Adler

Comment 15 2010-02-24 08:12:49 PST

(In reply to comment #14) > Sounds like a good idea, but what code are you talking about? AFAICT the UTF-8 > decoding is done by each port's own backend. We had discussed using the UTF8 conversion code from wtf/unicode/UTF8.cpp to create a TextCodecUTF8 that could be more efficient than the platform-specific encoders, but I guess we never did it!

WebKit Commit Bot

Comment 16 2010-02-24 20:03:21 PST

Comment on attachment 49364 [details] Same patch with 32-bit compile fix Rejecting patch 49364 from commit-queue. Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--quiet']" exit_code: 1 Running build-dumprendertree Compiling Java tests make: Nothing to be done for `default'. Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests Testing 12223 test cases. fast/table/007.html -> failed Exiting early after 1 failures. 8371 tests run. 136.90s total testing time 8370 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 5 test cases (<1%) had stderr output Full output: http://webkit-commit-queue.appspot.com/results/309294

Andreas Kling

Comment 17 2010-02-25 08:04:32 PST

Created attachment 49487 [details] Patch with bugfix for test failure

WebKit Review Bot

Comment 18 2010-02-25 08:08:29 PST

Attachment 49487 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/platform/text/TextCodecLatin1.cpp:171: use_lookup_table is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 1 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.

Darin Adler

Comment 19 2010-02-25 08:17:20 PST

Comment on attachment 49487 [details] Patch with bugfix for test failure The style bot is correct. Instead of use_lookup_table the label should be useLookupTable.

Darin Adler

Comment 20 2010-02-25 08:17:43 PST

I’m glad the test case caught the logic mistake!

Andreas Kling

Comment 21 2010-02-25 08:38:19 PST

Created attachment 49492 [details] Same patch again, style fixed

WebKit Commit Bot

Comment 22 2010-02-25 10:13:52 PST

Comment on attachment 49492 [details] Same patch again, style fixed Clearing flags on attachment: 49492 Committed r55242: <http://trac.webkit.org/changeset/55242>

WebKit Commit Bot

Comment 23 2010-02-25 10:13:58 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.