Bug 35233

Summary: Optimize Latin-1 decoding in TextCodecLatin1::decode()
Product: WebKit Reporter: Andreas Kling <kling>
Component: TextAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, benjamin, commit-queue, darin, laszlo.gombos, skyul, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
Attachments:
Description Flags
Patch
sam: review-
Same patch with nits addressed
darin: review-
Same patch, templates instead of ifdefs
darin: review-, commit-queue: commit-queue-
Same patch with 32-bit compile fix
darin: review+, commit-queue: commit-queue-
Patch with bugfix for test failure
darin: review-
Same patch again, style fixed none

Andreas Kling
Reported 2010-02-22 03:40:54 PST
The vast majority of Latin-1 text will be (English) ASCII-only data. We can exploit this fact to process the text one CPU-word at a time instead of byte-by-byte.
Attachments
Patch (3.39 KB, patch)
2010-02-22 03:46 PST, Andreas Kling
sam: review-
Same patch with nits addressed (3.34 KB, patch)
2010-02-22 08:30 PST, Andreas Kling
darin: review-
Same patch, templates instead of ifdefs (3.76 KB, patch)
2010-02-23 04:18 PST, Andreas Kling
darin: review-
commit-queue: commit-queue-
Same patch with 32-bit compile fix (3.74 KB, patch)
2010-02-24 01:59 PST, Andreas Kling
darin: review+
commit-queue: commit-queue-
Patch with bugfix for test failure (3.79 KB, patch)
2010-02-25 08:04 PST, Andreas Kling
darin: review-
Same patch again, style fixed (3.79 KB, patch)
2010-02-25 08:38 PST, Andreas Kling
no flags
Andreas Kling
Comment 1 2010-02-22 03:46:24 PST
Created attachment 49197 [details] Patch Cuts text decoding time in half on my 64-bit Linux machine (tested with data from slashdot.org and reddit.com)
Sam Weinig
Comment 2 2010-02-22 07:12:14 PST
Comment on attachment 49197 [details] Patch > +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64) > +#define NON_ASCII_MASK 0x8080808080808080 > +typedef uint64_t CpuWord; > +#else > +#define NON_ASCII_MASK 0x80808080 > +typedef uint32_t CpuWord; > +#endif Nice optimization. Two nits. NON_ASCII_MASK should be a constant instead of a #define. Can you use intptr_t instead of CpuWord? r- to address these.
Andreas Kling
Comment 3 2010-02-22 08:30:26 PST
Created attachment 49217 [details] Same patch with nits addressed Thanks for the speedy review!
Darin Adler
Comment 4 2010-02-22 10:00:56 PST
Comment on attachment 49217 [details] Same patch with nits addressed We capitalize acronyms in WebKit code, so the constant should be called nonASCIIMask rather than nonAsciiMask. > +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64) > +const uintptr_t nonAsciiMask = 0x8080808080808080; > +#else > +const uintptr_t nonAsciiMask = 0x80808080; > +#endif This can be done without ifdefs as is done in the WTF header file HashFunctions.h. template<size_t size> struct NonASCIIMask; template<> struct NonASCIIMask<4> { static unsigned value() { return 0x80808080U; } } template<> struct NonASCIIMask<8> { static unsigned long long value() { return 0x8080808080808080UL; } Then say: if (chunk & NonASCIIMask<sizeof(uintptr_t)>::value()) While this is a bit harder to read, it's safer. The code in the patch as is would fail in a subtle way if you had a 64-bit platform not listed in the ifdefs, only checking half of the bits. The copying can be done the same way: > + dest[0] = src[0]; > + dest[1] = src[1]; > + dest[2] = src[2]; > + dest[3] = src[3]; > +#if CPU(X86_64) || CPU(SPARC64) || CPU(IA64) || CPU(PPC64) > + dest[4] = src[4]; > + dest[5] = src[5]; > + dest[6] = src[6]; > + dest[7] = src[7]; > +#endif This copying can also be done with the template technique above, with an inline template member function that copies the correct number of bytes. Possibly better for platforms that allow unaligned access, this can be done with a single line: *reinterpret_cast<uintptr_t*>(dest) = chunk; You can see in StringHash.h that many platforms do allow unaligned access. I predict it would be a speedup to use that when possible.
Darin Adler
Comment 5 2010-02-22 10:01:07 PST
Great idea.
Darin Adler
Comment 6 2010-02-22 10:01:18 PST
I think the UTF-8 decoder could also be sped up the same way.
Andreas Kling
Comment 7 2010-02-23 04:16:50 PST
(In reply to comment #4) > Possibly better for platforms that allow unaligned access, this can be done > with a single line: > > *reinterpret_cast<uintptr_t*>(dest) = chunk; `dest` is UChar* (16-bit), so that won't work.
Andreas Kling
Comment 8 2010-02-23 04:18:55 PST
Created attachment 49275 [details] Same patch, templates instead of ifdefs
Darin Adler
Comment 9 2010-02-23 07:39:05 PST
Comment on attachment 49275 [details] Same patch, templates instead of ifdefs > + * platform/text/TextCodecLatin1.cpp: > + (WebCore::): > + (WebCore::TextCodecLatin1::decode): You should delete partial lines like that "WebCore::" one there. They're created due to a bug in the script. > +template<> struct NonASCIIMask<8> { > + static uintptr_t value() { return 0x8080808080808080UL; } > +}; I was concerned that compiling this might yield warnings on 32-bit platforms. That's why I used the type "unsigned long long" in my suggested code. I guess we're OK because the "UL" suffix makes it an unsigned long, not unsigned long long. I meant to use "ULL". Please investigate writing full words as well and applying this same optimization to the increasingly-often-used UTF-8 decoder as well.
Darin Adler
Comment 10 2010-02-23 07:39:28 PST
(In reply to comment #7) > `dest` is UChar* (16-bit), so that won't work. Oops, I see.
WebKit Commit Bot
Comment 11 2010-02-23 08:16:03 PST
Comment on attachment 49275 [details] Same patch, templates instead of ifdefs Rejecting patch 49275 from commit-queue. Failed to run "['WebKitTools/Scripts/build-webkit', '--debug']" exit_code: 1 Last 500 characters of output: bit value into a 32-bit value distcc[35647] ERROR: compile /Users/eseidel/Projects/CommitQueue/WebCore/platform/text/TextCodecLatin1.cpp on localhost failed ** BUILD FAILED ** The following build commands failed: WebCore: Distributed-CompileC /Users/eseidel/Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/i386/TextCodecLatin1.o /Users/eseidel/Projects/CommitQueue/WebCore/platform/text/TextCodecLatin1.cpp normal i386 c++ com.apple.compilers.gcc.4_2 (1 failure) Full output: http://webkit-commit-queue.appspot.com/results/300612
Darin Adler
Comment 12 2010-02-23 10:17:14 PST
Comment on attachment 49275 [details] Same patch, templates instead of ifdefs > +template<> struct NonASCIIMask<8> { > + static uintptr_t value() { return 0x8080808080808080UL; } > +}; As I feared, this does not compile in 32-bit Mac. I believe the right way to do this is this: static unsigned long long value() { return 0x8080808080808080ULL; } Should work on all platforms.
Andreas Kling
Comment 13 2010-02-24 01:59:04 PST
Created attachment 49364 [details] Same patch with 32-bit compile fix
Andreas Kling
Comment 14 2010-02-24 02:01:23 PST
(In reply to comment #9) > Please investigate writing full words as well and applying this same > optimization to the increasingly-often-used UTF-8 decoder as well. Sounds like a good idea, but what code are you talking about? AFAICT the UTF-8 decoding is done by each port's own backend.
Darin Adler
Comment 15 2010-02-24 08:12:49 PST
(In reply to comment #14) > Sounds like a good idea, but what code are you talking about? AFAICT the UTF-8 > decoding is done by each port's own backend. We had discussed using the UTF8 conversion code from wtf/unicode/UTF8.cpp to create a TextCodecUTF8 that could be more efficient than the platform-specific encoders, but I guess we never did it!
WebKit Commit Bot
Comment 16 2010-02-24 20:03:21 PST
Comment on attachment 49364 [details] Same patch with 32-bit compile fix Rejecting patch 49364 from commit-queue. Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--quiet']" exit_code: 1 Running build-dumprendertree Compiling Java tests make: Nothing to be done for `default'. Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests Testing 12223 test cases. fast/table/007.html -> failed Exiting early after 1 failures. 8371 tests run. 136.90s total testing time 8370 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 5 test cases (<1%) had stderr output Full output: http://webkit-commit-queue.appspot.com/results/309294
Andreas Kling
Comment 17 2010-02-25 08:04:32 PST
Created attachment 49487 [details] Patch with bugfix for test failure
WebKit Review Bot
Comment 18 2010-02-25 08:08:29 PST
Attachment 49487 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/platform/text/TextCodecLatin1.cpp:171: use_lookup_table is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 1 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 19 2010-02-25 08:17:20 PST
Comment on attachment 49487 [details] Patch with bugfix for test failure The style bot is correct. Instead of use_lookup_table the label should be useLookupTable.
Darin Adler
Comment 20 2010-02-25 08:17:43 PST
I’m glad the test case caught the logic mistake!
Andreas Kling
Comment 21 2010-02-25 08:38:19 PST
Created attachment 49492 [details] Same patch again, style fixed
WebKit Commit Bot
Comment 22 2010-02-25 10:13:52 PST
Comment on attachment 49492 [details] Same patch again, style fixed Clearing flags on attachment: 49492 Committed r55242: <http://trac.webkit.org/changeset/55242>
WebKit Commit Bot
Comment 23 2010-02-25 10:13:58 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.