Bug 93742

Summary: HTML Parser should produce 8bit substrings for inline style and script elements
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: DOMAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aestes, benjamin, dglazkov, eric, gustavo, jturcotte, menard, noam, ossy, philn, tkent, tobias.netzel, vestbo, webkit.review.bot, xan.lopez
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 94295    
Bug Blocks:    
Attachments:
Description Flags
Patch for review
gyuyoung.kim: commit-queue-
Updated patch without default template parameter
abarth: review-
Performance Comparison of Patch including char loop vs Asm loop
none
Updated Patch using OS(DARWIN) and intrinsics
benjamin: review-, buildbot: commit-queue-
Trivial bench
none
Patch updated with suggested changes including using PACKUSWB
benjamin: review+, webkit.review.bot: commit-queue-
Patch with Reviewer's Changes and Speculative Chromium Linux Build Fix none

Michael Saboff
Reported Friday, August 10, 2012 10:12:54 PM UTC
The HTML parser produces 16 bit strings for all elements. The JavaScript parser can accept 8-bit strings and work on the CSS parser is under way. This change should not create 8-bit strings for actual text as it would slow down the render code which expects 16-bit strings.
Attachments
Patch for review (13.87 KB, patch)
2012-08-10 16:03 PDT, Michael Saboff
gyuyoung.kim: commit-queue-
Updated patch without default template parameter (14.30 KB, patch)
2012-08-10 17:12 PDT, Michael Saboff
abarth: review-
Performance Comparison of Patch including char loop vs Asm loop (22.53 KB, application/pdf)
2012-08-13 13:58 PDT, Michael Saboff
no flags
Updated Patch using OS(DARWIN) and intrinsics (14.56 KB, patch)
2012-08-15 11:13 PDT, Michael Saboff
benjamin: review-
buildbot: commit-queue-
Trivial bench (4.87 KB, text/plain)
2012-08-15 23:49 PDT, Benjamin Poulain
no flags
Patch updated with suggested changes including using PACKUSWB (14.29 KB, patch)
2012-08-16 11:50 PDT, Michael Saboff
benjamin: review+
webkit.review.bot: commit-queue-
Patch with Reviewer's Changes and Speculative Chromium Linux Build Fix (14.53 KB, patch)
2012-08-16 14:24 PDT, Michael Saboff
no flags
Michael Saboff
Comment 1 Friday, August 10, 2012 10:13:24 PM UTC
Michael Saboff
Comment 2 Saturday, August 11, 2012 12:03:20 AM UTC
Created attachment 157827 [details] Patch for review
Gyuyoung Kim
Comment 3 Saturday, August 11, 2012 12:39:00 AM UTC
Comment on attachment 157827 [details] Patch for review Attachment 157827 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13477347
Build Bot
Comment 4 Saturday, August 11, 2012 1:01:13 AM UTC
Comment on attachment 157827 [details] Patch for review Attachment 157827 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13475383
Michael Saboff
Comment 5 Saturday, August 11, 2012 1:12:50 AM UTC
Created attachment 157840 [details] Updated patch without default template parameter
Eric Seidel (no email)
Comment 6 Saturday, August 11, 2012 10:38:11 AM UTC
Do we have any perf numbers for this work?
Adam Barth
Comment 7 Saturday, August 11, 2012 10:48:26 PM UTC
Comment on attachment 157840 [details] Updated patch without default template parameter View in context: https://bugs.webkit.org/attachment.cgi?id=157840&action=review > Source/WTF/wtf/text/ASCIIFastPath.h:106 > +#if CPU(X86_64) && COMPILER(GCC) && PLATFORM(MAC) Do you mean OS(DARWIN)?
Michael Saboff
Comment 8 Sunday, August 12, 2012 7:17:03 PM UTC
(In reply to comment #6) > Do we have any perf numbers for this work? Here they are as measured across three runs. Basically the CSS tests are slightly slower (we make 8 bit strings and then up convert to 16 bit before parsing). Most of the HTML tests improve, except tiny-inner. The work is a prerequisite for the CSS parsing work I'm doing and I expect the combination of this change and the 8 bit CSS parser work will be a progression. Baseline With 8 bit r124894 inline elems css-parser-yui 367.40 353.73 runs/s css-parser-yui 365.62 364.73 runs/s css-parser-yui 366.07 366.65 runs/s Mean 366.36 361.70 -1.27% html-parser 1862.65 1830.80 ms html-parser 1847.40 1816.15 ms html-parser 1848.10 1828.20 ms Mean 1852.72 1825.05 1.49% html5-full-render 21604.50 21405.50 ms html5-full-render 21702.50 21459.50 ms html5-full-render 21532.50 21286.50 ms Mean 21613.17 21383.83 1.06% innerHTML-setter 300.55 296.74 runs/s innerHTML-setter 289.15 303.01 runs/s innerHTML-setter 301.40 297.82 runs/s Mean 297.03 299.19 0.73% query-selector-deep 389.34 385.73 runs/s query-selector-deep 397.66 395.53 runs/s query-selector-deep 397.74 396.02 runs/s Mean 394.92 392.42 -0.63% query-selector-first 2084.10 2015.10 runs/s query-selector-first 2092.48 2036.31 runs/s query-selector-first 2045.50 2006.66 runs/s Mean 2074.03 2019.35 -2.64% query-selector-last 385.78 385.75 runs/s query-selector-last 386.03 379.85 runs/s query-selector-last 385.45 385.52 runs/s Mean 385.75 383.71 -0.53% simple-url 49.85 48.64 runs/s simple-url 47.66 48.65 runs/s simple-url 47.40 48.42 runs/s Mean 48.30 48.57 0.55% textarea-parsing 56.77 55.45 runs/s textarea-parsing 56.77 56.05 runs/s textarea-parsing 55.68 56.06 runs/s Mean 56.40 55.85 -0.98% tiny-innerHTML 5.53 5.29 runs/s tiny-innerHTML 5.53 5.17 runs/s tiny-innerHTML 5.40 4.99 runs/s Mean 5.48 5.15 -6.11% url-parser 111.89 112.25 runs/s url-parser 112.15 113.04 runs/s url-parser 110.36 111.82 runs/s Mean 111.47 112.37 0.81% xml-parser 7.55 7.59 runs/s xml-parser 7.62 7.60 runs/s xml-parser 7.48 7.46 runs/s Mean 7.55 7.55 0.04%
Michael Saboff
Comment 9 Sunday, August 12, 2012 7:21:39 PM UTC
(In reply to comment #7) > (From update of attachment 157840 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=157840&action=review > > > Source/WTF/wtf/text/ASCIIFastPath.h:106 > > +#if CPU(X86_64) && COMPILER(GCC) && PLATFORM(MAC) > > Do you mean OS(DARWIN)? Actually, if someone on the chrome side could test this on Linux, the last term could be changed to (PLATFORM(MAC) || OS(LINUX)).
Eric Seidel (no email)
Comment 10 Sunday, August 12, 2012 7:32:44 PM UTC
Do we understand what the perf regressions come from? Presumably from converting 8 bit strings back to 16 bit. Just curious if you've looked at the sample for the tests which got slower (in case avoiding such is an easy fix).
Eric Seidel (no email)
Comment 11 Sunday, August 12, 2012 7:35:30 PM UTC
(In reply to comment #9) > (In reply to comment #7) > > (From update of attachment 157840 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=157840&action=review > > > > > Source/WTF/wtf/text/ASCIIFastPath.h:106 > > > +#if CPU(X86_64) && COMPILER(GCC) && PLATFORM(MAC) > > > > Do you mean OS(DARWIN)? > > Actually, if someone on the chrome side could test this on Linux, the last term could be changed to (PLATFORM(MAC) || OS(LINUX)). My understanding is that PLATFORM(MAC) means only the Apple Mac port. If you wan to include Chromium Mac, and Qt on Mac, Gtk on Mac, etc. you'll need to use OS(DARWIN). But perhaps you intend only the Apple mac port?
Benjamin Poulain
Comment 12 Sunday, August 12, 2012 11:45:21 PM UTC
I can look at this tomorrow if you want. > Actually, if someone on the chrome side could test this on Linux, the last term could be changed to (PLATFORM(MAC) || OS(LINUX)). I think it shouldn't be changed. SSSE3 is not a mandatory extension of x86_64 (only SSE and SSE2 are). I think this patch is safe for Mac because the supported hardware start with the Intel Core Architecture. On Linux, WebKit runs on plenty of CPUs without SSSE3. To support this in a cross platform way, one should use the CPU extensions detection at runtime.
Adam Barth
Comment 13 Monday, August 13, 2012 6:18:05 PM UTC
> I think it shouldn't be changed. It sounds like it should be OS(DARWIN) rather than PLAFORM(MAC). PLATFORM(MAC) is just the apple-mac port, and excludes Qt, Chromium, and other ports that run on Mac OS X. Can you also post benchmark numbers for how the non-ASM code path performs? If the ASM is only a small win, it might not be worth the maintenance burden. If the non-ASM path is a regression for Windows and Linux ports of WebKit, that's also a concern.
Adam Barth
Comment 14 Monday, August 13, 2012 6:18:29 PM UTC
Comment on attachment 157840 [details] Updated patch without default template parameter r- for incorrect ifdef.
Michael Saboff
Comment 15 Monday, August 13, 2012 7:58:29 PM UTC
(In reply to comment #10) > Do we understand what the perf regressions come from? Presumably from converting 8 bit strings back to 16 bit. Just curious if you've looked at the sample for the tests which got slower (in case avoiding such is an easy fix). From using instruments, 1.6% of css-parser-yui is spent up converting the string back to 16 bits. The rest of the tests with regressions don't not show such an obvious source of the slowdown. The most curious slowdown is tiny-innerHTML.
Benjamin Poulain
Comment 16 Monday, August 13, 2012 8:33:57 PM UTC
(In reply to comment #13) > It sounds like it should be OS(DARWIN) rather than PLAFORM(MAC). PLATFORM(MAC) is just the apple-mac port, and excludes Qt, Chromium, and other ports that run on Mac OS X. I completely disagree. Read my previous comment. You cannot blindly assume SSSE3 is available on x86_64.
Adam Barth
Comment 17 Monday, August 13, 2012 8:42:14 PM UTC
I think there's some miscommunication. Let's be concrete. 1) Can you describe a system that runs chromium-mac or qt-mac port that doesn't have SSSE3? 2) Can you describe a system that uses WebKit with OS(DARWIN) that doesn't have SSSE3?
Benjamin Poulain
Comment 18 Monday, August 13, 2012 8:58:54 PM UTC
> 1) Can you describe a system that runs chromium-mac or qt-mac port that doesn't have SSSE3? All the AMD (and most of the Atom) hackintosh. I think it is ok for OS(MAC) to not support that, but pushing this on all ports seems unfair.
Adam Barth
Comment 19 Monday, August 13, 2012 9:25:04 PM UTC
(In reply to comment #18) > > 1) Can you describe a system that runs chromium-mac or qt-mac port that doesn't have SSSE3? > > All the AMD (and most of the Atom) hackintosh. I don't think chromium-mac has any desire to run on non-Apple Mac OS X hardware. We can ask the qt-mac folks for there opinion on that topic. > I think it is ok for OS(MAC) to not support that, but pushing this on all ports seems unfair. I wasn't aware that there was an OS(MAC). Do you mean PLATFORM(MAC)?
Michael Saboff
Comment 20 Monday, August 13, 2012 9:58:49 PM UTC
Created attachment 158097 [details] Performance Comparison of Patch including char loop vs Asm loop The attached data shows that the character at a time loop version is a progression for the HTML tests except tiny-innrHTML. The inline asm version improves on most tests. For example html-parser's performance goes from +.75% to +1.5% with the inline asm.
Adam Barth
Comment 21 Monday, August 13, 2012 10:01:49 PM UTC
Cool. Thanks.
Michael Saboff
Comment 22 Tuesday, August 14, 2012 10:42:33 PM UTC
(In reply to comment #19) > (In reply to comment #18) > > > 1) Can you describe a system that runs chromium-mac or qt-mac port that doesn't have SSSE3? > > > > All the AMD (and most of the Atom) hackintosh. > > I don't think chromium-mac has any desire to run on non-Apple Mac OS X hardware. We can ask the qt-mac folks for there opinion on that topic. > > > I think it is ok for OS(MAC) to not support that, but pushing this on all ports seems unfair. > > I wasn't aware that there was an OS(MAC). Do you mean PLATFORM(MAC)? I think the choice is between OS(DARWIN) and PLATFORM(MAC) (with the CPU(X86_64) && COMPILER(GCC) qualifiers). Who do we need to contact in qt land to rule out OS(DARWIN)? What other holdups are there for this patch?
Benjamin Poulain
Comment 23 Tuesday, August 14, 2012 10:47:22 PM UTC
> I think the choice is between OS(DARWIN) and PLATFORM(MAC) (with the CPU(X86_64) && COMPILER(GCC) qualifiers). Who do we need to contact in qt land to rule out OS(DARWIN)? > > What other holdups are there for this patch? Actually... I would much prefer intrinsics over inline assembly. I would also remove CPU(X86_64) && COMPILER(GCC) if you use intrinsics.
Eric Seidel (no email)
Comment 24 Tuesday, August 14, 2012 10:49:22 PM UTC
I would not hold up this patch based on theoretical concerns for QtMac. :) Hackintosh is not a supported platform of the WebKit project. We can safely assume that OS(DARWIN) means "Apple hardware" IMO.
Csaba Osztrogonác
Comment 25 Wednesday, August 15, 2012 9:38:39 AM UTC
I cc-ed Qt Mac guys. The question is simple: Do we have any x86_64 hardware without SSE3 support? ( I don't think so, but everything is possible. :) )
Michael Saboff
Comment 26 Wednesday, August 15, 2012 7:13:23 PM UTC
Created attachment 158598 [details] Updated Patch using OS(DARWIN) and intrinsics The performance of this version appears to be identical to the prior inline asm version. Also made a minor alignment fix, relaxing the alignment that can use the SSE loop to 16 byte, instead of 16 word.
Build Bot
Comment 27 Wednesday, August 15, 2012 7:49:04 PM UTC
Comment on attachment 158598 [details] Updated Patch using OS(DARWIN) and intrinsics Attachment 158598 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13501729
Benjamin Poulain
Comment 28 Wednesday, August 15, 2012 9:55:05 PM UTC
Comment on attachment 158598 [details] Updated Patch using OS(DARWIN) and intrinsics View in context: https://bugs.webkit.org/attachment.cgi?id=158598&action=review Quick comments, I'll review later. > Source/WTF/wtf/text/ASCIIFastPath.h:25 > +#if OS(DARWIN) && CPU(X86_64) || CPU(X86) Or defined(__SSSE3__) > Source/WTF/wtf/text/ASCIIFastPath.h:106 > +template <uintptr_t mask> > +inline bool isAlignedTo(const void* pointer) > +{ > + return !(reinterpret_cast<uintptr_t>(pointer) & mask); > +} Maybe move this to wtf/Alignment.h? > Source/WTF/wtf/text/ASCIIFastPath.h:110 > +#if OS(DARWIN) Ditto for the #define. > Source/WTF/wtf/text/ASCIIFastPath.h:121 > + const UChar* alignedEnd = reinterpret_cast<const UChar*>((reinterpret_cast<const uintptr_t>(source + length))&~sourceLoadMask); Any way to get rid of the reinterpret_cast? Coding style: spaces around the "&" operator. > Source/WTF/wtf/text/ASCIIFastPath.h:126 > + for (unsigned i = 0; i < length;) { > + if (isAlignedTo<memoryAccessMask>(&source[i])) { This is a little unusually construction. Usually, we have a prefix aligning the input, the simd code, the a suffix finishing the input. Nothing wrong with this code, just for info. > Source/WTF/wtf/text/ASCIIFastPath.h:130 > + ASSERT(!(source[i+checkIndex]&0xff00)); Coding style: space for the "&" operator. > Source/WTF/wtf/text/ASCIIFastPath.h:146 > + ASSERT(!(source[i]&0xff00)); !(source[i] & 0xff00) > Source/WTF/wtf/text/ASCIIFastPath.h:152 > + ASSERT(!(*source&0xff00)); Space.
Benjamin Poulain
Comment 29 Wednesday, August 15, 2012 11:34:35 PM UTC
Comment on attachment 158598 [details] Updated Patch using OS(DARWIN) and intrinsics View in context: https://bugs.webkit.org/attachment.cgi?id=158598&action=review > Source/WTF/wtf/text/ASCIIFastPath.h:114 > + // See Intel documentation for how the shuffle control mask works. I don't think that comment adds much :) > Source/WTF/wtf/text/ASCIIFastPath.h:120 > + const __m128i xmmShuf1 = {0x0e0c0a0806040200ll, 0x8080808080808080ll}; > + const __m128i xmmShuf2 = {0x8080808080808080ll, 0x0e0c0a0806040200ll}; I think mm_setr_pi16 would be clearer for the masks. > Source/WTF/wtf/text/ASCIIFastPath.h:137 > + __m128i first8LChars = _mm_shuffle_epi8(first8UChars, xmmShuf1); > + __m128i second8LChars = _mm_shuffle_epi8(second8UChars, xmmShuf2); > + __m128i all16LChars = _mm_or_si128(first8LChars, second8LChars); Could mm_packus_epi16 do the trick? I clear the flag while you try regular packing.
Benjamin Poulain
Comment 30 Wednesday, August 15, 2012 11:58:54 PM UTC
Comment on attachment 158598 [details] Updated Patch using OS(DARWIN) and intrinsics View in context: https://bugs.webkit.org/attachment.cgi?id=158598&action=review > Source/WTF/wtf/text/WTFString.cpp:775 > + if (!source || !length) !length should be enough. > Source/WTF/wtf/text/WTFString.cpp:779 > + if (length > numeric_limits<unsigned>::max()) > + CRASH(); StringImpl does that for you, you should not have to do it.
Benjamin Poulain
Comment 31 Thursday, August 16, 2012 7:45:50 AM UTC
Comment on attachment 158598 [details] Updated Patch using OS(DARWIN) and intrinsics View in context: https://bugs.webkit.org/attachment.cgi?id=158598&action=review >> Source/WTF/wtf/text/ASCIIFastPath.h:121 >> + const UChar* alignedEnd = reinterpret_cast<const UChar*>((reinterpret_cast<const uintptr_t>(source + length))&~sourceLoadMask); > > Any way to get rid of the reinterpret_cast? > Coding style: spaces around the "&" operator. If length so small that you do not pass the next 16bytes alignment, alignedEnd is invalid (< source).
Benjamin Poulain
Comment 32 Thursday, August 16, 2012 7:49:34 AM UTC
Created attachment 158723 [details] Trivial bench Here is a trivial benchmark of the 3 implementations. On my Mac Pro, the PACKUSWB gives me better results than PSHUFB. Both are a lot faster than the trivial implementation.
Michael Saboff
Comment 33 Thursday, August 16, 2012 7:50:01 PM UTC
Created attachment 158867 [details] Patch updated with suggested changes including using PACKUSWB
Benjamin Poulain
Comment 34 Thursday, August 16, 2012 8:40:20 PM UTC
Comment on attachment 158867 [details] Patch updated with suggested changes including using PACKUSWB View in context: https://bugs.webkit.org/attachment.cgi?id=158867&action=review I have a few comments but everything seems good to me. I wonder if this patch will also help with the cascading StringImp::getData16SlowCase() we get later in the execution. > Source/WTF/wtf/Alignment.h:60 > - > + Whilespaces. > Source/WTF/wtf/text/ASCIIFastPath.h:25 > +#if OS(DARWIN) Because of iOS, I think you also need "&& (CPU(X86) || CPU(X86_64))". > Source/WTF/wtf/text/ASCIIFastPath.h:104 > +#if OS(DARWIN) Ditto for iOS. > Source/WTF/wtf/text/ASCIIFastPath.h:108 > + unsigned i = 0; This should probably be size_t since length is also size_t. > Source/WTF/wtf/text/ASCIIFastPath.h:131 > + for (; i < length; ++i) { > + ASSERT(!(source[i] & 0xff00)); > + destination[i] = source[i]; > + } If you want this assertion, you should also have it in the prefix code. Maybe you should just add an ALWAYS_INLINE function doing the loop, and use it in the prefix, postfix and !OS(darwin) version. > Source/WTF/wtf/text/WTFString.h:104 > +template<bool isSpecialCharacter(UChar), typename CharType> > +bool isAllSpecialCharacters(const CharType*, size_t); More and more we tend to use CharacterType instead of CharType. > Source/WTF/wtf/text/WTFString.h:573 > -template<bool isSpecialCharacter(UChar)> inline bool isAllSpecialCharacters(const UChar* characters, size_t length) > +template<bool isSpecialCharacter(UChar), typename CharType> > +inline bool isAllSpecialCharacters(const CharType* characters, size_t length) CharType -> CharacterType would be nice. > Source/WebCore/ChangeLog:13 > + Currently all data associated with a token is stored and processed as UChars. > + Added code to determine that the contents of token data is all 8 bit by keeping > + the logical OR value of all prior characters. Also added a flag that the parser > + can set to indicate when the token data is converted to a String that we want > + to make an 8 bit string if possible. Enabled this handling for script, style, > + iframe, noembed, noframes, noscript and xmp tags. Double space after each period. > Source/WebCore/xml/parser/MarkupTokenBase.h:307 > + void setConvertTo8Bit() setConvertTo8Bit() -> setConvertTo8BitIfPossible()? The method name setConvertTo8Bit() seems to imply the token is always gonna be converted to 8bits.
WebKit Review Bot
Comment 35 Thursday, August 16, 2012 8:54:42 PM UTC
Comment on attachment 158867 [details] Patch updated with suggested changes including using PACKUSWB Attachment 158867 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13508724
Michael Saboff
Comment 36 Thursday, August 16, 2012 10:24:10 PM UTC
Created attachment 158903 [details] Patch with Reviewer's Changes and Speculative Chromium Linux Build Fix
Michael Saboff
Comment 37 Friday, August 17, 2012 3:54:02 AM UTC
Eric Seidel (no email)
Comment 38 Friday, August 17, 2012 3:59:20 AM UTC
Yay!
Kent Tamura
Comment 39 Friday, August 17, 2012 6:26:12 AM UTC
(In reply to comment #37) > Committed r125846: <http://trac.webkit.org/changeset/125846> It broke Chromium-mac build, and I landed a fix: http://trac.webkit.org/changeset/125855
WebKit Review Bot
Comment 40 Friday, August 17, 2012 6:27:41 AM UTC
Re-opened since this is blocked by 94295
Michael Saboff
Comment 41 Friday, August 17, 2012 4:52:24 PM UTC
(In reply to comment #39) > (In reply to comment #37) > > Committed r125846: <http://trac.webkit.org/changeset/125846> > > It broke Chromium-mac build, and I landed a fix: http://trac.webkit.org/changeset/125855 Thanks for fixing this. Looks like I left two includes in WTFString.cpp from when I was debugging.
Tobias Netzel
Comment 42 Friday, August 17, 2012 9:13:35 PM UTC
Mozilla also has SIMD code for string conversions, for example this one: https://hg.mozilla.org/mozilla-central/file/aecd4db48e8e/xpcom/string/src/nsUTF8UtilsSSE2.cpp They do the same thing using the same intrinsics but seem to have found out that unrolling the loop to process 32 UChars in one loop iteration improves performance. I know that it does speed up things on PowerPC (TenFourFox has an Altivec version of that code). Other parts where mozilla uses SIMD (SSE in this case) code for string processing: https://hg.mozilla.org/mozilla-central/file/aecd4db48e8e/content/base/src/nsTextFragmentSSE2.cpp : Find out if there are any 16 bit characters in a UChar buffer. https://hg.mozilla.org/mozilla-central/file/aecd4db48e8e/intl/uconv/src/nsUTF8ToUnicodeSSE2.cpp : Copy from an LChar buffer to a UChar buffer (somewhat different to the one in nsUTF8UtilsSSE2) Maybe that can inspire some more SIMD code in WebKit?
Michael Saboff
Comment 43 Friday, August 17, 2012 9:33:32 PM UTC
(In reply to comment #42) > Mozilla also has SIMD code for string conversions, for example this one: > https://hg.mozilla.org/mozilla-central/file/aecd4db48e8e/xpcom/string/src/nsUTF8UtilsSSE2.cpp > > They do the same thing using the same intrinsics but seem to have found out that unrolling the loop to process 32 UChars in one loop iteration improves performance. I know that it does speed up things on PowerPC (TenFourFox has an Altivec version of that code). > > Other parts where mozilla uses SIMD (SSE in this case) code for string processing: > https://hg.mozilla.org/mozilla-central/file/aecd4db48e8e/content/base/src/nsTextFragmentSSE2.cpp : Find out if there are any 16 bit characters in a UChar buffer. > https://hg.mozilla.org/mozilla-central/file/aecd4db48e8e/intl/uconv/src/nsUTF8ToUnicodeSSE2.cpp : Copy from an LChar buffer to a UChar buffer (somewhat different to the one in nsUTF8UtilsSSE2) > > Maybe that can inspire some more SIMD code in WebKit? Tobias, thanks for the info. After working on this patch, I'm already thinking of how we can use SIMD more.
Note You need to log in before you can comment on or make changes to this bug.