95207 – CSS Parser should directly parse 8 bit source strings

RESOLVED FIXED 95207

CSS Parser should directly parse 8 bit source strings

https://bugs.webkit.org/show_bug.cgi?id=95207

Summary CSS Parser should directly parse 8 bit source strings

Michael Saboff

Reported 2012-08-28 08:39:44 PDT

The CSS parser currently uses the characters() method to access source string data. It should be modified to use characters8() for 8 bit strings.

Attachments
Patch (90.41 KB, patch) 2012-08-28 12:37 PDT, Michael Saboff	no flags	Details Formatted Diff Diff
Performance results of patch (22.05 KB, application/pdf) 2012-08-28 12:41 PDT, Michael Saboff	no flags	Details
Patch with style nit fixed (90.40 KB, patch) 2012-08-28 12:44 PDT, Michael Saboff	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
Archive of layout-test-results from gce-cr-linux-01 (563.97 KB, application/zip) 2012-08-28 14:29 PDT, WebKit Review Bot	no flags	Details
Patch with suggested changes and speculative cr-linux test fix (90.33 KB, patch) 2012-08-28 18:54 PDT, Michael Saboff	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
Patch with unsigned length compare fix plus other updates (103.46 KB, patch) 2012-08-29 18:30 PDT, Michael Saboff	no flags	Details Formatted Diff Diff
Prior patch with style nit fixed (103.63 KB, patch) 2012-08-29 18:59 PDT, Michael Saboff	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
Performance numbers of last patch (22.33 KB, application/pdf) 2012-08-30 10:00 PDT, Michael Saboff	no flags	Details
Patch with test changed to a reference test (103.12 KB, patch) 2012-08-30 14:11 PDT, Michael Saboff	ggaren: review+	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Michael Saboff

Comment 1 2012-08-28 12:37:47 PDT

Created attachment 161035 [details] Patch

WebKit Review Bot

Comment 2 2012-08-28 12:41:07 PDT

Attachment 161035 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1 Source/WebCore/css/CSSParser.cpp:8884: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 15 files If any of these errors are false positives, please file a bug against check-webkit-style.

Michael Saboff

Comment 3 2012-08-28 12:41:29 PDT

Created attachment 161038 [details] Performance results of patch Improves performance of css-parser-yui as expected. Most other tests are neutral. Have looked into regressions in simple-url, tiny-innerHTML, url-parser and xml-parser, but don't have any leads yet. Will continue investigating why this patch made a change.

Michael Saboff

Comment 4 2012-08-28 12:44:01 PDT

Created attachment 161039 [details] Patch with style nit fixed

WebKit Review Bot

Comment 5 2012-08-28 14:29:43 PDT

Comment on attachment 161039 [details] Patch with style nit fixed Attachment 161039 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13657467 New failing tests: animations/suspend-resume-animation-events.html fast/css/unicode-escape-in-8bit-string.html

WebKit Review Bot

Comment 6 2012-08-28 14:29:46 PDT

Created attachment 161061 [details] Archive of layout-test-results from gce-cr-linux-01 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-01 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid

Benjamin Poulain

Comment 7 2012-08-28 15:47:38 PDT

Comment on attachment 161039 [details] Patch with style nit fixed View in context: https://bugs.webkit.org/attachment.cgi?id=161039&action=review I had a first look, I have early comments about some details. Can you please measure how much does the code grow with the patch? > Source/WebCore/css/CSSParser.cpp:168 > +template <typename CharacterType> > +ALWAYS_INLINE static bool equal(const CharacterType* a, const char* b, int length) > { > - for (int i = 0; i < a.length; ++i) { > + for (int i = 0; i < length; ++i) { > if (!b[i]) > return false; > - if (a.characters[i] != b[i]) > + if (a[i] != b[i]) > return false; > } > - return !b[a.length]; > + return !b[length]; > } We have optimized versions of this in StringImpl.h, you should call them directly. > Source/WebCore/css/CSSParser.cpp:186 > +template <typename CharacterType> > +ALWAYS_INLINE static bool equalIgnoringCase(const CharacterType* a, const char* b, int length) > { > - for (int i = 0; i < a.length; ++i) { > + for (int i = 0; i < length; ++i) { > if (!b[i]) > return false; > ASSERT(!isASCIIUpper(b[i])); > - if (toASCIILower(a.characters[i]) != b[i]) > + if (toASCIILower(a[i]) != b[i]) > return false; > } > - return !b[a.length]; > + return !b[length]; > +} We have optimized versions of this in StringImpl.h, you should call them directly. > Source/WebCore/css/CSSParser.cpp:335 > + int stringLength = string.length(); > + int length = stringLength + m_parsedTextPrefixLength + strlen(suffix) + 1; String length are typically unsigned. > Source/WebCore/css/CSSParser.cpp:340 > + for (unsigned i = 0; i < m_parsedTextPrefixLength; i++) > + m_dataStart8[i] = prefix[i]; memcpy? > Source/WebCore/css/CSSParser.cpp:348 > + for (unsigned i = start; i < end; i++) > + m_dataStart8[i] = suffix[i - start]; memcpy? Or just a while loop incrementing the pointers. I think the "i - start" is a little less readable. > Source/WebCore/css/CSSParser.cpp:350 > + m_dataStart8[length - 1] = 0; If you do: unsigned length = stringLength + m_parsedTextPrefixLength + strlen(suffix) + 1; ... m_dataStart8 = adoptArrayPtr(new LChar[length + 1]); ... This becomes: m_dataStart8[length] = 0; I personally prefer this way of handling zero termination, but that is up to you. > Source/WebCore/css/CSSParser.cpp:3122 > - AtomicString variableName = String(name.characters + 12, name.length - 12); > + AtomicString variableName = name.is8Bit() ? String(name.characters8() + 12, name.length() - 12) : String(name.characters16() + 12, name.length() - 12); Creating an AtomicString from a String can be inefficient. You should use the AtomicString constructor here instead. > Source/WebCore/css/CSSParser.cpp:8871 > + UnicodeToChars(result, unicode); } Lost bracket :) > Source/WebCore/css/CSSParser.h:489 > + LChar* m_currentCharacter8; > + UChar* m_currentCharacter16; Should this be an union? Are they mutually exclusive?

Michael Saboff

Comment 8 2012-08-28 18:54:55 PDT

Created attachment 161115 [details] Patch with suggested changes and speculative cr-linux test fix (In reply to comment #7) > (From update of attachment 161039 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=161039&action=review > > I had a first look, I have early comments about some details. > Can you please measure how much does the code grow with the patch? Comparing the size output from before and after X86-64 release builds of Mac WebCore.framework: __TEXT __DATA __OBJC others dec hex before- 17391616 1871872 0 18460672 37724160 23fa000 after- 17408000 1871872 0 18399232 37679104 23ef000 delta- 16384 0 0 This growth is less than I would have thought! > > Source/WebCore/css/CSSParser.cpp:168 > > +template <typename CharacterType> > > +ALWAYS_INLINE static bool equal(const CharacterType* a, const char* b, int length) > > { > > - for (int i = 0; i < a.length; ++i) { > > + for (int i = 0; i < length; ++i) { > > if (!b[i]) > > return false; > > - if (a.characters[i] != b[i]) > > + if (a[i] != b[i]) > > return false; > > } > > - return !b[a.length]; > > + return !b[length]; > > } > > We have optimized versions of this in StringImpl.h, you should call them directly. Done. > > Source/WebCore/css/CSSParser.cpp:186 > > +template <typename CharacterType> > > +ALWAYS_INLINE static bool equalIgnoringCase(const CharacterType* a, const char* b, int length) > > { > > - for (int i = 0; i < a.length; ++i) { > > + for (int i = 0; i < length; ++i) { > > if (!b[i]) > > return false; > > ASSERT(!isASCIIUpper(b[i])); > > - if (toASCIILower(a.characters[i]) != b[i]) > > + if (toASCIILower(a[i]) != b[i]) > > return false; > > } > > - return !b[a.length]; > > + return !b[length]; > > +} > > We have optimized versions of this in StringImpl.h, you should call them directly. Done. > > Source/WebCore/css/CSSParser.cpp:335 > > + int stringLength = string.length(); > > + int length = stringLength + m_parsedTextPrefixLength + strlen(suffix) + 1; > > String length are typically unsigned. Fixed this. > > Source/WebCore/css/CSSParser.cpp:340 > > + for (unsigned i = 0; i < m_parsedTextPrefixLength; i++) > > + m_dataStart8[i] = prefix[i]; > > memcpy? The prefixes and suffices are so small, 0 to a 33 bytes with most of them 0 that I think the call overhead would outweigh the benefit. > > Source/WebCore/css/CSSParser.cpp:348 > > + for (unsigned i = start; i < end; i++) > > + m_dataStart8[i] = suffix[i - start]; > > memcpy? > Or just a while loop incrementing the pointers. I think the "i - start" is a little less readable. Clang on X86 makes slightly better code using scaled index instructions instead of needing to increment a pointer (an ALU operation). I assume that the same is true for gcc. I can clean up the suffix index calculation, but it will probably look best with another local. > > Source/WebCore/css/CSSParser.cpp:350 > > + m_dataStart8[length - 1] = 0; > > If you do: > unsigned length = stringLength + m_parsedTextPrefixLength + strlen(suffix) + 1; > ... > m_dataStart8 = adoptArrayPtr(new LChar[length + 1]); > ... > This becomes: > m_dataStart8[length] = 0; > > I personally prefer this way of handling zero termination, but that is up to you. I want m_length to be the buffer length. It is used to allocate a 16 bit buffer when there is 8 bit source with a 16 bit escape. > > Source/WebCore/css/CSSParser.cpp:3122 > > - AtomicString variableName = String(name.characters + 12, name.length - 12); > > + AtomicString variableName = name.is8Bit() ? String(name.characters8() + 12, name.length() - 12) : String(name.characters16() + 12, name.length() - 12); > > Creating an AtomicString from a String can be inefficient. You should use the AtomicString constructor here instead. Done. > > Source/WebCore/css/CSSParser.cpp:8871 > > + UnicodeToChars(result, unicode); } > > Lost bracket :) Saw and fixed after original post. > > Source/WebCore/css/CSSParser.h:489 > > + LChar* m_currentCharacter8; > > + UChar* m_currentCharacter16; > > Should this be an union? Are they mutually exclusive? These are both needed. For the case that we have an 8 bit source and a Unicode escape that resolves to 16 bit, we use the 16 bit data buffer and m_currentCharacter16 to track where in that buffer.

Benjamin Poulain

Comment 9 2012-08-28 19:10:46 PDT

> Comparing the size output from before and after X86-64 release builds of Mac WebCore.framework: > __TEXT __DATA __OBJC others dec hex > before- 17391616 1871872 0 18460672 37724160 23fa000 > after- 17408000 1871872 0 18399232 37679104 23ef000 > delta- 16384 0 0 > This growth is less than I would have thought! I work my tail off to remove instructions in the order of 4kb :( But not 16kb is not too bad. :) > > > Source/WebCore/css/CSSParser.cpp:348 > > > + for (unsigned i = start; i < end; i++) > > > + m_dataStart8[i] = suffix[i - start]; > > > > memcpy? > > Or just a while loop incrementing the pointers. I think the "i - start" is a little less readable. > > Clang on X86 makes slightly better code using scaled index instructions instead of needing to increment a pointer (an ALU operation). I assume that the same is true for gcc. I can clean up the suffix index calculation, but it will probably look best with another local. Out of curiosity, does the compiler actually generate the "- start" in the loop? Or is it smart enough to just do (suffix + start) before the loop?

WebKit Review Bot

Comment 10 2012-08-28 20:50:25 PDT

Comment on attachment 161115 [details] Patch with suggested changes and speculative cr-linux test fix Attachment 161115 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13690034

Peter Beverloo (cr-android ews)

Comment 11 2012-08-28 21:02:15 PDT

Comment on attachment 161115 [details] Patch with suggested changes and speculative cr-linux test fix Attachment 161115 [details] did not pass cr-android-ews (chromium-android): Output: http://queues.webkit.org/results/13689025

Build Bot

Comment 12 2012-08-28 21:42:36 PDT

Comment on attachment 161115 [details] Patch with suggested changes and speculative cr-linux test fix Attachment 161115 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13686050

Michael Saboff

Comment 13 2012-08-29 18:30:21 PDT

Created attachment 161383 [details] Patch with unsigned length compare fix plus other updates Made some other template changes in isCSSTokenizer*, color parsing and double parsing code.

WebKit Review Bot

Comment 14 2012-08-29 18:32:25 PDT

Attachment 161383 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1 Source/WebCore/platform/graphics/Color.h:160: The parameter name "rgb" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 1 in 17 files If any of these errors are false positives, please file a bug against check-webkit-style.

Michael Saboff

Comment 15 2012-08-29 18:55:24 PDT

> Out of curiosity, does the compiler actually generate the "- start" in the loop? Or is it smart enough to just do (suffix + start) before the loop? It appears that it does. For this code, unsigned start = m_parsedTextPrefixLength + stringLength; unsigned end = start + strlen(suffix); for (unsigned i = start; i < end; i++) m_dataStart8[i] = suffix[i - start]; m_dataStart8[length - 1] = 0; clang (release build) generated: I believe on entry start is in ebx. end gets put into ecx at +240. During the loop, it looks like i ends up in rcx and r15 contains suffix. The compiler does some math to end up with a count down loop with strlen(suffix) in eax. Don't know why it didn't hoist up the loading of m_dataStart8 into rdx. 0x0102f892d9 <CSSParser::setUpParser+233>: callq 0x103b45590 <dyld_stub_strlen> 0x0102f892de <CSSParser::setUpParser+238>: mov %eax,%ecx 0x0102f892e0 <CSSParser::setUpParser+240>: add %ebx,%ecx 0x0102f892e2 <CSSParser::setUpParser+242>: cmp %ecx,%ebx 0x0102f892e4 <CSSParser::setUpParser+244>: jae 0x102f89317 <OwnArrayPtr> 0x0102f892e6 <CSSParser::setUpParser+246>: lea 0x1(%r12,%r13,1),%ecx 0x0102f892eb <CSSParser::setUpParser+251>: add %r13d,%r12d 0x0102f892ee <CSSParser::setUpParser+254>: lea 0x1(%rax,%r12,1),%eax 0x0102f892f3 <CSSParser::setUpParser+259>: sub %ecx,%eax 0x0102f892f5 <CSSParser::setUpParser+261>: mov %ebx,%ecx 0x0102f892f7 <CSSParser::setUpParser+263>: nopw 0x0(%rax,%rax,1) 0x0102f89300 <CSSParser::setUpParser+272>: mov 0xd0(%r14),%rdx 0x0102f89307 <CSSParser::setUpParser+279>: mov (%r15),%bl 0x0102f8930a <CSSParser::setUpParser+282>: mov %bl,(%rdx,%rcx,1) 0x0102f8930d <CSSParser::setUpParser+285>: inc %r15 0x0102f89310 <CSSParser::setUpParser+288>: inc %rcx 0x0102f89313 <CSSParser::setUpParser+291>: dec %eax 0x0102f89315 <CSSParser::setUpParser+293>: jne 0x102f89300 <CSSParser::setUpParser+272> 0x0102f89317 <_ZNK3WTF11OwnArrayPtrIhEixEl+0>: mov 0xd0(%r14),%rax 0x0102f8931e <_ZNK3WTF11OwnArrayPtrIhEixEl+7>: mov -0x48(%rbp),%rdx 0x0102f89322 <CSSParser::setUpParser+306>: lea -0x1(%rdx),%ecx 0x0102f89325 <CSSParser::setUpParser+309>: movb $0x0,(%rax,%rcx,1)

Michael Saboff

Comment 16 2012-08-29 18:59:31 PDT

Created attachment 161387 [details] Prior patch with style nit fixed Cleaned up other existing flavors of parseHexColor() at the same time.

Benjamin Poulain

Comment 17 2012-08-29 19:22:50 PDT

> clang (release build) generated: > I believe on entry start is in ebx. end gets put into ecx at +240. During the loop, it looks like i ends up in rcx and r15 contains suffix. The compiler does some math to end up with a count down loop with strlen(suffix) in eax. Don't know why it didn't hoist up the loading of m_dataStart8 into rdx. > [...] That's pretty cool!

WebKit Review Bot

Comment 18 2012-08-30 02:44:05 PDT

Comment on attachment 161387 [details] Prior patch with style nit fixed Attachment 161387 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13690622 New failing tests: fast/css/unicode-escape-in-8bit-string.html

WebKit Review Bot

Comment 19 2012-08-30 03:38:10 PDT

Comment on attachment 161387 [details] Prior patch with style nit fixed Attachment 161387 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13689640 New failing tests: fast/css/unicode-escape-in-8bit-string.html

Michael Saboff

Comment 20 2012-08-30 10:00:23 PDT

Created attachment 161502 [details] Performance numbers of last patch

Michael Saboff

Comment 21 2012-08-30 14:11:04 PDT

Created attachment 161552 [details] Patch with test changed to a reference test

Geoffrey Garen

Comment 22 2012-08-30 18:43:31 PDT

Comment on attachment 161552 [details] Patch with test changed to a reference test View in context: https://bugs.webkit.org/attachment.cgi?id=161552&action=review r=me > Source/WebCore/css/CSSParser.cpp:8603 > +UChar*& > +CSSParser::currentCharacter16() Should be on the same line. > Source/WebCore/css/CSSParser.cpp:8891 > +inline void CSSParser::parseURI(CSSParserString& string) This could be a function template, since the logic in the 8 vs 16 bit cases is equal.

Michael Saboff

Comment 23 2012-08-31 10:26:07 PDT

Committed r127277: <http://trac.webkit.org/changeset/127277>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component CSS

Assignee

Michael Saboff

Reported

2012-08-28 08:39 PDT

Modified

2013-03-15 05:39 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks

112436

Dependencies

tree graph