WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
95207
CSS Parser should directly parse 8 bit source strings
https://bugs.webkit.org/show_bug.cgi?id=95207
Summary
CSS Parser should directly parse 8 bit source strings
Michael Saboff
Reported
2012-08-28 08:39:44 PDT
The CSS parser currently uses the characters() method to access source string data. It should be modified to use characters8() for 8 bit strings.
Attachments
Patch
(90.41 KB, patch)
2012-08-28 12:37 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Performance results of patch
(22.05 KB, application/pdf)
2012-08-28 12:41 PDT
,
Michael Saboff
no flags
Details
Patch with style nit fixed
(90.40 KB, patch)
2012-08-28 12:44 PDT
,
Michael Saboff
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from gce-cr-linux-01
(563.97 KB, application/zip)
2012-08-28 14:29 PDT
,
WebKit Review Bot
no flags
Details
Patch with suggested changes and speculative cr-linux test fix
(90.33 KB, patch)
2012-08-28 18:54 PDT
,
Michael Saboff
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Patch with unsigned length compare fix plus other updates
(103.46 KB, patch)
2012-08-29 18:30 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Prior patch with style nit fixed
(103.63 KB, patch)
2012-08-29 18:59 PDT
,
Michael Saboff
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Performance numbers of last patch
(22.33 KB, application/pdf)
2012-08-30 10:00 PDT
,
Michael Saboff
no flags
Details
Patch with test changed to a reference test
(103.12 KB, patch)
2012-08-30 14:11 PDT
,
Michael Saboff
ggaren
: review+
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Michael Saboff
Comment 1
2012-08-28 12:37:47 PDT
Created
attachment 161035
[details]
Patch
WebKit Review Bot
Comment 2
2012-08-28 12:41:07 PDT
Attachment 161035
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1 Source/WebCore/css/CSSParser.cpp:8884: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 1 in 15 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 3
2012-08-28 12:41:29 PDT
Created
attachment 161038
[details]
Performance results of patch Improves performance of css-parser-yui as expected. Most other tests are neutral. Have looked into regressions in simple-url, tiny-innerHTML, url-parser and xml-parser, but don't have any leads yet. Will continue investigating why this patch made a change.
Michael Saboff
Comment 4
2012-08-28 12:44:01 PDT
Created
attachment 161039
[details]
Patch with style nit fixed
WebKit Review Bot
Comment 5
2012-08-28 14:29:43 PDT
Comment on
attachment 161039
[details]
Patch with style nit fixed
Attachment 161039
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13657467
New failing tests: animations/suspend-resume-animation-events.html fast/css/unicode-escape-in-8bit-string.html
WebKit Review Bot
Comment 6
2012-08-28 14:29:46 PDT
Created
attachment 161061
[details]
Archive of layout-test-results from gce-cr-linux-01 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-01 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Benjamin Poulain
Comment 7
2012-08-28 15:47:38 PDT
Comment on
attachment 161039
[details]
Patch with style nit fixed View in context:
https://bugs.webkit.org/attachment.cgi?id=161039&action=review
I had a first look, I have early comments about some details. Can you please measure how much does the code grow with the patch?
> Source/WebCore/css/CSSParser.cpp:168 > +template <typename CharacterType> > +ALWAYS_INLINE static bool equal(const CharacterType* a, const char* b, int length) > { > - for (int i = 0; i < a.length; ++i) { > + for (int i = 0; i < length; ++i) { > if (!b[i]) > return false; > - if (a.characters[i] != b[i]) > + if (a[i] != b[i]) > return false; > } > - return !b[a.length]; > + return !b[length]; > }
We have optimized versions of this in StringImpl.h, you should call them directly.
> Source/WebCore/css/CSSParser.cpp:186 > +template <typename CharacterType> > +ALWAYS_INLINE static bool equalIgnoringCase(const CharacterType* a, const char* b, int length) > { > - for (int i = 0; i < a.length; ++i) { > + for (int i = 0; i < length; ++i) { > if (!b[i]) > return false; > ASSERT(!isASCIIUpper(b[i])); > - if (toASCIILower(a.characters[i]) != b[i]) > + if (toASCIILower(a[i]) != b[i]) > return false; > } > - return !b[a.length]; > + return !b[length]; > +}
We have optimized versions of this in StringImpl.h, you should call them directly.
> Source/WebCore/css/CSSParser.cpp:335 > + int stringLength = string.length(); > + int length = stringLength + m_parsedTextPrefixLength + strlen(suffix) + 1;
String length are typically unsigned.
> Source/WebCore/css/CSSParser.cpp:340 > + for (unsigned i = 0; i < m_parsedTextPrefixLength; i++) > + m_dataStart8[i] = prefix[i];
memcpy?
> Source/WebCore/css/CSSParser.cpp:348 > + for (unsigned i = start; i < end; i++) > + m_dataStart8[i] = suffix[i - start];
memcpy? Or just a while loop incrementing the pointers. I think the "i - start" is a little less readable.
> Source/WebCore/css/CSSParser.cpp:350 > + m_dataStart8[length - 1] = 0;
If you do: unsigned length = stringLength + m_parsedTextPrefixLength + strlen(suffix) + 1; ... m_dataStart8 = adoptArrayPtr(new LChar[length + 1]); ... This becomes: m_dataStart8[length] = 0; I personally prefer this way of handling zero termination, but that is up to you.
> Source/WebCore/css/CSSParser.cpp:3122 > - AtomicString variableName = String(name.characters + 12, name.length - 12); > + AtomicString variableName = name.is8Bit() ? String(name.characters8() + 12, name.length() - 12) : String(name.characters16() + 12, name.length() - 12);
Creating an AtomicString from a String can be inefficient. You should use the AtomicString constructor here instead.
> Source/WebCore/css/CSSParser.cpp:8871 > + UnicodeToChars(result, unicode); }
Lost bracket :)
> Source/WebCore/css/CSSParser.h:489 > + LChar* m_currentCharacter8; > + UChar* m_currentCharacter16;
Should this be an union? Are they mutually exclusive?
Michael Saboff
Comment 8
2012-08-28 18:54:55 PDT
Created
attachment 161115
[details]
Patch with suggested changes and speculative cr-linux test fix (In reply to
comment #7
)
> (From update of
attachment 161039
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=161039&action=review
> > I had a first look, I have early comments about some details. > Can you please measure how much does the code grow with the patch?
Comparing the size output from before and after X86-64 release builds of Mac WebCore.framework: __TEXT __DATA __OBJC others dec hex before- 17391616 1871872 0 18460672 37724160 23fa000 after- 17408000 1871872 0 18399232 37679104 23ef000 delta- 16384 0 0 This growth is less than I would have thought!
> > Source/WebCore/css/CSSParser.cpp:168 > > +template <typename CharacterType> > > +ALWAYS_INLINE static bool equal(const CharacterType* a, const char* b, int length) > > { > > - for (int i = 0; i < a.length; ++i) { > > + for (int i = 0; i < length; ++i) { > > if (!b[i]) > > return false; > > - if (a.characters[i] != b[i]) > > + if (a[i] != b[i]) > > return false; > > } > > - return !b[a.length]; > > + return !b[length]; > > } > > We have optimized versions of this in StringImpl.h, you should call them directly.
Done.
> > Source/WebCore/css/CSSParser.cpp:186 > > +template <typename CharacterType> > > +ALWAYS_INLINE static bool equalIgnoringCase(const CharacterType* a, const char* b, int length) > > { > > - for (int i = 0; i < a.length; ++i) { > > + for (int i = 0; i < length; ++i) { > > if (!b[i]) > > return false; > > ASSERT(!isASCIIUpper(b[i])); > > - if (toASCIILower(a.characters[i]) != b[i]) > > + if (toASCIILower(a[i]) != b[i]) > > return false; > > } > > - return !b[a.length]; > > + return !b[length]; > > +} > > We have optimized versions of this in StringImpl.h, you should call them directly.
Done.
> > Source/WebCore/css/CSSParser.cpp:335 > > + int stringLength = string.length(); > > + int length = stringLength + m_parsedTextPrefixLength + strlen(suffix) + 1; > > String length are typically unsigned.
Fixed this.
> > Source/WebCore/css/CSSParser.cpp:340 > > + for (unsigned i = 0; i < m_parsedTextPrefixLength; i++) > > + m_dataStart8[i] = prefix[i]; > > memcpy?
The prefixes and suffices are so small, 0 to a 33 bytes with most of them 0 that I think the call overhead would outweigh the benefit.
> > Source/WebCore/css/CSSParser.cpp:348 > > + for (unsigned i = start; i < end; i++) > > + m_dataStart8[i] = suffix[i - start]; > > memcpy? > Or just a while loop incrementing the pointers. I think the "i - start" is a little less readable.
Clang on X86 makes slightly better code using scaled index instructions instead of needing to increment a pointer (an ALU operation). I assume that the same is true for gcc. I can clean up the suffix index calculation, but it will probably look best with another local.
> > Source/WebCore/css/CSSParser.cpp:350 > > + m_dataStart8[length - 1] = 0; > > If you do: > unsigned length = stringLength + m_parsedTextPrefixLength + strlen(suffix) + 1; > ... > m_dataStart8 = adoptArrayPtr(new LChar[length + 1]); > ... > This becomes: > m_dataStart8[length] = 0; > > I personally prefer this way of handling zero termination, but that is up to you.
I want m_length to be the buffer length. It is used to allocate a 16 bit buffer when there is 8 bit source with a 16 bit escape.
> > Source/WebCore/css/CSSParser.cpp:3122 > > - AtomicString variableName = String(name.characters + 12, name.length - 12); > > + AtomicString variableName = name.is8Bit() ? String(name.characters8() + 12, name.length() - 12) : String(name.characters16() + 12, name.length() - 12); > > Creating an AtomicString from a String can be inefficient. You should use the AtomicString constructor here instead.
Done.
> > Source/WebCore/css/CSSParser.cpp:8871 > > + UnicodeToChars(result, unicode); } > > Lost bracket :)
Saw and fixed after original post.
> > Source/WebCore/css/CSSParser.h:489 > > + LChar* m_currentCharacter8; > > + UChar* m_currentCharacter16; > > Should this be an union? Are they mutually exclusive?
These are both needed. For the case that we have an 8 bit source and a Unicode escape that resolves to 16 bit, we use the 16 bit data buffer and m_currentCharacter16 to track where in that buffer.
Benjamin Poulain
Comment 9
2012-08-28 19:10:46 PDT
> Comparing the size output from before and after X86-64 release builds of Mac WebCore.framework: > __TEXT __DATA __OBJC others dec hex > before- 17391616 1871872 0 18460672 37724160 23fa000 > after- 17408000 1871872 0 18399232 37679104 23ef000 > delta- 16384 0 0 > This growth is less than I would have thought!
I work my tail off to remove instructions in the order of 4kb :( But not 16kb is not too bad. :)
> > > Source/WebCore/css/CSSParser.cpp:348 > > > + for (unsigned i = start; i < end; i++) > > > + m_dataStart8[i] = suffix[i - start]; > > > > memcpy? > > Or just a while loop incrementing the pointers. I think the "i - start" is a little less readable. > > Clang on X86 makes slightly better code using scaled index instructions instead of needing to increment a pointer (an ALU operation). I assume that the same is true for gcc. I can clean up the suffix index calculation, but it will probably look best with another local.
Out of curiosity, does the compiler actually generate the "- start" in the loop? Or is it smart enough to just do (suffix + start) before the loop?
WebKit Review Bot
Comment 10
2012-08-28 20:50:25 PDT
Comment on
attachment 161115
[details]
Patch with suggested changes and speculative cr-linux test fix
Attachment 161115
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13690034
Peter Beverloo (cr-android ews)
Comment 11
2012-08-28 21:02:15 PDT
Comment on
attachment 161115
[details]
Patch with suggested changes and speculative cr-linux test fix
Attachment 161115
[details]
did not pass cr-android-ews (chromium-android): Output:
http://queues.webkit.org/results/13689025
Build Bot
Comment 12
2012-08-28 21:42:36 PDT
Comment on
attachment 161115
[details]
Patch with suggested changes and speculative cr-linux test fix
Attachment 161115
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13686050
Michael Saboff
Comment 13
2012-08-29 18:30:21 PDT
Created
attachment 161383
[details]
Patch with unsigned length compare fix plus other updates Made some other template changes in isCSSTokenizer*, color parsing and double parsing code.
WebKit Review Bot
Comment 14
2012-08-29 18:32:25 PDT
Attachment 161383
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1 Source/WebCore/platform/graphics/Color.h:160: The parameter name "rgb" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 1 in 17 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 15
2012-08-29 18:55:24 PDT
> Out of curiosity, does the compiler actually generate the "- start" in the loop? Or is it smart enough to just do (suffix + start) before the loop?
It appears that it does. For this code, unsigned start = m_parsedTextPrefixLength + stringLength; unsigned end = start + strlen(suffix); for (unsigned i = start; i < end; i++) m_dataStart8[i] = suffix[i - start]; m_dataStart8[length - 1] = 0; clang (release build) generated: I believe on entry start is in ebx. end gets put into ecx at +240. During the loop, it looks like i ends up in rcx and r15 contains suffix. The compiler does some math to end up with a count down loop with strlen(suffix) in eax. Don't know why it didn't hoist up the loading of m_dataStart8 into rdx. 0x0102f892d9 <CSSParser::setUpParser+233>: callq 0x103b45590 <dyld_stub_strlen> 0x0102f892de <CSSParser::setUpParser+238>: mov %eax,%ecx 0x0102f892e0 <CSSParser::setUpParser+240>: add %ebx,%ecx 0x0102f892e2 <CSSParser::setUpParser+242>: cmp %ecx,%ebx 0x0102f892e4 <CSSParser::setUpParser+244>: jae 0x102f89317 <OwnArrayPtr> 0x0102f892e6 <CSSParser::setUpParser+246>: lea 0x1(%r12,%r13,1),%ecx 0x0102f892eb <CSSParser::setUpParser+251>: add %r13d,%r12d 0x0102f892ee <CSSParser::setUpParser+254>: lea 0x1(%rax,%r12,1),%eax 0x0102f892f3 <CSSParser::setUpParser+259>: sub %ecx,%eax 0x0102f892f5 <CSSParser::setUpParser+261>: mov %ebx,%ecx 0x0102f892f7 <CSSParser::setUpParser+263>: nopw 0x0(%rax,%rax,1) 0x0102f89300 <CSSParser::setUpParser+272>: mov 0xd0(%r14),%rdx 0x0102f89307 <CSSParser::setUpParser+279>: mov (%r15),%bl 0x0102f8930a <CSSParser::setUpParser+282>: mov %bl,(%rdx,%rcx,1) 0x0102f8930d <CSSParser::setUpParser+285>: inc %r15 0x0102f89310 <CSSParser::setUpParser+288>: inc %rcx 0x0102f89313 <CSSParser::setUpParser+291>: dec %eax 0x0102f89315 <CSSParser::setUpParser+293>: jne 0x102f89300 <CSSParser::setUpParser+272> 0x0102f89317 <_ZNK3WTF11OwnArrayPtrIhEixEl+0>: mov 0xd0(%r14),%rax 0x0102f8931e <_ZNK3WTF11OwnArrayPtrIhEixEl+7>: mov -0x48(%rbp),%rdx 0x0102f89322 <CSSParser::setUpParser+306>: lea -0x1(%rdx),%ecx 0x0102f89325 <CSSParser::setUpParser+309>: movb $0x0,(%rax,%rcx,1)
Michael Saboff
Comment 16
2012-08-29 18:59:31 PDT
Created
attachment 161387
[details]
Prior patch with style nit fixed Cleaned up other existing flavors of parseHexColor() at the same time.
Benjamin Poulain
Comment 17
2012-08-29 19:22:50 PDT
> clang (release build) generated: > I believe on entry start is in ebx. end gets put into ecx at +240. During the loop, it looks like i ends up in rcx and r15 contains suffix. The compiler does some math to end up with a count down loop with strlen(suffix) in eax. Don't know why it didn't hoist up the loading of m_dataStart8 into rdx. > [...]
That's pretty cool!
WebKit Review Bot
Comment 18
2012-08-30 02:44:05 PDT
Comment on
attachment 161387
[details]
Prior patch with style nit fixed
Attachment 161387
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13690622
New failing tests: fast/css/unicode-escape-in-8bit-string.html
WebKit Review Bot
Comment 19
2012-08-30 03:38:10 PDT
Comment on
attachment 161387
[details]
Prior patch with style nit fixed
Attachment 161387
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13689640
New failing tests: fast/css/unicode-escape-in-8bit-string.html
Michael Saboff
Comment 20
2012-08-30 10:00:23 PDT
Created
attachment 161502
[details]
Performance numbers of last patch
Michael Saboff
Comment 21
2012-08-30 14:11:04 PDT
Created
attachment 161552
[details]
Patch with test changed to a reference test
Geoffrey Garen
Comment 22
2012-08-30 18:43:31 PDT
Comment on
attachment 161552
[details]
Patch with test changed to a reference test View in context:
https://bugs.webkit.org/attachment.cgi?id=161552&action=review
r=me
> Source/WebCore/css/CSSParser.cpp:8603 > +UChar*& > +CSSParser::currentCharacter16()
Should be on the same line.
> Source/WebCore/css/CSSParser.cpp:8891 > +inline void CSSParser::parseURI(CSSParserString& string)
This could be a function template, since the logic in the 8 vs 16 bit cases is equal.
Michael Saboff
Comment 23
2012-08-31 10:26:07 PDT
Committed
r127277
: <
http://trac.webkit.org/changeset/127277
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug