70107 – Custom written CSS lexer

RESOLVED FIXED 70107

Custom written CSS lexer

https://bugs.webkit.org/show_bug.cgi?id=70107

Summary Custom written CSS lexer

Zoltan Herczeg

Reported 2011-10-14 06:35:56 PDT

In the current state it is far from ready, I am just curious about EWS. On methanol it is more than twice (~2.18x) as fast as the original. The result of 3x2 runs in CPU cycles Original: 1354030040 1377989672 1345539012 Optimized: 622771368 614838444 635476680

Attachments
draft patch (52.40 KB, patch) 2011-10-14 06:36 PDT, Zoltan Herczeg	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
next draft (52.45 KB, patch) 2011-10-17 03:56 PDT, Zoltan Herczeg	no flags	Details Formatted Diff Diff
core patch (52.72 KB, patch) 2011-12-12 00:57 PST, Zoltan Herczeg	oliver: review- webkit.review.bot: commit-queue-	Details Formatted Diff Diff
CPU cycle measure (2.64 KB, patch) 2011-12-12 23:32 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
core patch2 (55.29 KB, patch) 2011-12-13 01:35 PST, Zoltan Herczeg	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
full patch, build testing (76.65 KB, patch) 2011-12-20 03:20 PST, Zoltan Herczeg	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
updated draft patch (81.20 KB, patch) 2012-01-03 05:20 PST, Zoltan Herczeg	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
release patch (84.87 KB, patch) 2012-01-05 01:27 PST, Zoltan Herczeg	oliver: review-	Details Formatted Diff Diff
next patch (78.82 KB, patch) 2012-01-11 04:47 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
patch update (78.61 KB, patch) 2012-01-24 01:21 PST, Zoltan Herczeg	koivisto: review+ ap: commit-queue-	Details Formatted Diff Diff
patch with renamings (83.83 KB, patch) 2012-01-25 06:01 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
parsing test (58.42 KB, application/octet-stream) 2012-01-30 05:12 PST, Zoltan Herczeg	no flags	Details
Show Obsolete (8) View All Add attachment proposed patch, testcase, etc.

Zoltan Herczeg

Comment 1 2011-10-14 06:36:37 PDT

Created attachment 111009 [details] draft patch

Zoltan Herczeg

Comment 2 2011-10-14 06:38:20 PDT

> On methanol it is more than twice (~2.18x) as fast as the original. I mean the sum of CSSParser::lex(...) function calls

WebKit Review Bot

Comment 3 2011-10-14 06:38:38 PDT

Attachment 111009 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/css/CSSParser.cpp', u'Sourc..." exit_code: 1 Source/WebCore/css/CSSParser.cpp:6993: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/css/CSSParser.cpp:7686: An else if statement should be written as an if statement when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4] Source/WebCore/css/CSSParser.cpp:7950: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/css/CSSParser.cpp:7952: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/css/CSSParser.cpp:7953: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/css/CSSParser.cpp:7953: Semicolon defining empty statement for this loop. Use { } instead. [whitespace/semicolon] [5] Total errors found: 6 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.

WebKit Review Bot

Comment 4 2011-10-14 08:05:18 PDT

Comment on attachment 111009 [details] draft patch Attachment 111009 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10054557 New failing tests: inspector/styles/styles-source-offsets.html inspector/elements/elements-panel-styles.html inspector/styles/styles-new-API.html inspector/styles/styles-source-lines.html

Zoltan Herczeg

Comment 5 2011-10-17 03:56:46 PDT

Created attachment 111241 [details] next draft

Zoltan Herczeg

Comment 6 2011-10-17 03:59:40 PDT

Layout and style (except debug) errors are fixed. The question now: what is the correct token style?

WebKit Review Bot

Comment 7 2011-10-17 03:59:57 PDT

Attachment 111241 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/css/CSSParser.cpp', u'Sourc..." exit_code: 1 Source/WebCore/css/CSSParser.cpp:7950: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/css/CSSParser.cpp:7952: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/css/CSSParser.cpp:7953: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/css/CSSParser.cpp:7953: Semicolon defining empty statement for this loop. Use { } instead. [whitespace/semicolon] [5] Total errors found: 4 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.

Zoltan Herczeg

Comment 8 2011-12-12 00:57:53 PST

Created attachment 118748 [details] core patch I think the tokens supposedly parsed by CSS are finally covered, and it is time for reviewing the custom written css tokenizer. The patch is not ready, it has no ChangeLog, and tokenizer.flex is not removed form the trunk. But I would like to hear your opinion since I am sure you have several ideas to improve the patch.

WebKit Review Bot

Comment 9 2011-12-12 01:32:11 PST

Comment on attachment 118748 [details] core patch Attachment 118748 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10841018 New failing tests: fast/regions/parsing-region-style-rule.html

Antti Koivisto

Comment 10 2011-12-12 09:36:44 PST

Someone who knows about parsers should take a look. Oliver perhaps?

Oliver Hunt

Comment 11 2011-12-12 13:09:50 PST

Comment on attachment 118748 [details] core patch View in context: https://bugs.webkit.org/attachment.cgi?id=118748&action=review There seems to be a lot of code in this that assumes null-termination of all strings passed to the lexer -- is this guaranteed? For instance if i do var foo = "some big string"; someElement.style=foo.substring(2, 8); (not sure if this exact statement is valid), we may return a StringImpl that is referencing the middle of another string so null termination won't be guaranteed. At the very least i'd like assertions that m_data never exceeds the buffer length. > Source/WebCore/css/CSSParser.cpp:7388 > + return chr | 0x20; Can you add an assertion that chr is in fact an ascii letter? > Source/WebCore/css/CSSParser.cpp:7394 > + || (data[0] == '\\' && isCSSEscape(data[1])); How do we guarantee data is at least 2 characters long? > Source/WebCore/css/CSSParser.cpp:7430 > + } while (isASCIIHexDigit(*data) && --length); I'd like assertions that data doesn't extend beyond the end of the buffer. > Source/WebCore/css/CSSParser.cpp:7533 > + if (UNLIKELY(*m_data == quote)) { > + // String parsing is done. > + ++m_data; > + return; > + } What happens if we have an unterminated string? afaict we just walk off the end of the buffer.

Zoltan Herczeg

Comment 12 2011-12-12 13:25:49 PST

Hey Oliver, actually the CSS parser working in an internal buffer, which is terminated by two '\0'-s, see the following code: void CSSParser::setupParser(const char* prefix, const String& string, const char* suffix) { int length = string.length() + strlen(prefix) + strlen(suffix) + 2; fastFree(m_data); m_data = static_cast<UChar*>(fastMalloc(length * sizeof(UChar))); for (unsigned i = 0; i < strlen(prefix); i++) m_data[i] = prefix[i]; memcpy(m_data + strlen(prefix), string.characters(), string.length() * sizeof(UChar)); unsigned start = strlen(prefix) + string.length(); unsigned end = start + strlen(suffix); for (unsigned i = start; i < end; i++) m_data[i] = suffix[i - start]; m_data[length - 1] = 0; m_data[length - 2] = 0; yy_hold_char = 0; yyleng = 0; yytext = yy_c_buf_p = m_data; yy_hold_char = *yy_c_buf_p; resetRuleBodyMarks(); } Flex needs this, and I actually liked this approach, since it requires less buffer range checks (you know that any non \0 character must be followed at least two \0-s). CSS actually designed that way that '\0' is not part of any token.

Oliver Hunt

Comment 13 2011-12-12 13:28:17 PST

(In reply to comment #12) > Hey Oliver, > > actually the CSS parser working in an internal buffer, which is terminated by two '\0'-s, see the following code: > > void CSSParser::setupParser(const char* prefix, const String& string, const char* suffix) > { > int length = string.length() + strlen(prefix) + strlen(suffix) + 2; > > fastFree(m_data); > m_data = static_cast<UChar*>(fastMalloc(length * sizeof(UChar))); > for (unsigned i = 0; i < strlen(prefix); i++) > m_data[i] = prefix[i]; > > memcpy(m_data + strlen(prefix), string.characters(), string.length() * sizeof(UChar)); > > unsigned start = strlen(prefix) + string.length(); > unsigned end = start + strlen(suffix); > for (unsigned i = start; i < end; i++) > m_data[i] = suffix[i - start]; > > m_data[length - 1] = 0; > m_data[length - 2] = 0; > > yy_hold_char = 0; > yyleng = 0; > yytext = yy_c_buf_p = m_data; > yy_hold_char = *yy_c_buf_p; > resetRuleBodyMarks(); > } > > Flex needs this, and I actually liked this approach, since it requires less buffer range checks (you know that any non \0 character must be followed at least two \0-s). CSS actually designed that way that '\0' is not part of any token. On the downside it requires a copy of the css source -- is a well predicted branch really more expensive than a copy?

Oliver Hunt

Comment 14 2011-12-12 13:33:55 PST

Oh, and i still don't see how the null termination will stop the string parsing from running over the end of the buffer (In reply to comment #12) > Hey Oliver, > > actually the CSS parser working in an internal buffer, which is terminated by two '\0'-s, see the following code: > > void CSSParser::setupParser(const char* prefix, const String& string, const char* suffix) > { > int length = string.length() + strlen(prefix) + strlen(suffix) + 2; > > fastFree(m_data); > m_data = static_cast<UChar*>(fastMalloc(length * sizeof(UChar))); > for (unsigned i = 0; i < strlen(prefix); i++) > m_data[i] = prefix[i]; > > memcpy(m_data + strlen(prefix), string.characters(), string.length() * sizeof(UChar)); > > unsigned start = strlen(prefix) + string.length(); > unsigned end = start + strlen(suffix); > for (unsigned i = start; i < end; i++) > m_data[i] = suffix[i - start]; > > m_data[length - 1] = 0; > m_data[length - 2] = 0; > > yy_hold_char = 0; > yyleng = 0; > yytext = yy_c_buf_p = m_data; > yy_hold_char = *yy_c_buf_p; > resetRuleBodyMarks(); > } > > Flex needs this, and I actually liked this approach, since it requires less buffer range checks (you know that any non \0 character must be followed at least two \0-s). CSS actually designed that way that '\0' is not part of any token.

Andreas Kling

Comment 15 2011-12-12 14:49:58 PST

(In reply to comment #0) > In the current state it is far from ready, I am just curious about EWS. > > On methanol it is more than twice (~2.18x) as fast as the original. > > The result of 3x2 runs in CPU cycles > Original: 1354030040 1377989672 1345539012 > Optimized: 622771368 614838444 635476680 Are these callgrind CPU cycle counts, or measured using a hardware CPU profiler? What's the improvement in wall-time? I'm curious because I've seen some rather unbelievable improvement numbers in association with the "methanol" benchmark more than once before and I'm not convinced it can be trusted. (Examples: bug 66851, bug 64262, bug 63909)

Benjamin Poulain

Comment 16 2011-12-12 14:58:51 PST

> On methanol it is more than twice (~2.18x) as fast as the original. I have seen multiple false or totally erroneous claims based on this "methanol" benchmark. Could you please provide a cross-platform stress test (a webpage or just C++ calling the lexer) so others can verify the results?

Benjamin Poulain

Comment 17 2011-12-12 15:12:33 PST

By the way, I think you should announce the intent of changing the CSS lexer on WebKit-dev. Given the scope of the change, it would be polite to give visibility for every port.

Zoltan Herczeg

Comment 18 2011-12-12 23:32:21 PST

Created attachment 118964 [details] CPU cycle measure The attached patch works with Qt, gcc and x86, but it can be extended to other ports / CPUs if anyone needs it. I measured the values on an Intel Xeon server, since the cycle counters of all cores are synchronized there. Therefore you don't need to worry about CPU affinity. On other machines you may experince "jumpings" as the kernel migrates the process, that is why I like the Xeon. I saw speedup between 2x - 2.5x on any pages, so you can use any page to measure it. The reason of using cycle counters is based on that observation that the tokenizer called thousands of times, but a single run itself takes little time. Thus statistic based measurements are useless in our case.

Oliver Hunt

Comment 19 2011-12-12 23:42:38 PST

Based on other patches i've seen go through it might be worth checking against html5 spec load time (although i suspect that's mostly style calc/resolution rather than rule parsing). Antti can probably give some info on how he's measuring. Can you explain to me how the quoted string lexing doesn't run off the end of the buffer though? I still can't see it.

Zoltan Herczeg

Comment 20 2011-12-12 23:49:47 PST

> Can you explain to me how the quoted string lexing doesn't run off the end of the buffer though? I still can't see it. I am on #webkit now, we can discuss there if you have a little time.

Zoltan Herczeg

Comment 21 2011-12-13 00:04:36 PST

> > Source/WebCore/css/CSSParser.cpp:7388 > > + return chr | 0x20; > > Can you add an assertion that chr is in fact an ascii letter? It doesn't need to be one. I saw this trick in JavaScriptCore/wtf/ASCIICType.h (chr | 0x20) == 'a' if and only if chr is 'A' or 'a'. Nothing else can satisfy this expression, and I use it for character comparison many times. The only exception is the second isCaselessEqual, where I describe the rules applies to that function in the comment, and assertions were added to check the rules. Basically if constantString contains minus sign '-', the cssString must contains only cssLetter-s. > > > Source/WebCore/css/CSSParser.cpp:7394 > > + || (data[0] == '\\' && isCSSEscape(data[1])); > > How do we guarantee data is at least 2 characters long? C++ has a short circuit expression evaluation method, so data[0] == '\\' must be true before it checks data[1]. > Can you explain to me how the quoted string lexing doesn't run off the end of the buffer though? I still can't see it. You are right here, checkString must return with fail if it encounters a \0.

Oliver Hunt

Comment 22 2011-12-13 00:05:18 PST

Comment on attachment 118748 [details] core patch View in context: https://bugs.webkit.org/attachment.cgi?id=118748&action=review > Source/WebCore/css/CSSParser.cpp:7460 > + if (UNLIKELY(*data == quote)) { We don't take this path if we've reached the end of the buffer as *data == 0 > Source/WebCore/css/CSSParser.cpp:7464 > + if (UNLIKELY(*data < 14 && *data > 9 && *data != 11)) { This is checking for *data == 10, 12 or 13 so we don't match 0 here. > Source/WebCore/css/CSSParser.cpp:7469 > + if (LIKELY(data[0] != '\\')) data[0] is 0 (because we're at the end of the buffer) so we take this path, increment data, and go back to the head of the loop

Zoltan Herczeg

Comment 23 2011-12-13 00:08:24 PST

(In reply to comment #22) > (From update of attachment 118748 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=118748&action=review > > > Source/WebCore/css/CSSParser.cpp:7460 > > + if (UNLIKELY(*data == quote)) { > > We don't take this path if we've reached the end of the buffer as *data == 0 > > > Source/WebCore/css/CSSParser.cpp:7464 > > + if (UNLIKELY(*data < 14 && *data > 9 && *data != 11)) { > > This is checking for *data == 10, 12 or 13 so we don't match 0 here. > > > Source/WebCore/css/CSSParser.cpp:7469 > > + if (LIKELY(data[0] != '\\')) > > data[0] is 0 (because we're at the end of the buffer) so we take this path, increment data, and go back to the head of the loop I totally agree this one as I said above :) I fix it of course.

Oliver Hunt

Comment 24 2011-12-13 00:14:01 PST

> > > Source/WebCore/css/CSSParser.cpp:7394 > > > + || (data[0] == '\\' && isCSSEscape(data[1])); > > > > How do we guarantee data is at least 2 characters long? > > C++ has a short circuit expression evaluation method, so data[0] == '\\' must be true before it checks data[1]. Oh right, and we guarantee null termination so even if i have a \ at the end of input this will work. I don't really like all these dependencies on having 1+ null terminations on the input as it will make it harder to safely move to a non-copying lexer in future, but i recognize this is essentially existing behavior so you don't need to change it in this patch.

Zoltan Herczeg

Comment 25 2011-12-13 00:24:38 PST

> On the downside it requires a copy of the css source -- is a well predicted branch really more expensive than a copy? This cannot be avoided, since the CSS tokenizer overwrites escape sequences to real characters: Identifier "x\61y" is changed to "xay" before further processing. So we can only avoid buffer dumplication if the input does not contain any escape sequences. We need to take care of prefix and postfix strings as well somehow. And the worst thing is we don't get the ownership of the input string (not even a RefPtr), so it can be freed during tokenizing. What do you think?

Zoltan Herczeg

Comment 26 2011-12-13 01:35:17 PST

Created attachment 118978 [details] core patch2 Fixed the string parsing bug and added new comments.

WebKit Review Bot

Comment 27 2011-12-13 02:25:20 PST

Comment on attachment 118978 [details] core patch2 Attachment 118978 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10844274 New failing tests: fast/regions/parsing-region-style-rule.html

Oliver Hunt

Comment 28 2011-12-13 08:40:17 PST

(In reply to comment #25) > > On the downside it requires a copy of the css source -- is a well predicted branch really more expensive than a copy? > > This cannot be avoided, since the CSS tokenizer overwrites escape sequences to real characters: Identifier "x\61y" is changed to "xay" before further processing. So we can only avoid buffer dumplication if the input does not contain any escape sequences. We need to take care of prefix and postfix strings as well somehow. And the worst thing is we don't get the ownership of the input string (not even a RefPtr), so it can be freed during tokenizing. What do you think? The same problem exists when parsing JS -- but copying all strings is a pessimisation, you should only need to copy when there are escape sequences, and they are very rare. Once again: a change to make it non-copying doesn't need to be in this patch :D

Zoltan Herczeg

Comment 29 2011-12-13 13:50:07 PST

Thanks for the review Oliver. Is there anything else I should do with the core part?

Zoltan Herczeg

Comment 30 2011-12-20 03:20:05 PST

Created attachment 120002 [details] full patch, build testing

WebKit Review Bot

Comment 31 2011-12-20 15:00:02 PST

Comment on attachment 120002 [details] full patch, build testing Attachment 120002 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10992332 New failing tests: fast/regions/parsing-region-style-rule.html

Benjamin Poulain

Comment 32 2011-12-27 03:39:49 PST

Comment on attachment 120002 [details] full patch, build testing View in context: https://bugs.webkit.org/attachment.cgi?id=120002&action=review I just quickly skimmed over the path. Some comments bellow. A general concern is lot of code have the assumption the input is ASCII. Is there no need to handle unicode (identifiers, space, digits, etc?). > Source/WebCore/css/CSSParser.cpp:7446 > + while (UNLIKELY(*data <= ' ' && (typesOfASCIICharacters[*data] == CharacterWhiteSpace))) I am not sure the UNLIKELY() is a good idea here. A bunch of spaces grouped together seems very likely :) > Source/WebCore/css/CSSParser.cpp:8032 > + // This is not a real loop, just allows us to avoid gotos. > + while (true) { Where would the Goto be? I only see one break at the end. The body is a switch(). > Source/WebCore/css/CSSParser.cpp:8098 > + if (!isASCIIDigit(m_data[0])) > + break; Isn't this redundant with the following code? The code of CharacterNumber also handle the dot case. > Source/WebCore/css/CSSParser.cpp:8102 > + bool dotSeen = (m_token == '.'); I haven't read the purpose of this yet but that but I have only seen m_token used with named contants everywhere else.

Zoltan Herczeg

Comment 33 2012-01-02 02:19:36 PST

Hey Benjamin, thank you for the review and Happy New Year! > A general concern is lot of code have the assumption the input is ASCII. Is there no need to handle unicode (identifiers, space, digits, etc?). Yes. In CSS everything > 127 is called "nonascii" and treated as part of an identifier. > > Source/WebCore/css/CSSParser.cpp:8032 > > + // This is not a real loop, just allows us to avoid gotos. > > + while (true) { > > Where would the Goto be? I only see one break at the end. The body is a switch(). It is for /* */ comments. See "case CharacterSlash". Although that is the only place, so we could change that to a goto if you prefer that. > > > Source/WebCore/css/CSSParser.cpp:8098 > > + if (!isASCIIDigit(m_data[0])) > > + break; > > Isn't this redundant with the following code? > The code of CharacterNumber also handle the dot case. A little more difficult. A single '.' is a dot token, while '.0' is a number (zero in this example). > > > Source/WebCore/css/CSSParser.cpp:8102 > > + bool dotSeen = (m_token == '.'); > > I haven't read the purpose of this yet but that but I have only seen m_token used with named contants everywhere else. All m_token < 256 are single character tokens. CSSGrammar.y does not define named constant for dot but I can define one if you think it is better. I'll fix the unlikely issue.

Antti Koivisto

Comment 34 2012-01-02 03:32:16 PST

(In reply to comment #33) > > > Source/WebCore/css/CSSParser.cpp:8032 > > > + // This is not a real loop, just allows us to avoid gotos. > > > + while (true) { > > > > Where would the Goto be? I only see one break at the end. The body is a switch(). > > It is for /* */ comments. See "case CharacterSlash". Although that is the only place, so we could change that to a goto if you prefer that. I don't think WebKit has any sort of strict anti-goto policy. If goto is the most natural way to write this then you should just use it.

Zoltan Herczeg

Comment 35 2012-01-03 05:20:44 PST

Created attachment 120935 [details] updated draft patch All requestes are fixed

WebKit Review Bot

Comment 36 2012-01-03 05:22:40 PST

Attachment 120935 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/CMakeLists.txt', u'Source/W..." exit_code: 1 Source/WebCore/css/CSSParser.cpp:1424: One space before end of line comments [whitespace/comments] [5] Source/WebCore/css/CSSParser.cpp:5185: One space before end of line comments [whitespace/comments] [5] Total errors found: 2 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.

WebKit Review Bot

Comment 37 2012-01-03 06:10:48 PST

Comment on attachment 120935 [details] updated draft patch Attachment 120935 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/11074262 New failing tests: fast/regions/parsing-region-style-rule.html

Zoltan Herczeg

Comment 38 2012-01-04 05:47:12 PST

> New failing tests: > fast/regions/parsing-region-style-rule.html The bot always complain about this, but I couldn't find it in its output (and it works with Qt). Can I check this difference somehow?

Adam Barth

Comment 39 2012-01-04 07:42:59 PST

> The bot always complain about this, but I couldn't find it in its output (and it works with Qt). Can I check this difference somehow? There's no way to read this information back from the bot currently. (That's our number 1 feature request.) If you like, you can try building the Chromium port locally and try running the test: $ ./Tools/Scripts/update-webkit --chromium $ ./Tools/Scripts/build-webkit --chromium Otherwise, you can move forward with your patch, but it would be nice to keep an eye on build.webkit.org when landing to see if this problem manifests itself on the buildbots.

Zoltan Herczeg

Comment 40 2012-01-05 01:27:36 PST

Created attachment 121239 [details] release patch I think I fixed everything. Changelog added. Ready for review.

WebKit Review Bot

Comment 41 2012-01-05 01:29:02 PST

Attachment 121239 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'ChangeLog', u'Source/WebCore/CMakeLists.tx..." exit_code: 1 Source/WebCore/css/CSSParser.cpp:1424: One space before end of line comments [whitespace/comments] [5] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.

Zoltan Herczeg

Comment 42 2012-01-05 23:51:31 PST

The patch now passes all tests, and I think it is ready for a review. Oliver, Benjamin what do you think of this patch?

Oliver Hunt

Comment 43 2012-01-06 13:36:21 PST

Comment on attachment 121239 [details] release patch View in context: https://bugs.webkit.org/attachment.cgi?id=121239&action=review Okay, i'm doing a preliminary r- due to the large number of unnecessary formatting changes sorry. A patch of this scale and complexity shouldn't have unnecessary no-op changes in it :-/ I'll continue with actual content review, but this should let you start working on the formatting stuff without waiting for me to finish this review > Source/WebCore/css/CSSParser.cpp:263 > - int length = string.length() + strlen(prefix) + strlen(suffix) + 2; > + int length = string.length() + strlen(prefix) + strlen(suffix) + 1; Why have we dropped to only a single extra null terminator? I recall (though haven't got that far in the patch) that we rely on two nulls at the end of the string > Source/WebCore/css/CSSParser.cpp:925 > - if (id == CSSValueStatic || > - id == CSSValueRelative || > - id == CSSValueAbsolute || > - id == CSSValueFixed) > + if (id == CSSValueStatic > + || id == CSSValueRelative > + || id == CSSValueAbsolute > + || id == CSSValueFixed) This change seems unnecessary formatting which i'd rather avoid in a patch of this size and complexity > Source/WebCore/css/CSSParser.cpp:939 > - if (id == CSSValueAuto || > - id == CSSValueAlways || > - id == CSSValueAvoid || > - id == CSSValueLeft || > - id == CSSValueRight) > + if (id == CSSValueAuto > + || id == CSSValueAlways > + || id == CSSValueAvoid > + || id == CSSValueLeft > + || id == CSSValueRight) ditto > Source/WebCore/css/CSSParser.cpp:952 > - if (id == CSSValueShow || > - id == CSSValueHide) > + if (id == CSSValueShow > + || id == CSSValueHide) ditto > Source/WebCore/css/CSSParser.cpp:965 > - if (id == CSSValueNormal || > - id == CSSValuePre || > - id == CSSValuePreWrap || > - id == CSSValuePreLine || > - id == CSSValueNowrap) > + if (id == CSSValueNormal > + || id == CSSValuePre > + || id == CSSValuePreWrap > + || id == CSSValuePreLine > + || id == CSSValueNowrap) ditto >> Source/WebCore/css/CSSParser.cpp:1424 >> - case CSSPropertyCounterReset: // [ <identifier> <integer>? ]+ | none | inherit >> + case CSSPropertyCounterReset: // [ <identifier> <integer>? ]+ | none | inherit > > One space before end of line comments [whitespace/comments] [5] ditto > Source/WebCore/css/CSSParser.cpp:2025 > - validPrimitive = true; > + validPrimitive = true; ditto > Source/WebCore/css/CSSParser.cpp:2894 > - return cssValuePool()->createIdentifierValue(id); > + return cssValuePool()->createIdentifierValue(id); ditto > Source/WebCore/css/CSSParser.cpp:5186 > - // it is a color, but a color isn't allowed at this point. > + // it is a color, but a color isn't allowed at this point. ditto > Source/WebCore/css/CSSParser.cpp:5474 > - return id == CSSValueStretch || id == CSSValueRepeat || id == CSSValueSpace || id == CSSValueRound; > + return id == CSSValueStretch || id == CSSValueRepeat || id == CSSValueSpace || id == CSSValueRound; ditto > Source/WebCore/css/CSSParser.cpp:5559 > - m_left = m_cssValuePool->createValue(m_right->getDoubleValue(), (CSSPrimitiveValue::UnitTypes)m_right->primitiveType()); > + m_left = m_cssValuePool->createValue(m_right->getDoubleValue(), (CSSPrimitiveValue::UnitTypes)m_right->primitiveType()); ditto

Oliver Hunt

Comment 44 2012-01-06 13:50:17 PST

Comment on attachment 121239 [details] release patch View in context: https://bugs.webkit.org/attachment.cgi?id=121239&action=review > Source/WebCore/css/CSSParser.cpp:7494 > + if (UNLIKELY(*data < 14 && (!*data || *data == 10 || (*data | 0x1) == 13))) { > + // String parsing is failed for character 0, 10, 12 or 13. > + return 0; > + } yucky constants -- could you use '\n' etc? e.g. *data <= '\r' (i think?), etc > Source/WebCore/css/CSSParser.cpp:7561 > + ASSERT(*m_data >= 14 || (*m_data <= 9 && *m_data) || *m_data == 11); more nasty constants

Zoltan Herczeg

Comment 45 2012-01-11 04:47:45 PST

Created attachment 122006 [details] next patch Style changes are moved to other patches and they are landed. Constants are fixed. I think it is ready for the next review round.

Zoltan Herczeg

Comment 46 2012-01-11 04:50:39 PST

> > Source/WebCore/css/CSSParser.cpp:263 > > - int length = string.length() + strlen(prefix) + strlen(suffix) + 2; > > + int length = string.length() + strlen(prefix) + strlen(suffix) + 1; > > Why have we dropped to only a single extra null terminator? I recall (though haven't got that far in the patch) that we rely on two nulls at the end of the string The double zero terminator is only required by Flex. The attached code does not depend on the double terminator (thanks to C short circuit evaluation model).

Antti Koivisto

Comment 47 2012-01-11 05:05:22 PST

(In reply to comment #18) > The reason of using cycle counters is based on that observation that the tokenizer called thousands of times, but a single run itself takes little time. Thus statistic based measurements are useless in our case. This is not true. Sampling profiler works just fine in cases like this. If you are not seeing progression in something like Instruments or Shark then you are very likely not improving anything.

Zoltan Herczeg

Comment 48 2012-01-17 05:15:45 PST

> This is not true. Sampling profiler works just fine in cases like this. If you are not seeing progression in something like Instruments or Shark then you are very likely not improving anything. I made some profiling on arm. I collected all CSS lexing contributor functions of "opreport -l libQtWebKit.so.4" Was: 505 1.7080 WebCore::CSSParser::lex() 131 0.4431 WebCore::CSSParser::text(int*) 54 0.1826 WebCore::CSSParser::lex(void*) Sum: ~700 sample New: 190 0.6663 WebCore::CSSParser::lex(void*) 12 0.0421 WebCore::CSSParser::detectAtToken(int, bool) 6 0.0210 WebCore::CSSParser::detectNumberToken(unsigned short*, int) 6 0.0210 WebCore::CSSParser::detectNumberToken(unsigned short*, int) 6 0.0210 WebCore::CSSParser::parseIdentifier(unsigned short*&) 6 0.0210 WebCore::CSSParser::parseURI(unsigned short*&, unsigned short*&) 1 0.0035 WebCore::CSSParser::detectFunctionToken(int) Sum: ~240 sample So you are right, you can see that with any profiling tool. Oliver could you review the patch?

Zoltan Herczeg

Comment 49 2012-01-19 14:30:27 PST

Any thoughts?

Zoltan Herczeg

Comment 50 2012-01-23 12:13:13 PST

(In reply to comment #49) > Any thoughts? There is quite a silence here. Can I interpret that as the patch is ready and no more changes are needed?

Oliver Hunt

Comment 51 2012-01-23 12:57:18 PST

(In reply to comment #50) > (In reply to comment #49) > > Any thoughts? > > There is quite a silence here. Can I interpret that as the patch is ready and no more changes are needed? Erk, sorry, i've had plague for the last few days, i'll have a quick look through again, but given i've done the last few rounds of review i'd like at least one other set of eyes to go over it as well -- maybe antti?

Oliver Hunt

Comment 52 2012-01-23 13:09:11 PST

Comment on attachment 122006 [details] next patch Okay I'm convinced this patch is sound, and would give it an r+, I'd like someone else to go through just to double check as I think i've been the only reviewer for the last few revs. Antti art thou up for it? I'll r+ tomorrow if no one objects (anyone knowing the area can r+ if they feel confident in it as well, in the knowledge that i think the patch is sane). Would be nice to know what the raw perf improvement is if you throw a giant pile of css at this though. eg. collect a few hundred meg of css files, concatenate them, and see what overall parsing time becomes.

Antti Koivisto

Comment 53 2012-01-23 14:34:59 PST

Sure, I'll check it out tomorrow.

Zoltan Herczeg

Comment 54 2012-01-24 01:21:45 PST

Created attachment 123705 [details] patch update The previous patch was quite old so I have updated to the latest revision and tested it on Qt. I also made a better ChangeLog. No other changes.

Zoltan Herczeg

Comment 55 2012-01-24 01:22:35 PST

Hey Oliver, thanks for the review!

Antti Koivisto

Comment 56 2012-01-24 06:42:26 PST

Comment on attachment 123705 [details] patch update View in context: https://bugs.webkit.org/attachment.cgi?id=123705&action=review Looks good to me. It would have been nice to see a performance test that shows the improvement (CSS parser is currently big enough CPU user that it should be very possible to construct a synthetic test that shows the progression). Setting to r+ based on Oliver's comment. > Source/WebCore/css/CSSParser.cpp:7953 > + if (hasEscape) > + return; > + > + switch (length) { > + case 12: > + if (isCaselessEqual(name + 2, "ottom-left")) > + m_token = BOTTOMLEFT_SYM; > + return; > + > + case 13: I would prefer less empty lines, here and other places.

Alexey Proskuryakov

Comment 57 2012-01-24 13:04:10 PST

Comment on attachment 123705 [details] patch update View in context: https://bugs.webkit.org/attachment.cgi?id=123705&action=review > Source/WebCore/css/CSSParser.cpp:181 > +static inline UChar lowercaseCharacter(UChar chr) "chr"? Please don't abbreviate like this. I suggest putting this in ASCIICType.h as toASCIILowerUnchecked(). > Source/WebCore/css/CSSParser.cpp:190 > +static inline bool isCaselessEqual(UChar cssCharacter, UChar character) Ditto. > Source/WebCore/css/CSSParser.cpp:7489 > +static inline bool isCSSLetter(UChar chr) Please don't abbreviate "character". > Source/WebCore/css/CSSParser.cpp:7491 > + return chr >= 128 || typesOfASCIICharacters[chr] <= CharacterDash; Even U+FFFD and unassigned code points are CSS letters? > Source/WebCore/css/CSSParser.cpp:7494 > +static inline bool isCSSEscape(UChar chr) Repeated several more times in the patch. > Source/WebCore/css/CSSParser.cpp:7510 > +static inline bool isCaselessEqual(UChar* cssString, const char* constantString) What's important to say in function name is that it's ASCII case insensitive comparison (which is different from Unicode one even for ASCII characters!), and that this depends on CSS syntax. Otherwise, call sites get really confusing. > Source/WebCore/css/CSSParser.cpp:7527 > +static UChar* checkEscape(UChar* data) A check function can return false or true. What does this function do? > Source/WebCore/css/CSSParser.cpp:7543 > + if (isHTMLSpace(*data)) Are HTML and CSS whitespace definitions the same? > Source/WebCore/css/CSSParser.cpp:7550 > +static inline UChar* skipWhiteSpaces(UChar* data) Should be named "skipWhiteSpace" (there is even a CSS keyword skip-white-space, not to mention that this is how the function is named elsewhere in WebCore). > Source/WebCore/css/CSSParser.cpp:7552 > + while (*data <= ' ' && (typesOfASCIICharacters[*data] == CharacterWhiteSpace)) This is a yet another definition of whitespace, with all lower ASCII code being treated as such. If this is right, please explain in a comment. > Source/WebCore/css/CSSParser.cpp:7565 > +inline UChar* CSSParser::checkString(UChar* data, UChar quote) Please use a name that implies returning a UChar*, not a boolean. > Source/WebCore/css/CSSParser.cpp:7619 > +inline bool CSSParser::parseIdentifier(UChar*& result) Some added "parse" functions are void, but this returns a boolean. This is super confusing. Is there a way to have uniform interface for these? Or names that explain the difference? > Source/WebCore/css/CSSParser.cpp:7775 > +inline void CSSParser::detectFunctionToken(int length) Does this actually detect the token? Looks like this function is called after a function has been detected, not to detect it. > Source/WebCore/css/CSSParser.h:383 > + OwnArrayPtr<UChar> m_dataStart; > + UChar* m_data; What is m_data, is it actually m_nextCharacter?

Zoltan Herczeg

Comment 58 2012-01-25 05:59:25 PST

AP ythanks for the in-depth review. I am agree all your name change suggestions. > > Source/WebCore/css/CSSParser.cpp:7491 > > + return chr >= 128 || typesOfASCIICharacters[chr] <= CharacterDash; > > Even U+FFFD and unassigned code points are CSS letters? No they are not letters. Characters > 127 are called non-ASCII in CSS, and they all treated as part of literals regardless what they are. > > Source/WebCore/css/CSSParser.cpp:7543 > > + if (isHTMLSpace(*data)) > > Are HTML and CSS whitespace definitions the same? Yes. It is used several other places in the CSS parser. > > > Source/WebCore/css/CSSParser.cpp:7552 > > + while (*data <= ' ' && (typesOfASCIICharacters[*data] == CharacterWhiteSpace)) > > This is a yet another definition of whitespace, with all lower ASCII code being treated as such. If this is right, please explain in a comment. I changed it to isHTMLSpace. > > Source/WebCore/css/CSSParser.cpp:7619 > > +inline bool CSSParser::parseIdentifier(UChar*& result) > > Some added "parse" functions are void, but this returns a boolean. This is super confusing. Is there a way to have uniform interface for these? Or names that explain the difference? The return value is moved to bool& argument > What is m_data, is it actually m_nextCharacter? It is the currentCharacter more precisely. I renamed to that.

Zoltan Herczeg

Comment 59 2012-01-25 06:01:18 PST

Created attachment 123928 [details] patch with renamings AP, is this patch ok?

WebKit Review Bot

Comment 60 2012-01-25 06:03:16 PST

Attachment 123928 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'ChangeLog', u'Source/JavaScriptCore/Change..." exit_code: 1 Source/WebCore/css/CSSParser.cpp:7510: Use 0 or null instead of NULL (even in *comments*). [readability/null] [4] Source/WebCore/css/CSSParser.cpp:7550: Use 0 or null instead of NULL (even in *comments*). [readability/null] [4] Total errors found: 2 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.

Antti Koivisto

Comment 61 2012-01-25 07:13:59 PST

(In reply to comment #57) > > Source/WebCore/css/CSSParser.cpp:181 > > +static inline UChar lowercaseCharacter(UChar chr) > > "chr"? Please don't abbreviate like this. I disagree with this comment. In type of code where a few variable names are repeated a lot (like parsers) using shorter names improves readability. We use 'i' instead of 'index' as the standard loop variable name for the same reason.

Zoltan Herczeg

Comment 62 2012-01-28 04:55:21 PST

Alexey, do you have any more comments to the patch?

Oliver Hunt

Comment 63 2012-01-29 11:17:03 PST

Comment on attachment 123928 [details] patch with renamings Fix the style bot complaints and r=me, and apparently antti (from the prior one), and given the absence of response from ap i'll assume he's also okay with the current names :D

Zoltan Herczeg

Comment 64 2012-01-30 00:03:03 PST

Thank you for the reviews! Fixed the style issues and landed as http://trac.webkit.org/changeset/106217

Simon Fraser (smfr)

Comment 65 2012-01-30 01:22:06 PST

Are there any pages where we think this is a measurable win, that we could use for perf testing?

Csaba Osztrogonác

Comment 66 2012-01-30 01:25:00 PST

Reopen, because it made css2.1/20110323/eof-002.htm flakey. Sometimes pass, sometimes crash, sometimes fail ... You can easily reproduce the bug with the following command: $ Tools/Scripts/run-webkit-tests css2.1/20110323/eof-002.htm --iterations 100 --run-singly

Zoltan Herczeg

Comment 67 2012-01-30 03:20:10 PST

(In reply to comment #66) > Reopen, because it made css2.1/20110323/eof-002.htm flakey. Sometimes pass, sometimes crash, sometimes fail ... You can easily reproduce the bug with the following command: Thanks Csaba, valgrind also got it. Buildfix landed as http://trac.webkit.org/changeset/106227

Zoltan Herczeg

Comment 68 2012-01-30 05:12:06 PST

Created attachment 124528 [details] parsing test The patch contains the <style> rules from the most popular 25 web-sites (~270k). The test parses it 100 times, and display the time.

Zoltan Herczeg

Comment 69 2012-01-30 05:13:23 PST

> The patch contains the <style> rules from the most popular 25 web-sites (~270k). The test parses it 100 times, and display the time. Oh this is not a patch just a simple page in a zipped file.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Enhancement

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component CSS

Assignee

Nobody

Reported

2011-10-14 06:35 PDT

Modified

2012-05-10 22:35 PDT History

CC List

21 users Show

URL

Keywords

Depends on

69083

Blocks

Dependencies

tree graph

abarth
benjamin
dglazkov
donggwan.kim
eric.carlson
hausmann
kennyluck
kling
koivisto
macpherson
mathias
ojan
oliver
ossy
peter
psolanki
rakuco
simon.fraser
tonikitoo
vestbo
webkit.review.bot