Summary: | querySelector("#\u0000") should match an element with ID U+FFFD | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Domenic Denicola <d> | ||||||||||
Component: | CSS | Assignee: | Chris Dumez <cdumez> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | benjamin, cdumez, cmarcelo, darin, eric.carlson, esprehn+autocc, ews-watchlist, ggaren, glenn, gyuyoung.kim, jer.noble, koivisto, macpherson, menard, philipj, sergio, simon.fraser, webkit-bug-importer, zalan | ||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||
Version: | Safari 13 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
URL: | https://drafts.csswg.org/css-syntax/#input-preprocessing | ||||||||||||
Attachments: |
|
Description
Domenic Denicola
2020-04-07 09:24:53 PDT
Looking at our code it looks like we indeed have no preprocessing stage and our CSSParser implementation tries to deal with this fact as a result. E.g.: static bool isNewLine(UChar cc) { // We check \r and \f here, since we have no preprocessing stage return (cc == '\r' || cc == '\n' || cc == '\f'); } CSSTokenizer::CSSTokenizer(const String& string) : m_input(string) { // According to the spec, we should perform preprocessing here. // See: http://dev.w3.org/csswg/css-syntax/#input-preprocessing // // However, we can skip this step since: // * We're using HTML spaces (which accept \r and \f as a valid white space) // * Do not count white spaces // * CSSTokenizerInputStream::nextInputChar() replaces NULLs for replacement characters Clearly it is not 100% working though. Created attachment 395820 [details]
Patch
Created attachment 395844 [details]
Patch
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review > Source/WebCore/css/parser/CSSTokenizer.cpp:44 > +static String& preprocessString(String& string) Doesn’t seem like a great pattern to take a String& and also return it. I suggest we take either String&& or const String& and return a String. > Source/WebCore/css/parser/CSSTokenizer.cpp:48 > + return string.replace('\0', 0xFFFD); We have a named constant for this, replacementCharacter, in <wtf/unicode/CharacterNames.h>. Let's use that. > Source/WebCore/css/parser/CSSTokenizer.cpp:208 > + if (m_input.peek(0) == '!' > + && m_input.peek(1) == '-' > + && m_input.peek(2) == '-') { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.cpp:227 > + if (m_input.peek(0) == '-' > + && m_input.peek(1) == '>') { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.cpp:330 > + if (m_input.peek(0) == '+' > + && (isASCIIHexDigit(m_input.peek(1)) || m_input.peek(1) == '?')) { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.h:49 > - explicit CSSTokenizer(const String&); > - CSSTokenizer(const String&, CSSParserObserverWrapper&); // For the inspector > + explicit CSSTokenizer(String); > + CSSTokenizer(String, CSSParserObserverWrapper&); // For the inspector We could have made this change without changing the argument type; it would simple cost a little reference count churn. If we want to change the interface to avoid the churn, then I suggest a String&& argument instead of a String argument. Or maybe you just like the simplicity of just String? > Source/WebCore/css/parser/CSSTokenizerInputStream.h:50 > return '\0'; Messy that we have kEndOfFileMarker but don’t use it much. > Source/WebCore/css/parser/CSSTokenizerInputStream.h:56 > + UChar peek(unsigned lookaheadOffset) const I almost wish this was just operator[] instead a named function. Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >> +static String& preprocessString(String& string) > > Doesn’t seem like a great pattern to take a String& and also return it. > > I suggest we take either String&& or const String& and return a String. I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. >> Source/WebCore/css/parser/CSSTokenizer.cpp:48 >> + return string.replace('\0', 0xFFFD); > > We have a named constant for this, replacementCharacter, in <wtf/unicode/CharacterNames.h>. Let's use that. Ok. >> Source/WebCore/css/parser/CSSTokenizer.cpp:208 >> + && m_input.peek(2) == '-') { > > We could make this a one-liner now, easier to read. Will fix those. Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>> +static String& preprocessString(String& string) >> >> Doesn’t seem like a great pattern to take a String& and also return it. >> >> I suggest we take either String&& or const String& and return a String. > > I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. Sure, lets just use m_input instead of string inside the constructors. Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>>> +static String& preprocessString(String& string) >>> >>> Doesn’t seem like a great pattern to take a String& and also return it. >>> >>> I suggest we take either String&& or const String& and return a String. >> >> I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. > > Sure, lets just use m_input instead of string inside the constructors. m_input is not a String though and does not really store the String either (just the StringImpl). Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>>>> +static String& preprocessString(String& string) >>>> >>>> Doesn’t seem like a great pattern to take a String& and also return it. >>>> >>>> I suggest we take either String&& or const String& and return a String. >>> >>> I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. >> >> Sure, lets just use m_input instead of string inside the constructors. > > m_input is not a String though and does not really store the String either (just the StringImpl). The proposal I have is to make the constructors private and add a makeCSSTokenizer() free function that would construct one for you, preprocessing the string before passing it to the constructors. Created attachment 395867 [details]
Patch
Comment on attachment 395867 [details]
Patch
Some inspector tests are crashing. I must have introduced a bug in this latest iteration.
Created attachment 395881 [details]
Patch
Committed r259773: <https://trac.webkit.org/changeset/259773> All reviewed patches have been landed. Closing bug and clearing flags on attachment 395881 [details]. |