Tested using the version of Safari deployed on wpt.fyi; I don't have a Mac available while working from home. URL: http://wpt.live/dom/nodes/ParentNode-querySelector-escapes.html Other browsers: Chrome fails, Firefox passes Per spec, selectors are parsed with a preprocessing step that transforms U+0000 to U+FFFD. This does not seem to be implemented, at least in the case tested here.
Looking at our code it looks like we indeed have no preprocessing stage and our CSSParser implementation tries to deal with this fact as a result. E.g.: static bool isNewLine(UChar cc) { // We check \r and \f here, since we have no preprocessing stage return (cc == '\r' || cc == '\n' || cc == '\f'); }
CSSTokenizer::CSSTokenizer(const String& string) : m_input(string) { // According to the spec, we should perform preprocessing here. // See: http://dev.w3.org/csswg/css-syntax/#input-preprocessing // // However, we can skip this step since: // * We're using HTML spaces (which accept \r and \f as a valid white space) // * Do not count white spaces // * CSSTokenizerInputStream::nextInputChar() replaces NULLs for replacement characters Clearly it is not 100% working though.
Created attachment 395820 [details] Patch
Created attachment 395844 [details] Patch
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review > Source/WebCore/css/parser/CSSTokenizer.cpp:44 > +static String& preprocessString(String& string) Doesn’t seem like a great pattern to take a String& and also return it. I suggest we take either String&& or const String& and return a String. > Source/WebCore/css/parser/CSSTokenizer.cpp:48 > + return string.replace('\0', 0xFFFD); We have a named constant for this, replacementCharacter, in <wtf/unicode/CharacterNames.h>. Let's use that. > Source/WebCore/css/parser/CSSTokenizer.cpp:208 > + if (m_input.peek(0) == '!' > + && m_input.peek(1) == '-' > + && m_input.peek(2) == '-') { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.cpp:227 > + if (m_input.peek(0) == '-' > + && m_input.peek(1) == '>') { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.cpp:330 > + if (m_input.peek(0) == '+' > + && (isASCIIHexDigit(m_input.peek(1)) || m_input.peek(1) == '?')) { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.h:49 > - explicit CSSTokenizer(const String&); > - CSSTokenizer(const String&, CSSParserObserverWrapper&); // For the inspector > + explicit CSSTokenizer(String); > + CSSTokenizer(String, CSSParserObserverWrapper&); // For the inspector We could have made this change without changing the argument type; it would simple cost a little reference count churn. If we want to change the interface to avoid the churn, then I suggest a String&& argument instead of a String argument. Or maybe you just like the simplicity of just String? > Source/WebCore/css/parser/CSSTokenizerInputStream.h:50 > return '\0'; Messy that we have kEndOfFileMarker but don’t use it much. > Source/WebCore/css/parser/CSSTokenizerInputStream.h:56 > + UChar peek(unsigned lookaheadOffset) const I almost wish this was just operator[] instead a named function.
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >> +static String& preprocessString(String& string) > > Doesn’t seem like a great pattern to take a String& and also return it. > > I suggest we take either String&& or const String& and return a String. I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. >> Source/WebCore/css/parser/CSSTokenizer.cpp:48 >> + return string.replace('\0', 0xFFFD); > > We have a named constant for this, replacementCharacter, in <wtf/unicode/CharacterNames.h>. Let's use that. Ok. >> Source/WebCore/css/parser/CSSTokenizer.cpp:208 >> + && m_input.peek(2) == '-') { > > We could make this a one-liner now, easier to read. Will fix those.
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>> +static String& preprocessString(String& string) >> >> Doesn’t seem like a great pattern to take a String& and also return it. >> >> I suggest we take either String&& or const String& and return a String. > > I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. Sure, lets just use m_input instead of string inside the constructors.
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>>> +static String& preprocessString(String& string) >>> >>> Doesn’t seem like a great pattern to take a String& and also return it. >>> >>> I suggest we take either String&& or const String& and return a String. >> >> I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. > > Sure, lets just use m_input instead of string inside the constructors. m_input is not a String though and does not really store the String either (just the StringImpl).
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>>>> +static String& preprocessString(String& string) >>>> >>>> Doesn’t seem like a great pattern to take a String& and also return it. >>>> >>>> I suggest we take either String&& or const String& and return a String. >>> >>> I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. >> >> Sure, lets just use m_input instead of string inside the constructors. > > m_input is not a String though and does not really store the String either (just the StringImpl). The proposal I have is to make the constructors private and add a makeCSSTokenizer() free function that would construct one for you, preprocessing the string before passing it to the constructors.
Created attachment 395867 [details] Patch
Comment on attachment 395867 [details] Patch Some inspector tests are crashing. I must have introduced a bug in this latest iteration.
Created attachment 395881 [details] Patch
Committed r259773: <https://trac.webkit.org/changeset/259773> All reviewed patches have been landed. Closing bug and clearing flags on attachment 395881 [details].
<rdar://problem/61488027>