Bug 210119

Summary: querySelector("#\u0000") should match an element with ID U+FFFD
Product: WebKit Reporter: Domenic Denicola <d>
Component: CSSAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, darin, eric.carlson, esprehn+autocc, ews-watchlist, ggaren, glenn, gyuyoung.kim, jer.noble, koivisto, macpherson, menard, philipj, sergio, simon.fraser, webkit-bug-importer, zalan
Priority: P2 Keywords: InRadar
Version: Safari 13   
Hardware: Unspecified   
OS: Unspecified   
URL: https://drafts.csswg.org/css-syntax/#input-preprocessing
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch none

Domenic Denicola
Reported 2020-04-07 09:24:53 PDT
Tested using the version of Safari deployed on wpt.fyi; I don't have a Mac available while working from home. URL: http://wpt.live/dom/nodes/ParentNode-querySelector-escapes.html Other browsers: Chrome fails, Firefox passes Per spec, selectors are parsed with a preprocessing step that transforms U+0000 to U+FFFD. This does not seem to be implemented, at least in the case tested here.
Attachments
Patch (29.21 KB, patch)
2020-04-08 09:38 PDT, Chris Dumez
no flags
Patch (29.27 KB, patch)
2020-04-08 11:27 PDT, Chris Dumez
no flags
Patch (39.73 KB, patch)
2020-04-08 14:09 PDT, Chris Dumez
no flags
Patch (31.10 KB, patch)
2020-04-08 16:15 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2020-04-07 12:06:35 PDT
Looking at our code it looks like we indeed have no preprocessing stage and our CSSParser implementation tries to deal with this fact as a result. E.g.: static bool isNewLine(UChar cc) { // We check \r and \f here, since we have no preprocessing stage return (cc == '\r' || cc == '\n' || cc == '\f'); }
Chris Dumez
Comment 2 2020-04-07 13:29:42 PDT
CSSTokenizer::CSSTokenizer(const String& string) : m_input(string) { // According to the spec, we should perform preprocessing here. // See: http://dev.w3.org/csswg/css-syntax/#input-preprocessing // // However, we can skip this step since: // * We're using HTML spaces (which accept \r and \f as a valid white space) // * Do not count white spaces // * CSSTokenizerInputStream::nextInputChar() replaces NULLs for replacement characters Clearly it is not 100% working though.
Chris Dumez
Comment 3 2020-04-08 09:38:14 PDT
Chris Dumez
Comment 4 2020-04-08 11:27:28 PDT
Darin Adler
Comment 5 2020-04-08 12:09:44 PDT
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review > Source/WebCore/css/parser/CSSTokenizer.cpp:44 > +static String& preprocessString(String& string) Doesn’t seem like a great pattern to take a String& and also return it. I suggest we take either String&& or const String& and return a String. > Source/WebCore/css/parser/CSSTokenizer.cpp:48 > + return string.replace('\0', 0xFFFD); We have a named constant for this, replacementCharacter, in <wtf/unicode/CharacterNames.h>. Let's use that. > Source/WebCore/css/parser/CSSTokenizer.cpp:208 > + if (m_input.peek(0) == '!' > + && m_input.peek(1) == '-' > + && m_input.peek(2) == '-') { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.cpp:227 > + if (m_input.peek(0) == '-' > + && m_input.peek(1) == '>') { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.cpp:330 > + if (m_input.peek(0) == '+' > + && (isASCIIHexDigit(m_input.peek(1)) || m_input.peek(1) == '?')) { We could make this a one-liner now, easier to read. > Source/WebCore/css/parser/CSSTokenizer.h:49 > - explicit CSSTokenizer(const String&); > - CSSTokenizer(const String&, CSSParserObserverWrapper&); // For the inspector > + explicit CSSTokenizer(String); > + CSSTokenizer(String, CSSParserObserverWrapper&); // For the inspector We could have made this change without changing the argument type; it would simple cost a little reference count churn. If we want to change the interface to avoid the churn, then I suggest a String&& argument instead of a String argument. Or maybe you just like the simplicity of just String? > Source/WebCore/css/parser/CSSTokenizerInputStream.h:50 > return '\0'; Messy that we have kEndOfFileMarker but don’t use it much. > Source/WebCore/css/parser/CSSTokenizerInputStream.h:56 > + UChar peek(unsigned lookaheadOffset) const I almost wish this was just operator[] instead a named function.
Chris Dumez
Comment 6 2020-04-08 12:25:02 PDT
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >> +static String& preprocessString(String& string) > > Doesn’t seem like a great pattern to take a String& and also return it. > > I suggest we take either String&& or const String& and return a String. I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. >> Source/WebCore/css/parser/CSSTokenizer.cpp:48 >> + return string.replace('\0', 0xFFFD); > > We have a named constant for this, replacementCharacter, in <wtf/unicode/CharacterNames.h>. Let's use that. Ok. >> Source/WebCore/css/parser/CSSTokenizer.cpp:208 >> + && m_input.peek(2) == '-') { > > We could make this a one-liner now, easier to read. Will fix those.
Darin Adler
Comment 7 2020-04-08 12:34:32 PDT
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>> +static String& preprocessString(String& string) >> >> Doesn’t seem like a great pattern to take a String& and also return it. >> >> I suggest we take either String&& or const String& and return a String. > > I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. Sure, lets just use m_input instead of string inside the constructors.
Chris Dumez
Comment 8 2020-04-08 12:44:30 PDT
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>>> +static String& preprocessString(String& string) >>> >>> Doesn’t seem like a great pattern to take a String& and also return it. >>> >>> I suggest we take either String&& or const String& and return a String. >> >> I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. > > Sure, lets just use m_input instead of string inside the constructors. m_input is not a String though and does not really store the String either (just the StringImpl).
Chris Dumez
Comment 9 2020-04-08 12:46:44 PDT
Comment on attachment 395844 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=395844&action=review >>>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>>>> +static String& preprocessString(String& string) >>>> >>>> Doesn’t seem like a great pattern to take a String& and also return it. >>>> >>>> I suggest we take either String&& or const String& and return a String. >>> >>> I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. >> >> Sure, lets just use m_input instead of string inside the constructors. > > m_input is not a String though and does not really store the String either (just the StringImpl). The proposal I have is to make the constructors private and add a makeCSSTokenizer() free function that would construct one for you, preprocessing the string before passing it to the constructors.
Chris Dumez
Comment 10 2020-04-08 14:09:51 PDT
Chris Dumez
Comment 11 2020-04-08 15:30:24 PDT
Comment on attachment 395867 [details] Patch Some inspector tests are crashing. I must have introduced a bug in this latest iteration.
Chris Dumez
Comment 12 2020-04-08 16:15:16 PDT
EWS
Comment 13 2020-04-08 17:43:54 PDT
Committed r259773: <https://trac.webkit.org/changeset/259773> All reviewed patches have been landed. Closing bug and clearing flags on attachment 395881 [details].
Radar WebKit Bug Importer
Comment 14 2020-04-08 17:44:15 PDT
Note You need to log in before you can comment on or make changes to this bug.