WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
210119
querySelector("#\u0000") should match an element with ID U+FFFD
https://bugs.webkit.org/show_bug.cgi?id=210119
Summary
querySelector("#\u0000") should match an element with ID U+FFFD
Domenic Denicola
Reported
2020-04-07 09:24:53 PDT
Tested using the version of Safari deployed on wpt.fyi; I don't have a Mac available while working from home. URL:
http://wpt.live/dom/nodes/ParentNode-querySelector-escapes.html
Other browsers: Chrome fails, Firefox passes Per spec, selectors are parsed with a preprocessing step that transforms U+0000 to U+FFFD. This does not seem to be implemented, at least in the case tested here.
Attachments
Patch
(29.21 KB, patch)
2020-04-08 09:38 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(29.27 KB, patch)
2020-04-08 11:27 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(39.73 KB, patch)
2020-04-08 14:09 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(31.10 KB, patch)
2020-04-08 16:15 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Chris Dumez
Comment 1
2020-04-07 12:06:35 PDT
Looking at our code it looks like we indeed have no preprocessing stage and our CSSParser implementation tries to deal with this fact as a result. E.g.: static bool isNewLine(UChar cc) { // We check \r and \f here, since we have no preprocessing stage return (cc == '\r' || cc == '\n' || cc == '\f'); }
Chris Dumez
Comment 2
2020-04-07 13:29:42 PDT
CSSTokenizer::CSSTokenizer(const String& string) : m_input(string) { // According to the spec, we should perform preprocessing here. // See:
http://dev.w3.org/csswg/css-syntax/#input-preprocessing
// // However, we can skip this step since: // * We're using HTML spaces (which accept \r and \f as a valid white space) // * Do not count white spaces // * CSSTokenizerInputStream::nextInputChar() replaces NULLs for replacement characters Clearly it is not 100% working though.
Chris Dumez
Comment 3
2020-04-08 09:38:14 PDT
Created
attachment 395820
[details]
Patch
Chris Dumez
Comment 4
2020-04-08 11:27:28 PDT
Created
attachment 395844
[details]
Patch
Darin Adler
Comment 5
2020-04-08 12:09:44 PDT
Comment on
attachment 395844
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=395844&action=review
> Source/WebCore/css/parser/CSSTokenizer.cpp:44 > +static String& preprocessString(String& string)
Doesn’t seem like a great pattern to take a String& and also return it. I suggest we take either String&& or const String& and return a String.
> Source/WebCore/css/parser/CSSTokenizer.cpp:48 > + return string.replace('\0', 0xFFFD);
We have a named constant for this, replacementCharacter, in <wtf/unicode/CharacterNames.h>. Let's use that.
> Source/WebCore/css/parser/CSSTokenizer.cpp:208 > + if (m_input.peek(0) == '!' > + && m_input.peek(1) == '-' > + && m_input.peek(2) == '-') {
We could make this a one-liner now, easier to read.
> Source/WebCore/css/parser/CSSTokenizer.cpp:227 > + if (m_input.peek(0) == '-' > + && m_input.peek(1) == '>') {
We could make this a one-liner now, easier to read.
> Source/WebCore/css/parser/CSSTokenizer.cpp:330 > + if (m_input.peek(0) == '+' > + && (isASCIIHexDigit(m_input.peek(1)) || m_input.peek(1) == '?')) {
We could make this a one-liner now, easier to read.
> Source/WebCore/css/parser/CSSTokenizer.h:49 > - explicit CSSTokenizer(const String&); > - CSSTokenizer(const String&, CSSParserObserverWrapper&); // For the inspector > + explicit CSSTokenizer(String); > + CSSTokenizer(String, CSSParserObserverWrapper&); // For the inspector
We could have made this change without changing the argument type; it would simple cost a little reference count churn. If we want to change the interface to avoid the churn, then I suggest a String&& argument instead of a String argument. Or maybe you just like the simplicity of just String?
> Source/WebCore/css/parser/CSSTokenizerInputStream.h:50 > return '\0';
Messy that we have kEndOfFileMarker but don’t use it much.
> Source/WebCore/css/parser/CSSTokenizerInputStream.h:56 > + UChar peek(unsigned lookaheadOffset) const
I almost wish this was just operator[] instead a named function.
Chris Dumez
Comment 6
2020-04-08 12:25:02 PDT
Comment on
attachment 395844
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=395844&action=review
>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >> +static String& preprocessString(String& string) > > Doesn’t seem like a great pattern to take a String& and also return it. > > I suggest we take either String&& or const String& and return a String.
I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up.
>> Source/WebCore/css/parser/CSSTokenizer.cpp:48 >> + return string.replace('\0', 0xFFFD); > > We have a named constant for this, replacementCharacter, in <wtf/unicode/CharacterNames.h>. Let's use that.
Ok.
>> Source/WebCore/css/parser/CSSTokenizer.cpp:208 >> + && m_input.peek(2) == '-') { > > We could make this a one-liner now, easier to read.
Will fix those.
Darin Adler
Comment 7
2020-04-08 12:34:32 PDT
Comment on
attachment 395844
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=395844&action=review
>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>> +static String& preprocessString(String& string) >> >> Doesn’t seem like a great pattern to take a String& and also return it. >> >> I suggest we take either String&& or const String& and return a String. > > I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up.
Sure, lets just use m_input instead of string inside the constructors.
Chris Dumez
Comment 8
2020-04-08 12:44:30 PDT
Comment on
attachment 395844
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=395844&action=review
>>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>>> +static String& preprocessString(String& string) >>> >>> Doesn’t seem like a great pattern to take a String& and also return it. >>> >>> I suggest we take either String&& or const String& and return a String. >> >> I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. > > Sure, lets just use m_input instead of string inside the constructors.
m_input is not a String though and does not really store the String either (just the StringImpl).
Chris Dumez
Comment 9
2020-04-08 12:46:44 PDT
Comment on
attachment 395844
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=395844&action=review
>>>>> Source/WebCore/css/parser/CSSTokenizer.cpp:44 >>>>> +static String& preprocessString(String& string) >>>> >>>> Doesn’t seem like a great pattern to take a String& and also return it. >>>> >>>> I suggest we take either String&& or const String& and return a String. >>> >>> I was trying to reduce the amount of refactoring because the constructor uses the string both to initialize m_input (which is a CSSTokenizerInputStream) and then later on in its body. Taking a String& made this convenient. However, I agree this is not a great pattern so I will clean this up. >> >> Sure, lets just use m_input instead of string inside the constructors. > > m_input is not a String though and does not really store the String either (just the StringImpl).
The proposal I have is to make the constructors private and add a makeCSSTokenizer() free function that would construct one for you, preprocessing the string before passing it to the constructors.
Chris Dumez
Comment 10
2020-04-08 14:09:51 PDT
Created
attachment 395867
[details]
Patch
Chris Dumez
Comment 11
2020-04-08 15:30:24 PDT
Comment on
attachment 395867
[details]
Patch Some inspector tests are crashing. I must have introduced a bug in this latest iteration.
Chris Dumez
Comment 12
2020-04-08 16:15:16 PDT
Created
attachment 395881
[details]
Patch
EWS
Comment 13
2020-04-08 17:43:54 PDT
Committed
r259773
: <
https://trac.webkit.org/changeset/259773
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 395881
[details]
.
Radar WebKit Bug Importer
Comment 14
2020-04-08 17:44:15 PDT
<
rdar://problem/61488027
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug