Bug 240942
Summary: | Make StringView(const char*) private | ||
---|---|---|---|
Product: | WebKit | Reporter: | Chris Dumez <cdumez> |
Component: | Web Template Framework | Assignee: | Chris Dumez <cdumez> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | darin, Hironori.Fujii, webkit-bug-importer |
Priority: | P2 | Keywords: | InRadar |
Version: | WebKit Nightly Build | ||
Hardware: | Unspecified | ||
OS: | Unspecified |
Chris Dumez
Make StringView(const char*) private and update existing call sites to use either StringView(ASCIILiteral) or StringView::fromLatin1(const char*).
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Chris Dumez
Pull request: https://github.com/WebKit/WebKit/pull/1040
EWS
Committed r294916 (251034@main): <https://commits.webkit.org/251034@main>
Reviewed commits have been landed. Closing PR #1040 and removing active labels.
Radar WebKit Bug Importer
<rdar://problem/94007259>
Fujii Hironori
bug#250017 is going to reimplement makeString with using more StringView internally, and makeString accepts a char pointer.
What is the reason you disallow constructing a StringView from a char pointer?
Why do we have to construct ASCIILiteral just to pass into a StringView?
Chris Dumez
(In reply to Fujii Hironori from comment #4)
> bug#250017 is going to reimplement makeString with using more StringView
> internally, and makeString accepts a char pointer.
> What is the reason you disallow constructing a StringView from a char
> pointer?
> Why do we have to construct ASCIILiteral just to pass into a StringView?
As I remember it, we merely replaced the `StringView(const char*)` constructor with a `StringView::fromLatin1(const char*)` function, to encourage people to use `StringView(ASCIILiteral)` whenever they are actually dealing with a string literal.
If you not dealing with a string literal and just dealing with a `const char*`, just use `StringView::fromLatin1(const char*)`. Don't construct an ASCIILiteral from something that is not a string literal, that would lead to bugs.
Fujii Hironori
(In reply to Chris Dumez from comment #5)
> As I remember it, we merely replaced the `StringView(const char*)`
> constructor with a `StringView::fromLatin1(const char*)` function, to
> encourage people to use `StringView(ASCIILiteral)` whenever they are
> actually dealing with a string literal.
In my understanding, we use ASCIILiteral to pass a string literal into a String.
> If you not dealing with a string literal and just dealing with a `const
> char*`, just use `StringView::fromLatin1(const char*)`.
`StringView::fromLatin1(const char*)` just calls `StringView::StringView(const char*)` constructor.
> Don't construct an
> ASCIILiteral from something that is not a string literal, that would lead to
> bugs.
Right.
But, it's not a problem to construct a StringView from a char* , something that is not a string literal.
Chris Dumez
(In reply to Fujii Hironori from comment #6)
> (In reply to Chris Dumez from comment #5)
> > As I remember it, we merely replaced the `StringView(const char*)`
> > constructor with a `StringView::fromLatin1(const char*)` function, to
> > encourage people to use `StringView(ASCIILiteral)` whenever they are
> > actually dealing with a string literal.
>
> In my understanding, we use ASCIILiteral to pass a string literal into a
> String.
We use ASCIILiteral to represent any string literal in WebKit, whether to construct a String or something else. This allows the recipient code to make some optimization if appropriate.
>
> > If you not dealing with a string literal and just dealing with a `const
> > char*`, just use `StringView::fromLatin1(const char*)`.
>
> `StringView::fromLatin1(const char*)` just calls
> `StringView::StringView(const char*)` constructor.
Yes, so what's the problem with you using it? It does what you want. It just makes it clearer that we're treating the input as latin1, which we thought was a good idea.
>
> > Don't construct an
> > ASCIILiteral from something that is not a string literal, that would lead to
> > bugs.
>
> Right.
> But, it's not a problem to construct a StringView from a char* , something
> that is not a string literal.
It is not a problem. We just wanted to be explicit about it being treated as latin1 iirc, exactly like we did for String.
Chris Dumez
(In reply to Chris Dumez from comment #7)
> (In reply to Fujii Hironori from comment #6)
> > (In reply to Chris Dumez from comment #5)
> > > As I remember it, we merely replaced the `StringView(const char*)`
> > > constructor with a `StringView::fromLatin1(const char*)` function, to
> > > encourage people to use `StringView(ASCIILiteral)` whenever they are
> > > actually dealing with a string literal.
> >
> > In my understanding, we use ASCIILiteral to pass a string literal into a
> > String.
>
> We use ASCIILiteral to represent any string literal in WebKit, whether to
> construct a String or something else. This allows the recipient code to make
> some optimization if appropriate.
>
> >
> > > If you not dealing with a string literal and just dealing with a `const
> > > char*`, just use `StringView::fromLatin1(const char*)`.
> >
> > `StringView::fromLatin1(const char*)` just calls
> > `StringView::StringView(const char*)` constructor.
>
> Yes, so what's the problem with you using it? It does what you want. It just
> makes it clearer that we're treating the input as latin1, which we thought
> was a good idea.
>
> >
> > > Don't construct an
> > > ASCIILiteral from something that is not a string literal, that would lead to
> > > bugs.
> >
> > Right.
> > But, it's not a problem to construct a StringView from a char* , something
> > that is not a string literal.
>
> It is not a problem. We just wanted to be explicit about it being treated as
> latin1 iirc, exactly like we did for String.
I added Darin in CC just in case. I *think* this change was one he suggested, with the reasoning I cited in my previous comment iirc.
Fujii Hironori
Please grep "makeString". WebKit is using a lot of string literals in it, not with ""_s.
https://searchfox.org/wubkat/search?q=makeString&path=&case=false®exp=false
If a function take a StringView, it'd be nice if we can call it with string literal without ""_s. That's the reason makeString accepts char*, I think.
Chris Dumez
(In reply to Fujii Hironori from comment #9)
> Please grep "makeString". WebKit is using a lot of string literals in it,
> not with ""_s.
> https://searchfox.org/wubkat/
> search?q=makeString&path=&case=false®exp=false
Yes, because we haven't gotten around to fixing this code yet. Nowadays, the expectation in WebKit is that you should be using the _s suffix whenever you are dealing with a String literal.
> If a function take a StringView, it'd be nice if we can call it with string
> literal without ""_s. That's the reason makeString accepts char*, I think.
Darin Adler
(In reply to Fujii Hironori from comment #9)
> If a function take a StringView, it'd be nice if we can call it with string
> literal without ""_s. That's the reason makeString accepts char*, I think.
I am not sure I agree about this, going forward. We should probably move away from this.
The reason makeString accepts const char* is historical. When we created it, ASCIILiteral wasn’t as easy to use as it is now. And I must admit that I often use the const char* version because I like not having to add "_s" to the ends of my string literals, even in new code.
But a const char* is ambiguous about encoding. In WebKit code, it could easily be either UTF-8 or Latin-1, or, less-likely some other encoding. But makeString treats const char* as Latin-1. If the string if UTF-8, then it will actively do the wrong thing.
One really great thing about ASCIILiteral is that it states explicitly that all characters will be ASCII. Such strings can be safely passed to functions that treat the bytes as Latin-1, functions that treat the bytes as UTF-8, or even functions that optimize by assuming all bytes are ASCII, and there are no 0x80-0xFF bytes. This is good for both optimizations and correctness.
A direction where we stop using non-ASCIILiteral string literals entirely is likely a good one for WebKit. I think we should consider setting some guidelines for new code so we can make this transition.
We have had many mistakes where we treat UTF-8 strings as Latin-1 accidentally and it would be good to make this error much more difficult in the future.
We may even want to add UTF8Literal at some point!
Darin Adler
So, I approve of this change Chris made.
Fujii Hironori
Makes sense. Thank you.