Bug 240942 - Make StringView(const char*) private
Summary: Make StringView(const char*) private
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-05-25 20:52 PDT by Chris Dumez
Modified: 2023-01-27 13:25 PST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dumez 2022-05-25 20:52:10 PDT
Make StringView(const char*) private and update existing call sites to use either StringView(ASCIILiteral) or StringView::fromLatin1(const char*).
Comment 1 Chris Dumez 2022-05-25 21:20:12 PDT
Pull request: https://github.com/WebKit/WebKit/pull/1040
Comment 2 EWS 2022-05-26 17:50:39 PDT
Committed r294916 (251034@main): <https://commits.webkit.org/251034@main>

Reviewed commits have been landed. Closing PR #1040 and removing active labels.
Comment 3 Radar WebKit Bug Importer 2022-05-26 17:51:15 PDT
<rdar://problem/94007259>
Comment 4 Fujii Hironori 2023-01-27 04:16:10 PST
bug#250017 is going to reimplement makeString with using more StringView internally, and makeString accepts a char pointer.
What is the reason you disallow constructing a StringView from a char pointer?
Why do we have to construct ASCIILiteral just to pass into a StringView?
Comment 5 Chris Dumez 2023-01-27 07:46:24 PST
(In reply to Fujii Hironori from comment #4)
> bug#250017 is going to reimplement makeString with using more StringView
> internally, and makeString accepts a char pointer.
> What is the reason you disallow constructing a StringView from a char
> pointer?
> Why do we have to construct ASCIILiteral just to pass into a StringView?

As I remember it, we merely replaced the `StringView(const char*)` constructor with a `StringView::fromLatin1(const char*)` function, to encourage people to use `StringView(ASCIILiteral)` whenever they are actually dealing with a string literal.

If you not dealing with a string literal and just dealing with a `const char*`, just use `StringView::fromLatin1(const char*)`. Don't construct an ASCIILiteral from something that is not a string literal, that would lead to bugs.
Comment 6 Fujii Hironori 2023-01-27 12:16:00 PST
(In reply to Chris Dumez from comment #5)
> As I remember it, we merely replaced the `StringView(const char*)`
> constructor with a `StringView::fromLatin1(const char*)` function, to
> encourage people to use `StringView(ASCIILiteral)` whenever they are
> actually dealing with a string literal.

In my understanding, we use ASCIILiteral to pass a string literal into a String.
 
> If you not dealing with a string literal and just dealing with a `const
> char*`, just use `StringView::fromLatin1(const char*)`.

`StringView::fromLatin1(const char*)` just calls `StringView::StringView(const char*)` constructor.

> Don't construct an
> ASCIILiteral from something that is not a string literal, that would lead to
> bugs.

Right.
But, it's not a problem to construct a StringView from a char* , something that is not a string literal.
Comment 7 Chris Dumez 2023-01-27 12:21:17 PST
(In reply to Fujii Hironori from comment #6)
> (In reply to Chris Dumez from comment #5)
> > As I remember it, we merely replaced the `StringView(const char*)`
> > constructor with a `StringView::fromLatin1(const char*)` function, to
> > encourage people to use `StringView(ASCIILiteral)` whenever they are
> > actually dealing with a string literal.
> 
> In my understanding, we use ASCIILiteral to pass a string literal into a
> String.

We use ASCIILiteral to represent any string literal in WebKit, whether to construct a String or something else. This allows the recipient code to make some optimization if appropriate.

>  
> > If you not dealing with a string literal and just dealing with a `const
> > char*`, just use `StringView::fromLatin1(const char*)`.
> 
> `StringView::fromLatin1(const char*)` just calls
> `StringView::StringView(const char*)` constructor.

Yes, so what's the problem with you using it? It does what you want. It just makes it clearer that we're treating the input as latin1, which we thought was a good idea.

> 
> > Don't construct an
> > ASCIILiteral from something that is not a string literal, that would lead to
> > bugs.
> 
> Right.
> But, it's not a problem to construct a StringView from a char* , something
> that is not a string literal.

It is not a problem. We just wanted to be explicit about it being treated as latin1 iirc, exactly like we did for String.
Comment 8 Chris Dumez 2023-01-27 12:22:46 PST
(In reply to Chris Dumez from comment #7)
> (In reply to Fujii Hironori from comment #6)
> > (In reply to Chris Dumez from comment #5)
> > > As I remember it, we merely replaced the `StringView(const char*)`
> > > constructor with a `StringView::fromLatin1(const char*)` function, to
> > > encourage people to use `StringView(ASCIILiteral)` whenever they are
> > > actually dealing with a string literal.
> > 
> > In my understanding, we use ASCIILiteral to pass a string literal into a
> > String.
> 
> We use ASCIILiteral to represent any string literal in WebKit, whether to
> construct a String or something else. This allows the recipient code to make
> some optimization if appropriate.
> 
> >  
> > > If you not dealing with a string literal and just dealing with a `const
> > > char*`, just use `StringView::fromLatin1(const char*)`.
> > 
> > `StringView::fromLatin1(const char*)` just calls
> > `StringView::StringView(const char*)` constructor.
> 
> Yes, so what's the problem with you using it? It does what you want. It just
> makes it clearer that we're treating the input as latin1, which we thought
> was a good idea.
> 
> > 
> > > Don't construct an
> > > ASCIILiteral from something that is not a string literal, that would lead to
> > > bugs.
> > 
> > Right.
> > But, it's not a problem to construct a StringView from a char* , something
> > that is not a string literal.
> 
> It is not a problem. We just wanted to be explicit about it being treated as
> latin1 iirc, exactly like we did for String.

I added Darin in CC just in case. I *think* this change was one he suggested, with the reasoning I cited in my previous comment iirc.
Comment 9 Fujii Hironori 2023-01-27 12:40:14 PST
Please grep "makeString". WebKit is using a lot of string literals in it, not with ""_s.
https://searchfox.org/wubkat/search?q=makeString&path=&case=false&regexp=false

If a function take a StringView, it'd be nice if we can call it with string literal without ""_s. That's the reason makeString accepts char*, I think.
Comment 10 Chris Dumez 2023-01-27 12:41:31 PST
(In reply to Fujii Hironori from comment #9)
> Please grep "makeString". WebKit is using a lot of string literals in it,
> not with ""_s.
> https://searchfox.org/wubkat/
> search?q=makeString&path=&case=false&regexp=false

Yes, because we haven't gotten around to fixing this code yet. Nowadays, the expectation in WebKit is that you should be using the _s suffix whenever you are dealing with a String literal.

> If a function take a StringView, it'd be nice if we can call it with string
> literal without ""_s. That's the reason makeString accepts char*, I think.
Comment 11 Darin Adler 2023-01-27 12:50:16 PST
(In reply to Fujii Hironori from comment #9)
> If a function take a StringView, it'd be nice if we can call it with string
> literal without ""_s. That's the reason makeString accepts char*, I think.

I am not sure I agree about this, going forward. We should probably move away from this.

The reason makeString accepts const char* is historical. When we created it, ASCIILiteral wasn’t as easy to use as it is now. And I must admit that I often use the const char* version because I like not having to add "_s" to the ends of my string literals, even in new code.

But a const char* is ambiguous about encoding. In WebKit code, it could easily be either UTF-8 or Latin-1, or, less-likely some other encoding. But makeString treats const char* as Latin-1. If the string if UTF-8, then it will actively do the wrong thing.

One really great thing about ASCIILiteral is that it states explicitly that all characters will be ASCII. Such strings can be safely passed to functions that treat the bytes as Latin-1, functions that treat the bytes as UTF-8, or even functions that optimize by assuming all bytes are ASCII, and there are no 0x80-0xFF bytes. This is good for both optimizations and correctness.

A direction where we stop using non-ASCIILiteral string literals entirely is likely a good one for WebKit. I think we should consider setting some guidelines for new code so we can make this transition.

We have had many mistakes where we treat UTF-8 strings as Latin-1 accidentally and it would be good to make this error much more difficult in the future.

We may even want to add UTF8Literal at some point!
Comment 12 Darin Adler 2023-01-27 12:51:01 PST
So, I approve of this change Chris made.
Comment 13 Fujii Hironori 2023-01-27 13:25:41 PST
Makes sense. Thank you.