Bug 224125

Summary: Define AtomString(ASCIILiteral) and use ASCIILiteral more to avoid memory allocation
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: New BugsAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: aboxhall, apinheiro, benjamin, cdumez, cfleizach, changseok, cmarcelo, darin, dino, dmazzoni, eric.carlson, esprehn+autocc, ews-watchlist, fmalita, glenn, gyuyoung.kim, jcraig, jdiggs, jer.noble, kangil.han, mifenton, mmaxfield, pdr, philipj, saam, sabouhallawa, samuel_white, schenney, sergio, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch none

Description Yusuke Suzuki 2021-04-02 12:57:16 PDT
Define AtomString(ASCIILiteral) and use ASCIILiteral more to avoid memory allocation
Comment 1 Yusuke Suzuki 2021-04-02 12:59:08 PDT
Created attachment 425044 [details]
Patch
Comment 2 EWS 2021-04-05 17:09:36 PDT
Committed r275457: <https://commits.webkit.org/r275457>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 425044 [details].
Comment 3 Radar WebKit Bug Importer 2021-04-05 17:10:18 PDT
<rdar://problem/76242929>
Comment 4 Darin Adler 2021-04-05 17:37:08 PDT
When Ben Poulain originally did this work, he selected ConstructFromLiteral over ASCIILiteral as the idiom for this. Has something changed since this time?
Comment 5 Yusuke Suzuki 2021-04-05 19:50:15 PDT
(In reply to Darin Adler from comment #4)
> When Ben Poulain originally did this work, he selected ConstructFromLiteral
> over ASCIILiteral as the idiom for this. Has something changed since this
> time?

I added `_s` user-defined-literal. It made creating ASCIILiteral significantly easy & it made hard to make ASCIILiteral with non-static char*.
Comment 6 Darin Adler 2021-04-06 14:27:55 PDT
(In reply to Yusuke Suzuki from comment #5)
> (In reply to Darin Adler from comment #4)
> > When Ben Poulain originally did this work, he selected ConstructFromLiteral
> > over ASCIILiteral as the idiom for this. Has something changed since this
> > time?
> 
> I added `_s` user-defined-literal. It made creating ASCIILiteral
> significantly easy & it made hard to make ASCIILiteral with non-static char*.

Yes, I remember that.

I don’t think that would have affected Ben’s decision. His choice of ConstructFromLiteral was about whether the length was computed at runtime or pre-computed and passed in to the constructor. Not about ease of use at call sites.
Comment 7 Darin Adler 2021-04-06 14:28:32 PDT
A code size vs. performance tradeoff. ConstructFromLiteral was larger, but faster, when he measured it.
Comment 8 Yusuke Suzuki 2021-04-06 15:24:26 PDT
(In reply to Darin Adler from comment #7)
> A code size vs. performance tradeoff. ConstructFromLiteral was larger, but
> faster, when he measured it.

But ConstructFromLiteral does not work for constexpr string's list like this patch, right?
While ASCIILiteral can be constructed via constexpr, AtomString cannot be. And string-literal can be constructed, but in that case size information in the type is missed because we are listing many string-literals in the list.
And I think this is why we are allocating AtomString in this patch's fixed places: ConstructFromLiteral cannot be applied in these places, and we end up allocating AtomString with heap storage.
So, when comparing between allocating AtomString with heap-storage v.s. ASCIILiteral, I think the latter is better.
Comment 9 Darin Adler 2021-04-06 15:39:20 PDT
(In reply to Yusuke Suzuki from comment #8)
> So, when comparing between allocating AtomString with heap-storage v.s.
> ASCIILiteral, I think the latter is better.

Of course! I am not questioning that.

1) For these case involving a literal, do we need an idiom that supplies the size at compile time, like ConstructFromLiteral, or should we use the one that supplies null-terminated characters at compile time, like ASCIILiteral? Both should be the same for heap memory use, because in both cases we can point to the ASCII literal characters.

2) Do we need to keep ConstructFromLiteral at all? If so, when should we use it instead of ASCIILiteral?
Comment 10 Yusuke Suzuki 2021-04-06 15:56:04 PDT
(In reply to Darin Adler from comment #9)
> (In reply to Yusuke Suzuki from comment #8)
> > So, when comparing between allocating AtomString with heap-storage v.s.
> > ASCIILiteral, I think the latter is better.
> 
> Of course! I am not questioning that.

Yeah!

> 1) For these case involving a literal, do we need an idiom that supplies the
> size at compile time, like ConstructFromLiteral, or should we use the one
> that supplies null-terminated characters at compile time, like ASCIILiteral?
> Both should be the same for heap memory use, because in both cases we can
> point to the ASCII literal characters.

I think the ideal implementation is that getting both benefits.
We should have ASCIILiteralWithSize<N> where N is size of string.
Upcoming C++20 allows us to define such a nice user-defined literal.

In C++20, our ASCIILiteral implementation can be,

template<size_t N>
class ASCIILiteralWithSize final : public ASCIILiteral {
public:
    constexpr ASCIILiteralWithSize(const char (&characters)[N])
        : ASCIILiteral(characters)
    { }

    constexpr unsigned length() { return N; }
};

template<ASCIILiteralWithSize result>
constexpr auto operator"" _x2() -> decltype(auto)
{
    return result;
}

Unfortunately, we are still in C++17 era, so we cannot encode size information into the type since size is passed as a function's parameter (not as a template parameter).
So, I think, in the future, we can remove ConstructFromLiteral completely, and just saying "always use _s".

But I wonder if, in most cases, we can always use ASCIILiteral. Now ASCIILiteral construction functions are well-annotated as constexpr.
I think this is likely that the compiler can just fold `ASCIILiteral::length()` call into a constant since it is invoking system-provided `strlen`, which is specially constant-folded in any C compilers typically.


> 2) Do we need to keep ConstructFromLiteral at all? If so, when should we use
> it instead of ASCIILiteral?

Now ASCIILiteral construction is constexpr-annotated, and the compiler can be encouraged by folding strlen into a constant number.
I'm thinking that using ASCIILiteral / ConstructFromLiteral will not change the output.
Comment 12 Darin Adler 2021-04-06 16:08:54 PDT
(In reply to Yusuke Suzuki from comment #10)
> I think the ideal implementation is that getting both benefits.
> We should have ASCIILiteralWithSize<N> where N is size of string.

You say "both benefits".

But this is a *tradeoff*, not two benefits.

A compiled-in length design (as opposed to a runtime-computed length) where we pass the string length is likely faster because it doesn't scan the characters looking for a null character, but likely creates larger code because the number of characters has to be compiled in, not just a pointer to characters with a null character terminator. I suppose if we found a way to enforce a maximum length of 255 and avoid the null character terminator we could store the length as a byte and avoid the code size difference.

> Now ASCIILiteral construction is constexpr-annotated, and the compiler can
> be encouraged by folding strlen into a constant number.
> I'm thinking that using ASCIILiteral / ConstructFromLiteral will not change
> the output.

I’m pretty sure that it *does* change the output.

I agree that because strlen folding can be done by the compiler, we can possibly make something with the compiled-in length behavior, but with nicer syntax like ASCIILiteral.

I’d like to get rid of the awkward ConstructFromLiteral syntax *entirely*. But I am concerned that in doing so I will hurt performance (speed), although likely will improve code size, because I will move from compiled-in length to runtime-computed length.

I was trying to recruit you to help me think this through.