Bug 239585 - Move global AtomStrings to a common header to promote reuse
Summary: Move global AtomStrings to a common header to promote reuse
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-04-20 21:06 PDT by Chris Dumez
Modified: 2022-04-22 21:23 PDT (History)
36 users (show)

See Also:


Attachments
Patch (125.67 KB, patch)
2022-04-20 21:15 PDT, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (125.68 KB, patch)
2022-04-20 21:25 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (125.45 KB, patch)
2022-04-21 08:24 PDT, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (123.74 KB, patch)
2022-04-21 08:38 PDT, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (122.22 KB, patch)
2022-04-21 10:37 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (123.02 KB, patch)
2022-04-21 15:16 PDT, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (124.75 KB, patch)
2022-04-21 16:57 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (119.90 KB, patch)
2022-04-22 17:45 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dumez 2022-04-20 21:06:58 PDT
Move global AtomStrings to a common header to promote reuse.
Comment 1 Chris Dumez 2022-04-20 21:15:26 PDT
Created attachment 458035 [details]
Patch
Comment 2 Chris Dumez 2022-04-20 21:25:52 PDT
Created attachment 458038 [details]
Patch
Comment 3 Chris Dumez 2022-04-21 08:24:34 PDT
Created attachment 458063 [details]
Patch
Comment 4 Chris Dumez 2022-04-21 08:38:35 PDT
Created attachment 458065 [details]
Patch
Comment 5 Chris Dumez 2022-04-21 10:37:11 PDT
Created attachment 458075 [details]
Patch
Comment 6 Chris Dumez 2022-04-21 15:16:48 PDT
Created attachment 458097 [details]
Patch
Comment 7 Chris Dumez 2022-04-21 16:57:21 PDT
Created attachment 458103 [details]
Patch
Comment 8 Geoffrey Garen 2022-04-22 10:54:10 PDT
Comment on attachment 458103 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=458103&action=review

r=me

> Source/WebKit/NetworkProcess/NetworkProcess.cpp:322
> +    WebCore::initializeCommonAtomStrings();

As a reluctant expert in JSC::initialize(), WTF::initialize(), and WTF::initializeMainThread(), I am saddened to see yet another "make sure to call this initialization function at various entrypoints otherwise you will crash". Can we fold common atom string initialization into  some other initialization function? Maybe just fold it into AtomString::init()?

In the ideal world, the set of top-level initialization functions should shrink over time. Maybe we can even get down to just two, one for WebKit and one for JSC.
Comment 9 Chris Dumez 2022-04-22 10:56:24 PDT
(In reply to Geoffrey Garen from comment #8)
> Comment on attachment 458103 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=458103&action=review
> 
> r=me
> 
> > Source/WebKit/NetworkProcess/NetworkProcess.cpp:322
> > +    WebCore::initializeCommonAtomStrings();
> 
> As a reluctant expert in JSC::initialize(), WTF::initialize(), and
> WTF::initializeMainThread(), I am saddened to see yet another "make sure to
> call this initialization function at various entrypoints otherwise you will
> crash". Can we fold common atom string initialization into  some other
> initialization function? Maybe just fold it into AtomString::init()?

AtomString::init() is in WTF/. How could it do initialization for WebCore/?

I agree with the sentiment but not sure how to address it.
Comment 10 Darin Adler 2022-04-22 10:58:48 PDT
Comment on attachment 458103 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=458103&action=review

> Source/WebCore/contentextensions/ContentExtensionParser.cpp:208
>      AtomString::init();
> +    initializeCommonAtomStrings();

Seems like we could make this more foolproof by having one function that calls both AtomString::init and WebCore::initializeCommonAtomStrings instead of having both calls in each place. Could be as simple as having initializeCommonAtomStrings call AtomString::init.

> Source/WebCore/html/HTMLFormElement.cpp:1016
> +    return equalIgnoringASCIICase(attributeWithoutSynchronization(autocompleteAttr), "off") ? offAtom() : onAtom();

This can use equalLettersIgnoringASCIICase for slightly better performance.

> Source/WebCore/platform/CommonAtomStrings.cpp:60
> +    static std::once_flag initializeKey;
> +    std::call_once(initializeKey, [] {
> +        // Initialization is not thread safe, so this function must be called from the main thread first.
> +        ASSERT(isUIThread());

Use of std::call_once is a *little* confusing, since the AtomString table can only be used on one thread. I assume it’s specific to the iOS legacy "web thread" machinery; the comment doesn’t make that entirely clear.

If it wasn't iOS, presumably we would not need the thread safety from std::call_once and could just assert isMainThread. Might even be nice to write it that way if there was a clean way to do so.

I suggest moving the assertion out of the call_once, because we don’t want anyone to write code that calls initializeCommonAtomStrings on something that is not a UI thread, even if a particular call site happens to not to be the first caller at *this* time.

> Source/WebCore/platform/CommonAtomStrings.h:33
> +extern MainThreadLazyNeverDestroyed<const AtomString> alternativeAtomData;

I find it annoying to have these list repeated 4 times. Even if we use macros to achieve that, would be nice not to repeat the names 4 times, along with the strings.

> Source/WebCore/platform/CommonAtomStrings.h:81
> +WEBCORE_EXPORT void initializeCommonAtomStrings();

Is it a good tradeoff to always initialize all of these? Eventually as the list gets long that might not be so great, instead of lazy initialization do it all up front, and have them all in the AtomString table. This has given me pause in past when creating things like EventNames.h.

> Source/WebCore/svg/SVGAnimationElement.cpp:322
> +static const AtomString& sumAtom()

You decided to factor this out, but not move it into CommonAtomStrings.h?

> Source/WebCore/svg/animation/SVGSMILElement.cpp:71
> +static const AtomString& indefiniteAtom()

Same pattern here too. But it’s not mentioned in the change log. Not sure how you decided when to do which.
Comment 11 Chris Dumez 2022-04-22 11:04:12 PDT
(In reply to Darin Adler from comment #10)
> Comment on attachment 458103 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=458103&action=review
> 
> > Source/WebCore/contentextensions/ContentExtensionParser.cpp:208
> >      AtomString::init();
> > +    initializeCommonAtomStrings();
> 
> Seems like we could make this more foolproof by having one function that
> calls both AtomString::init and WebCore::initializeCommonAtomStrings instead
> of having both calls in each place. Could be as simple as having
> initializeCommonAtomStrings call AtomString::init.

Good idea.

> > Source/WebCore/html/HTMLFormElement.cpp:1016
> > +    return equalIgnoringASCIICase(attributeWithoutSynchronization(autocompleteAttr), "off") ? offAtom() : onAtom();
> 
> This can use equalLettersIgnoringASCIICase for slightly better performance.
> 

Ok.

> > Source/WebCore/platform/CommonAtomStrings.cpp:60
> > +    static std::once_flag initializeKey;
> > +    std::call_once(initializeKey, [] {
> > +        // Initialization is not thread safe, so this function must be called from the main thread first.
> > +        ASSERT(isUIThread());
> 
> Use of std::call_once is a *little* confusing, since the AtomString table
> can only be used on one thread. I assume it’s specific to the iOS legacy
> "web thread" machinery; the comment doesn’t make that entirely clear.
> 
> If it wasn't iOS, presumably we would not need the thread safety from
> std::call_once and could just assert isMainThread. Might even be nice to
> write it that way if there was a clean way to do so.
> I suggest moving the assertion out of the call_once, because we don’t want
> anyone to write code that calls initializeCommonAtomStrings on something
> that is not a UI thread, even if a particular call site happens to not to be
> the first caller at *this* time.

I was just copying the pattern in AtomString.

> > Source/WebCore/platform/CommonAtomStrings.h:33
> > +extern MainThreadLazyNeverDestroyed<const AtomString> alternativeAtomData;
> 
> I find it annoying to have these list repeated 4 times. Even if we use
> macros to achieve that, would be nice not to repeat the names 4 times, along
> with the strings.
> 
> > Source/WebCore/platform/CommonAtomStrings.h:81
> > +WEBCORE_EXPORT void initializeCommonAtomStrings();
> 
> Is it a good tradeoff to always initialize all of these? Eventually as the
> list gets long that might not be so great, instead of lazy initialization do
> it all up front, and have them all in the AtomString table. This has given
> me pause in past when creating things like EventNames.h.

So maybe instead of an upfront initialization, I can move to functions with an out of line body that has a:
MainThreadNeverDestroyed<const AtomString> alternativeAtomData("alternative"_s);

Then there wouldn't be any regression compared to existing code, would there?

> > Source/WebCore/svg/SVGAnimationElement.cpp:322
> > +static const AtomString& sumAtom()
> 
> You decided to factor this out, but not move it into CommonAtomStrings.h?
> 
> > Source/WebCore/svg/animation/SVGSMILElement.cpp:71
> > +static const AtomString& indefiniteAtom()
> 
> Same pattern here too. But it’s not mentioned in the change log. Not sure
> how you decided when to do which.

I could. I didn't do it because the callers were all in the same cpp so they just needed to be file global, not project global.
Comment 12 Darin Adler 2022-04-22 11:17:59 PDT
Comment on attachment 458103 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=458103&action=review

>>> Source/WebCore/platform/CommonAtomStrings.h:33
>>> +extern MainThreadLazyNeverDestroyed<const AtomString> alternativeAtomData;
>> 
>> I find it annoying to have these list repeated 4 times. Even if we use macros to achieve that, would be nice not to repeat the names 4 times, along with the strings.
> 
> So maybe instead of an upfront initialization, I can move to functions with an out of line body that has a:
> MainThreadNeverDestroyed<const AtomString> alternativeAtomData("alternative"_s);
> 
> Then there wouldn't be any regression compared to existing code, would there?

Probably don’t need to make a change.

It’s a tradeoff. Paying for a function call and a check if it’s already initialized each time the string is used vs. using speed, more memory, and more space in the AtomString table at the time WebCore is initialized.

Honestly I don’t know which side of the tradeoff we fall on, but I do know it changes as we add more and more strings, especially if they aren’t all used depending on web content.

There are possible optimizations for creating them too, some of which apply only if we do them all at once. Adding many strings to the AtomString table at once could likely be optimized; if nothing else, we could size the hash table before adding all the new entries instead of possibly doing multiple rehashes during the creation process, but maybe something even better. And with either approach, we could calculate the hashes at compile time instead of runtime for these strings. Not sure which of these techniques are important enough.

It’s tempting to have something where we know that callers just fetch a global pointer and do no other work, I think that’s the kind of efficiency we have for emptyString() and emptyAtom for example, and you have that here.

Good news is that since you are using function calls at each call site, we can change it later if we decide we got it wrong, without affecting the call sites.
Comment 13 Chris Dumez 2022-04-22 17:45:04 PDT
Created attachment 458190 [details]
Patch
Comment 14 EWS 2022-04-22 21:22:40 PDT
Committed r293285 (249910@main): <https://commits.webkit.org/249910@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 458190 [details].
Comment 15 Radar WebKit Bug Importer 2022-04-22 21:23:15 PDT
<rdar://problem/92204746>