Bug 222885 - Formalize rules for what is an Experimental Feature
Summary: Formalize rules for what is an Experimental Feature
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on: 247926
Blocks:
  Show dependency treegraph
 
Reported: 2021-03-07 08:54 PST by Sam Weinig
Modified: 2023-04-06 18:08 PDT (History)
16 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Weinig 2021-03-07 08:54:30 PST
We should formalize rules for what is an Experimental Feature and what isn't. Currently, it is a hodge-podge of features that don't work at all, features we would like to ship one day, and features we have already shipped. Having a clear set of rules would make things much easier to reason about.
Comment 1 Sam Weinig 2021-03-07 08:58:47 PST
I propose the following rules:

Experimental Features are:

- Features that have not shipped in an official release.
- Off by default (this means that if the experiment is disabling something, it should be phrased as a negative, e.g. Disable SQL Databases)
- Stable enough that we would like people to try them out.
Comment 2 Simon Fraser (smfr) 2021-03-08 09:58:56 PST
Would be good to clarify what happens when:
1. the feature gets stable enough to be on by default, but has not been shipped yet.
2. the feature has been shipped in an OS release (by Apple, at least)

There is utility in turning the feature off for QA testing (or, for a web dev, to test  behavior in older Safari versions), so the switch needs to be exposed somewhere.
Comment 3 Sam Weinig 2021-03-08 14:08:03 PST
(In reply to Simon Fraser (smfr) from comment #2)
> Would be good to clarify what happens when:
> 1. the feature gets stable enough to be on by default, but has not been
> shipped yet.

This might be good argument against my "Off by default" rule.

> 2. the feature has been shipped in an OS release (by Apple, at least)

This is a good point. Perhaps "experimental" status needs to be a per-port concept. Perhaps we should have an independent list of Features that are experimental, instead of binding it into the preferences themselves.

> There is utility in turning the feature off for QA testing (or, for a web
> dev, to test  behavior in older Safari versions), so the switch needs to be
> exposed somewhere.

Agreed. I think once a feature has been shipped on by default, using an Internal Setting should cover this.


Ok, updated proposal:

Experimental Features are:

- Features that have not shipped in an official release.
- Preferred to be off by default (this means that if the experiment is disabling something, it should be phrased as a negative, e.g. Disable SQL Databases), but a stable feature waiting for an upcoming release can stay in the experimental features set on by default until has shipped.
- Stable enough that we would like people to try them out.


Features that still have utility in being configurable after they ship should move to the Internal Features set.


Implementation Changes:
- Merge all WebPreferences*.yaml files into WebPreferences.yaml
- Add new ExperimentalFeatures.yaml, InternalFeatures.yaml and DebugFeatures.yaml files that list preferences from WebPreferences.yaml that should be in that set. 
- Stop relying on a feature being on by default in LayoutTests when it is an Experimental Feature, and rather add new keys to the defaultValue: dictionary in each preference definition for WebKitTestRunner and DumpRenderTree.
Comment 4 Ryosuke Niwa 2021-03-08 18:10:20 PST
(In reply to Sam Weinig from comment #3)
> Implementation Changes:
> - Merge all WebPreferences*.yaml files into WebPreferences.yaml
> - Add new ExperimentalFeatures.yaml, InternalFeatures.yaml and
> DebugFeatures.yaml files that list preferences from WebPreferences.yaml that
> should be in that set.

Can we have a simple Features.yaml and then specify what's experimental & what's internal/debug in that file? I find it annoying that I have to open different files just because something has changed its status from being experimental to internal, etc... then we can have all the information about which platform/port will enable which feature in this one file.

Maybe we can add some script to automatically spit out the all the features enabled / disabled by default in a given port as well so that figuring out what's enabled / disabled in a given release/branch will be easy as well.
Comment 5 Sam Weinig 2021-03-09 12:53:50 PST
(In reply to Ryosuke Niwa from comment #4)
> (In reply to Sam Weinig from comment #3)
> > Implementation Changes:
> > - Merge all WebPreferences*.yaml files into WebPreferences.yaml
> > - Add new ExperimentalFeatures.yaml, InternalFeatures.yaml and
> > DebugFeatures.yaml files that list preferences from WebPreferences.yaml that
> > should be in that set.
> 
> Can we have a simple Features.yaml and then specify what's experimental &
> what's internal/debug in that file? I find it annoying that I have to open
> different files just because something has changed its status from being
> experimental to internal, etc... then we can have all the information about
> which platform/port will enable which feature in this one file.
> 

I'd at least like to separate the list of all preferences (and their default values, etc) from the classification of experimental vs. internal vs. debug. Perhaps something like:


PreferenceClassification.yaml

Experimental:
   - AccessibilityObjectModelEnabled
   - AspectRatioEnabled
   - { 
         name: AspectRatioOfImgFromWidthAndHeightEnabled
         enabledFor:
             PLATFORM(COCOA): false
             default: true
     }
   - AsyncClipboardAPIEnabled 
   ...       

Internal:
    - AllowViewportShrinkToFitContent
    - AlwaysZoomOnDoubleTap
   ...

etc.
Comment 6 Sam Weinig 2021-03-09 12:57:38 PST
(actually, let's separate out the implementation part of this from the policy and keep this bug to just deciding the policy).
Comment 7 Sam Weinig 2021-03-09 13:01:53 PST
So, without the implementation aspects of the proposal, the latest proposal is:


The Experimental Features set is:

- Features that have not shipped in an official release.
- Preferred to be off by default (this means that if the experiment is disabling something, it should be phrased as a negative, e.g. Disable SQL Databases), but a stable feature waiting for an upcoming release can stay in the experimental features set on by default until has shipped.
- Stable enough that we would like people to try them out.
- A feature or bit of functionality we would like for developers to try out. 

Features that still have utility in being configurable after they ship should move to the Internal Features set.

Debug functionality or functionality never intended to ship should not be included in the Experimental Features set.
Comment 8 Ryosuke Niwa 2021-03-09 22:45:01 PST
(In reply to Sam Weinig from comment #7)
>
> - Features that have not shipped in an official release.

What does an official release mean though? Do any "official" release of any browser in any port quality?
Comment 9 Sam Weinig 2021-03-10 11:00:07 PST
(In reply to Ryosuke Niwa from comment #8)
> (In reply to Sam Weinig from comment #7)
> >
> > - Features that have not shipped in an official release.
> 
> What does an official release mean though? Do any "official" release of any
> browser in any port quality?

I was intending for it to be up to the individual ports to define what it means for them, letting each port define what is experimental for them at the moment.
Comment 10 Ryosuke Niwa 2021-03-10 16:19:48 PST
(In reply to Sam Weinig from comment #9)
> (In reply to Ryosuke Niwa from comment #8)
> > (In reply to Sam Weinig from comment #7)
> > >
> > > - Features that have not shipped in an official release.
> > 
> > What does an official release mean though? Do any "official" release of any
> > browser in any port quality?
> 
> I was intending for it to be up to the individual ports to define what it
> means for them, letting each port define what is experimental for them at
> the moment.

So the definition of an experimental feature is per port?
Comment 11 Sam Weinig 2021-03-10 18:04:05 PST
(In reply to Ryosuke Niwa from comment #10)
> (In reply to Sam Weinig from comment #9)
> > (In reply to Ryosuke Niwa from comment #8)
> > > (In reply to Sam Weinig from comment #7)
> > > >
> > > > - Features that have not shipped in an official release.
> > > 
> > > What does an official release mean though? Do any "official" release of any
> > > browser in any port quality?
> > 
> > I was intending for it to be up to the individual ports to define what it
> > means for them, letting each port define what is experimental for them at
> > the moment.
> 
> So the definition of an experimental feature is per port?

Yes, that is what I am proposing here.
Comment 12 Radar WebKit Bug Importer 2021-03-14 09:55:15 PDT
<rdar://problem/75408862>
Comment 13 Maciej Stachowiak 2021-03-18 18:53:55 PDT
(In reply to Sam Weinig from comment #1)
> I propose the following rules:
> 
> Experimental Features are:
> 
> - Features that have not shipped in an official release.
> - Off by default (this means that if the experiment is disabling something,
> it should be phrased as a negative, e.g. Disable SQL Databases)
> - Stable enough that we would like people to try them out.

What would be the correct status for a feature that meets this first two of these criteria but not the third (but which does have a runtime feature flag)?
Comment 14 Maciej Stachowiak 2021-03-18 18:59:58 PDT
(In reply to Sam Weinig from comment #7)
> So, without the implementation aspects of the proposal, the latest proposal
> is:
> 
> 
> The Experimental Features set is:
> 
> - Features that have not shipped in an official release.
> - Preferred to be off by default (this means that if the experiment is
> disabling something, it should be phrased as a negative, e.g. Disable SQL
> Databases), but a stable feature waiting for an upcoming release can stay in
> the experimental features set on by default until has shipped.
> - Stable enough that we would like people to try them out.
> - A feature or bit of functionality we would like for developers to try out. 
> 
> Features that still have utility in being configurable after they ship
> should move to the Internal Features set.

Seems like this would make it inconvenient for people to try turning such features off (at least in Apple ports), and would create a confusing disconnect for anyone testing on/off comparisons, where they now have to look in a different place.

I can see how it is conceptually wrong to call a shipping feature “experimental” but this aspect of the proposal seems to have serious downsides, unless we change the UI to make internal flags much more easily accessible, and put them closer to experimental flags.

By way of comparison, other browsers with feature flag UI do not move flags to a totally different and less easily accessible place when they become on by default. And I don’t think we should do so either.

Maybe this means we should drop the label “experimental” and just call them something neutral like “feature flags”.

> 
> Debug functionality or functionality never intended to ship should not be
> included in the Experimental Features set.
Comment 15 Sam Weinig 2021-03-22 10:49:01 PDT
(In reply to Maciej Stachowiak from comment #14)
> (In reply to Sam Weinig from comment #7)
> > So, without the implementation aspects of the proposal, the latest proposal
> > is:
> > 
> > 
> > The Experimental Features set is:
> > 
> > - Features that have not shipped in an official release.
> > - Preferred to be off by default (this means that if the experiment is
> > disabling something, it should be phrased as a negative, e.g. Disable SQL
> > Databases), but a stable feature waiting for an upcoming release can stay in
> > the experimental features set on by default until has shipped.
> > - Stable enough that we would like people to try them out.
> > - A feature or bit of functionality we would like for developers to try out. 
> > 
> > Features that still have utility in being configurable after they ship
> > should move to the Internal Features set.
> 
> Seems like this would make it inconvenient for people to try turning such
> features off (at least in Apple ports), and would create a confusing
> disconnect for anyone testing on/off comparisons, where they now have to
> look in a different place.
> 
> I can see how it is conceptually wrong to call a shipping feature
> “experimental” but this aspect of the proposal seems to have serious
> downsides, unless we change the UI to make internal flags much more easily
> accessible, and put them closer to experimental flags.
> 
> By way of comparison, other browsers with feature flag UI do not move flags
> to a totally different and less easily accessible place when they become on
> by default. And I don’t think we should do so either.
> 
> Maybe this means we should drop the label “experimental” and just call them
> something neutral like “feature flags”.

This is getting a bit too into Safari's UI probably, but the Develop menu has other features in it that aren't "experimental" (see the "Disable Styles", "Disable JavaScript" section). There is nothing stopping Safari from adding these features where it is useful to disable them to other parts of the menu. I also think it would be a useful exercise to make a decisions when a feature has shipped if it really is useful to have a way to disable it, and thus it's manual inclusion in the Develop menu will require more thought. I am not convinced that most features need a way to be disabled after shipping, so this seems ok to me.


That said, if we want to make it "easy" to have a way to disable some features after shipping, we could make another group, "Toggle-able Features" say, that exists parallel to the "Experimental" and "Internal" Features groups. Then, Safari could decide how it wants to display these.


I am making this argument because I think it is valuable to have clear delineation of what we think is "experimental", as it makes a clear indication both to those working on WebKit and to web developers on the status of the feature. The more we conflate shipping and experimental status, the muddier and more confusing that becomes.
Comment 16 Maciej Stachowiak 2021-03-22 11:23:57 PDT
(In reply to Sam Weinig from comment #15)
> (In reply to Maciej Stachowiak from comment #14)
> > (In reply to Sam Weinig from comment #7)
> > 
> > 
> > Seems like this would make it inconvenient for people to try turning such
> > features off (at least in Apple ports), and would create a confusing
> > disconnect for anyone testing on/off comparisons, where they now have to
> > look in a different place.
> > 
> > I can see how it is conceptually wrong to call a shipping feature
> > “experimental” but this aspect of the proposal seems to have serious
> > downsides, unless we change the UI to make internal flags much more easily
> > accessible, and put them closer to experimental flags.
> > 
> > By way of comparison, other browsers with feature flag UI do not move flags
> > to a totally different and less easily accessible place when they become on
> > by default. And I don’t think we should do so either.
> > 
> > Maybe this means we should drop the label “experimental” and just call them
> > something neutral like “feature flags”.
> 
> This is getting a bit too into Safari's UI probably, but the Develop menu
> has other features in it that aren't "experimental" (see the "Disable
> Styles", "Disable JavaScript" section). There is nothing stopping Safari
> from adding these features where it is useful to disable them to other parts
> of the menu. I also think it would be a useful exercise to make a decisions
> when a feature has shipped if it really is useful to have a way to disable
> it, and thus it's manual inclusion in the Develop menu will require more
> thought. I am not convinced that most features need a way to be disabled
> after shipping, so this seems ok to me.

I don't think it is reasonable to label flags for disabling recently-enabled features "Internal", and then hand-curate a list at the Safari level of ones that get put in the Develop menu instead of the place where all other Internal flags go. I think this is a poor approach because:

(a) It's misleading to call these types of flags Internal if they are not actually intended to be treated this way.

(b) WebKit has more expertise than Safari on which flags of this type should be offered for convenient disablement. This is analogous to the way WebKit has more expertise in which flags represent experimental features that are ready to try, so we don't ask WebKit-based browsers to manually curate their own list, we give them one.

(c) This hand-curated list would have to be manually kept in sync with the status of things in WebKit, including things where we might remove the runtime feature flag entirely.

(d) We sort of support running old Safari with new WebKit; that would automatically pick up new Experimental flags, but not flags in this category.
 
> 
> That said, if we want to make it "easy" to have a way to disable some
> features after shipping, we could make another group, "Toggle-able Features"
> say, that exists parallel to the "Experimental" and "Internal" Features
> groups. Then, Safari could decide how it wants to display these.

I agree that making a distinct group for recently-enabled features would be a better proposal.

This would also let browsers put both this category and "Experimental" flags in one UI, or in separate places, as they prefer.

Personally, I'd argue that any runtime feature flag representing a web-exposed feature, where we have not yet chosen to remove the flag and leave the feature enabled always, is likely worth toggling for testing. But if we think it needs to be a more case-by-case process, the WebKit project is in a better position to execute that process.


> 
> 
> I am making this argument because I think it is valuable to have clear
> delineation of what we think is "experimental", as it makes a clear
> indication both to those working on WebKit and to web developers on the
> status of the feature. The more we conflate shipping and experimental
> status, the muddier and more confusing that becomes.

Part of the problem here is trying to use a hierarchy of menus as the UI. Other browsers have a page-like experience that lets you toggle flags but also makes clear what the default value is. With such a UI, they have no need to distinguish flags that are default-off, but worth testing (what we call "Experimental") from flags that are enabled by default but sometimes worth turning off for testing; and no need to put them in different places.
Comment 17 Sam Weinig 2021-03-22 11:27:24 PDT
(In reply to Maciej Stachowiak from comment #13)
> (In reply to Sam Weinig from comment #1)
> > I propose the following rules:
> > 
> > Experimental Features are:
> > 
> > - Features that have not shipped in an official release.
> > - Off by default (this means that if the experiment is disabling something,
> > it should be phrased as a negative, e.g. Disable SQL Databases)
> > - Stable enough that we would like people to try them out.
> 
> What would be the correct status for a feature that meets this first two of
> these criteria but not the third (but which does have a runtime feature
> flag)?

It could either be an internal feature or an unannotated feature.
Comment 18 Sam Weinig 2021-03-22 11:34:53 PDT
The only reason the "other" option is "internal features" is that is the only other group we have that is automatically exposed. We can quite easily add more categories(In reply to Sam Weinig from comment #17)
> (In reply to Maciej Stachowiak from comment #13)
> > (In reply to Sam Weinig from comment #1)
> > > I propose the following rules:
> > > 
> > > Experimental Features are:
> > > 
> > > - Features that have not shipped in an official release.
> > > - Off by default (this means that if the experiment is disabling something,
> > > it should be phrased as a negative, e.g. Disable SQL Databases)
> > > - Stable enough that we would like people to try them out.
> > 
> > What would be the correct status for a feature that meets this first two of
> > these criteria but not the third (but which does have a runtime feature
> > flag)?
> 
> It could either be an internal feature or an unannotated feature.

Or a we could add a new category of toggle-able shipped features for it to go in.
Comment 19 Sam Sneddon [:gsnedders] 2021-03-22 17:10:33 PDT
https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/blink/renderer/platform/RuntimeEnabledFeatures.md is the Blink guidelines for this.

In short, they have four basic categories:

 * status: <missing> - anything goes.

 * status: "test" - it must be in a sufficient state to permit internal testing. For example, enabling it should not be known to easily cause crashes, leak memory, or otherwise significantly effect the reliability of bots. Consideration should also be given to the potential for loss of test coverage of shipping behavior.

 * status: "experimental" - it should be far enough along to permit testing by early adopter web developers. Many chromium enthusiasts run with --enable-experimental-web-platform-features, and so promoting a feature to experimental status can be a good way to get early warning of any stability or compatibility problems. If such problems are discovered (e.g. major websites being seriously broken when the feature is enabled), the feature should be demoted back to no status or status: "test" to avoid creating undue problems for such users.

 * status: "stable" - it must be complete and ready for use by all chrome users.

Personally, I quite like this, as it gives a clear indication as to how usable we expect the browser to be with the feature enabled.

Roughly speaking, we currently have three statuses effectively:

 * off-by-default - anything goes

 * enabled in TestOptions::defaults - equivalent of Blink's status: "test"

 * on-by-default - equivalent of Blink's status: "stable"

What we don't have is anything equivalent to their status: "experimental", and as an outsider it’s very hard to distinguish features where they’re off-by-default because (e.g.) just the interfaces have been added with no/little implementation from those where they’re off-by-default because we’d rather they remain off until the next release branches (due to lack of confidence or whatever reason) but are otherwise believed to be complete.

Note that this does have some degree of ramification for https://wpt.fyi's default view, where STP is currently run (at the WebKit team’s request) with only features enabled by default, whereas Chrome Dev is run with --enable-experimental-web-platform-features and Firefox Nightly is run with all features enabled in CI, hence STP is at a disadvantage to the others, but currently there’s no easy route to determine what a reasonable set of features to enable would be (and leads to a bigger loss of coverage for Safari, given Chrome and Firefox both have their beta releases with only their on-by-default settings but Safari beta results aren’t available).

On the whole, my preference would be:

 * Have a single runtime-enabled features file (this means changing status should just be a -1/+1 change of the status line).

 * Adopt the four categories from Blink, defining them in some sensible but largely analogous way, perhaps splitting the status: <missing> into a further status: "debug only" for things like ITP disablement.

 * Expose those four categories through the WebKit API.

At that point, applications embedding WebKit can split the list up in their UI in a more meaningful chunks, and potentially add "enable all experimental features" and "enable all testable features" options too.
Comment 20 Maciej Stachowiak 2021-03-22 18:10:13 PDT
(In reply to Sam Weinig from comment #18)
> The only reason the "other" option is "internal features" is that is the
> only other group we have that is automatically exposed. We can quite easily
> add more categories(In reply to Sam Weinig from comment #17)
> > (In reply to Maciej Stachowiak from comment #13)
> > > (In reply to Sam Weinig from comment #1)
> > > > I propose the following rules:
> > > > 
> > > > Experimental Features are:
> > > > 
> > > > - Features that have not shipped in an official release.
> > > > - Off by default (this means that if the experiment is disabling something,
> > > > it should be phrased as a negative, e.g. Disable SQL Databases)
> > > > - Stable enough that we would like people to try them out.
> > > 
> > > What would be the correct status for a feature that meets this first two of
> > > these criteria but not the third (but which does have a runtime feature
> > > flag)?
> > 
> > It could either be an internal feature or an unannotated feature.
> 
> Or a we could add a new category of toggle-able shipped features for it to
> go in.

By assumption, a feature like this would not be "shipped". To be clear, I was asking about a feature that has not shipped in an official release, is off by default, but is _not_ stable enough for it to be on by default. I'm not sure what the category of "unannotated feature" implies but maybe it's a fit.
Comment 21 Maciej Stachowiak 2021-03-22 18:18:48 PDT
(In reply to Sam Sneddon [:gsnedders] from comment #19)
> https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/blink/
> renderer/platform/RuntimeEnabledFeatures.md is the Blink guidelines for this.
> 
> In short, they have four basic categories:
> 
>  * status: <missing> - anything goes.
> 
>  * status: "test" - it must be in a sufficient state to permit internal
> testing. For example, enabling it should not be known to easily cause
> crashes, leak memory, or otherwise significantly effect the reliability of
> bots. Consideration should also be given to the potential for loss of test
> coverage of shipping behavior.
> 
>  * status: "experimental" - it should be far enough along to permit testing
> by early adopter web developers. Many chromium enthusiasts run with
> --enable-experimental-web-platform-features, and so promoting a feature to
> experimental status can be a good way to get early warning of any stability
> or compatibility problems. If such problems are discovered (e.g. major
> websites being seriously broken when the feature is enabled), the feature
> should be demoted back to no status or status: "test" to avoid creating
> undue problems for such users.
> 
>  * status: "stable" - it must be complete and ready for use by all chrome
> users.
> 
> Personally, I quite like this, as it gives a clear indication as to how
> usable we expect the browser to be with the feature enabled.
> 
> Roughly speaking, we currently have three statuses effectively:
> 
>  * off-by-default - anything goes
> 
>  * enabled in TestOptions::defaults - equivalent of Blink's status: "test"
> 
>  * on-by-default - equivalent of Blink's status: "stable"
> 
> What we don't have is anything equivalent to their status: "experimental",
> and as an outsider it’s very hard to distinguish features where they’re
> off-by-default because (e.g.) just the interfaces have been added with
> no/little implementation from those where they’re off-by-default because
> we’d rather they remain off until the next release branches (due to lack of
> confidence or whatever reason) but are otherwise believed to be complete.
> 
> Note that this does have some degree of ramification for https://wpt.fyi's
> default view, where STP is currently run (at the WebKit team’s request) with
> only features enabled by default, whereas Chrome Dev is run with
> --enable-experimental-web-platform-features and Firefox Nightly is run with
> all features enabled in CI, hence STP is at a disadvantage to the others,
> but currently there’s no easy route to determine what a reasonable set of
> features to enable would be (and leads to a bigger loss of coverage for
> Safari, given Chrome and Firefox both have their beta releases with only
> their on-by-default settings but Safari beta results aren’t available).
> 
> On the whole, my preference would be:
> 
>  * Have a single runtime-enabled features file (this means changing status
> should just be a -1/+1 change of the status line).
> 
>  * Adopt the four categories from Blink, defining them in some sensible but
> largely analogous way, perhaps splitting the status: <missing> into a
> further status: "debug only" for things like ITP disablement.
> 
>  * Expose those four categories through the WebKit API.
> 
> At that point, applications embedding WebKit can split the list up in their
> UI in a more meaningful chunks, and potentially add "enable all experimental
> features" and "enable all testable features" options too.

I like this proposal a lot!

One proposed friendly amendment: let's disallow missing status (build failure if you try to add a feature with no status) and add meaningful statuses for flags that are not test, experimental, or stable. Here's a few I can think of:
* unstable - meant to be an actual feature, which would eventually evolve to "test" status and beyond, but not yet functional and usable.
* internal debug - flags like "Disable Accelerated Compositing" that we think browser devs might need to use to diagnose problems, but that we don't think are reasonable useful for web developers
* developer debug - debugging flags that we think are reasonable for web developers to try to diagnose problems (perhaps the WebRTC flags, currently in their own submenu, would fall in this category)

Maybe pulling on this thread there'd be too many statuses to represent every one, but missing status can be a mistake that's easy to make and hard to spot (hard to notice lack of something), so I think it would be better to have statuses that can apply to any kind of flag we'd like to have.
Comment 22 Ryosuke Niwa 2021-03-22 18:26:05 PDT
(In reply to Maciej Stachowiak from comment #21)
> (In reply to Sam Sneddon [:gsnedders] from comment #19)
> > https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/blink/
> > renderer/platform/RuntimeEnabledFeatures.md is the Blink guidelines for this.
> > 
> > In short, they have four basic categories:
> > 
> >  * status: <missing> - anything goes.
> > 
> >  * status: "test" - it must be in a sufficient state to permit internal
> > testing. For example, enabling it should not be known to easily cause
> > crashes, leak memory, or otherwise significantly effect the reliability of
> > bots. Consideration should also be given to the potential for loss of test
> > coverage of shipping behavior.
> > 
> >  * status: "experimental" - it should be far enough along to permit testing
> > by early adopter web developers. Many chromium enthusiasts run with
> > --enable-experimental-web-platform-features, and so promoting a feature to
> > experimental status can be a good way to get early warning of any stability
> > or compatibility problems. If such problems are discovered (e.g. major
> > websites being seriously broken when the feature is enabled), the feature
> > should be demoted back to no status or status: "test" to avoid creating
> > undue problems for such users.
> > 
> >  * status: "stable" - it must be complete and ready for use by all chrome
> > users.
> > 
> > Personally, I quite like this, as it gives a clear indication as to how
> > usable we expect the browser to be with the feature enabled.
> > 
> > Roughly speaking, we currently have three statuses effectively:
> > 
> >  * off-by-default - anything goes
> > 
> >  * enabled in TestOptions::defaults - equivalent of Blink's status: "test"
> > 
> >  * on-by-default - equivalent of Blink's status: "stable"
> > 
> > What we don't have is anything equivalent to their status: "experimental",
> > and as an outsider it’s very hard to distinguish features where they’re
> > off-by-default because (e.g.) just the interfaces have been added with
> > no/little implementation from those where they’re off-by-default because
> > we’d rather they remain off until the next release branches (due to lack of
> > confidence or whatever reason) but are otherwise believed to be complete.
> > 
> > Note that this does have some degree of ramification for https://wpt.fyi's
> > default view, where STP is currently run (at the WebKit team’s request) with
> > only features enabled by default, whereas Chrome Dev is run with
> > --enable-experimental-web-platform-features and Firefox Nightly is run with
> > all features enabled in CI, hence STP is at a disadvantage to the others,
> > but currently there’s no easy route to determine what a reasonable set of
> > features to enable would be (and leads to a bigger loss of coverage for
> > Safari, given Chrome and Firefox both have their beta releases with only
> > their on-by-default settings but Safari beta results aren’t available).
> > 
> > On the whole, my preference would be:
> > 
> >  * Have a single runtime-enabled features file (this means changing status
> > should just be a -1/+1 change of the status line).
> > 
> >  * Adopt the four categories from Blink, defining them in some sensible but
> > largely analogous way, perhaps splitting the status: <missing> into a
> > further status: "debug only" for things like ITP disablement.
> > 
> >  * Expose those four categories through the WebKit API.
> > 
> > At that point, applications embedding WebKit can split the list up in their
> > UI in a more meaningful chunks, and potentially add "enable all experimental
> > features" and "enable all testable features" options too.
> 
> I like this proposal a lot!
> 
> One proposed friendly amendment: let's disallow missing status (build
> failure if you try to add a feature with no status) and add meaningful
> statuses for flags that are not test, experimental, or stable. Here's a few
> I can think of:
> * unstable - meant to be an actual feature, which would eventually evolve to
> "test" status and beyond, but not yet functional and usable.
> * internal debug - flags like "Disable Accelerated Compositing" that we
> think browser devs might need to use to diagnose problems, but that we don't
> think are reasonable useful for web developers
> * developer debug - debugging flags that we think are reasonable for web
> developers to try to diagnose problems (perhaps the WebRTC flags, currently
> in their own submenu, would fall in this category)
> 
> Maybe pulling on this thread there'd be too many statuses to represent every
> one, but missing status can be a mistake that's easy to make and hard to
> spot (hard to notice lack of something), so I think it would be better to
> have statuses that can apply to any kind of flag we'd like to have.

So the list of categories we're thinking is:
"unstable" - feature in active/stale development
"internal debug" - for WebKit engineers
"developer debug" - toggling flags for web developers
"tests" - enabled by default in DRT/WTR
"experimental" - features enable by default in STP, etc... but not yet stable
"stable" - enabled by default & ready for prime time

I think "prototype" or "exploratory" might be better instead of "unstable". I also feel like the distinction between "experimental" vs "developer debug" might be a bit confusing for web developers.

Maybe we need to explicitly call it some of the flags are for feature enablement and others are for debugging purposes so something like:
"Internal Debug Options"
"Developer Debug Options"
"Exploratory Features"
"Test Enabled Features"
"Experimentally Enabled Features"
"Stably Enabled Features"
Comment 23 Sam Sneddon [:gsnedders] 2021-03-23 07:05:00 PDT
(In reply to Ryosuke Niwa from comment #22)
> (In reply to Maciej Stachowiak from comment #21)
> > I like this proposal a lot!
> > 
> > One proposed friendly amendment: let's disallow missing status (build
> > failure if you try to add a feature with no status) and add meaningful
> > statuses for flags that are not test, experimental, or stable. Here's a few
> > I can think of:
> > * unstable - meant to be an actual feature, which would eventually evolve to
> > "test" status and beyond, but not yet functional and usable.
> > * internal debug - flags like "Disable Accelerated Compositing" that we
> > think browser devs might need to use to diagnose problems, but that we don't
> > think are reasonable useful for web developers
> > * developer debug - debugging flags that we think are reasonable for web
> > developers to try to diagnose problems (perhaps the WebRTC flags, currently
> > in their own submenu, would fall in this category)
> > 
> > Maybe pulling on this thread there'd be too many statuses to represent every
> > one, but missing status can be a mistake that's easy to make and hard to
> > spot (hard to notice lack of something), so I think it would be better to
> > have statuses that can apply to any kind of flag we'd like to have.
> 
> So the list of categories we're thinking is:
> "unstable" - feature in active/stale development
> "internal debug" - for WebKit engineers
> "developer debug" - toggling flags for web developers
> "tests" - enabled by default in DRT/WTR
> "experimental" - features enable by default in STP, etc... but not yet stable
> "stable" - enabled by default & ready for prime time

This seems pretty reasonable. I think we can definitely bikeshed the descriptions, and I think there are reasonable questions about whether we should have "experimental" enabled by default in any given application, but that's a discussion for each application rather than one that needs to have any effect on the categorisation within WebKit.

> I think "prototype" or "exploratory" might be better instead of "unstable".

I think we convey anything we don't have the confidence to be running tests against likely is risky to enable, so unstable seems suitable to me?

> I also feel like the distinction between "experimental" vs "developer debug"
> might be a bit confusing for web developers.
> 
> Maybe we need to explicitly call it some of the flags are for feature
> enablement and others are for debugging purposes so something like:
> "Internal Debug Options"
> "Developer Debug Options"
> "Exploratory Features"
> "Test Enabled Features"
> "Experimentally Enabled Features"
> "Stably Enabled Features"

It's probably not worth getting too hung up on exactly what string we use in the YAML file, and allow application embedding WebKit to choose how to present this information to users. If an application segregates the different categories (i.e., doesn't have the current Safari behaviour of putting everything in a single menu), I think the distinction between "experimental" and "developer debug" should be pretty clear?

All this said, probably relevant to any documentation of policy here is also the question of when we want a runtime flag versus a compile-time flag, especially for features in the "unstable" category. Documenting when we want early development to happen behind a compile-time flag instead of a run-time flag is probably relevant. And in extreme, one could imagine an implementation of the runtime flag system whereby the runtime-check is an inline function which returns the constant false for (e.g.) features in the "unstable" category to effectively largely disable them at compile-time.
Comment 24 Ryosuke Niwa 2021-03-23 17:26:46 PDT
(In reply to Sam Sneddon [:gsnedders] from comment #23)
> (In reply to Ryosuke Niwa from comment #22)
> > (In reply to Maciej Stachowiak from comment #21)
> > > I like this proposal a lot!
> > > 
> > > One proposed friendly amendment: let's disallow missing status (build
> > > failure if you try to add a feature with no status) and add meaningful
> > > statuses for flags that are not test, experimental, or stable. Here's a few
> > > I can think of:
> > > * unstable - meant to be an actual feature, which would eventually evolve to
> > > "test" status and beyond, but not yet functional and usable.
> > > * internal debug - flags like "Disable Accelerated Compositing" that we
> > > think browser devs might need to use to diagnose problems, but that we don't
> > > think are reasonable useful for web developers
> > > * developer debug - debugging flags that we think are reasonable for web
> > > developers to try to diagnose problems (perhaps the WebRTC flags, currently
> > > in their own submenu, would fall in this category)
> > > 
> > > Maybe pulling on this thread there'd be too many statuses to represent every
> > > one, but missing status can be a mistake that's easy to make and hard to
> > > spot (hard to notice lack of something), so I think it would be better to
> > > have statuses that can apply to any kind of flag we'd like to have.
> > 
> > So the list of categories we're thinking is:
> > "unstable" - feature in active/stale development
> > "internal debug" - for WebKit engineers
> > "developer debug" - toggling flags for web developers
> > "tests" - enabled by default in DRT/WTR
> > "experimental" - features enable by default in STP, etc... but not yet stable
> > "stable" - enabled by default & ready for prime time
> 
> This seems pretty reasonable. I think we can definitely bikeshed the
> descriptions, and I think there are reasonable questions about whether we
> should have "experimental" enabled by default in any given application, but
> that's a discussion for each application rather than one that needs to have
> any effect on the categorisation within WebKit.
> 
> > I think "prototype" or "exploratory" might be better instead of "unstable".
> 
> I think we convey anything we don't have the confidence to be running tests
> against likely is risky to enable, so unstable seems suitable to me?

"unstable" to me conveys that it's simply crashy because "stability" often refers to how frequent a crash is encountered at least internally at Apple and elsewhere in the general software QA sense so I don't think we want to be using that specific term.

> > I also feel like the distinction between "experimental" vs "developer debug"
> > might be a bit confusing for web developers.
> > 
> > Maybe we need to explicitly call it some of the flags are for feature
> > enablement and others are for debugging purposes so something like:
> > "Internal Debug Options"
> > "Developer Debug Options"
> > "Exploratory Features"
> > "Test Enabled Features"
> > "Experimentally Enabled Features"
> > "Stably Enabled Features"
> 
> It's probably not worth getting too hung up on exactly what string we use in
> the YAML file, and allow application embedding WebKit to choose how to
> present this information to users. If an application segregates the
> different categories (i.e., doesn't have the current Safari behaviour of
> putting everything in a single menu), I think the distinction between
> "experimental" and "developer debug" should be pretty clear?

I disagree. That's one thing that separates WebKit from the rest of browser engine open source projects. We're very much insistent on using terminologies that make sense for humans even in our code so that we don't have code -> human-readable documentation. It has served us well, and we should continue to make our code self documentary.

> All this said, probably relevant to any documentation of policy here is also
> the question of when we want a runtime flag versus a compile-time flag,
> especially for features in the "unstable" category.

The current policy is that any feature that could be complied on all platforms should be a runtime enabled feature unless it poses a new security, privacy, or perf cost that could not be rectified easily.

> And in extreme, one could imagine an
> implementation of the runtime flag system whereby the runtime-check is an
> inline function which returns the constant false for (e.g.) features in the
> "unstable" category to effectively largely disable them at compile-time.

That is not sufficient in some cases because compile flags are needed for cases in which things just don't compile unless certain platform features / capabilities are available.
Comment 25 Maciej Stachowiak 2021-03-24 00:17:52 PDT
(In reply to Ryosuke Niwa from comment #24)
> (In reply to Sam Sneddon [:gsnedders] from comment #23)
> > (In reply to Ryosuke Niwa from comment #22)
> > > (In reply to Maciej Stachowiak from comment #21)
> > 
> > > I think "prototype" or "exploratory" might be better instead of "unstable".
> > 
> > I think we convey anything we don't have the confidence to be running tests
> > against likely is risky to enable, so unstable seems suitable to me?
> 
> "unstable" to me conveys that it's simply crashy because "stability" often
> refers to how frequent a crash is encountered at least internally at Apple
> and elsewhere in the general software QA sense so I don't think we want to
> be using that specific term.

Features in this category could potentially be crashy, or have serious security for perf issues. Or they could have nothing but stubs. It's probably good to use a name that sounds a bit scary. I'm not sure "prototype" or "exploratory" sufficiently convey that here there be dragons. It's possible there's a better word than "unstable" though.

> 
> > > I also feel like the distinction between "experimental" vs "developer debug"
> > > might be a bit confusing for web developers.
> > > 
> > > Maybe we need to explicitly call it some of the flags are for feature
> > > enablement and others are for debugging purposes so something like:
> > > "Internal Debug Options"
> > > "Developer Debug Options"
> > > "Exploratory Features"
> > > "Test Enabled Features"
> > > "Experimentally Enabled Features"
> > > "Stably Enabled Features"
> > 
> > It's probably not worth getting too hung up on exactly what string we use in
> > the YAML file, and allow application embedding WebKit to choose how to
> > present this information to users. If an application segregates the
> > different categories (i.e., doesn't have the current Safari behaviour of
> > putting everything in a single menu), I think the distinction between
> > "experimental" and "developer debug" should be pretty clear?
> 
> I disagree. That's one thing that separates WebKit from the rest of browser
> engine open source projects. We're very much insistent on using
> terminologies that make sense for humans even in our code so that we don't
> have code -> human-readable documentation. It has served us well, and we
> should continue to make our code self documentary.

I like distinguishing "option" flags from "feature" flags, but maybe we could try for something easy to type/read in config files. Maybe we don't need the word Enabled in all places, and perhaps these phrases could be all lowercase.

> 
> > All this said, probably relevant to any documentation of policy here is also
> > the question of when we want a runtime flag versus a compile-time flag,
> > especially for features in the "unstable" category.
> 
> The current policy is that any feature that could be complied on all
> platforms should be a runtime enabled feature unless it poses a new
> security, privacy, or perf cost that could not be rectified easily.

We have a policy document that explains when to use a compile-time flag, which includes the above conditions plus some others: https://webkit.org/feature-policy/

Probably the info about when runtime flags are enabled or disabled at the above link should be replaced with documentation of the various feature flag states.

For feature flag states that are features, not options, I wonder if we could get the WebKit feature status page to reflect them. It would be a bit more granular info than what we have now, where "Under Development" could be a wide range of conditions.
Comment 26 Ryosuke Niwa 2021-03-24 01:07:30 PDT
(In reply to Maciej Stachowiak from comment #25)
> (In reply to Ryosuke Niwa from comment #24)
> > (In reply to Sam Sneddon [:gsnedders] from comment #23)
> > > (In reply to Ryosuke Niwa from comment #22)
> > > > (In reply to Maciej Stachowiak from comment #21)
> > > 
> > > > I think "prototype" or "exploratory" might be better instead of "unstable".
> > > 
> > > I think we convey anything we don't have the confidence to be running tests
> > > against likely is risky to enable, so unstable seems suitable to me?
> > 
> > "unstable" to me conveys that it's simply crashy because "stability" often
> > refers to how frequent a crash is encountered at least internally at Apple
> > and elsewhere in the general software QA sense so I don't think we want to
> > be using that specific term.
> 
> Features in this category could potentially be crashy, or have serious
> security for perf issues. Or they could have nothing but stubs. It's
> probably good to use a name that sounds a bit scary. I'm not sure
> "prototype" or "exploratory" sufficiently convey that here there be dragons.
> It's possible there's a better word than "unstable" though.

Hm... maybe "untested" features? That conveys exactly what these things are.

> > > > I also feel like the distinction between "experimental" vs "developer debug"
> > > > might be a bit confusing for web developers.
> > > > 
> > > > Maybe we need to explicitly call it some of the flags are for feature
> > > > enablement and others are for debugging purposes so something like:
> > > > "Internal Debug Options"
> > > > "Developer Debug Options"
> > > > "Exploratory Features"
> > > > "Test Enabled Features"
> > > > "Experimentally Enabled Features"
> > > > "Stably Enabled Features"
> > > 
> > > It's probably not worth getting too hung up on exactly what string we use in
> > > the YAML file, and allow application embedding WebKit to choose how to
> > > present this information to users. If an application segregates the
> > > different categories (i.e., doesn't have the current Safari behaviour of
> > > putting everything in a single menu), I think the distinction between
> > > "experimental" and "developer debug" should be pretty clear?
> > 
> > I disagree. That's one thing that separates WebKit from the rest of browser
> > engine open source projects. We're very much insistent on using
> > terminologies that make sense for humans even in our code so that we don't
> > have code -> human-readable documentation. It has served us well, and we
> > should continue to make our code self documentary.
> 
> I like distinguishing "option" flags from "feature" flags, but maybe we
> could try for something easy to type/read in config files. Maybe we don't
> need the word Enabled in all places, and perhaps these phrases could be all
> lowercase.

I don't think we necessarily need to type in these names. We could either put a new field in yaml like below or just put debugging options and feature flags in two different files. It's really confusing now that WebPreferences list both the list of features as well as things we only disable or enable for debugging purposes. That's a lot more important distinction than whether a given feature is being tested vs. experimental to me.

AsyncClipboardAPIEnabled:
  type: bool
  kind: feature
  state: stable
  humanReadableName: "Async Clipboard API"
  humanReadableDescription: "Enable the async clipboard API"
  defaultValue:
...

WebGL2Enabled:
  type: bool
  kind: feature
  state: experimental
  humanReadableName: "WebGL 2.0"
  humanReadableDescription: "WebGL 2 prototype"
  webcoreBinding: RuntimeEnabledFeatures
  condition: ENABLE(WEBGL2)
  defaultValue:
...

RequestIdleCallbackEnabled:
  type: bool
  kind: feature
  state: untested
  humanReadableName: "requestIdleCallback"
  humanReadableDescription: "Enable requestIdleCallback support"
  defaultValue:
...

TextAreasAreResizable:
  type: bool
  kind: internal.debug
  webcoreOnChange: setNeedsRecalcStyleInAllFrames
  defaultValue:
...

> For feature flag states that are features, not options, I wonder if we could
> get the WebKit feature status page to reflect them. It would be a bit more
> granular info than what we have now, where "Under Development" could be a
> wide range of conditions.

Yeah, I think that would be ideal. We've had a number of cases where the feature status page wasn't updated in time because the person who was working on it simply forgot about it.

This is sort of the thing I was talking about other day when I said we have too many random things we need to do / update when we work on a new feature. Anyway we can reduce the total number of random house keeping to do, the less likely that things will get out-of-date / out-of-sync.
Comment 27 Sam Weinig 2021-03-24 09:50:00 PDT
(In reply to Ryosuke Niwa from comment #26)
> 
> This is sort of the thing I was talking about other day when I said we have
> too many random things we need to do / update when we work on a new feature.
> Anyway we can reduce the total number of random house keeping to do, the
> less likely that things will get out-of-date / out-of-sync.

To be clear, this is what I have been working to iteratively improve over the 6 months, and I have gotten rid of quite a few of them (no longer need to update WebCore, WebKitLegacy Mac, WebKit, DumpRenderTree, and WebKitTestRunner for runtime flags was a big one, (still have one more to go, but we will get there, see Bug 222866). But one thing that makes it a challenging project is the lack of definitions for things.

I would like to merge features.json and these preference yaml files at some point, but it is not clear they are always at the same granularity, so we will have to be careful in how we go about doing it, but it is a really important goal to me that we do (my memory is not good enough to remember more than 1 or 2 places to update anything).
Comment 28 Ryosuke Niwa 2021-03-24 13:10:20 PDT
(In reply to Sam Weinig from comment #27)
> (In reply to Ryosuke Niwa from comment #26)
> > 
> > This is sort of the thing I was talking about other day when I said we have
> > too many random things we need to do / update when we work on a new feature.
> > Anyway we can reduce the total number of random house keeping to do, the
> > less likely that things will get out-of-date / out-of-sync.
> 
> To be clear, this is what I have been working to iteratively improve over
> the 6 months, and I have gotten rid of quite a few of them (no longer need
> to update WebCore, WebKitLegacy Mac, WebKit, DumpRenderTree, and
> WebKitTestRunner for runtime flags was a big one, (still have one more to
> go, but we will get there, see Bug 222866). But one thing that makes it a
> challenging project is the lack of definitions for things.

Yeah, even just the fact we can share the default across WebKit & WebKitLegacy has been a huge plus! So thanks for working on that.

> I would like to merge features.json and these preference yaml files at some
> point, but it is not clear they are always at the same granularity, so we
> will have to be careful in how we go about doing it, but it is a really
> important goal to me that we do (my memory is not good enough to remember
> more than 1 or 2 places to update anything).

Yeah, but maybe we can define some kind of "Feature Category" or something for that kind of stuff. Or maybe it ain't so bad if we had another file defining these categories or how flags should be grouped together. I think one massive plus would be to have a URL to spec right next to feature flag. I often wonder what a given feature flag is for because the name of spec doesn't necessarily always match the feature flag.
Comment 29 Maciej Stachowiak 2021-04-22 13:35:10 PDT
(In reply to Ryosuke Niwa from comment #26)
> (In reply to Maciej Stachowiak from comment #25)
>
> > 
> > I like distinguishing "option" flags from "feature" flags, but maybe we
> > could try for something easy to type/read in config files. Maybe we don't
> > need the word Enabled in all places, and perhaps these phrases could be all
> > lowercase.
> 
> I don't think we necessarily need to type in these names. We could either
> put a new field in yaml like below or just put debugging options and feature
> flags in two different files. It's really confusing now that WebPreferences
> list both the list of features as well as things we only disable or enable
> for debugging purposes. That's a lot more important distinction than whether
> a given feature is being tested vs. experimental to me.
> 
> AsyncClipboardAPIEnabled:
>   type: bool
>   kind: feature
>   state: stable
>   humanReadableName: "Async Clipboard API"
>   humanReadableDescription: "Enable the async clipboard API"
>   defaultValue:
> ...
> 
> WebGL2Enabled:
>   type: bool
>   kind: feature
>   state: experimental
>   humanReadableName: "WebGL 2.0"
>   humanReadableDescription: "WebGL 2 prototype"
>   webcoreBinding: RuntimeEnabledFeatures
>   condition: ENABLE(WEBGL2)
>   defaultValue:
> ...
> 
> RequestIdleCallbackEnabled:
>   type: bool
>   kind: feature
>   state: untested
>   humanReadableName: "requestIdleCallback"
>   humanReadableDescription: "Enable requestIdleCallback support"
>   defaultValue:
> ...
> 
> TextAreasAreResizable:
>   type: bool
>   kind: internal.debug
>   webcoreOnChange: setNeedsRecalcStyleInAllFrames
>   defaultValue:
> ...

Reasonable direction but I'm not sure I like having both "kind" and "State" fields (especially since, in this example, state does not apply to option/debugging flags at all, and instead they say everything in the state.

Is the dot syntax because yaml doesn't allow spaces (without quotes) in the value field?
Comment 30 Maciej Stachowiak 2021-04-22 13:37:05 PDT
(In reply to Ryosuke Niwa from comment #28)
> (In reply to Sam Weinig from comment #27)
> > To be clear, this is what I have been working to iteratively improve over
> > the 6 months, and I have gotten rid of quite a few of them (no longer need
> > to update WebCore, WebKitLegacy Mac, WebKit, DumpRenderTree, and
> > WebKitTestRunner for runtime flags was a big one, (still have one more to
> > go, but we will get there, see Bug 222866). But one thing that makes it a
> > challenging project is the lack of definitions for things.
> 
> Yeah, even just the fact we can share the default across WebKit &
> WebKitLegacy has been a huge plus! So thanks for working on that.

+1
Comment 31 Beth Dakin 2022-03-29 15:45:01 PDT
What are next steps to make some of these changes a reality?
Comment 32 Maciej Stachowiak 2022-04-25 14:56:24 PDT
I think someone needs to write this up as a more formal proposal, and circulate in the WebKit community. Let's find someone to do that. (Not sure I have the time myself).
Comment 33 Ahmad Saleem 2023-04-06 18:08:13 PDT
@Brent - recently there was lot of work done to categorize Experimental Features in WebKit. Do we need to do anything more here?