222885 – Formalize rules for what is an Experimental Feature

NEW 222885

Formalize rules for what is an Experimental Feature

https://bugs.webkit.org/show_bug.cgi?id=222885

Summary Formalize rules for what is an Experimental Feature

Sam Weinig

Reported 2021-03-07 08:54:30 PST

We should formalize rules for what is an Experimental Feature and what isn't. Currently, it is a hodge-podge of features that don't work at all, features we would like to ship one day, and features we have already shipped. Having a clear set of rules would make things much easier to reason about.

Attachments
Add attachment proposed patch, testcase, etc.

Sam Weinig

Comment 1 2021-03-07 08:58:47 PST

I propose the following rules: Experimental Features are: - Features that have not shipped in an official release. - Off by default (this means that if the experiment is disabling something, it should be phrased as a negative, e.g. Disable SQL Databases) - Stable enough that we would like people to try them out.

Simon Fraser (smfr)

Comment 2 2021-03-08 09:58:56 PST

Would be good to clarify what happens when: 1. the feature gets stable enough to be on by default, but has not been shipped yet. 2. the feature has been shipped in an OS release (by Apple, at least) There is utility in turning the feature off for QA testing (or, for a web dev, to test behavior in older Safari versions), so the switch needs to be exposed somewhere.

Sam Weinig

Comment 3 2021-03-08 14:08:03 PST

(In reply to Simon Fraser (smfr) from comment #2) > Would be good to clarify what happens when: > 1. the feature gets stable enough to be on by default, but has not been > shipped yet. This might be good argument against my "Off by default" rule. > 2. the feature has been shipped in an OS release (by Apple, at least) This is a good point. Perhaps "experimental" status needs to be a per-port concept. Perhaps we should have an independent list of Features that are experimental, instead of binding it into the preferences themselves. > There is utility in turning the feature off for QA testing (or, for a web > dev, to test behavior in older Safari versions), so the switch needs to be > exposed somewhere. Agreed. I think once a feature has been shipped on by default, using an Internal Setting should cover this. Ok, updated proposal: Experimental Features are: - Features that have not shipped in an official release. - Preferred to be off by default (this means that if the experiment is disabling something, it should be phrased as a negative, e.g. Disable SQL Databases), but a stable feature waiting for an upcoming release can stay in the experimental features set on by default until has shipped. - Stable enough that we would like people to try them out. Features that still have utility in being configurable after they ship should move to the Internal Features set. Implementation Changes: - Merge all WebPreferences*.yaml files into WebPreferences.yaml - Add new ExperimentalFeatures.yaml, InternalFeatures.yaml and DebugFeatures.yaml files that list preferences from WebPreferences.yaml that should be in that set. - Stop relying on a feature being on by default in LayoutTests when it is an Experimental Feature, and rather add new keys to the defaultValue: dictionary in each preference definition for WebKitTestRunner and DumpRenderTree.

Ryosuke Niwa

Comment 4 2021-03-08 18:10:20 PST

(In reply to Sam Weinig from comment #3) > Implementation Changes: > - Merge all WebPreferences*.yaml files into WebPreferences.yaml > - Add new ExperimentalFeatures.yaml, InternalFeatures.yaml and > DebugFeatures.yaml files that list preferences from WebPreferences.yaml that > should be in that set. Can we have a simple Features.yaml and then specify what's experimental & what's internal/debug in that file? I find it annoying that I have to open different files just because something has changed its status from being experimental to internal, etc... then we can have all the information about which platform/port will enable which feature in this one file. Maybe we can add some script to automatically spit out the all the features enabled / disabled by default in a given port as well so that figuring out what's enabled / disabled in a given release/branch will be easy as well.

Sam Weinig

Comment 5 2021-03-09 12:53:50 PST

(In reply to Ryosuke Niwa from comment #4) > (In reply to Sam Weinig from comment #3) > > Implementation Changes: > > - Merge all WebPreferences*.yaml files into WebPreferences.yaml > > - Add new ExperimentalFeatures.yaml, InternalFeatures.yaml and > > DebugFeatures.yaml files that list preferences from WebPreferences.yaml that > > should be in that set. > > Can we have a simple Features.yaml and then specify what's experimental & > what's internal/debug in that file? I find it annoying that I have to open > different files just because something has changed its status from being > experimental to internal, etc... then we can have all the information about > which platform/port will enable which feature in this one file. > I'd at least like to separate the list of all preferences (and their default values, etc) from the classification of experimental vs. internal vs. debug. Perhaps something like: PreferenceClassification.yaml Experimental: - AccessibilityObjectModelEnabled - AspectRatioEnabled - { name: AspectRatioOfImgFromWidthAndHeightEnabled enabledFor: PLATFORM(COCOA): false default: true } - AsyncClipboardAPIEnabled ... Internal: - AllowViewportShrinkToFitContent - AlwaysZoomOnDoubleTap ... etc.

Sam Weinig

Comment 6 2021-03-09 12:57:38 PST

(actually, let's separate out the implementation part of this from the policy and keep this bug to just deciding the policy).

Sam Weinig

Comment 7 2021-03-09 13:01:53 PST

So, without the implementation aspects of the proposal, the latest proposal is: The Experimental Features set is: - Features that have not shipped in an official release. - Preferred to be off by default (this means that if the experiment is disabling something, it should be phrased as a negative, e.g. Disable SQL Databases), but a stable feature waiting for an upcoming release can stay in the experimental features set on by default until has shipped. - Stable enough that we would like people to try them out. - A feature or bit of functionality we would like for developers to try out. Features that still have utility in being configurable after they ship should move to the Internal Features set. Debug functionality or functionality never intended to ship should not be included in the Experimental Features set.

Ryosuke Niwa

Comment 8 2021-03-09 22:45:01 PST

(In reply to Sam Weinig from comment #7) > > - Features that have not shipped in an official release. What does an official release mean though? Do any "official" release of any browser in any port quality?

Sam Weinig

Comment 9 2021-03-10 11:00:07 PST

(In reply to Ryosuke Niwa from comment #8) > (In reply to Sam Weinig from comment #7) > > > > - Features that have not shipped in an official release. > > What does an official release mean though? Do any "official" release of any > browser in any port quality? I was intending for it to be up to the individual ports to define what it means for them, letting each port define what is experimental for them at the moment.

Ryosuke Niwa

Comment 10 2021-03-10 16:19:48 PST

(In reply to Sam Weinig from comment #9) > (In reply to Ryosuke Niwa from comment #8) > > (In reply to Sam Weinig from comment #7) > > > > > > - Features that have not shipped in an official release. > > > > What does an official release mean though? Do any "official" release of any > > browser in any port quality? > > I was intending for it to be up to the individual ports to define what it > means for them, letting each port define what is experimental for them at > the moment. So the definition of an experimental feature is per port?

Sam Weinig

Comment 11 2021-03-10 18:04:05 PST

(In reply to Ryosuke Niwa from comment #10) > (In reply to Sam Weinig from comment #9) > > (In reply to Ryosuke Niwa from comment #8) > > > (In reply to Sam Weinig from comment #7) > > > > > > > > - Features that have not shipped in an official release. > > > > > > What does an official release mean though? Do any "official" release of any > > > browser in any port quality? > > > > I was intending for it to be up to the individual ports to define what it > > means for them, letting each port define what is experimental for them at > > the moment. > > So the definition of an experimental feature is per port? Yes, that is what I am proposing here.

Radar WebKit Bug Importer

Comment 12 2021-03-14 09:55:15 PDT

<rdar://problem/75408862>

Maciej Stachowiak

Comment 13 2021-03-18 18:53:55 PDT

(In reply to Sam Weinig from comment #1) > I propose the following rules: > > Experimental Features are: > > - Features that have not shipped in an official release. > - Off by default (this means that if the experiment is disabling something, > it should be phrased as a negative, e.g. Disable SQL Databases) > - Stable enough that we would like people to try them out. What would be the correct status for a feature that meets this first two of these criteria but not the third (but which does have a runtime feature flag)?

Maciej Stachowiak

Comment 14 2021-03-18 18:59:58 PDT

(In reply to Sam Weinig from comment #7) > So, without the implementation aspects of the proposal, the latest proposal > is: > > > The Experimental Features set is: > > - Features that have not shipped in an official release. > - Preferred to be off by default (this means that if the experiment is > disabling something, it should be phrased as a negative, e.g. Disable SQL > Databases), but a stable feature waiting for an upcoming release can stay in > the experimental features set on by default until has shipped. > - Stable enough that we would like people to try them out. > - A feature or bit of functionality we would like for developers to try out. > > Features that still have utility in being configurable after they ship > should move to the Internal Features set. Seems like this would make it inconvenient for people to try turning such features off (at least in Apple ports), and would create a confusing disconnect for anyone testing on/off comparisons, where they now have to look in a different place. I can see how it is conceptually wrong to call a shipping feature “experimental” but this aspect of the proposal seems to have serious downsides, unless we change the UI to make internal flags much more easily accessible, and put them closer to experimental flags. By way of comparison, other browsers with feature flag UI do not move flags to a totally different and less easily accessible place when they become on by default. And I don’t think we should do so either. Maybe this means we should drop the label “experimental” and just call them something neutral like “feature flags”. > > Debug functionality or functionality never intended to ship should not be > included in the Experimental Features set.

Sam Weinig

Comment 15 2021-03-22 10:49:01 PDT

(In reply to Maciej Stachowiak from comment #14) > (In reply to Sam Weinig from comment #7) > > So, without the implementation aspects of the proposal, the latest proposal > > is: > > > > > > The Experimental Features set is: > > > > - Features that have not shipped in an official release. > > - Preferred to be off by default (this means that if the experiment is > > disabling something, it should be phrased as a negative, e.g. Disable SQL > > Databases), but a stable feature waiting for an upcoming release can stay in > > the experimental features set on by default until has shipped. > > - Stable enough that we would like people to try them out. > > - A feature or bit of functionality we would like for developers to try out. > > > > Features that still have utility in being configurable after they ship > > should move to the Internal Features set. > > Seems like this would make it inconvenient for people to try turning such > features off (at least in Apple ports), and would create a confusing > disconnect for anyone testing on/off comparisons, where they now have to > look in a different place. > > I can see how it is conceptually wrong to call a shipping feature > “experimental” but this aspect of the proposal seems to have serious > downsides, unless we change the UI to make internal flags much more easily > accessible, and put them closer to experimental flags. > > By way of comparison, other browsers with feature flag UI do not move flags > to a totally different and less easily accessible place when they become on > by default. And I don’t think we should do so either. > > Maybe this means we should drop the label “experimental” and just call them > something neutral like “feature flags”. This is getting a bit too into Safari's UI probably, but the Develop menu has other features in it that aren't "experimental" (see the "Disable Styles", "Disable JavaScript" section). There is nothing stopping Safari from adding these features where it is useful to disable them to other parts of the menu. I also think it would be a useful exercise to make a decisions when a feature has shipped if it really is useful to have a way to disable it, and thus it's manual inclusion in the Develop menu will require more thought. I am not convinced that most features need a way to be disabled after shipping, so this seems ok to me. That said, if we want to make it "easy" to have a way to disable some features after shipping, we could make another group, "Toggle-able Features" say, that exists parallel to the "Experimental" and "Internal" Features groups. Then, Safari could decide how it wants to display these. I am making this argument because I think it is valuable to have clear delineation of what we think is "experimental", as it makes a clear indication both to those working on WebKit and to web developers on the status of the feature. The more we conflate shipping and experimental status, the muddier and more confusing that becomes.

Maciej Stachowiak

Comment 16 2021-03-22 11:23:57 PDT

(In reply to Sam Weinig from comment #15) > (In reply to Maciej Stachowiak from comment #14) > > (In reply to Sam Weinig from comment #7) > > > > > > Seems like this would make it inconvenient for people to try turning such > > features off (at least in Apple ports), and would create a confusing > > disconnect for anyone testing on/off comparisons, where they now have to > > look in a different place. > > > > I can see how it is conceptually wrong to call a shipping feature > > “experimental” but this aspect of the proposal seems to have serious > > downsides, unless we change the UI to make internal flags much more easily > > accessible, and put them closer to experimental flags. > > > > By way of comparison, other browsers with feature flag UI do not move flags > > to a totally different and less easily accessible place when they become on > > by default. And I don’t think we should do so either. > > > > Maybe this means we should drop the label “experimental” and just call them > > something neutral like “feature flags”. > > This is getting a bit too into Safari's UI probably, but the Develop menu > has other features in it that aren't "experimental" (see the "Disable > Styles", "Disable JavaScript" section). There is nothing stopping Safari > from adding these features where it is useful to disable them to other parts > of the menu. I also think it would be a useful exercise to make a decisions > when a feature has shipped if it really is useful to have a way to disable > it, and thus it's manual inclusion in the Develop menu will require more > thought. I am not convinced that most features need a way to be disabled > after shipping, so this seems ok to me. I don't think it is reasonable to label flags for disabling recently-enabled features "Internal", and then hand-curate a list at the Safari level of ones that get put in the Develop menu instead of the place where all other Internal flags go. I think this is a poor approach because: (a) It's misleading to call these types of flags Internal if they are not actually intended to be treated this way. (b) WebKit has more expertise than Safari on which flags of this type should be offered for convenient disablement. This is analogous to the way WebKit has more expertise in which flags represent experimental features that are ready to try, so we don't ask WebKit-based browsers to manually curate their own list, we give them one. (c) This hand-curated list would have to be manually kept in sync with the status of things in WebKit, including things where we might remove the runtime feature flag entirely. (d) We sort of support running old Safari with new WebKit; that would automatically pick up new Experimental flags, but not flags in this category. > > That said, if we want to make it "easy" to have a way to disable some > features after shipping, we could make another group, "Toggle-able Features" > say, that exists parallel to the "Experimental" and "Internal" Features > groups. Then, Safari could decide how it wants to display these. I agree that making a distinct group for recently-enabled features would be a better proposal. This would also let browsers put both this category and "Experimental" flags in one UI, or in separate places, as they prefer. Personally, I'd argue that any runtime feature flag representing a web-exposed feature, where we have not yet chosen to remove the flag and leave the feature enabled always, is likely worth toggling for testing. But if we think it needs to be a more case-by-case process, the WebKit project is in a better position to execute that process. > > > I am making this argument because I think it is valuable to have clear > delineation of what we think is "experimental", as it makes a clear > indication both to those working on WebKit and to web developers on the > status of the feature. The more we conflate shipping and experimental > status, the muddier and more confusing that becomes. Part of the problem here is trying to use a hierarchy of menus as the UI. Other browsers have a page-like experience that lets you toggle flags but also makes clear what the default value is. With such a UI, they have no need to distinguish flags that are default-off, but worth testing (what we call "Experimental") from flags that are enabled by default but sometimes worth turning off for testing; and no need to put them in different places.

Sam Weinig

Comment 17 2021-03-22 11:27:24 PDT

(In reply to Maciej Stachowiak from comment #13) > (In reply to Sam Weinig from comment #1) > > I propose the following rules: > > > > Experimental Features are: > > > > - Features that have not shipped in an official release. > > - Off by default (this means that if the experiment is disabling something, > > it should be phrased as a negative, e.g. Disable SQL Databases) > > - Stable enough that we would like people to try them out. > > What would be the correct status for a feature that meets this first two of > these criteria but not the third (but which does have a runtime feature > flag)? It could either be an internal feature or an unannotated feature.

Sam Weinig

Comment 18 2021-03-22 11:34:53 PDT

The only reason the "other" option is "internal features" is that is the only other group we have that is automatically exposed. We can quite easily add more categories(In reply to Sam Weinig from comment #17) > (In reply to Maciej Stachowiak from comment #13) > > (In reply to Sam Weinig from comment #1) > > > I propose the following rules: > > > > > > Experimental Features are: > > > > > > - Features that have not shipped in an official release. > > > - Off by default (this means that if the experiment is disabling something, > > > it should be phrased as a negative, e.g. Disable SQL Databases) > > > - Stable enough that we would like people to try them out. > > > > What would be the correct status for a feature that meets this first two of > > these criteria but not the third (but which does have a runtime feature > > flag)? > > It could either be an internal feature or an unannotated feature. Or a we could add a new category of toggle-able shipped features for it to go in.

Sam Sneddon [:gsnedders]

Comment 19 2021-03-22 17:10:33 PDT

https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/blink/renderer/platform/RuntimeEnabledFeatures.md is the Blink guidelines for this. In short, they have four basic categories: * status: <missing> - anything goes. * status: "test" - it must be in a sufficient state to permit internal testing. For example, enabling it should not be known to easily cause crashes, leak memory, or otherwise significantly effect the reliability of bots. Consideration should also be given to the potential for loss of test coverage of shipping behavior. * status: "experimental" - it should be far enough along to permit testing by early adopter web developers. Many chromium enthusiasts run with --enable-experimental-web-platform-features, and so promoting a feature to experimental status can be a good way to get early warning of any stability or compatibility problems. If such problems are discovered (e.g. major websites being seriously broken when the feature is enabled), the feature should be demoted back to no status or status: "test" to avoid creating undue problems for such users. * status: "stable" - it must be complete and ready for use by all chrome users. Personally, I quite like this, as it gives a clear indication as to how usable we expect the browser to be with the feature enabled. Roughly speaking, we currently have three statuses effectively: * off-by-default - anything goes * enabled in TestOptions::defaults - equivalent of Blink's status: "test" * on-by-default - equivalent of Blink's status: "stable" What we don't have is anything equivalent to their status: "experimental", and as an outsider it’s very hard to distinguish features where they’re off-by-default because (e.g.) just the interfaces have been added with no/little implementation from those where they’re off-by-default because we’d rather they remain off until the next release branches (due to lack of confidence or whatever reason) but are otherwise believed to be complete. Note that this does have some degree of ramification for https://wpt.fyi's default view, where STP is currently run (at the WebKit team’s request) with only features enabled by default, whereas Chrome Dev is run with --enable-experimental-web-platform-features and Firefox Nightly is run with all features enabled in CI, hence STP is at a disadvantage to the others, but currently there’s no easy route to determine what a reasonable set of features to enable would be (and leads to a bigger loss of coverage for Safari, given Chrome and Firefox both have their beta releases with only their on-by-default settings but Safari beta results aren’t available). On the whole, my preference would be: * Have a single runtime-enabled features file (this means changing status should just be a -1/+1 change of the status line). * Adopt the four categories from Blink, defining them in some sensible but largely analogous way, perhaps splitting the status: <missing> into a further status: "debug only" for things like ITP disablement. * Expose those four categories through the WebKit API. At that point, applications embedding WebKit can split the list up in their UI in a more meaningful chunks, and potentially add "enable all experimental features" and "enable all testable features" options too.

Maciej Stachowiak

Comment 20 2021-03-22 18:10:13 PDT

(In reply to Sam Weinig from comment #18) > The only reason the "other" option is "internal features" is that is the > only other group we have that is automatically exposed. We can quite easily > add more categories(In reply to Sam Weinig from comment #17) > > (In reply to Maciej Stachowiak from comment #13) > > > (In reply to Sam Weinig from comment #1) > > > > I propose the following rules: > > > > > > > > Experimental Features are: > > > > > > > > - Features that have not shipped in an official release. > > > > - Off by default (this means that if the experiment is disabling something, > > > > it should be phrased as a negative, e.g. Disable SQL Databases) > > > > - Stable enough that we would like people to try them out. > > > > > > What would be the correct status for a feature that meets this first two of > > > these criteria but not the third (but which does have a runtime feature > > > flag)? > > > > It could either be an internal feature or an unannotated feature. > > Or a we could add a new category of toggle-able shipped features for it to > go in. By assumption, a feature like this would not be "shipped". To be clear, I was asking about a feature that has not shipped in an official release, is off by default, but is _not_ stable enough for it to be on by default. I'm not sure what the category of "unannotated feature" implies but maybe it's a fit.

Maciej Stachowiak

Comment 21 2021-03-22 18:18:48 PDT

(In reply to Sam Sneddon [:gsnedders] from comment #19) > https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/blink/ > renderer/platform/RuntimeEnabledFeatures.md is the Blink guidelines for this. > > In short, they have four basic categories: > > * status: <missing> - anything goes. > > * status: "test" - it must be in a sufficient state to permit internal > testing. For example, enabling it should not be known to easily cause > crashes, leak memory, or otherwise significantly effect the reliability of > bots. Consideration should also be given to the potential for loss of test > coverage of shipping behavior. > > * status: "experimental" - it should be far enough along to permit testing > by early adopter web developers. Many chromium enthusiasts run with > --enable-experimental-web-platform-features, and so promoting a feature to > experimental status can be a good way to get early warning of any stability > or compatibility problems. If such problems are discovered (e.g. major > websites being seriously broken when the feature is enabled), the feature > should be demoted back to no status or status: "test" to avoid creating > undue problems for such users. > > * status: "stable" - it must be complete and ready for use by all chrome > users. > > Personally, I quite like this, as it gives a clear indication as to how > usable we expect the browser to be with the feature enabled. > > Roughly speaking, we currently have three statuses effectively: > > * off-by-default - anything goes > > * enabled in TestOptions::defaults - equivalent of Blink's status: "test" > > * on-by-default - equivalent of Blink's status: "stable" > > What we don't have is anything equivalent to their status: "experimental", > and as an outsider it’s very hard to distinguish features where they’re > off-by-default because (e.g.) just the interfaces have been added with > no/little implementation from those where they’re off-by-default because > we’d rather they remain off until the next release branches (due to lack of > confidence or whatever reason) but are otherwise believed to be complete. > > Note that this does have some degree of ramification for https://wpt.fyi's > default view, where STP is currently run (at the WebKit team’s request) with > only features enabled by default, whereas Chrome Dev is run with > --enable-experimental-web-platform-features and Firefox Nightly is run with > all features enabled in CI, hence STP is at a disadvantage to the others, > but currently there’s no easy route to determine what a reasonable set of > features to enable would be (and leads to a bigger loss of coverage for > Safari, given Chrome and Firefox both have their beta releases with only > their on-by-default settings but Safari beta results aren’t available). > > On the whole, my preference would be: > > * Have a single runtime-enabled features file (this means changing status > should just be a -1/+1 change of the status line). > > * Adopt the four categories from Blink, defining them in some sensible but > largely analogous way, perhaps splitting the status: <missing> into a > further status: "debug only" for things like ITP disablement. > > * Expose those four categories through the WebKit API. > > At that point, applications embedding WebKit can split the list up in their > UI in a more meaningful chunks, and potentially add "enable all experimental > features" and "enable all testable features" options too. I like this proposal a lot! One proposed friendly amendment: let's disallow missing status (build failure if you try to add a feature with no status) and add meaningful statuses for flags that are not test, experimental, or stable. Here's a few I can think of: * unstable - meant to be an actual feature, which would eventually evolve to "test" status and beyond, but not yet functional and usable. * internal debug - flags like "Disable Accelerated Compositing" that we think browser devs might need to use to diagnose problems, but that we don't think are reasonable useful for web developers * developer debug - debugging flags that we think are reasonable for web developers to try to diagnose problems (perhaps the WebRTC flags, currently in their own submenu, would fall in this category) Maybe pulling on this thread there'd be too many statuses to represent every one, but missing status can be a mistake that's easy to make and hard to spot (hard to notice lack of something), so I think it would be better to have statuses that can apply to any kind of flag we'd like to have.

Ryosuke Niwa

Comment 22 2021-03-22 18:26:05 PDT

(In reply to Maciej Stachowiak from comment #21) > (In reply to Sam Sneddon [:gsnedders] from comment #19) > > https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/blink/ > > renderer/platform/RuntimeEnabledFeatures.md is the Blink guidelines for this. > > > > In short, they have four basic categories: > > > > * status: <missing> - anything goes. > > > > * status: "test" - it must be in a sufficient state to permit internal > > testing. For example, enabling it should not be known to easily cause > > crashes, leak memory, or otherwise significantly effect the reliability of > > bots. Consideration should also be given to the potential for loss of test > > coverage of shipping behavior. > > > > * status: "experimental" - it should be far enough along to permit testing > > by early adopter web developers. Many chromium enthusiasts run with > > --enable-experimental-web-platform-features, and so promoting a feature to > > experimental status can be a good way to get early warning of any stability > > or compatibility problems. If such problems are discovered (e.g. major > > websites being seriously broken when the feature is enabled), the feature > > should be demoted back to no status or status: "test" to avoid creating > > undue problems for such users. > > > > * status: "stable" - it must be complete and ready for use by all chrome > > users. > > > > Personally, I quite like this, as it gives a clear indication as to how > > usable we expect the browser to be with the feature enabled. > > > > Roughly speaking, we currently have three statuses effectively: > > > > * off-by-default - anything goes > > > > * enabled in TestOptions::defaults - equivalent of Blink's status: "test" > > > > * on-by-default - equivalent of Blink's status: "stable" > > > > What we don't have is anything equivalent to their status: "experimental", > > and as an outsider it’s very hard to distinguish features where they’re > > off-by-default because (e.g.) just the interfaces have been added with > > no/little implementation from those where they’re off-by-default because > > we’d rather they remain off until the next release branches (due to lack of > > confidence or whatever reason) but are otherwise believed to be complete. > > > > Note that this does have some degree of ramification for https://wpt.fyi's > > default view, where STP is currently run (at the WebKit team’s request) with > > only features enabled by default, whereas Chrome Dev is run with > > --enable-experimental-web-platform-features and Firefox Nightly is run with > > all features enabled in CI, hence STP is at a disadvantage to the others, > > but currently there’s no easy route to determine what a reasonable set of > > features to enable would be (and leads to a bigger loss of coverage for > > Safari, given Chrome and Firefox both have their beta releases with only > > their on-by-default settings but Safari beta results aren’t available). > > > > On the whole, my preference would be: > > > > * Have a single runtime-enabled features file (this means changing status > > should just be a -1/+1 change of the status line). > > > > * Adopt the four categories from Blink, defining them in some sensible but > > largely analogous way, perhaps splitting the status: <missing> into a > > further status: "debug only" for things like ITP disablement. > > > > * Expose those four categories through the WebKit API. > > > > At that point, applications embedding WebKit can split the list up in their > > UI in a more meaningful chunks, and potentially add "enable all experimental > > features" and "enable all testable features" options too. > > I like this proposal a lot! > > One proposed friendly amendment: let's disallow missing status (build > failure if you try to add a feature with no status) and add meaningful > statuses for flags that are not test, experimental, or stable. Here's a few > I can think of: > * unstable - meant to be an actual feature, which would eventually evolve to > "test" status and beyond, but not yet functional and usable. > * internal debug - flags like "Disable Accelerated Compositing" that we > think browser devs might need to use to diagnose problems, but that we don't > think are reasonable useful for web developers > * developer debug - debugging flags that we think are reasonable for web > developers to try to diagnose problems (perhaps the WebRTC flags, currently > in their own submenu, would fall in this category) > > Maybe pulling on this thread there'd be too many statuses to represent every > one, but missing status can be a mistake that's easy to make and hard to > spot (hard to notice lack of something), so I think it would be better to > have statuses that can apply to any kind of flag we'd like to have. So the list of categories we're thinking is: "unstable" - feature in active/stale development "internal debug" - for WebKit engineers "developer debug" - toggling flags for web developers "tests" - enabled by default in DRT/WTR "experimental" - features enable by default in STP, etc... but not yet stable "stable" - enabled by default & ready for prime time I think "prototype" or "exploratory" might be better instead of "unstable". I also feel like the distinction between "experimental" vs "developer debug" might be a bit confusing for web developers. Maybe we need to explicitly call it some of the flags are for feature enablement and others are for debugging purposes so something like: "Internal Debug Options" "Developer Debug Options" "Exploratory Features" "Test Enabled Features" "Experimentally Enabled Features" "Stably Enabled Features"

Sam Sneddon [:gsnedders]

Comment 23 2021-03-23 07:05:00 PDT

(In reply to Ryosuke Niwa from comment #22) > (In reply to Maciej Stachowiak from comment #21) > > I like this proposal a lot! > > > > One proposed friendly amendment: let's disallow missing status (build > > failure if you try to add a feature with no status) and add meaningful > > statuses for flags that are not test, experimental, or stable. Here's a few > > I can think of: > > * unstable - meant to be an actual feature, which would eventually evolve to > > "test" status and beyond, but not yet functional and usable. > > * internal debug - flags like "Disable Accelerated Compositing" that we > > think browser devs might need to use to diagnose problems, but that we don't > > think are reasonable useful for web developers > > * developer debug - debugging flags that we think are reasonable for web > > developers to try to diagnose problems (perhaps the WebRTC flags, currently > > in their own submenu, would fall in this category) > > > > Maybe pulling on this thread there'd be too many statuses to represent every > > one, but missing status can be a mistake that's easy to make and hard to > > spot (hard to notice lack of something), so I think it would be better to > > have statuses that can apply to any kind of flag we'd like to have. > > So the list of categories we're thinking is: > "unstable" - feature in active/stale development > "internal debug" - for WebKit engineers > "developer debug" - toggling flags for web developers > "tests" - enabled by default in DRT/WTR > "experimental" - features enable by default in STP, etc... but not yet stable > "stable" - enabled by default & ready for prime time This seems pretty reasonable. I think we can definitely bikeshed the descriptions, and I think there are reasonable questions about whether we should have "experimental" enabled by default in any given application, but that's a discussion for each application rather than one that needs to have any effect on the categorisation within WebKit. > I think "prototype" or "exploratory" might be better instead of "unstable". I think we convey anything we don't have the confidence to be running tests against likely is risky to enable, so unstable seems suitable to me? > I also feel like the distinction between "experimental" vs "developer debug" > might be a bit confusing for web developers. > > Maybe we need to explicitly call it some of the flags are for feature > enablement and others are for debugging purposes so something like: > "Internal Debug Options" > "Developer Debug Options" > "Exploratory Features" > "Test Enabled Features" > "Experimentally Enabled Features" > "Stably Enabled Features" It's probably not worth getting too hung up on exactly what string we use in the YAML file, and allow application embedding WebKit to choose how to present this information to users. If an application segregates the different categories (i.e., doesn't have the current Safari behaviour of putting everything in a single menu), I think the distinction between "experimental" and "developer debug" should be pretty clear? All this said, probably relevant to any documentation of policy here is also the question of when we want a runtime flag versus a compile-time flag, especially for features in the "unstable" category. Documenting when we want early development to happen behind a compile-time flag instead of a run-time flag is probably relevant. And in extreme, one could imagine an implementation of the runtime flag system whereby the runtime-check is an inline function which returns the constant false for (e.g.) features in the "unstable" category to effectively largely disable them at compile-time.

Ryosuke Niwa

Comment 24 2021-03-23 17:26:46 PDT

(In reply to Sam Sneddon [:gsnedders] from comment #23) > (In reply to Ryosuke Niwa from comment #22) > > (In reply to Maciej Stachowiak from comment #21) > > > I like this proposal a lot! > > > > > > One proposed friendly amendment: let's disallow missing status (build > > > failure if you try to add a feature with no status) and add meaningful > > > statuses for flags that are not test, experimental, or stable. Here's a few > > > I can think of: > > > * unstable - meant to be an actual feature, which would eventually evolve to > > > "test" status and beyond, but not yet functional and usable. > > > * internal debug - flags like "Disable Accelerated Compositing" that we > > > think browser devs might need to use to diagnose problems, but that we don't > > > think are reasonable useful for web developers > > > * developer debug - debugging flags that we think are reasonable for web > > > developers to try to diagnose problems (perhaps the WebRTC flags, currently > > > in their own submenu, would fall in this category) > > > > > > Maybe pulling on this thread there'd be too many statuses to represent every > > > one, but missing status can be a mistake that's easy to make and hard to > > > spot (hard to notice lack of something), so I think it would be better to > > > have statuses that can apply to any kind of flag we'd like to have. > > > > So the list of categories we're thinking is: > > "unstable" - feature in active/stale development > > "internal debug" - for WebKit engineers > > "developer debug" - toggling flags for web developers > > "tests" - enabled by default in DRT/WTR > > "experimental" - features enable by default in STP, etc... but not yet stable > > "stable" - enabled by default & ready for prime time > > This seems pretty reasonable. I think we can definitely bikeshed the > descriptions, and I think there are reasonable questions about whether we > should have "experimental" enabled by default in any given application, but > that's a discussion for each application rather than one that needs to have > any effect on the categorisation within WebKit. > > > I think "prototype" or "exploratory" might be better instead of "unstable". > > I think we convey anything we don't have the confidence to be running tests > against likely is risky to enable, so unstable seems suitable to me? "unstable" to me conveys that it's simply crashy because "stability" often refers to how frequent a crash is encountered at least internally at Apple and elsewhere in the general software QA sense so I don't think we want to be using that specific term. > > I also feel like the distinction between "experimental" vs "developer debug" > > might be a bit confusing for web developers. > > > > Maybe we need to explicitly call it some of the flags are for feature > > enablement and others are for debugging purposes so something like: > > "Internal Debug Options" > > "Developer Debug Options" > > "Exploratory Features" > > "Test Enabled Features" > > "Experimentally Enabled Features" > > "Stably Enabled Features" > > It's probably not worth getting too hung up on exactly what string we use in > the YAML file, and allow application embedding WebKit to choose how to > present this information to users. If an application segregates the > different categories (i.e., doesn't have the current Safari behaviour of > putting everything in a single menu), I think the distinction between > "experimental" and "developer debug" should be pretty clear? I disagree. That's one thing that separates WebKit from the rest of browser engine open source projects. We're very much insistent on using terminologies that make sense for humans even in our code so that we don't have code -> human-readable documentation. It has served us well, and we should continue to make our code self documentary. > All this said, probably relevant to any documentation of policy here is also > the question of when we want a runtime flag versus a compile-time flag, > especially for features in the "unstable" category. The current policy is that any feature that could be complied on all platforms should be a runtime enabled feature unless it poses a new security, privacy, or perf cost that could not be rectified easily. > And in extreme, one could imagine an > implementation of the runtime flag system whereby the runtime-check is an > inline function which returns the constant false for (e.g.) features in the > "unstable" category to effectively largely disable them at compile-time. That is not sufficient in some cases because compile flags are needed for cases in which things just don't compile unless certain platform features / capabilities are available.

Maciej Stachowiak

Comment 25 2021-03-24 00:17:52 PDT

(In reply to Ryosuke Niwa from comment #24) > (In reply to Sam Sneddon [:gsnedders] from comment #23) > > (In reply to Ryosuke Niwa from comment #22) > > > (In reply to Maciej Stachowiak from comment #21) > > > > > I think "prototype" or "exploratory" might be better instead of "unstable". > > > > I think we convey anything we don't have the confidence to be running tests > > against likely is risky to enable, so unstable seems suitable to me? > > "unstable" to me conveys that it's simply crashy because "stability" often > refers to how frequent a crash is encountered at least internally at Apple > and elsewhere in the general software QA sense so I don't think we want to > be using that specific term. Features in this category could potentially be crashy, or have serious security for perf issues. Or they could have nothing but stubs. It's probably good to use a name that sounds a bit scary. I'm not sure "prototype" or "exploratory" sufficiently convey that here there be dragons. It's possible there's a better word than "unstable" though. > > > > I also feel like the distinction between "experimental" vs "developer debug" > > > might be a bit confusing for web developers. > > > > > > Maybe we need to explicitly call it some of the flags are for feature > > > enablement and others are for debugging purposes so something like: > > > "Internal Debug Options" > > > "Developer Debug Options" > > > "Exploratory Features" > > > "Test Enabled Features" > > > "Experimentally Enabled Features" > > > "Stably Enabled Features" > > > > It's probably not worth getting too hung up on exactly what string we use in > > the YAML file, and allow application embedding WebKit to choose how to > > present this information to users. If an application segregates the > > different categories (i.e., doesn't have the current Safari behaviour of > > putting everything in a single menu), I think the distinction between > > "experimental" and "developer debug" should be pretty clear? > > I disagree. That's one thing that separates WebKit from the rest of browser > engine open source projects. We're very much insistent on using > terminologies that make sense for humans even in our code so that we don't > have code -> human-readable documentation. It has served us well, and we > should continue to make our code self documentary. I like distinguishing "option" flags from "feature" flags, but maybe we could try for something easy to type/read in config files. Maybe we don't need the word Enabled in all places, and perhaps these phrases could be all lowercase. > > > All this said, probably relevant to any documentation of policy here is also > > the question of when we want a runtime flag versus a compile-time flag, > > especially for features in the "unstable" category. > > The current policy is that any feature that could be complied on all > platforms should be a runtime enabled feature unless it poses a new > security, privacy, or perf cost that could not be rectified easily. We have a policy document that explains when to use a compile-time flag, which includes the above conditions plus some others: https://webkit.org/feature-policy/ Probably the info about when runtime flags are enabled or disabled at the above link should be replaced with documentation of the various feature flag states. For feature flag states that are features, not options, I wonder if we could get the WebKit feature status page to reflect them. It would be a bit more granular info than what we have now, where "Under Development" could be a wide range of conditions.

Ryosuke Niwa

Comment 26 2021-03-24 01:07:30 PDT

(In reply to Maciej Stachowiak from comment #25) > (In reply to Ryosuke Niwa from comment #24) > > (In reply to Sam Sneddon [:gsnedders] from comment #23) > > > (In reply to Ryosuke Niwa from comment #22) > > > > (In reply to Maciej Stachowiak from comment #21) > > > > > > > I think "prototype" or "exploratory" might be better instead of "unstable". > > > > > > I think we convey anything we don't have the confidence to be running tests > > > against likely is risky to enable, so unstable seems suitable to me? > > > > "unstable" to me conveys that it's simply crashy because "stability" often > > refers to how frequent a crash is encountered at least internally at Apple > > and elsewhere in the general software QA sense so I don't think we want to > > be using that specific term. > > Features in this category could potentially be crashy, or have serious > security for perf issues. Or they could have nothing but stubs. It's > probably good to use a name that sounds a bit scary. I'm not sure > "prototype" or "exploratory" sufficiently convey that here there be dragons. > It's possible there's a better word than "unstable" though. Hm... maybe "untested" features? That conveys exactly what these things are. > > > > I also feel like the distinction between "experimental" vs "developer debug" > > > > might be a bit confusing for web developers. > > > > > > > > Maybe we need to explicitly call it some of the flags are for feature > > > > enablement and others are for debugging purposes so something like: > > > > "Internal Debug Options" > > > > "Developer Debug Options" > > > > "Exploratory Features" > > > > "Test Enabled Features" > > > > "Experimentally Enabled Features" > > > > "Stably Enabled Features" > > > > > > It's probably not worth getting too hung up on exactly what string we use in > > > the YAML file, and allow application embedding WebKit to choose how to > > > present this information to users. If an application segregates the > > > different categories (i.e., doesn't have the current Safari behaviour of > > > putting everything in a single menu), I think the distinction between > > > "experimental" and "developer debug" should be pretty clear? > > > > I disagree. That's one thing that separates WebKit from the rest of browser > > engine open source projects. We're very much insistent on using > > terminologies that make sense for humans even in our code so that we don't > > have code -> human-readable documentation. It has served us well, and we > > should continue to make our code self documentary. > > I like distinguishing "option" flags from "feature" flags, but maybe we > could try for something easy to type/read in config files. Maybe we don't > need the word Enabled in all places, and perhaps these phrases could be all > lowercase. I don't think we necessarily need to type in these names. We could either put a new field in yaml like below or just put debugging options and feature flags in two different files. It's really confusing now that WebPreferences list both the list of features as well as things we only disable or enable for debugging purposes. That's a lot more important distinction than whether a given feature is being tested vs. experimental to me. AsyncClipboardAPIEnabled: type: bool kind: feature state: stable humanReadableName: "Async Clipboard API" humanReadableDescription: "Enable the async clipboard API" defaultValue: ... WebGL2Enabled: type: bool kind: feature state: experimental humanReadableName: "WebGL 2.0" humanReadableDescription: "WebGL 2 prototype" webcoreBinding: RuntimeEnabledFeatures condition: ENABLE(WEBGL2) defaultValue: ... RequestIdleCallbackEnabled: type: bool kind: feature state: untested humanReadableName: "requestIdleCallback" humanReadableDescription: "Enable requestIdleCallback support" defaultValue: ... TextAreasAreResizable: type: bool kind: internal.debug webcoreOnChange: setNeedsRecalcStyleInAllFrames defaultValue: ... > For feature flag states that are features, not options, I wonder if we could > get the WebKit feature status page to reflect them. It would be a bit more > granular info than what we have now, where "Under Development" could be a > wide range of conditions. Yeah, I think that would be ideal. We've had a number of cases where the feature status page wasn't updated in time because the person who was working on it simply forgot about it. This is sort of the thing I was talking about other day when I said we have too many random things we need to do / update when we work on a new feature. Anyway we can reduce the total number of random house keeping to do, the less likely that things will get out-of-date / out-of-sync.

Sam Weinig

Comment 27 2021-03-24 09:50:00 PDT

(In reply to Ryosuke Niwa from comment #26) > > This is sort of the thing I was talking about other day when I said we have > too many random things we need to do / update when we work on a new feature. > Anyway we can reduce the total number of random house keeping to do, the > less likely that things will get out-of-date / out-of-sync. To be clear, this is what I have been working to iteratively improve over the 6 months, and I have gotten rid of quite a few of them (no longer need to update WebCore, WebKitLegacy Mac, WebKit, DumpRenderTree, and WebKitTestRunner for runtime flags was a big one, (still have one more to go, but we will get there, see Bug 222866). But one thing that makes it a challenging project is the lack of definitions for things. I would like to merge features.json and these preference yaml files at some point, but it is not clear they are always at the same granularity, so we will have to be careful in how we go about doing it, but it is a really important goal to me that we do (my memory is not good enough to remember more than 1 or 2 places to update anything).

Ryosuke Niwa

Comment 28 2021-03-24 13:10:20 PDT

(In reply to Sam Weinig from comment #27) > (In reply to Ryosuke Niwa from comment #26) > > > > This is sort of the thing I was talking about other day when I said we have > > too many random things we need to do / update when we work on a new feature. > > Anyway we can reduce the total number of random house keeping to do, the > > less likely that things will get out-of-date / out-of-sync. > > To be clear, this is what I have been working to iteratively improve over > the 6 months, and I have gotten rid of quite a few of them (no longer need > to update WebCore, WebKitLegacy Mac, WebKit, DumpRenderTree, and > WebKitTestRunner for runtime flags was a big one, (still have one more to > go, but we will get there, see Bug 222866). But one thing that makes it a > challenging project is the lack of definitions for things. Yeah, even just the fact we can share the default across WebKit & WebKitLegacy has been a huge plus! So thanks for working on that. > I would like to merge features.json and these preference yaml files at some > point, but it is not clear they are always at the same granularity, so we > will have to be careful in how we go about doing it, but it is a really > important goal to me that we do (my memory is not good enough to remember > more than 1 or 2 places to update anything). Yeah, but maybe we can define some kind of "Feature Category" or something for that kind of stuff. Or maybe it ain't so bad if we had another file defining these categories or how flags should be grouped together. I think one massive plus would be to have a URL to spec right next to feature flag. I often wonder what a given feature flag is for because the name of spec doesn't necessarily always match the feature flag.

Maciej Stachowiak

Comment 29 2021-04-22 13:35:10 PDT

(In reply to Ryosuke Niwa from comment #26) > (In reply to Maciej Stachowiak from comment #25) > > > > > I like distinguishing "option" flags from "feature" flags, but maybe we > > could try for something easy to type/read in config files. Maybe we don't > > need the word Enabled in all places, and perhaps these phrases could be all > > lowercase. > > I don't think we necessarily need to type in these names. We could either > put a new field in yaml like below or just put debugging options and feature > flags in two different files. It's really confusing now that WebPreferences > list both the list of features as well as things we only disable or enable > for debugging purposes. That's a lot more important distinction than whether > a given feature is being tested vs. experimental to me. > > AsyncClipboardAPIEnabled: > type: bool > kind: feature > state: stable > humanReadableName: "Async Clipboard API" > humanReadableDescription: "Enable the async clipboard API" > defaultValue: > ... > > WebGL2Enabled: > type: bool > kind: feature > state: experimental > humanReadableName: "WebGL 2.0" > humanReadableDescription: "WebGL 2 prototype" > webcoreBinding: RuntimeEnabledFeatures > condition: ENABLE(WEBGL2) > defaultValue: > ... > > RequestIdleCallbackEnabled: > type: bool > kind: feature > state: untested > humanReadableName: "requestIdleCallback" > humanReadableDescription: "Enable requestIdleCallback support" > defaultValue: > ... > > TextAreasAreResizable: > type: bool > kind: internal.debug > webcoreOnChange: setNeedsRecalcStyleInAllFrames > defaultValue: > ... Reasonable direction but I'm not sure I like having both "kind" and "State" fields (especially since, in this example, state does not apply to option/debugging flags at all, and instead they say everything in the state. Is the dot syntax because yaml doesn't allow spaces (without quotes) in the value field?

Maciej Stachowiak

Comment 30 2021-04-22 13:37:05 PDT

(In reply to Ryosuke Niwa from comment #28) > (In reply to Sam Weinig from comment #27) > > To be clear, this is what I have been working to iteratively improve over > > the 6 months, and I have gotten rid of quite a few of them (no longer need > > to update WebCore, WebKitLegacy Mac, WebKit, DumpRenderTree, and > > WebKitTestRunner for runtime flags was a big one, (still have one more to > > go, but we will get there, see Bug 222866). But one thing that makes it a > > challenging project is the lack of definitions for things. > > Yeah, even just the fact we can share the default across WebKit & > WebKitLegacy has been a huge plus! So thanks for working on that. +1

Beth Dakin

Comment 31 2022-03-29 15:45:01 PDT

What are next steps to make some of these changes a reality?

Maciej Stachowiak

Comment 32 2022-04-25 14:56:24 PDT

I think someone needs to write this up as a more formal proposal, and circulate in the WebKit community. Let's find someone to do that. (Not sure I have the time myself).

Ahmad Saleem

Comment 33 2023-04-06 18:08:13 PDT

@Brent - recently there was lot of work done to categorize Experimental Features in WebKit. Do we need to do anything more here?

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version Other

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebKit Misc.

Assignee

Nobody

Reported

2021-03-07 08:54 PST

Modified

2023-04-06 18:08 PDT History

CC List

16 users Show

URL

Keywords InRadar

Depends on

247926

Blocks

Dependencies

tree graph