Bug 110479 - [Meta] Implement support for TTML
Summary: [Meta] Implement support for TTML
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-21 08:38 PST by Glenn Adams
Modified: 2023-05-09 05:58 PDT (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Glenn Adams 2013-02-21 08:38:31 PST
The HTML5 track element permits reference to arbitrary types of caption, subtitle, and metadata resources. When making reference to a caption or subtitle resource, that resource may take the form of TTML (application/xml+ttml) [1] as well as WebVTT [2] and other caption/subtitle document formats.

Given that the U.S. Federal Communications Commission (FCC) has designated TTML (in the form of SMPTE-TT [3], a profile of TTML) as a safe harbor format for the delivery of captions over the Internet [4][5], it is expected that use of TTML (and its various profiles, such as SMPTE-TT and SDP-US [6]) will increase on the Internet and in the context of video media delivery over the Web. A number of significant providers of Web video content have already begun making use of TTML, including HBOGO [7] and Netflix.

This bug will serve as a meta bug for tracking other more specific bugs that provide an initial TTML implementation.

Note also that work is underway to publish a second edition of TTML 1.0 [8], and work has also begun on TTML 1.1 [9] which will add or change features in TTML.

[1] http://www.w3.org/TR/2010/REC-ttaf1-dfxp-20101118/
[2] http://dev.w3.org/html5/webvtt/
[3] https://www.smpte.org/sites/default/files/st2052-1-2010.pdf
[4] http://www.dwt.com/FCC-Adopts-Closed-Captioning-Rules-for-Online-Video-Programming-01-17-2012/
[5] http://www.broadbandlawadvisor.com/2011/07/articles/accessibility-persons-with-dis/accessibility-advisory-committee-releases-report-on-internet-closed-captioning-proposes-tiered-schedule-for-rule-compliance/
[6] http://www.w3.org/TR/2013/NOTE-ttml10-sdp-us-20130205/
[7] http://lists.w3.org/Archives/Public/public-tt/2012Jul/0007.html
[8] http://www.w3.org/TR/2013/WD-ttaf1-dfxp-20130131/
[9] https://dvcs.w3.org/hg/ttml/raw-file/default/ttml11/spec/ttml11.html
Comment 1 Maciej Stachowiak 2013-02-21 10:08:51 PST
If anyone plans to work on this, don't forget to follow this process: <http://www.webkit.org/coding/adding-features.html>.

I am not sure that the WebKit community will be interested in having an implementation of TTML, particularly given the dependency on the XSL-FO formatting model.
Comment 2 Glenn Adams 2013-02-21 10:22:32 PST
(In reply to comment #1)
> If anyone plans to work on this, don't forget to follow this process: <http://www.webkit.org/coding/adding-features.html>.

Certainly. Also, I would expect any code to be bracketed with

#if ENABLE(VIDEO_TRACK) && ENABLE(TTML)
#endif
 
> I am not sure that the WebKit community will be interested in having an implementation of TTML, particularly given the dependency on the XSL-FO formatting model.

(1) the use of XSL-FO in the TTML spec is only for explanatory purposes, namely, describing formatting semantics, and not used for implementation purposes; the same semantics can be restated in terms of CSS as well; so that is not really a substantive objection;

(2) IE10 implements support for a subset of TTML;

(3) the FCC rules could translate to mandatory support in a variety of deployment contexts, e.g., HTML5 UAs embedded in television devices available at retail that are already subject to a variety of FCC rules;
Comment 3 Maciej Stachowiak 2013-02-21 10:38:13 PST
(In reply to comment #2)
> (In reply to comment #1)
> > If anyone plans to work on this, don't forget to follow this process: <http://www.webkit.org/coding/adding-features.html>.
> 
> Certainly. Also, I would expect any code to be bracketed with
> 
> #if ENABLE(VIDEO_TRACK) && ENABLE(TTML)
> #endif

You need consensus to add the feature even if it has a feature flag.

> 
> > I am not sure that the WebKit community will be interested in having an implementation of TTML, particularly given the dependency on the XSL-FO formatting model.
> 
> (1) the use of XSL-FO in the TTML spec is only for explanatory purposes, namely, describing formatting semantics, and not used for implementation purposes; the same semantics can be restated in terms of CSS as well; so that is not really a substantive objection;

I don't believe the full XSL-FO formatting model can be mapped CSS, or if it can, no one has done so; it may be possible for the subset of XSL-FO used by a subset of TTML.

> 
> (2) IE10 implements support for a subset of TTML;
> 
> (3) the FCC rules could translate to mandatory support in a variety of deployment contexts, e.g., HTML5 UAs embedded in television devices available at retail that are already subject to a variety of FCC rules;

There are many possible interpretations of the FCC rules. Some say that it's not necessary to specifically implement SMPTE-TT, the safe harbor format, as long as a format with all the mandated capabilities is supported. I have yet to hear an interpretation it's required to implement a subset of TTML that's not the same profile as SMPTE-TT.

I think you will find addition of this feature is controversial, as WebVTT exists in part because of some browser implementors objecting to the complexity of TTML.
Comment 4 Glenn Adams 2013-02-21 11:20:57 PST
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > If anyone plans to work on this, don't forget to follow this process: <http://www.webkit.org/coding/adding-features.html>.
> > 
> > Certainly. Also, I would expect any code to be bracketed with
> > 
> > #if ENABLE(VIDEO_TRACK) && ENABLE(TTML)
> > #endif
> 
> You need consensus to add the feature even if it has a feature flag.

Sure.

> 
> > 
> > > I am not sure that the WebKit community will be interested in having an implementation of TTML, particularly given the dependency on the XSL-FO formatting model.
> > 
> > (1) the use of XSL-FO in the TTML spec is only for explanatory purposes, namely, describing formatting semantics, and not used for implementation purposes; the same semantics can be restated in terms of CSS as well; so that is not really a substantive objection;
> 
> I don't believe the full XSL-FO formatting model can be mapped CSS, or if it can, no one has done so; it may be possible for the subset of XSL-FO used by a subset of TTML.

A very small amount of the XSL-FO model is referenced by TTML. The part that is referenced can easily be mapped to CSS. In any case, the XSL-FO issue is a non-issue IMO.

The TTML spec says:

"The semantics of TTML style presentation are described in terms of the model in [XSL 1.1]. The intended effect of the attributes in this section are to be compatible with the layout model of XSL. Presentation agents may however use any technology to satisfy the authorial intent of the document. In particular since [CSS2] is a subset of this model, a CSS processor may be used for the features that the models have in common."

"Implementors should recognize that it is the layout model of [XSL 1.1] that is being referenced by this specification, not the requirement to use a compliant [XSL 1.1] formatting processor"

If there are in fact XSL-FO semantics required by TTML that cannot be described in terms of CSS semantics, then I'm sure the TTWG would be interested in learning about them, since it was not the intent of the TTWG to make TTML dependent on XSL-FO from an implementation perspective.

> 
> > 
> > (2) IE10 implements support for a subset of TTML;
> > 
> > (3) the FCC rules could translate to mandatory support in a variety of deployment contexts, e.g., HTML5 UAs embedded in television devices available at retail that are already subject to a variety of FCC rules;
> 
> There are many possible interpretations of the FCC rules. Some say that it's not necessary to specifically implement SMPTE-TT, the safe harbor format, as long as a format with all the mandated capabilities is supported. I have yet to hear an interpretation it's required to implement a subset of TTML that's not the same profile as SMPTE-TT.

The real question is whether video service providers will or would like to reference TTML content directly as a delivery format. We have some preliminary input that they will. We also have a variety of downstream ports of WK that are deploying on devices covered by FCC rules that relate to captions. I agree that we don't have the final word on this subject. But I think absence of finality on this matter is not a reasonable objection to exclude TTML from WK. The WK community has added many other features that have less finality or concrete requirements.

> 
> I think you will find addition of this feature is controversial, as WebVTT exists in part because of some browser implementors objecting to the complexity of TTML.

IMO, those objections were based in part in a misunderstanding of the role of XSL-FO in TTML. As I've pointed out, it serves only as a definitional device, and has little or no impact on implementation. I agree that TTML is more general than WebVTT, and as such has some complexities that are not found in WebVTT. However, there is nothing to indicate that we should not support one of the simpler profiles of TTML such as SMPTE-TT or SDP-US.

In any case, I would be willing to claim that an implementation of all of TTML is at least one or two orders of magnitude simpler than SVG or HTML5, or even some of the more complex, recent additions to CSS.

What would be useful is an implementation of a subset of TTML that works in WK against which objective judgments about complexity, etc,  can be made. At present, the arguments are rather speculative in nature, wouldn't you agree?
Comment 5 Silvia Pfeiffer 2013-02-21 14:17:00 PST
Do you have a link to the rules for rendering TTML in browsers? HTML requires these:  http://www.w3.org/TR/html5/single-page.html#replaced-elements
Comment 6 Glenn Adams 2013-02-21 14:51:51 PST
(In reply to comment #5)
> Do you have a link to the rules for rendering TTML in browsers? HTML requires these:  http://www.w3.org/TR/html5/single-page.html#replaced-elements

I guess you are asking if I could provide a reference to rules for TTML that are equivalent to the following found in 10.4.1 Embedded content:

<quote>
Any subtitles or captions are expected to be overlayed directly on top of their video element, as defined by the relevant rendering rules; for WebVTT, those are the WebVTT cue text rendering rules. [WEBVTT]

When the user agent starts exposing a user interface for a video element, the user agent should run the rules for updating the text track rendering of each of the text tracks in the video element's list of text tracks that are showing (e.g., for text tracks based on WebVTT, the rules for updating the display of WebVTT text tracks). [WEBVTT]
</quote>

Is that what you are asking? At present, TTML leaves the decision about where to display the "root container region" up to the external authoring or presentation context (see [1], with a minor clarifying edit "[or presentation]"):

<quote>
If the tts:extent attribute is specified on the tt element, then it must adhere to 8.2.7 tts:extent, in which case it specifies the spatial extent of the root container region in which content regions are located and presented. If no tts:extent attribute is specified, then the spatial extent of the root container region is considered to be determined by the external authoring or presentation context. The root container origin is determined by the external authoring [or presentation] context.
</quote>

So the language you cite in HTML5 10.4.1 can certainly apply as the "external presentation context" rules. However, there is a recent thread [2] related to this matter, the resolution of which I expect will add further clarifying language, and possibly new mechanisms to permit the author to express intentions about the relation between the root container region and the related media object display region.

If you need something more specific, let me know and I'll raise an issue with the TTWG.

[1] https://dvcs.w3.org/hg/ttml/raw-file/default/ttml10/spec/ttaf1-dfxp.html#document-structure-vocabulary-tt
[2] http://lists.w3.org/Archives/Public/public-tt/2013Jan/0027.html
Comment 7 Silvia Pfeiffer 2013-02-21 15:23:34 PST
(In reply to comment #6)
> (In reply to comment #5)
> I guess you are asking if I could provide a reference to rules for TTML that are equivalent to the following found in 10.4.1 Embedded content:

Yes.

> Is that what you are asking? At present, TTML leaves the decision about where to display the "root container region" up to the external authoring or presentation context (see [1], with a minor clarifying edit "[or presentation]"):
> 
> <quote>
> If the tts:extent attribute is specified on the tt element, then it must adhere to 8.2.7 tts:extent, in which case it specifies the spatial extent of the root container region in which content regions are located and presented. If no tts:extent attribute is specified, then the spatial extent of the root container region is considered to be determined by the external authoring or presentation context. The root container origin is determined by the external authoring [or presentation] context.
> </quote>

No, this is not sufficient. You actually need to explain how you are rendering the information found in a TTML file. This is not something that is part of the TTML format specification, but more like a mapping layer between the file format and the browser.


> So the language you cite in HTML5 10.4.1 can certainly apply as the "external presentation context" rules.

It's not just about what the "root container" means, but what everything in TTML is mapped to in HTML.

See for example http://dev.w3.org/html5/webvtt/#webvtt-cue-text-dom-construction-rules to see how the tags and elements used in WebVTT are mapped to HTML tags. This is something that is not clear in TTML. You need to explain what a TTML <p> element maps to and any other elements in use. You need to explain how to populate TextTrackCues.

You need to explain how to get to the CSS styling, see for example http://dev.w3.org/html5/webvtt/#webvtt-cue-text-rendering-rules . What CSS does an XSL-FO rule create in the HTML DOM?

It's impossible to implement the full scope of TTML without these clarifications. And it's impossible to implement it compatibly between browsers without a specification for it.
Comment 8 Glenn Adams 2013-02-21 16:30:50 PST
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > I guess you are asking if I could provide a reference to rules for TTML that are equivalent to the following found in 10.4.1 Embedded content:
> 
> Yes.
> 
> > Is that what you are asking? At present, TTML leaves the decision about where to display the "root container region" up to the external authoring or presentation context (see [1], with a minor clarifying edit "[or presentation]"):
> > 
> > <quote>
> > If the tts:extent attribute is specified on the tt element, then it must adhere to 8.2.7 tts:extent, in which case it specifies the spatial extent of the root container region in which content regions are located and presented. If no tts:extent attribute is specified, then the spatial extent of the root container region is considered to be determined by the external authoring or presentation context. The root container origin is determined by the external authoring [or presentation] context.
> > </quote>
> 
> No, this is not sufficient. You actually need to explain how you are rendering the information found in a TTML file. This is not something that is part of the TTML format specification, but more like a mapping layer between the file format and the browser.
> 
> 
> > So the language you cite in HTML5 10.4.1 can certainly apply as the "external presentation context" rules.
> 
> It's not just about what the "root container" means, but what everything in TTML is mapped to in HTML.
> 
> See for example http://dev.w3.org/html5/webvtt/#webvtt-cue-text-dom-construction-rules to see how the tags and elements used in WebVTT are mapped to HTML tags. This is something that is not clear in TTML. You need to explain what a TTML <p> element maps to and any other elements in use. You need to explain how to populate TextTrackCues.
> 
> You need to explain how to get to the CSS styling, see for example http://dev.w3.org/html5/webvtt/#webvtt-cue-text-rendering-rules . What CSS does an XSL-FO rule create in the HTML DOM?
> 
> It's impossible to implement the full scope of TTML without these clarifications. And it's impossible to implement it compatibly between browsers without a specification for it.

I would not agree with a number of your above statements:

* that TTML content needs to be translated into a list of cues (as currently defined)
* that TTML content needs to be mapped to HTML or CSS
* that it is impossible to implement TTML without these clarifications
* that it is impossible to implement TTML compatibly between browsers

It may very well be possible to translate a profile of TTML, e.g., SMPTE-TT or SDP-US, into the current definition of text track cues, and that may be the first step in implementing support for such a profile. But it isn't necessary to do this in order to implement a compliant TTML presentation processor that is able to render a <track/> element that references a TTML resource.

Notwithstanding the above, I think it would be a useful exercise to define a mapping to HTML/CSS and the current constrained definition of TextTrackCue for a subset of TTML content. But doing so need not be a pre-condition for adding support for rendering TTML in WK.

Since IE10 does implement some level of support for TTML, it would be worth learning if they attempt to map to HTML/CSS and TextTrackCues as you suggest.
Comment 9 Silvia Pfeiffer 2013-02-21 18:21:11 PST
(In reply to comment #8)
> I would not agree with a number of your above statements:
> 
> * that TTML content needs to be translated into a list of cues (as currently defined)
> * that TTML content needs to be mapped to HTML or CSS
> * that it is impossible to implement TTML without these clarifications
> * that it is impossible to implement TTML compatibly between browsers
> 
> It may very well be possible to translate a profile of TTML, e.g., SMPTE-TT or SDP-US, into the current definition of text track cues, and that may be the first step in implementing support for such a profile. But it isn't necessary to do this in order to implement a compliant TTML presentation processor that is able to render a <track/> element that references a TTML resource.


<track> is a replaced element as listed here: http://www.w3.org/TR/html5/single-page.html#replaced-elements . Replaced elements are mapped into the DOM through mapping to an Object. 

<track> is mapped to a TextTrack object.
http://www.w3.org/TR/html5/single-page.html#the-track-element

A TextTrack object consists of a TextTrackCueList for active and for parsed cues.
http://www.w3.org/TR/html5/single-page.html#texttrack

I fail to understand how you can represent TTML in the browser without making use of these constructs that <track> stands for.


> Since IE10 does implement some level of support for TTML, it would be worth learning if they attempt to map to HTML/CSS and TextTrackCues as you suggest.

I don't have a Windows7 machine and IE10 at hand to test it, but I am 99% sure that the parsed cues are mapped into a TextTrackCueList both for TTML and WebVTT. Since IE10 only supports simple captions (starttime, endtime, text) and no formatting, all they had to do was to map the <p> elements into cues. That's the simple part of the mapping, but it's a start.

You can experiment with their implementation here on their example page:
http://ie.microsoft.com/testdrive/Graphics/VideoCaptions/Default.html
Comment 10 Glenn Adams 2013-02-21 18:59:08 PST
(In reply to comment #9)
> (In reply to comment #8)
> > I would not agree with a number of your above statements:
> > 
> > * that TTML content needs to be translated into a list of cues (as currently defined)
> > * that TTML content needs to be mapped to HTML or CSS
> > * that it is impossible to implement TTML without these clarifications
> > * that it is impossible to implement TTML compatibly between browsers
> > 
> > It may very well be possible to translate a profile of TTML, e.g., SMPTE-TT or SDP-US, into the current definition of text track cues, and that may be the first step in implementing support for such a profile. But it isn't necessary to do this in order to implement a compliant TTML presentation processor that is able to render a <track/> element that references a TTML resource.
> 
> 
> <track> is a replaced element as listed here: http://www.w3.org/TR/html5/single-page.html#replaced-elements . Replaced elements are mapped into the DOM through mapping to an Object. 
> 
> <track> is mapped to a TextTrack object.
> http://www.w3.org/TR/html5/single-page.html#the-track-element
> 
> A TextTrack object consists of a TextTrackCueList for active and for parsed cues.
> http://www.w3.org/TR/html5/single-page.html#texttrack
> 
> I fail to understand how you can represent TTML in the browser without making use of these constructs that <track> stands for.

Where did I see I wouldn't make use of these constructs?

4.8.10.12.1

"A list of zero or more cues
A list of text track cues, along with rules for updating the text track rendering. For example, for WebVTT, the rules for updating the display of WebVTT text tracks. [WEBVTT]"

4.8.10.12.4

"How a specific format's text track cues are to be interpreted for the purposes of processing by an HTML user agent is defined by that format. In the absence of such a specification, this section provides some constraints within which implementations can attempt to consistently expose such formats."

4.8.10.1.5

"The getCueAsHTML() method must convert the text track cue text to a DocumentFragment for the script's document of the entry script, using the appropriate rules for doing so. For example, for WebVTT, those rules are the WebVTT cue text parsing rules and the WebVTT cue text DOM construction rules. [WEBVTT]"

Nothing here requires populating cues. That is, the set of cues could be exposed or not via the DOM. It doesn't mandate their exposure, and it leaves it up to the specific format to decide what it means to expose its content in the form of cues. Further, it doesn't mandate whether a cue, if present, returns an HTML representation, or if it does, what it means or how it relates to the source track format.

I'm just not seeing a problem here so far. I've agreed that it would be nice and even possible to specify an explicit mapping, and I think that is something worth doing. But failing to do so doesn't prevent supporting TTML and adhering to the currently defined APIs. (Though see the two new bugs I posted earlier [1][2] where too much is assumed about use of WebVTT.)

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=21079
[2] https://www.w3.org/Bugs/Public/show_bug.cgi?id=21080

> > Since IE10 does implement some level of support for TTML, it would be worth learning if they attempt to map to HTML/CSS and TextTrackCues as you suggest.
> 
> I don't have a Windows7 machine and IE10 at hand to test it, but I am 99% sure that the parsed cues are mapped into a TextTrackCueList both for TTML and WebVTT. Since IE10 only supports simple captions (starttime, endtime, text) and no formatting, all they had to do was to map the <p> elements into cues. That's the simple part of the mapping, but it's a start.
> 
> You can experiment with their implementation here on their example page:
> http://ie.microsoft.com/testdrive/Graphics/VideoCaptions/Default.html

Thanks, I'll look into this more. In any case, I'm not sure I follow your desire to map cue content to HTML. What is the purpose of this? Is it because you are assuming the implementation will take the resulting HTML and format it to present the cue? If so, I think that assumption is implementation specific, and not borne out by the spec. If it is implied by the spec, then I will need to file a bug asking that implication be removed, since it is not semantically necessary.
Comment 11 Glenn Adams 2013-02-21 18:59:57 PST
(In reply to comment #10)
> Where did I see I wouldn't make use of these constructs?

s/see/say/
Comment 12 Silvia Pfeiffer 2013-02-21 19:17:01 PST
(In reply to comment #10)
> (In reply to comment #9)
> > I fail to understand how you can represent TTML in the browser without making use of these constructs that <track> stands for.
> 
> Where did I see I wouldn't make use of these constructs?

When you said:

> > > I would not agree with a number of your above statements:
> > > 
> > > * that TTML content needs to be translated into a list of cues (as currently defined)
> > > * that TTML content needs to be mapped to HTML or CSS
> > > * that it is impossible to implement TTML without these clarifications
> > > * that it is impossible to implement TTML compatibly between browsers


> Nothing here requires populating cues. That is, the set of cues could be exposed or not via the DOM.

Are you suggesting to parse a <track> element into a HTMLTrackElement, but not actually giving it a TextTrack object?

Are you further suggesting to "just render" the text on top of the video without adding it to the shadow DOM? How is this rendering supposed to work? Are you intending to render pixels into the video before the video is displayed?

The way in which <track> is specified in HTML5 is that it introduces content into the browser that a JS developer can manipulate. Therefore it populates all of the objects. This is intentional and not a bug.
Comment 13 Glenn Adams 2013-02-21 19:45:56 PST
(In reply to comment #12)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > I fail to understand how you can represent TTML in the browser without making use of these constructs that <track> stands for.
> > 
> > Where did I see I wouldn't make use of these constructs?
> 
> When you said:
> 
> > > > I would not agree with a number of your above statements:
> > > > 
> > > > * that TTML content needs to be translated into a list of cues (as currently defined)
> > > > * that TTML content needs to be mapped to HTML or CSS
> > > > * that it is impossible to implement TTML without these clarifications
> > > > * that it is impossible to implement TTML compatibly between browsers
> 
> 
> > Nothing here requires populating cues. That is, the set of cues could be exposed or not via the DOM.
> 
> Are you suggesting to parse a <track> element into a HTMLTrackElement, but not actually giving it a TextTrack object?

No. I'm saying that it isn't necessary to populate the cue list.

> 
> Are you further suggesting to "just render" the text on top of the video without adding it to the shadow DOM?

I'm not exactly sure what you are referring to as shadow DOM here, but if you are referring to TextTrack, then there is no reason not to provide a TextTrack object. But individual cues aren't needed in TextTrack to  perform formatting of TTML content.

> How is this rendering supposed to work? Are you intending to render pixels into the video before the video is displayed?

I'm assuming there is a reasonable implementation specific way to overlay a graphics layer over a video layer in WK's display pipeline. Since WebVTT is displaying over video, then clearly there is some functionality there to support this.

> The way in which <track> is specified in HTML5 is that it introduces content into the browser that a JS developer can manipulate. Therefore it populates all of the objects. This is intentional and not a bug.

Sure. But there is nothing in the HTML5 spec that says that cues (or what might be considered a "cue") *must* be exposed and can't be implicitly presented or otherwise processed without exposing actual cue objects. At least there is nothing I'm aware of that mandates this. Please correct me if I've missed it.

One could also take a less minimalist approach (to exposing no cues) and populate cue objects that have nothing more than start/end times. My interpretation of UA behavior w.r.t. text tracks is that it is up to the UA and the specific track content type to define one or more mappings, one of which may be the empty set in terms of exposed cue objects.
Comment 14 Silvia Pfeiffer 2013-02-21 20:15:32 PST
(In reply to comment #13)
> > > Nothing here requires populating cues. That is, the set of cues could be exposed or not via the DOM.

They don't go into the DOM, but into the shadow DOM. They are just represented as objects.


> > Are you suggesting to parse a <track> element into a HTMLTrackElement, but not actually giving it a TextTrack object?
> 
> No. I'm saying that it isn't necessary to populate the cue list.

OK... but a TextTrack object without cues is basically useless to the JS developer: it doesn't fire oncuechange events and it doesn't give access to the parsed cues. Might as well not create a TextTrack object.

> > Are you further suggesting to "just render" the text on top of the video without adding it to the shadow DOM?
> 
> I'm not exactly sure what you are referring to as shadow DOM here, but if you are referring to TextTrack, then there is no reason not to provide a TextTrack object. But individual cues aren't needed in TextTrack to  perform formatting of TTML content.

The way it's currently specified, rendering relies on having TextTrackCues.

> > How is this rendering supposed to work? Are you intending to render pixels into the video before the video is displayed?
> 
> I'm assuming there is a reasonable implementation specific way to overlay a graphics layer over a video layer in WK's display pipeline. Since WebVTT is displaying over video, then clearly there is some functionality there to support this.

Try opening up an example video with <track> in Chromium or Chrome and open your inspector. Activate the "Shadow DOM" functionality of inspector and look at how captions are rendered. They are not rendered into a graphics layer but into the Shadow DOM.


> > The way in which <track> is specified in HTML5 is that it introduces content into the browser that a JS developer can manipulate. Therefore it populates all of the objects. This is intentional and not a bug.
> 
> Sure. But there is nothing in the HTML5 spec that says that cues (or what might be considered a "cue") *must* be exposed and can't be implicitly presented or otherwise processed without exposing actual cue objects. At least there is nothing I'm aware of that mandates this. Please correct me if I've missed it.

I think you've missed it. Try implementing TTML support and you'll certainly come across it.
Comment 15 Silvia Pfeiffer 2013-02-21 20:47:14 PST
(In reply to comment #14)
> > > The way in which <track> is specified in HTML5 is that it introduces content into the browser that a JS developer can manipulate. Therefore it populates all of the objects. This is intentional and not a bug.
> > 
> > Sure. But there is nothing in the HTML5 spec that says that cues (or what might be considered a "cue") *must* be exposed and can't be implicitly presented or otherwise processed without exposing actual cue objects. At least there is nothing I'm aware of that mandates this. Please correct me if I've missed it.
> 
> I think you've missed it. Try implementing TTML support and you'll certainly come across it.

I should have been more precise and pointed you to the part that explains how to render cues:
http://www.w3.org/TR/html5/single-page.html#text-track-model

It states:
A text track consists of:
...
A list of zero or more cues:

* A list of text track cues, along with rules for updating the text track rendering. For example, for WebVTT, the rules for updating the display of WebVTT text tracks. [WEBVTT]

* The list of cues of a text track can change dynamically, either because the text track has not yet been loaded or is still loading, or due to DOM manipulation.

Without a list of text track cues and rules for updating the text track rendering there is no display of cues.
Comment 16 Glenn Adams 2013-02-21 21:01:40 PST
(In reply to comment #14)
> (In reply to comment #13)
> > > Are you further suggesting to "just render" the text on top of the video without adding it to the shadow DOM?
> > 
> > I'm not exactly sure what you are referring to as shadow DOM here, but if you are referring to TextTrack, then there is no reason not to provide a TextTrack object. But individual cues aren't needed in TextTrack to  perform formatting of TTML content.
> 
> The way it's currently specified, rendering relies on having TextTrackCues.

I don't see that. Please point out where the spec mandates that interpretation. I can see how some implementation may depend on this, e.g., the WK support for VTT, but it is certainly not a necessary implementation.

> 
> > > How is this rendering supposed to work? Are you intending to render pixels into the video before the video is displayed?
> > 
> > I'm assuming there is a reasonable implementation specific way to overlay a graphics layer over a video layer in WK's display pipeline. Since WebVTT is displaying over video, then clearly there is some functionality there to support this.
> 
> Try opening up an example video with <track> in Chromium or Chrome and open your inspector. Activate the "Shadow DOM" functionality of inspector and look at how captions are rendered. They are not rendered into a graphics layer but into the Shadow DOM.

That's just a possible implementation approach, and not mandated by the spec. If it is mandated by the spec, please tell me where.

> > > The way in which <track> is specified in HTML5 is that it introduces content into the browser that a JS developer can manipulate. Therefore it populates all of the objects. This is intentional and not a bug.

Sure, and that's great. But it is not a logical consequence of the current spec language. Further, exposing cues to JS to permit events or JS manipulation of cues is not a technical requirement to be able to process and present track content. It is simply the approach that has been taken for VTT.

Don't get me wrong, I'm not objecting to doing this, and I'd like to see a mapping of TTML into exposable cue objects. I'm only saying that I don't see it as necessary, either for providing TTML rendering or HTML5 spec compliance.

> > 
> > Sure. But there is nothing in the HTML5 spec that says that cues (or what might be considered a "cue") *must* be exposed and can't be implicitly presented or otherwise processed without exposing actual cue objects. At least there is nothing I'm aware of that mandates this. Please correct me if I've missed it.
> 
> I think you've missed it. Try implementing TTML support and you'll certainly come across it.

If I missed it (in the spec), then please point it out. If I move forward with implementing TTML in WK, then I will find out if it is possible to do as I suggest or not. I'm confident that TTML presentation processing can be readily accomplished without relying on the use of text cue objects, shadow dom objects, or translating to HTML/CSS. If I'm wrong, I'll be the first to admit it. At the same time, I will do what I can to define a reasonable mapping to cue objects and HTML/CSS fragments to permit JS exposure. I think both goals are reasonable and independent. Given the upcoming first meeting of the TT Task Force in the Web & TV IG, it might be useful to suggest they take up the task of defining this mapping. I trust that if a mapping is drafted, then you will be able to provide a valuable review of it.
Comment 17 Silvia Pfeiffer 2013-02-21 23:05:05 PST
I think your only alternative is to render TTML as an overlay graphic on the video video. Is that your plan for this bug?

I'd take a look at a mapping specification when you've created it. However, I personally don't think we should proliferate more file formats for <track> on the Web, because it just causes support issues. I am very wary of the vast complexity of the TTML ecosystem with now at least 5 different and partially non-compatible formats and more forthcoming. It's a support nightmare in the making.
Comment 18 Maciej Stachowiak 2013-02-22 00:47:18 PST
(In reply to comment #4)
> (In reply to comment #3)
>> > 
> > There are many possible interpretations of the FCC rules. Some say that it's not necessary to specifically implement SMPTE-TT, the safe harbor format, as long as a format with all the mandated capabilities is supported. I have yet to hear an interpretation it's required to implement a subset of TTML that's not the same profile as SMPTE-TT.
> 
> The real question is whether video service providers will or would like to reference TTML content directly as a delivery format. We have some preliminary input that they will. We also have a variety of downstream ports of WK that are deploying on devices covered by FCC rules that relate to captions. I agree that we don't have the final word on this subject. But I think absence of finality on this matter is not a reasonable objection to exclude TTML from WK. The WK community has added many other features that have less finality or concrete requirements.

I'm sure they'd like all sorts of things. That's not enough reason to add a duplicative feature to WebKit.

I think adding features primarily for specialized downstream ports is also a bad idea. Adding support for WML turned out to be a regrettable decision and we should have rejected it on the basis that it's not valuable to mainstream ports.

> 
> In any case, I would be willing to claim that an implementation of all of TTML is at least one or two orders of magnitude simpler than SVG or HTML5, or even some of the more complex, recent additions to CSS.
> 
> What would be useful is an implementation of a subset of TTML that works in WK against which objective judgments about complexity, etc,  can be made. At present, the arguments are rather speculative in nature, wouldn't you agree?

I don't think there's any reason to natively support another caption format for out-of-band captions. If we do, then supporting an arbitrary subset of a spec rather than something actually defined by spec seems like a terrible idea for interoperability. So yes, I would object.
Comment 19 Glenn Adams 2013-02-22 05:05:50 PST
(In reply to comment #17)
> I think your only alternative is to render TTML as an overlay graphic on the video video. Is that your plan for this bug?

Yes.
 
> I'd take a look at a mapping specification when you've created it. However, I personally don't think we should proliferate more file formats for <track> on the Web, because it just causes support issues. I am very wary of the vast complexity of the TTML ecosystem with now at least 5 different and partially non-compatible formats and more forthcoming. It's a support nightmare in the making.

Well, one could view WebVTT as a proliferation of file formats. One could view WebM as a proliferation of file formats. I find this argument about file formats unrealistic. They are there. Why do we have XHR and WebSockets and WebRTC and Server-Sent Events? WebVTT has no more claim to legitimacy in the Web than TTML, and probably less since it isn't even a published standard.

The issue of different profiles of TTML is also a non-issue. There are well defined ways in TTML to manage feature spaces and required features in an implementation. The same cannot be said for WebVTT. You are already promoting extensions to WebVTT that will eventually create different sets of implemented features. How will you manage them? TTML has an answer to this.

I really don't want to fall into a side by side comparison of WebVTT and TTML. It is not a productive use of time. There are legitimate reasons to use both and they are different reasons.
Comment 20 Glenn Adams 2013-02-22 05:37:17 PST
(In reply to comment #18)
> (In reply to comment #4)
> > (In reply to comment #3)
> >> > 
> > > There are many possible interpretations of the FCC rules. Some say that it's not necessary to specifically implement SMPTE-TT, the safe harbor format, as long as a format with all the mandated capabilities is supported. I have yet to hear an interpretation it's required to implement a subset of TTML that's not the same profile as SMPTE-TT.
> > 
> > The real question is whether video service providers will or would like to reference TTML content directly as a delivery format. We have some preliminary input that they will. We also have a variety of downstream ports of WK that are deploying on devices covered by FCC rules that relate to captions. I agree that we don't have the final word on this subject. But I think absence of finality on this matter is not a reasonable objection to exclude TTML from WK. The WK community has added many other features that have less finality or concrete requirements.
> 
> I'm sure they'd like all sorts of things. That's not enough reason to add a duplicative feature to WebKit.

It is not a duplicative feature.

> 
> I think adding features primarily for specialized downstream ports is also a bad idea. Adding support for WML turned out to be a regrettable decision and we should have rejected it on the basis that it's not valuable to mainstream ports.

First, I did not say that was the primary reason. In any case, downstream ports should have as much standing as upstream ports with respect to feature addition. This is an open community intended to address many needs. All ports will find the addition of TTML support a useful feature, and can make their own decision about enabling.

> > In any case, I would be willing to claim that an implementation of all of TTML is at least one or two orders of magnitude simpler than SVG or HTML5, or even some of the more complex, recent additions to CSS.
> > 
> > What would be useful is an implementation of a subset of TTML that works in WK against which objective judgments about complexity, etc,  can be made. At present, the arguments are rather speculative in nature, wouldn't you agree?
> 
> I don't think there's any reason to natively support another caption format for out-of-band captions. If we do, then supporting an arbitrary subset of a spec rather than something actually defined by spec seems like a terrible idea for interoperability. So yes, I would object.

Who said anything about implementing an arbitrary subset? The intention is to implement at least one or more of the well defined profiles of TTML, such as SMPTE-TT and SDP-US. These profiles are well defined by published standards.

It sounds like you are objecting to the notion of incremental implementation of a feature in WK. From my observation, the implementation of features in WK doesn't happen in a binary fashion. Instead, new features are build behind an ENABLE flag and are filled out functionally speaking and tested up until a time that the feature can be announce as fully implemented. Take CSS for example, many extensions to CSS have been added to WK. What is the bar for deciding to add an implementation of one of these extensions? It seems rather arbitrary. A number of CSS extensions have been implemented upon the early availability of an editorial draft that is in a high state of flux. In the case of TTML, we have a published W3C REC and a number of profiles already published in a final form.

I just counted 65 features that are disabled by default in build-webkit. What was the criteria for determining if they were worthy of inclusion or not in WK? The OWP (and WK) has many features that can be considered duplicative at some level, but not identical. For example, XHR and WebSockets. Should one of these be rejected because the other can provide similar or even identical functionality (in some cases)? Perhaps. But I would consider a call for rejection of one of these in favor of the other as exclusionary and probably short sighted. TTML and WebVTT have different uses in the industry of Web content, and will never have the same feature set or user communities. Sure, in some cases one or the other may be sufficient, but in other cases this won't be so. Without trying to play one of these off against the other, I would suggest they both have a legitimate role in WK.
Comment 21 Silvia Pfeiffer 2013-02-23 16:01:29 PST
(In reply to comment #19)
> Well, one could view WebVTT as a proliferation of file formats.

WebVTT is not exclusively a caption format - it is a custom text track file format developed for the Web to support captions, subtitles, video descriptions (in text format), chapters, and metadata.

TTML's spec says: "It is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions."

> WebVTT has no more claim to legitimacy in the Web than TTML, and probably less since it isn't even a published standard.

Neither is HTML5 yet. Other specs never turned into "published standards" such as RSS and are still de-facto standards. The maturity of a spec is not defined by its process status.

> The issue of different profiles of TTML is also a non-issue. There are well defined ways in TTML to manage feature spaces and required features in an implementation. The same cannot be said for WebVTT. You are already promoting extensions to WebVTT that will eventually create different sets of implemented features. How will you manage them? TTML has an answer to this.

Browsers prefer not to even differentiate between different features and just parse what they are given. That's what WebVTT (including the region extension) will be. We could make profiles for authors, but they are not relevant to browsers.

> I really don't want to fall into a side by side comparison of WebVTT and TTML. It is not a productive use of time. There are legitimate reasons to use both and they are different reasons.

Agreed.  In particular since this bug is not about WebVTT.
Comment 22 Ian 'Hixie' Hickson 2013-02-25 10:36:08 PST
For the record, both WebVTT and HTML are "published standards".
Comment 23 Glenn Adams 2013-02-25 11:27:51 PST
(In reply to comment #21)
> TTML's spec says: "It is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions."

Just so there's no misunderstanding, TTML as published is definitely intended to be used as a distribution format. Indeed, it was previously named DFXP (Distribution Format Exchange Profile) to distinguish it from a more ambitious, but never fully specified AFXP (Authoring Format Exchange Profile).

The paragraph just following the one you cite above states:

"In addition to being used for interchange among legacy distribution content formats, TTML content may be used directly as a distribution format..."

The System Model Diagram [1], makes it clear that DFXP (TTML) is intended for direct distribution.

[1] http://www.w3.org/TR/2010/REC-ttaf1-dfxp-20101118/#model-graphic

Historically, the TTWG started defining a more ambitious authoring format (AFXP), but eventually chose to focus on those features that gave priority to direct distribution, a process which eventually gave birth to DFXP, later relabeled as TTML.

It is certainly true,  however, that TTML and WebVTT have different characteristics and efficiencies for use in direct distribution. For example, a full streaming model for TTML was never formalized; but only suggested in Annex L [2]. Some of the more recently defined profiles of TTML, e.g., SDP-US, have added constraints that facilitate streaming use cases.

[2] http://www.w3.org/TR/2010/REC-ttaf1-dfxp-20101118/#streaming
Comment 24 Silvia Pfeiffer 2013-02-25 14:50:15 PST
(In reply to comment #23)
> (In reply to comment #21)
> > TTML's spec says: "It is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions."
> 
> Just so there's no misunderstanding, TTML as published is definitely intended to be used as a distribution format. Indeed, it was previously named DFXP (Distribution Format Exchange Profile) to distinguish it from a more ambitious, but never fully specified AFXP (Authoring Format Exchange Profile).


I have no issues with this statement. TTML was built with two aims:
* as a distribution format ("distribution" in contrast to "rendering")
* with a focus on captioning and subtitling

That's all I wanted to point out.
Comment 25 Glenn Adams 2013-02-25 14:57:33 PST
(In reply to comment #24)
> (In reply to comment #23)
> > (In reply to comment #21)
> > > TTML's spec says: "It is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions."
> > 
> > Just so there's no misunderstanding, TTML as published is definitely intended to be used as a distribution format. Indeed, it was previously named DFXP (Distribution Format Exchange Profile) to distinguish it from a more ambitious, but never fully specified AFXP (Authoring Format Exchange Profile).
> 
> 
> I have no issues with this statement. TTML was built with two aims:
> * as a distribution format ("distribution" in contrast to "rendering")
> * with a focus on captioning and subtitling
> 
> That's all I wanted to point out.

TTML defines two types of content processors, a transformation processor and a presentation processor. TTML authors indicate (within the document) which of these must be supported and used by the receiving entity. Therefore, the only two documented use cases for TTML upon distributing a document is to transform it or to render it. Most of the profiles in use today, such as SMPTE-TT, EBU-TT, and SDP-US, mandate support for presentation processing.

So it would be incorrect to imply that Distribution of TTML does not include rendering as a potential use case. Indeed, most existing uses of TTML today specify the use of a presentation processor.
Comment 26 Pierre Lemieux 2013-02-25 16:07:47 PST
(In reply to comment #24)
> (In reply to comment #23)
> > (In reply to comment #21)
> > > TTML's spec says: "It is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions."
> > 
> > Just so there's no misunderstanding, TTML as published is definitely intended to be used as a distribution format. Indeed, it was previously named DFXP (Distribution Format Exchange Profile) to distinguish it from a more ambitious, but never fully specified AFXP (Authoring Format Exchange Profile).
> 
> 
> I have no issues with this statement. TTML was built with two aims:
> * as a distribution format ("distribution" in contrast to "rendering")
> * with a focus on captioning and subtitling
> 
> That's all I wanted to point out.

As pointed out earlier, there are many TTML-based applications worldwide, including EBU-TT [1], SMPTE-TT [2], SDP-US [3] and CFF-TT [4]. These applications do not preclude the distribution of TTML documents to client devices for rendering and, in fact, include the use case.

Why then introduce friction for TTML authors and end-users by actively discouraging the implementation of TTML rendering in WebKit, which is used in client devices?

[1] http://tech.ebu.ch/docs/tech/tech3350.pdf?vers=1.0
[2] https://www.smpte.org/sites/default/files/st2052-1-2010.pdf
[3] http://www.w3.org/TR/ttml10-sdp-us/
[4] http://www.uvvu.com/techspec-archive.php
Comment 27 Silvia Pfeiffer 2013-02-25 18:57:05 PST
(In reply to comment #26)
> 
> As pointed out earlier, there are many TTML-based applications worldwide, including EBU-TT [1], SMPTE-TT [2], SDP-US [3] and CFF-TT [4].

These are, in fact, different specs and not different applications.


> Why then introduce friction for TTML authors and end-users by actively discouraging the implementation of TTML rendering in WebKit, which is used in client devices?

See comment #17
Comment 28 Pierre Lemieux 2013-02-25 21:07:47 PST
(In reply to comment #27)
> (In reply to comment #26)
> > 
> > As pointed out earlier, there are many TTML-based applications worldwide, including EBU-TT [1], SMPTE-TT [2], SDP-US [3] and CFF-TT [4].
> 
> These are, in fact, different specs and not different applications.

In fact they are distinct profiles of the same core specification (TTML). As such, they share a common document syntax and structure, timing model, and layout and rendering model. In fact, SMPTE-TT is a minor extension of TTML (it adds support for sub-picture), with CFF-TT being essentially a subset of SMPTE-TT and SDP-US essentially being a subset of SMPTE-TT.

Let me know if your understanding is different. Happy to dig deeper and provide additional information.

> > Why then introduce friction for TTML authors and end-users by actively discouraging the implementation of TTML rendering in WebKit, which is used in client devices?
> 
> See comment #17

Yes. I am sympathetic with the goal of minimizing proliferation of formats (or versions of the same format!) on the web, or everywhere for that matter.

In the specific case of TTML, I however see no evidence that preventing the implementation of TTML rendering in WebKit will prevent the use of TTML on the Internet -- based on my observations, the FCC safe-harbor provision has a strong influence in the entertainment media community.

I in fact see strong evidence that preventing the implementation of TTML rendering in WebKit will  frustrate and confuse content authors and end-users alike.
Comment 29 Anne van Kesteren 2023-05-09 05:58:38 PDT
Closing per prior comments from Maciej and lack of activity.