Bug 39485 - Speech for HTML input elements.
Summary: Speech for HTML input elements.
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Forms (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Hans Wennborg
URL: https://docs.google.com/View?id=dcfg7...
Keywords:
Depends on: 63162 39487 40776 40878 40925 40984 41518 42047 42367 42483 42603 43008 43146 43240 43261 43352 43425 43477 43563 43857 43922 44421 44427 45108 45181 45288 46457 46799 46873 47089 47127 48068 48339 48426 49736 50077 52325 52718 58577 59208 59613 59687 59874 60451 64625 65333
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-21 07:15 PDT by Satish Sampath
Modified: 2012-11-02 12:49 PDT (History)
11 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Satish Sampath 2010-05-21 07:15:38 PDT
This is a parent bug for all patches related to adding a new input tag type for speech recognition. This will render like a button element (with an embedded status indicator) in the page for the user to start/stop speech recognition. After speech recognition the element's onchange handler is fired with the recognized text as the event's value.

We have discussed this proposal with some browser vendors earlier and received good feedback. So we'd like to move towards implementing it as a conditionally compiled feature in WebKit (off by default) and get more web developer input before making a formal proposal to W3C.

The speech input element itself will appear like a clickable push-button with an embedded status indicator/icon. The embedded status indicator/icon will be themable and UAs/platforms can style it to match their current themes.

Backwards compatibility:
1. UAs which don't recognize this new input type will render it as a text input element, and any speech specific API calls made from javascript code will throw an exception due to missing properties/methods.
2. Once the initial implementation is ready we intend to enable this API in Chrome behind a run-time flag, which will let web developers turn on the feature in their own machines and experiment with it to give useful feedback.

We intend to add this API to webkit as a series of small steps:

1. Add a bare bones 'speech' type to the existing html input element in webkit and associated
    rendering code to render it like a push button. This gives a properly rendering control with
    no associated actions.
    a. Add speech input element styles to the UA style sheet.
    b. Make HTMLInputElement recognize 'speech' as a valid input type
    c. A new renderer based on RenderButton to draw the speech input element
    d. Platform and UA specific themed rendering (RenderThemeXxxxx files) for the
       embedded status indicator
2. Add a speech service to talk to the UA in a platform specific manner, modeled after
    an existing service such as Geolocation
    a. A new SpeechService and SpeechServiceClient under Webcore/platform
    b. A chromium specific extension of the above under Webcore/platform/chromium
    c. A chromium specific bridge class under WebKit/chromium to let multiple speech
        input elements in a single page talk to the same provider in the UA
3. Hook up the speech input element with the speech service and the UA
    a. Handle click event and fire onchange in speech control renderer
    b. Render the various states of the speech control via the theme layer
4. Implement UA specific code (outside webkit) for handling speech recognition.

An informal spec of the new API, along with some sample apps and use cases can be found at http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en.
Comment 1 Satish Sampath 2010-06-17 06:40:34 PDT
We got a lot of great feedback for the speech input API proposal and the API spec has been updated to take them into account (available at https://docs.google.com/View?id=dcfg79pz_5dhnp23f5). Whereas the earlier proposal was to add a new <input type="speech"> control, the new API spec adds an @speech attribute to most input elements for enabling speech input. The input element will allow users to start/stop speech recognition (perhaps via a button-like control as part of the element) and the recognized text will be inserted as the element's value.

Backwards compatibility:
1. Web developers will use a new '@speech' attribute to explicitly enable individual
   form input fields for speech input. UAs which don't recognize this new attribute
   will render the input element in it's normal form.
2. Once the initial implementation is ready we intend to enable this API in Chrome
   behind a run-time flag, which will let web developers turn on the feature in their
   own machines and experiment with it to give useful feedback.

Relevant discussions can be found in the WHATWG lists
  - http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-May/026338.html
  - http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2010-June/026747.html

We intend to add this API to webkit as a series of small steps:

1. Add bare bones '@speech' attribute support to the existing html text input element in webkit
    and associated rendering code. Input elements with '@speech' set will show a clickable button
    within the text field area similar to the clear buttons that appear in <input type='search'>
    elements.
    a. Add speech input button styles to the UA style sheet and CSS parsing code.
    b. Recognize '@speech' as a valid attribute
    c. RenderTextControlSingleLine will check if @speech is set and if so create the speech
       input button in the same fashion as the search field's cancel button.
    d. Platform and UA specific themed rendering (RenderThemeXxxxx files) for the above speech
       input button
2. Add a speech service to talk to the UA in a platform specific manner, modeled after
    an existing service such as Geolocation
    a. A new SpeechService and SpeechServiceClient under Webcore/platform
    b. A chromium specific extension of the above under Webcore/platform/chromium
    c. A chromium specific bridge class under WebKit/chromium to let multiple speech
        input elements in a single page talk to the same provider in the UA
3. Hook up the speech input element with the speech service and the UA
    a. Handle click event and fire onchange in speech control renderer
    b. Render the various states of the speech control via the theme layer
4. Implement UA specific code (outside webkit) for handling speech recognition.
Comment 2 Ian 'Hixie' Hickson 2010-09-09 23:26:41 PDT
Why require the speech="" attribute? It seems like this would be most helpful for users if it didn't require authors to do anything. That way it would already work on the billions of Web pages out there today.
Comment 3 Satish Sampath 2010-09-10 02:42:21 PDT
Our original idea was to allow speech input in all editable fields, but from all feedback received it looked like that use case is better served with a general purpose speech IME, either part of the browser or the OS. So we refined the proposal to only aim at web apps which have explicit speech input needs (for e.g. requiring all recognition hypotheses/results rather than just the best match as in the IME case, requiring notification when speech recognition results were available, control the grammer used and so on). While the current proposal is to add a 'speech' attribute to the input element, it could be easily adapted to either a new speech input tag (e.g. <asr>) or a non-text field tag (e.g. <button>).

A W3C Incubator Group for HTML Speech has been recently formed (www.w3.org/2005/Incubator/htmlspeech/) and we are discussing this proposal in there. We intend to update the implementation in webkit as we get closer to a formal proposal.