This is a parent bug for all patches related to adding a new input tag type for speech recognition. This will render like a button element (with an embedded status indicator) in the page for the user to start/stop speech recognition. After speech recognition the element's onchange handler is fired with the recognized text as the event's value. We have discussed this proposal with some browser vendors earlier and received good feedback. So we'd like to move towards implementing it as a conditionally compiled feature in WebKit (off by default) and get more web developer input before making a formal proposal to W3C. The speech input element itself will appear like a clickable push-button with an embedded status indicator/icon. The embedded status indicator/icon will be themable and UAs/platforms can style it to match their current themes. Backwards compatibility: 1. UAs which don't recognize this new input type will render it as a text input element, and any speech specific API calls made from javascript code will throw an exception due to missing properties/methods. 2. Once the initial implementation is ready we intend to enable this API in Chrome behind a run-time flag, which will let web developers turn on the feature in their own machines and experiment with it to give useful feedback. We intend to add this API to webkit as a series of small steps: 1. Add a bare bones 'speech' type to the existing html input element in webkit and associated rendering code to render it like a push button. This gives a properly rendering control with no associated actions. a. Add speech input element styles to the UA style sheet. b. Make HTMLInputElement recognize 'speech' as a valid input type c. A new renderer based on RenderButton to draw the speech input element d. Platform and UA specific themed rendering (RenderThemeXxxxx files) for the embedded status indicator 2. Add a speech service to talk to the UA in a platform specific manner, modeled after an existing service such as Geolocation a. A new SpeechService and SpeechServiceClient under Webcore/platform b. A chromium specific extension of the above under Webcore/platform/chromium c. A chromium specific bridge class under WebKit/chromium to let multiple speech input elements in a single page talk to the same provider in the UA 3. Hook up the speech input element with the speech service and the UA a. Handle click event and fire onchange in speech control renderer b. Render the various states of the speech control via the theme layer 4. Implement UA specific code (outside webkit) for handling speech recognition. An informal spec of the new API, along with some sample apps and use cases can be found at http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en.
We got a lot of great feedback for the speech input API proposal and the API spec has been updated to take them into account (available at https://docs.google.com/View?id=dcfg79pz_5dhnp23f5). Whereas the earlier proposal was to add a new <input type="speech"> control, the new API spec adds an @speech attribute to most input elements for enabling speech input. The input element will allow users to start/stop speech recognition (perhaps via a button-like control as part of the element) and the recognized text will be inserted as the element's value. Backwards compatibility: 1. Web developers will use a new '@speech' attribute to explicitly enable individual form input fields for speech input. UAs which don't recognize this new attribute will render the input element in it's normal form. 2. Once the initial implementation is ready we intend to enable this API in Chrome behind a run-time flag, which will let web developers turn on the feature in their own machines and experiment with it to give useful feedback. Relevant discussions can be found in the WHATWG lists - http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-May/026338.html - http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2010-June/026747.html We intend to add this API to webkit as a series of small steps: 1. Add bare bones '@speech' attribute support to the existing html text input element in webkit and associated rendering code. Input elements with '@speech' set will show a clickable button within the text field area similar to the clear buttons that appear in <input type='search'> elements. a. Add speech input button styles to the UA style sheet and CSS parsing code. b. Recognize '@speech' as a valid attribute c. RenderTextControlSingleLine will check if @speech is set and if so create the speech input button in the same fashion as the search field's cancel button. d. Platform and UA specific themed rendering (RenderThemeXxxxx files) for the above speech input button 2. Add a speech service to talk to the UA in a platform specific manner, modeled after an existing service such as Geolocation a. A new SpeechService and SpeechServiceClient under Webcore/platform b. A chromium specific extension of the above under Webcore/platform/chromium c. A chromium specific bridge class under WebKit/chromium to let multiple speech input elements in a single page talk to the same provider in the UA 3. Hook up the speech input element with the speech service and the UA a. Handle click event and fire onchange in speech control renderer b. Render the various states of the speech control via the theme layer 4. Implement UA specific code (outside webkit) for handling speech recognition.
Why require the speech="" attribute? It seems like this would be most helpful for users if it didn't require authors to do anything. That way it would already work on the billions of Web pages out there today.
Our original idea was to allow speech input in all editable fields, but from all feedback received it looked like that use case is better served with a general purpose speech IME, either part of the browser or the OS. So we refined the proposal to only aim at web apps which have explicit speech input needs (for e.g. requiring all recognition hypotheses/results rather than just the best match as in the IME case, requiring notification when speech recognition results were available, control the grammer used and so on). While the current proposal is to add a 'speech' attribute to the input element, it could be easily adapted to either a new speech input tag (e.g. <asr>) or a non-text field tag (e.g. <button>). A W3C Incubator Group for HTML Speech has been recently formed (www.w3.org/2005/Incubator/htmlspeech/) and we are discussing this proposal in there. We intend to update the implementation in webkit as we get closer to a formal proposal.