Bug 60170

Summary:

[Chromium] Speech-enabled input fields need the ability to trigger without a click

Product:

WebKit

Reporter:

Dominic Mazzoni <dmazzoni>

Component:

New Bugs

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

dglazkov, fishd, gustavo.noronha, gustavo, satish, tonyg, webkit.review.bot, xan.lopez

Priority:

Version:

528+ (Nightly build)

Hardware:

OS:

OS X 10.5

Attachments:

Description	Flags
Patch for initial feedback, still has debugging printfs - is the approach good?	webkit.review.bot: commit-queue-
Updated patch with layout test, ready for review.	none
Fixed style error.	fishd: review-
Rename methods to be more clear.	gustavo.noronha: commit-queue-
Fix gtk & qt errors	none
Patch	none
Patch for landing	none
Patch	none

Dominic Mazzoni

Reported 2011-05-04 08:23:43 PDT

It needs to be possible to start speech input on speech-enabled input fields with something other than a mouse click - like a keypress.

Attachments
Patch for initial feedback, still has debugging printfs - is the approach good? (8.04 KB, patch) 2011-05-04 08:29 PDT, Dominic Mazzoni	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
Updated patch with layout test, ready for review. (18.82 KB, patch) 2011-06-06 13:17 PDT, Dominic Mazzoni	no flags	Details Formatted Diff Diff
Fixed style error. (18.82 KB, patch) 2011-06-06 13:45 PDT, Dominic Mazzoni	fishd: review-	Details Formatted Diff Diff
Rename methods to be more clear. (19.12 KB, patch) 2011-06-17 17:41 PDT, Dominic Mazzoni	gustavo.noronha: commit-queue-	Details Formatted Diff Diff
Fix gtk & qt errors (19.88 KB, patch) 2011-06-20 04:47 PDT, Dominic Mazzoni	no flags	Details Formatted Diff Diff
Patch (23.87 KB, patch) 2011-09-01 01:19 PDT, Dominic Mazzoni	no flags	Details Formatted Diff Diff
Patch for landing (23.87 KB, patch) 2011-09-01 12:16 PDT, Dominic Mazzoni	no flags	Details Formatted Diff Diff
Patch (23.87 KB, patch) 2011-09-01 13:32 PDT, Dominic Mazzoni	no flags	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Dominic Mazzoni

Comment 1 2011-05-04 08:29:39 PDT

Created attachment 92246 [details] Patch for initial feedback, still has debugging printfs - is the approach good?

WebKit Review Bot

Comment 2 2011-05-18 12:52:16 PDT

Attachment 92246 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/html/shadow/TextControlInnerElements.cpp:495: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 9 files If any of these errors are false positives, please file a bug against check-webkit-style.

WebKit Review Bot

Comment 3 2011-05-18 13:19:34 PDT

Comment on attachment 92246 [details] Patch for initial feedback, still has debugging printfs - is the approach good? Attachment 92246 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/8702872

Early Warning System Bot

Comment 4 2011-05-18 14:26:40 PDT

Comment on attachment 92246 [details] Patch for initial feedback, still has debugging printfs - is the approach good? Attachment 92246 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/8705839

Collabora GTK+ EWS bot

Comment 5 2011-05-18 17:40:52 PDT

Comment on attachment 92246 [details] Patch for initial feedback, still has debugging printfs - is the approach good? Attachment 92246 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/8702956

Gyuyoung Kim

Comment 6 2011-05-19 02:23:36 PDT

Comment on attachment 92246 [details] Patch for initial feedback, still has debugging printfs - is the approach good? Attachment 92246 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/8719031

Dominic Mazzoni

Comment 7 2011-06-06 13:17:57 PDT

Created attachment 96115 [details] Updated patch with layout test, ready for review.

WebKit Review Bot

Comment 8 2011-06-06 13:20:31 PDT

Attachment 96115 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1 Source/WebCore/html/shadow/TextControlInnerElements.cpp:531: Place brace on its own line for function definitions. [whitespace/braces] [4] Total errors found: 1 in 20 files If any of these errors are false positives, please file a bug against check-webkit-style.

Dominic Mazzoni

Comment 9 2011-06-06 13:45:44 PDT

Created attachment 96120 [details] Fixed style error.

Satish Sampath

Comment 10 2011-06-16 04:09:39 PDT

Comment on attachment 96120 [details] Fixed style error. > InputFieldSpeechButtonElement::onToggle() onToggle doesn't seem like the right name for this function. It doesn't toggle between 2 states but moves the state from idle to recording and recording to waiting states. I would suggest moving the code under each case statement to separate functions and add a new toggleSpeechInputState() function which actually moves between Idle and Recording states. I think you'd want a pending recognition session to be cancelled and a new one started if this toggleSpeechInputState() function was called when state == Recognizing. > RenderTextControlSingleLine::toggleSpeechRecognition() could use the same name as mentioned above - toggleSpeechInputState() > WEBKIT_API bool isSpeechEnabled() const; suggest changing to isSpeechInputEnabled - "speech" is not the same as "speechInput" > WEBKIT_API void toggleSpeechRecognition(); > LayoutTestController::toggleSpeechInput change to toggleSpeechInputState() as above

Tony Gentilcore

Comment 11 2011-06-16 08:52:18 PDT

You'll also need a stamp from fishd since this touches Source/WebKit/chromium/public/*.

Darin Fisher (:fishd, Google)

Comment 12 2011-06-16 10:38:37 PDT

Comment on attachment 96120 [details] Fixed style error. View in context: https://bugs.webkit.org/attachment.cgi?id=96120&action=review > Source/WebKit/chromium/public/WebInputElement.h:80 > + WEBKIT_API bool isSpeechEnabled() const; it looks like toggleSpeechRecognition and isSpeechEnabled are related methods, but they do not use the same terms. if isSpeechEnabled is supposed to return true when toggleSpeechRecognition enabled the feature, then isSpeechEnabled should really be named isSpeechRecognitionEnabled. that way the names of the methods reinforces the relationship without need for further documentation. also, we don't have any other "toggle" APIs like this. normally, we have isFooEnabled and enableFoo(bool). why not follow that pattern here? > Tools/DumpRenderTree/LayoutTestController.h:112 > + void toggleSpeechInput(JSContextRef inputElement); same nit: seems like this would be better as a boolean setter. why does the caller have to guess or know the state in order to decide if they should call toggle or not?

Dominic Mazzoni

Comment 13 2011-06-17 16:50:22 PDT

> --- Comment #10 from Satish Sampath <satish@chromium.org> 2011-06-16 04:09:39 PST --- > onToggle doesn't seem like the right name for this function. It doesn't toggle between 2 states but moves the state from idle to recording and recording to waiting states. I would suggest moving the code under each case statement to separate functions and add a new toggleSpeechInputState() function which actually moves between Idle and Recording states. I think you'd want a pending recognition session to be cancelled and a new one started if this toggleSpeechInputState() function was called when state == Recognizing. I'm happy with using toggleSpeechInputState, but I don't want to change the behavior of what happens when you cancel while already in the recognition stage - that'd affect clicks as well. The goal is simply to abstract what happens when you click now so that the "click" could be triggered by a keypress instead - I don't think there's anything wrong with the current behavior, which is to start recording the first click, and stop recording the second click (but not cancel recognition). If you want different behavior for the second click, let's save that for another change. Or, if the current behavior is fine, maybe toggleSpeechInputState isn't the right name for the method. How about startOrStopSpeechInput? > Darin Fisher (:fishd, Google) <fishd@chromium.org> changed: >> Source/WebKit/chromium/public/WebInputElement.h:80 >> + WEBKIT_API bool isSpeechEnabled() const; > > it looks like toggleSpeechRecognition and isSpeechEnabled are related methods, but > they do not use the same terms. if isSpeechEnabled is supposed to return true isSpeechEnabled means, "does this input element have the x-webkit-speech attribute, meaning that speech input is allowed?" toggleSpeechRecognition means, "for an input that has speech input enabled, start recognizing if it's not already, or stop recognizing if you've already started." It doesn't change the result of isSpeechEnabled - that's controlled by the DOM element's attribute. My proposed change: isSpeechInputEnabled and startOrStopSpeechInput. (We could provide startSpeechInput and stopSpeechInput also, if we think they'd be needed or useful. However, a startOrStopSpeechInput is still needed because when the user triggers speech input by an action in the browser process, whether by a key, context menu, etc. it doesn't make sense to send one round-trip to the renderer process to determine the current state, then flip the state in another RPC, with a possible race condition. Having a startOrStopSpeechInput method exactly mimics the user behavior when they click on the microphone icon, which is what we want. - Dominic

Dominic Mazzoni

Comment 14 2011-06-17 17:41:13 PDT

Created attachment 97672 [details] Rename methods to be more clear.

Collabora GTK+ EWS bot

Comment 15 2011-06-17 18:55:16 PDT

Comment on attachment 97672 [details] Rename methods to be more clear. Attachment 97672 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/8872723

Early Warning System Bot

Comment 16 2011-06-20 01:29:17 PDT

Comment on attachment 97672 [details] Rename methods to be more clear. Attachment 97672 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/8913438

Dominic Mazzoni

Comment 17 2011-06-20 04:47:42 PDT

Created attachment 97778 [details] Fix gtk & qt errors

Satish Sampath

Comment 18 2011-06-21 03:26:55 PDT

> The goal is simply to abstract what happens when you > click now so that the "click" could be triggered by a > keypress instead I think we should not mimic the click implementation for hotkey. When early prototypes of speech input were being tested we saw many users didn't understand that they can just stop speaking and the browser would detect silence to trigger recognition. Instead they were clicking on the button again thinking it was necessary. That is the reason why we handle a click on the speech button and move to recognition state if audio was already recorded. In the hotkey case, we can message easily to the user that they need to press the hotkey only to start speaking and once they stop speaking results will be available for them. The implementation is also much simpler as you only need to start recording on hotkey press and don't have to toggle (and can ignore hotkey if audio was already being recorded).

Darin Fisher (:fishd, Google)

Comment 19 2011-08-25 15:46:01 PDT

Comment on attachment 97778 [details] Fix gtk & qt errors View in context: https://bugs.webkit.org/attachment.cgi?id=97778&action=review > Source/WebKit/chromium/public/WebInputElement.h:81 > + WEBKIT_API void startOrStopSpeechInput(); This startOrStop method is a bit awkward. It seems like the caller has no way to know what the current state is, which means that they have to just make a guess. This might be OK for certain kinds of UI, or embeddings, but it seems like it could be a pretty confusing API for others. How about just having separate methods for starting and stopping speech input?

Darin Fisher (:fishd, Google)

Comment 20 2011-08-25 15:48:22 PDT

(In reply to comment #13) > (We could provide startSpeechInput and stopSpeechInput also, if we > think they'd be needed or useful. However, a startOrStopSpeechInput is > still needed because when the user triggers speech input by an action > in the browser process, whether by a key, context menu, etc. it > doesn't make sense to send one round-trip to the renderer process to > determine the current state, then flip the state in another RPC, with > a possible race condition. Having a startOrStopSpeechInput method > exactly mimics the user behavior when they click on the microphone > icon, which is what we want. OK, I see that you addressed my question here. I don't think these IPC considerations have much to do with how the WebKit API is shaped. You could very easily have an IPC named StartOrStopSpeechInput, which could use local WebKit APIs to determine whether it makes sense to call startSpeechInput or stopSpeechInput.

Dominic Mazzoni

Comment 21 2011-08-25 16:24:34 PDT

(In reply to comment #20) > I don't think these IPC considerations have much to do with how the WebKit API is shaped. You could very easily have an IPC named StartOrStopSpeechInput, which could use local WebKit APIs to determine whether it makes sense to call startSpeechInput or stopSpeechInput. Ah, that's clear now. Thanks, I'll make that change.

Dominic Mazzoni

Comment 22 2011-09-01 01:19:39 PDT

Created attachment 105921 [details] Patch

Dominic Mazzoni

Comment 23 2011-09-01 12:16:34 PDT

Created attachment 106005 [details] Patch for landing

WebKit Review Bot

Comment 24 2011-09-01 12:17:27 PDT

Comment on attachment 106005 [details] Patch for landing Rejecting attachment 106005 [details] from commit-queue. dmazzoni@google.com does not have committer permissions according to http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/committers.py. - If you do not have committer rights please read http://webkit.org/coding/contributing.html for instructions on how to use bugzilla flags. - If you have committer rights please correct the error in Tools/Scripts/webkitpy/common/config/committers.py by adding yourself to the file (no review needed). The commit-queue restarts itself every 2 hours. After restart the commit-queue will correctly respect your committer rights.

Dimitri Glazkov (Google)

Comment 25 2011-09-01 13:25:44 PDT

Comment on attachment 106005 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=106005&action=review Sorry, I have an extra nit! :) > Source/WebCore/html/shadow/TextControlInnerElements.cpp:527 > + RefPtr<HTMLInputElement> input(static_cast<HTMLInputElement*>(shadowAncestorNode())); use input = foo syntax instead.

Dominic Mazzoni

Comment 26 2011-09-01 13:32:19 PDT

Created attachment 106017 [details] Patch

WebKit Review Bot

Comment 27 2011-09-01 14:14:02 PDT

Comment on attachment 106017 [details] Patch Clearing flags on attachment: 106017 Committed r94343: <http://trac.webkit.org/changeset/94343>

WebKit Review Bot

Comment 28 2011-09-01 14:14:08 PDT

All reviewed patches have been landed. Closing bug.

Leandro Graciá Gil

Comment 29 2011-09-05 05:19:54 PDT

Comment on attachment 106017 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=106017&action=review > Source/WebCore/html/shadow/TextControlInnerElements.cpp:530 > + IntRect rect = input->renderer()->absoluteBoundingBoxRect(); This line is introducing a regression that undoes this patch: https://bugs.webkit.org/show_bug.cgi?id=65333 I've created a separate bug to fix this: https://bugs.webkit.org/show_bug.cgi?id=67597

Note You need to log in before you can comment on or make changes to this bug.