Bug 60170

Summary: [Chromium] Speech-enabled input fields need the ability to trigger without a click
Product: WebKit Reporter: Dominic Mazzoni <dmazzoni>
Component: New BugsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: dglazkov, fishd, gustavo.noronha, gustavo, satish, tonyg, webkit.review.bot, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
Patch for initial feedback, still has debugging printfs - is the approach good?
webkit.review.bot: commit-queue-
Updated patch with layout test, ready for review.
none
Fixed style error.
fishd: review-
Rename methods to be more clear.
gustavo.noronha: commit-queue-
Fix gtk & qt errors
none
Patch
none
Patch for landing
none
Patch none

Description Dominic Mazzoni 2011-05-04 08:23:43 PDT
It needs to be possible to start speech input on speech-enabled input fields with something other than a mouse click - like a keypress.
Comment 1 Dominic Mazzoni 2011-05-04 08:29:39 PDT
Created attachment 92246 [details]
Patch for initial feedback, still has debugging printfs - is the approach good?
Comment 2 WebKit Review Bot 2011-05-18 12:52:16 PDT
Attachment 92246 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1

Source/WebCore/html/shadow/TextControlInnerElements.cpp:495:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 WebKit Review Bot 2011-05-18 13:19:34 PDT
Comment on attachment 92246 [details]
Patch for initial feedback, still has debugging printfs - is the approach good?

Attachment 92246 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8702872
Comment 4 Early Warning System Bot 2011-05-18 14:26:40 PDT
Comment on attachment 92246 [details]
Patch for initial feedback, still has debugging printfs - is the approach good?

Attachment 92246 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8705839
Comment 5 Collabora GTK+ EWS bot 2011-05-18 17:40:52 PDT
Comment on attachment 92246 [details]
Patch for initial feedback, still has debugging printfs - is the approach good?

Attachment 92246 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8702956
Comment 6 Gyuyoung Kim 2011-05-19 02:23:36 PDT
Comment on attachment 92246 [details]
Patch for initial feedback, still has debugging printfs - is the approach good?

Attachment 92246 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/8719031
Comment 7 Dominic Mazzoni 2011-06-06 13:17:57 PDT
Created attachment 96115 [details]
Updated patch with layout test, ready for review.
Comment 8 WebKit Review Bot 2011-06-06 13:20:31 PDT
Attachment 96115 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1

Source/WebCore/html/shadow/TextControlInnerElements.cpp:531:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 1 in 20 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Dominic Mazzoni 2011-06-06 13:45:44 PDT
Created attachment 96120 [details]
Fixed style error.
Comment 10 Satish Sampath 2011-06-16 04:09:39 PDT
Comment on attachment 96120 [details]
Fixed style error.

> InputFieldSpeechButtonElement::onToggle()

onToggle doesn't seem like the right name for this function. It doesn't toggle between 2 states but moves the state from idle to recording and recording to waiting states. I would suggest moving the code under each case statement to separate functions and add a new toggleSpeechInputState() function which actually moves between Idle and Recording states. I think you'd want a pending recognition session to be cancelled and a new one started if this toggleSpeechInputState() function was called when state == Recognizing.

> RenderTextControlSingleLine::toggleSpeechRecognition()

could use the same name as mentioned above - toggleSpeechInputState()

> WEBKIT_API bool isSpeechEnabled() const;

suggest changing to isSpeechInputEnabled - "speech" is not the same as "speechInput"

> WEBKIT_API void toggleSpeechRecognition();
> LayoutTestController::toggleSpeechInput

change to toggleSpeechInputState() as above
Comment 11 Tony Gentilcore 2011-06-16 08:52:18 PDT
You'll also need a stamp from fishd since this touches Source/WebKit/chromium/public/*.
Comment 12 Darin Fisher (:fishd, Google) 2011-06-16 10:38:37 PDT
Comment on attachment 96120 [details]
Fixed style error.

View in context: https://bugs.webkit.org/attachment.cgi?id=96120&action=review

> Source/WebKit/chromium/public/WebInputElement.h:80
> +        WEBKIT_API bool isSpeechEnabled() const;

it looks like toggleSpeechRecognition and isSpeechEnabled are related methods, but
they do not use the same terms.  if isSpeechEnabled is supposed to return true
when toggleSpeechRecognition enabled the feature, then isSpeechEnabled should really
be named isSpeechRecognitionEnabled.  that way the names of the methods reinforces
the relationship without need for further documentation.

also, we don't have any other "toggle" APIs like this.  normally, we have
isFooEnabled and enableFoo(bool).  why not follow that pattern here?

> Tools/DumpRenderTree/LayoutTestController.h:112
> +    void toggleSpeechInput(JSContextRef inputElement);

same nit:  seems like this would be better as a boolean setter.  why does the
caller have to guess or know the state in order to decide if they should call
toggle or not?
Comment 13 Dominic Mazzoni 2011-06-17 16:50:22 PDT
> --- Comment #10 from Satish Sampath <satish@chromium.org>  2011-06-16 04:09:39 PST ---
> onToggle doesn't seem like the right name for this function. It doesn't toggle between 2 states but moves the state from idle to recording and recording to waiting states. I would suggest moving the code under each case statement to separate functions and add a new toggleSpeechInputState() function which actually moves between Idle and Recording states. I think you'd want a pending recognition session to be cancelled and a new one started if this toggleSpeechInputState() function was called when state == Recognizing.

I'm happy with using toggleSpeechInputState, but I don't want to
change the behavior of what happens when you cancel while already in
the recognition stage - that'd affect clicks as well. The goal is
simply to abstract what happens when you click now so that the "click"
could be triggered by a keypress instead - I don't think there's
anything wrong with the current behavior, which is to start recording
the first click, and stop recording the second click (but not cancel
recognition).

If you want different behavior for the second click, let's save that
for another change.

Or, if the current behavior is fine, maybe toggleSpeechInputState
isn't the right name for the method. How about startOrStopSpeechInput?

> Darin Fisher (:fishd, Google) <fishd@chromium.org> changed:
>> Source/WebKit/chromium/public/WebInputElement.h:80
>> +        WEBKIT_API bool isSpeechEnabled() const;
>
> it looks like toggleSpeechRecognition and isSpeechEnabled are related methods, but
> they do not use the same terms.  if isSpeechEnabled is supposed to return true

isSpeechEnabled means, "does this input element have the
x-webkit-speech attribute, meaning that speech input is allowed?"

toggleSpeechRecognition means, "for an input that has speech input
enabled, start recognizing if it's not already, or stop recognizing if
you've already started." It doesn't change the result of
isSpeechEnabled - that's controlled by the DOM element's attribute.

My proposed change: isSpeechInputEnabled and startOrStopSpeechInput.

(We could provide startSpeechInput and stopSpeechInput also, if we
think they'd be needed or useful. However, a startOrStopSpeechInput is
still needed because when the user triggers speech input by an action
in the browser process, whether by a key, context menu, etc. it
doesn't make sense to send one round-trip to the renderer process to
determine the current state, then flip the state in another RPC, with
a possible race condition. Having a startOrStopSpeechInput method
exactly mimics the user behavior when they click on the microphone
icon, which is what we want.

- Dominic
Comment 14 Dominic Mazzoni 2011-06-17 17:41:13 PDT
Created attachment 97672 [details]
Rename methods to be more clear.
Comment 15 Collabora GTK+ EWS bot 2011-06-17 18:55:16 PDT
Comment on attachment 97672 [details]
Rename methods to be more clear.

Attachment 97672 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8872723
Comment 16 Early Warning System Bot 2011-06-20 01:29:17 PDT
Comment on attachment 97672 [details]
Rename methods to be more clear.

Attachment 97672 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8913438
Comment 17 Dominic Mazzoni 2011-06-20 04:47:42 PDT
Created attachment 97778 [details]
Fix gtk & qt errors
Comment 18 Satish Sampath 2011-06-21 03:26:55 PDT
> The goal is simply to abstract what happens when you
> click now so that the "click" could be triggered by a
> keypress instead

I think we should not mimic the click implementation for hotkey. When early prototypes of speech input were being tested we saw many users didn't understand that they can just stop speaking and the browser would detect silence to trigger recognition. Instead they were clicking on the button again thinking it was necessary. That is the reason why we handle a click on the speech button and move to recognition state if audio was already recorded.

In the hotkey case, we can message easily to the user that they need to press the hotkey only to start speaking and once they stop speaking results will be available for them. The implementation is also much simpler as you only need to start recording on hotkey press and don't have to toggle (and can ignore hotkey if audio was already being recorded).
Comment 19 Darin Fisher (:fishd, Google) 2011-08-25 15:46:01 PDT
Comment on attachment 97778 [details]
Fix gtk & qt errors

View in context: https://bugs.webkit.org/attachment.cgi?id=97778&action=review

> Source/WebKit/chromium/public/WebInputElement.h:81
> +        WEBKIT_API void startOrStopSpeechInput();

This startOrStop method is a bit awkward.  It seems like the caller has
no way to know what the current state is, which means that they have to
just make a guess.  This might be OK for certain kinds of UI, or embeddings,
but it seems like it could be a pretty confusing API for others.

How about just having separate methods for starting and stopping speech input?
Comment 20 Darin Fisher (:fishd, Google) 2011-08-25 15:48:22 PDT
(In reply to comment #13)
> (We could provide startSpeechInput and stopSpeechInput also, if we
> think they'd be needed or useful. However, a startOrStopSpeechInput is
> still needed because when the user triggers speech input by an action
> in the browser process, whether by a key, context menu, etc. it
> doesn't make sense to send one round-trip to the renderer process to
> determine the current state, then flip the state in another RPC, with
> a possible race condition. Having a startOrStopSpeechInput method
> exactly mimics the user behavior when they click on the microphone
> icon, which is what we want.

OK, I see that you addressed my question here.

I don't think these IPC considerations have much to do with how the WebKit API is shaped.  You could very easily have an IPC named StartOrStopSpeechInput, which could use local WebKit APIs to determine whether it makes sense to call startSpeechInput or stopSpeechInput.
Comment 21 Dominic Mazzoni 2011-08-25 16:24:34 PDT
(In reply to comment #20)
> I don't think these IPC considerations have much to do with how the WebKit API is shaped.  You could very easily have an IPC named StartOrStopSpeechInput, which could use local WebKit APIs to determine whether it makes sense to call startSpeechInput or stopSpeechInput.

Ah, that's clear now. Thanks, I'll make that change.
Comment 22 Dominic Mazzoni 2011-09-01 01:19:39 PDT
Created attachment 105921 [details]
Patch
Comment 23 Dominic Mazzoni 2011-09-01 12:16:34 PDT
Created attachment 106005 [details]
Patch for landing
Comment 24 WebKit Review Bot 2011-09-01 12:17:27 PDT
Comment on attachment 106005 [details]
Patch for landing

Rejecting attachment 106005 [details] from commit-queue.

dmazzoni@google.com does not have committer permissions according to http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/committers.py.

- If you do not have committer rights please read http://webkit.org/coding/contributing.html for instructions on how to use bugzilla flags.

- If you have committer rights please correct the error in Tools/Scripts/webkitpy/common/config/committers.py by adding yourself to the file (no review needed).  The commit-queue restarts itself every 2 hours.  After restart the commit-queue will correctly respect your committer rights.
Comment 25 Dimitri Glazkov (Google) 2011-09-01 13:25:44 PDT
Comment on attachment 106005 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=106005&action=review

Sorry, I have an extra nit! :)

> Source/WebCore/html/shadow/TextControlInnerElements.cpp:527
> +    RefPtr<HTMLInputElement> input(static_cast<HTMLInputElement*>(shadowAncestorNode()));

use input = foo syntax instead.
Comment 26 Dominic Mazzoni 2011-09-01 13:32:19 PDT
Created attachment 106017 [details]
Patch
Comment 27 WebKit Review Bot 2011-09-01 14:14:02 PDT
Comment on attachment 106017 [details]
Patch

Clearing flags on attachment: 106017

Committed r94343: <http://trac.webkit.org/changeset/94343>
Comment 28 WebKit Review Bot 2011-09-01 14:14:08 PDT
All reviewed patches have been landed.  Closing bug.
Comment 29 Leandro GraciĆ” Gil 2011-09-05 05:19:54 PDT
Comment on attachment 106017 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=106017&action=review

> Source/WebCore/html/shadow/TextControlInnerElements.cpp:530
> +    IntRect rect = input->renderer()->absoluteBoundingBoxRect();

This line is introducing a regression that undoes this patch: https://bugs.webkit.org/show_bug.cgi?id=65333

I've created a separate bug to fix this: https://bugs.webkit.org/show_bug.cgi?id=67597