Bug 42367

Summary: Speech input plumbing in webkit
Product: WebKit Reporter: Satish Sampath <satish>
Component: FormsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: bulach, commit-queue, dglazkov, fishd, jorlow, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 39485    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Satish Sampath 2010-07-15 06:43:54 PDT
Adds the following in WebKit:
- interface WebSpeechInputClient under chromium/public, implemented by the embedder and callable by WebKit
- interface WebSpeechInputClientListener under chromium/public, implemented by WebKit and callable by the embedder
- class SpeechInputClientImpl under chrome/src which acts as a bi-directional channel between WebCore and the embedder for speech input

In the next patch I'll be adding a mock implementation of SpeechInputClientImpl, LayoutTestController additions and relevant layout tests.

All possible code changes are behind the speech input compile time flag (disabled by default).

More information about the speech input API proposal can be found in the following links:
 Parent bug: https://bugs.webkit.org/show_bug.cgi?id=39485
 Full API proposal: https://docs.google.com/View?id=dcfg79pz_5dhnp23f5
 Relevant discussions in the WHATWG list:
  - http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-May/026338.html
  - http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2010-June/026747.html
Comment 1 Satish Sampath 2010-07-15 08:44:50 PDT
Created attachment 61666 [details]
Patch
Comment 2 WebKit Review Bot 2010-07-15 10:13:53 PDT
Attachment 61666 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3558030
Comment 3 Jeremy Orlow 2010-07-15 10:17:07 PDT
Darin, can you please review the WebKit part?  Marcus, can you please review the entire thing?
Comment 4 Satish Sampath 2010-07-15 10:27:52 PDT
Doh, didn't add the new files to chromium/WebKit.gyp (since I tested on windows in local machine and added manually to the .vcproj). I'll add them along with the review comments once they come in.
Comment 5 Marcus Bulach 2010-07-16 04:17:16 PDT
(In reply to comment #4)
> Doh, didn't add the new files to chromium/WebKit.gyp (since I tested on windows in local machine and added manually to the .vcproj). I'll add them along with the review comments once they come in.

Hi Satish,

I can't do normative reviews yet, so here's some drive-by comments:

39 class WebSpeechInputClient {
 40 public:
 41     // Attaches a listener which receives future callbacks.
 42     // Returns false if a listener was already attached.
 43     virtual bool attachListener(WebSpeechInputClientListener* listener) = 0;
 I think the style is now to avoid pure virtuals and have:
 WEBKIT_ASSERT_NOT_REACHED();
 see: http://trac.webkit.org/browser/trunk/WebKit/chromium/public/WebIDBCallbacks.h
Also, no need to name the parameter when it's obvious from the type (please, check all other headers).
 
  39 class WebSpeechInputClientListener {
As above.
    
47 class SpeechInputClientImpl
both file / classname should be prefixed with "Web", since they're in WebKit namespace.

43 SpeechInputClientImpl::SpeechInputClientImpl(WebSpeechInputClient* client)
use initializer for m_client, and also zero-initialize m_listener!

    
64     WebCore::SpeechInputClientListener* m_listener; // Valid when recognition is in progress.
This should probably be a WebPrivatePtr:
http://trac.webkit.org/browser/trunk/WebKit/chromium/public/WebSerializedScriptValue.h
    
 71 void SpeechInputClientImpl::recordingComplete()
 72 {
 73     ASSERT(m_listener);
 74     if (m_listener)
no need for the if since the assert will terminate.
  
 78 void SpeechInputClientImpl::setRecognitionResult(const WebString& result)
 79 {
 80     ASSERT(m_listener);
 81     if (m_listener)   
as above.
   
67 } // namespace WebCore
 68 
 69 #endif // ENABLE(INPUT_SPEECH)
this is closing WebKit, not WebCore namespace.
Comment 6 Satish Sampath 2010-07-16 08:31:15 PDT
Created attachment 61809 [details]
Patch

Addressed all of Marcus's comments, except changing the name of SpeechInputClientImpl to WebSpeechInputClientImpl. It looks to me that the 'Web' prefix is used primarily for interfaces/classes exposed from WebKit via the headers in chromium/public directory. Majority of classes in chromium/src doesn't have the 'Web' prefix and most of those which do have the prefix are implementations of classes exposed in the chromium/public directory. I'm happy to change the name if my understanding is incorrect and there is a strong reason to have the 'Web' prefix even for classes internal to WebKit.
Comment 7 Darin Fisher (:fishd, Google) 2010-07-16 11:04:38 PDT
Comment on attachment 61809 [details]
Patch

WebKit/chromium/public/WebViewClient.h:340
 +       virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; }
indentation is wrong.  no need for WebKit:: prefix

WebKit/chromium/public/WebViewClient.h:340
 +       virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; }
instead of introducing a new interface here, just add the methods
from WebSpeechInputClient to WebViewClient.

Then rename WebSpeechInputClientListener to WebSpeechInputListener.

WebKit/chromium/public/WebSpeechInputClientListener.h:44
 +      virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); }
nit: didCompleteRecording

WebKit/chromium/public/WebSpeechInputClientListener.h:49
 +      virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); }
didCompleteRecognition?

WebKit/chromium/public/WebSpeechInputClient.h:45
 +      virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); }
why do we need multiple listeners?

WebKit/chromium/public/WebSpeechInputClient.h:57
 +      virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
why do we need the recordingComplete (didCompleteRecording) event if
there is an explicit stopRecording method?  are there other ways that
a recording could stop such that WebKit would require the notification
that recording stopped?

WebKit/chromium/public/WebSpeechInputClient.h:53
 +      virtual bool startRecognition() { WEBKIT_ASSERT_NOT_REACHED(); }
nit: might be nice to put the startRecognition / cancelRecognition methods
next to each other since they seem related.

WebKit/chromium/public/WebSpeechInputClient.h:57
 +      virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
should there be some way to start recording again?

WebKit/chromium/public/WebSpeechInputClientListener.h:44
 +      virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); }
since these methods are implemented by WebKit, they should be pure virtual.

what if there are multiple pages independently listening for speech
input?  given these interfaces, it seems like one of the pages could
cancel the speech recognition process, hampering the efforts of the
other page.  or, is that dealt with at a lower level within WebCore?
Comment 8 Satish Sampath 2010-07-16 11:40:01 PDT
(In reply to comment #7)

Thanks for the comments, some questions/replies below.

> WebKit/chromium/public/WebViewClient.h:340
>  +       virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; }
> instead of introducing a new interface here, just add the methods
> from WebSpeechInputClient to WebViewClient.
> 
> Then rename WebSpeechInputClientListener to WebSpeechInputListener.

I see other features such as Geolocation have an interface (WebKit::WebGeolocationService) and implemented in the chromium render process as a separate dispatcher-like class which seemed clean. Just double checking with you if we are moving away from that model..

> WebKit/chromium/public/WebSpeechInputClientListener.h:49
>  +      virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); }
> didCompleteRecognition?

This method can potentially get called multiple times, if there are partial results available as the user keeps speaking. The current name may be more suitable for such cases, is it ok to leave it as is?

> WebKit/chromium/public/WebSpeechInputClient.h:45
>  +      virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); }
> why do we need multiple listeners?

We don't need, there should be a 1:1 mapping between the client and listener. But since the client is fetched from the embedder on creation of WebViewImpl, I was not sure how this attach can be done before that. Any suggestions for making this better?

> WebKit/chromium/public/WebSpeechInputClient.h:57
>  +      virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
> why do we need the recordingComplete (didCompleteRecording) event if
> there is an explicit stopRecording method?  are there other ways that
> a recording could stop such that WebKit would require the notification
> that recording stopped?

stopRecording() is an optional call, without that the browser's speech recording 'endpointer' will detect silence in the input and stop recording automatically once the user stops speaking. stopRecording() is there to let users explicitly click the speech/mic button again in case they are not familiar with the workings or for better feature discoverability/usability.
recordingComplete() will be issued in both these cases. The speech element in WebKit explicitly needs to know when recording stops so it can update the UI and indicate that recognition is in progress (which can take a while if done via a speech recognition server).

> WebKit/chromium/public/WebSpeechInputClient.h:57
>  +      virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
> should there be some way to start recording again?

startRecognition() does that. This method is named stopRecording() and not stopRecognition() because the audio recorded so far is still recognized and the result returned to the input element.

> WebKit/chromium/public/WebSpeechInputClientListener.h:44
>  +      virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); }
> since these methods are implemented by WebKit, they should be pure virtual.

I got comments from Marcus (above) to change from pure virtual to this model as he felt this was the new hotness, I'm happy to move back to pure virtual if you think that is the way to go.

> what if there are multiple pages independently listening for speech
> input?  given these interfaces, it seems like one of the pages could
> cancel the speech recognition process, hampering the efforts of the
> other page.  or, is that dealt with at a lower level within WebCore?

In my understanding each WebViewImpl manages only a single page, but each page can have multiple speech enabled input elements. The code so far handles the multiple input element case by allowing only one input element to record at a time. Speech input code in the browser layer should handle the multiple page case, which is not done yet.
Comment 9 Darin Fisher (:fishd, Google) 2010-07-21 09:39:36 PDT
(In reply to comment #8)
> (In reply to comment #7)
> 
> Thanks for the comments, some questions/replies below.
> 
> > WebKit/chromium/public/WebViewClient.h:340
> >  +       virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; }
> > instead of introducing a new interface here, just add the methods
> > from WebSpeechInputClient to WebViewClient.
> > 
> > Then rename WebSpeechInputClientListener to WebSpeechInputListener.
> 
> I see other features such as Geolocation have an interface (WebKit::WebGeolocationService)
> and implemented in the chromium render process as a separate dispatcher-like class which 
> seemed clean. Just double checking with you if we are moving away from that model..

Yes, that's because WebGeolocationService has a lot of methods.  I'm OK creating
a separate interface for speech input if you think it is going to make things
cleaner.  I would just not call it WebSpeechInputClient.  That name suggests
that it is the "client" to a WebSpeechInput interface, which does not appear
to exist.

Instead, I would rename WebSpeechInputClient to WebSpeechInputService.  Then
I would rename WebSpeechInputClientListener to just WebSpeechInputListener.

i.e., the speech input service can have listeners.


> > WebKit/chromium/public/WebSpeechInputClientListener.h:49
> >  +      virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); }
> > didCompleteRecognition?
> 
> This method can potentially get called multiple times, if there are partial results 
> available as the user keeps speaking. The current name may be more suitable for such
> cases, is it ok to leave it as is?

OK, but please add comments explaining all of this.  I found these interfaces
to be very confusing when I first read them.


> > WebKit/chromium/public/WebSpeechInputClient.h:45
> >  +      virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); }
> > why do we need multiple listeners?
> 
> We don't need, there should be a 1:1 mapping between the client and listener.
> But since the client is fetched from the embedder on creation of WebViewImpl,
> I was not sure how this attach can be done before that. Any suggestions for
> making this better?

You should pass the WebSpeechInputListener to the method that returns
the WebSpeechInputService.  It becomes like a "constructor parameter"
in that case.


> > WebKit/chromium/public/WebSpeechInputClient.h:57
> >  +      virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
> > why do we need the recordingComplete (didCompleteRecording) event if
> > there is an explicit stopRecording method?  are there other ways that
> > a recording could stop such that WebKit would require the notification
> > that recording stopped?
> 
> stopRecording() is an optional call, without that the browser's speech recording
> 'endpointer' will detect silence in the input and stop recording automatically
> once the user stops speaking. stopRecording() is there to let users explicitly
> click the speech/mic button again in case they are not familiar with the workings
> or for better feature discoverability/usability.
> recordingComplete() will be issued in both these cases. The speech element in
> WebKit explicitly needs to know when recording stops so it can update the UI and
> indicate that recognition is in progress (which can take a while if done via a
> speech recognition server).

OK, please explain this in the comments.


> > WebKit/chromium/public/WebSpeechInputClient.h:57
> >  +      virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
> > should there be some way to start recording again?
> 
> startRecognition() does that. This method is named stopRecording() and not
> stopRecognition() because the audio recorded so far is still recognized and
> the result returned to the input element.

OK, please explain this in the comments.


> > WebKit/chromium/public/WebSpeechInputClientListener.h:44
> >  +      virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); }
> > since these methods are implemented by WebKit, they should be pure virtual.
> 
> I got comments from Marcus (above) to change from pure virtual to this model
> as he felt this was the new hotness, I'm happy to move back to pure virtual
> if you think that is the way to go.

The rule is very simple:  use pure virtual for methods implemented by WebKit,
but provide default implementations of methods implemented by the embedder.
Please share this rule :)

The point here is that we would prefer to always use pure virtual methods
as it allows the compiler to catch cases in which we forget to implement
a method.  However, to make it easier to roll new versions of webkit into
chromium, we provide default implementations for methods that chromium
may need to implement.  that way if the chromium side hasn't yet implemented
the new method, things will still compile (although they might not function
perfectly).


> > what if there are multiple pages independently listening for speech
> > input?  given these interfaces, it seems like one of the pages could
> > cancel the speech recognition process, hampering the efforts of the
> > other page.  or, is that dealt with at a lower level within WebCore?
> 
> In my understanding each WebViewImpl manages only a single page, but each
> page can have multiple speech enabled input elements. The code so far
> handles the multiple input element case by allowing only one input element
> to record at a time. Speech input code in the browser layer should handle
> the multiple page case, which is not done yet.

Makes sense.
Comment 10 Darin Fisher (:fishd, Google) 2010-07-21 09:43:03 PDT
(In reply to comment #9)
> Instead, I would rename WebSpeechInputClient to WebSpeechInputService.  Then
> I would rename WebSpeechInputClientListener to just WebSpeechInputListener.

Actually, even better:  Instead of WebSpeechInputService, how about if we call
it WebSpeechInputController?  Since there is one of these per WebView,
it seems wrong to add the "Service" suffix, and "Controller" seems like a
more accurate description of its responsibility.
Comment 11 Satish Sampath 2010-07-21 13:31:42 PDT
Created attachment 62225 [details]
Patch

Thanks Darin. This latest patch addresses all your comments, renamed interfaces and methods, added comments.. Please take another look.
Comment 12 Darin Fisher (:fishd, Google) 2010-07-21 13:40:04 PDT
Comment on attachment 62225 [details]
Patch

I think you should similarly rename the WebCore classes, but perhaps that can be
done in a separate patch.

Looking at the WebCore level, you have several classes (in WebCore/page/):

  SpeechInput
  SpeechInputListener
  SpeechInputClient
  SpeechInputClientListener

Why so many classes?  SpeechInputListener and SpeechInputClientListener seem
to be identical!
Comment 13 Satish Sampath 2010-07-21 14:06:04 PDT
The original idea was to have more functionality in the SpeechInput class and expose a subset to SpeechInputListener, but it has turned out to be a simple proxy now.
In a subsequent patch I was planning to remove SpeechInput and SpeechInputListener , rename SpeechInputClientListener->SpeechInputListener and get InputFieldSpeechButtonElement implement that directly. Is that ok?
Comment 14 Satish Sampath 2010-07-23 09:18:59 PDT
Created attachment 62432 [details]
Patch

Removed unused headers and added a missing return statement.
Comment 15 Darin Fisher (:fishd, Google) 2010-07-23 10:08:11 PDT
Comment on attachment 62432 [details]
Patch

WebKit/chromium/public/WebViewClient.h:341
 +      virtual WebSpeechInputController* createSpeechInputController(
this method is named create*, yet it returns an interface that defines
a protected virtual destructor, so i have no way to destroy the thing
that was created for me.  perhaps you don't intend to allocate a new
WebSpeechInputController each time this method is called?  if that is
the case, then this method should just have the create prefix dropped.
it should just be named "speechInputController".

cleaning up the webcore classes in a separate patch is completely fine.
my concern with this patch is that the interfaces should match up well
with what we want the webcore classes to look like.  i'm happy if that's
the plan.

R=me with createSpeechInputController renamed to speechInputController.
Comment 16 Satish Sampath 2010-07-23 11:38:28 PDT
Created attachment 62451 [details]
Patch

Removed 'create' from method name as suggested. Could you also please give me a commit-queue+ since I am not a committer yet?
Comment 17 Steve Block 2010-07-26 02:28:15 PDT
Comment on attachment 62451 [details]
Patch

r=me, given Darin's earlier review
Comment 18 WebKit Commit Bot 2010-07-26 03:32:27 PDT
Comment on attachment 62451 [details]
Patch

Rejecting patch 62451 from commit-queue.

Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--ignore-tests', 'compositing', '--quiet']" exit_code: 1
Running build-dumprendertree
Compiling Java tests
make: Nothing to be done for `default'.
Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests
Testing 20718 test cases.
fast/loader/recursive-before-unload-crash.html -> failed

Exiting early after 1 failures. 14202 tests run.
214.25s total testing time

14201 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
1 test case (<1%) had stderr output

Full output: http://queues.webkit.org/results/3585565
Comment 19 Steve Block 2010-07-26 04:08:56 PDT
Comment on attachment 62451 [details]
Patch

Observed LayoutTest crash is unrelated to this patch, trying again.
Comment 20 WebKit Commit Bot 2010-07-26 04:53:02 PDT
Comment on attachment 62451 [details]
Patch

Clearing flags on attachment: 62451

Committed r64042: <http://trac.webkit.org/changeset/64042>
Comment 21 WebKit Commit Bot 2010-07-26 04:53:08 PDT
All reviewed patches have been landed.  Closing bug.