RESOLVED FIXED Bug 42367
Speech input plumbing in webkit
https://bugs.webkit.org/show_bug.cgi?id=42367
Summary Speech input plumbing in webkit
Satish Sampath
Reported 2010-07-15 06:43:54 PDT
Adds the following in WebKit: - interface WebSpeechInputClient under chromium/public, implemented by the embedder and callable by WebKit - interface WebSpeechInputClientListener under chromium/public, implemented by WebKit and callable by the embedder - class SpeechInputClientImpl under chrome/src which acts as a bi-directional channel between WebCore and the embedder for speech input In the next patch I'll be adding a mock implementation of SpeechInputClientImpl, LayoutTestController additions and relevant layout tests. All possible code changes are behind the speech input compile time flag (disabled by default). More information about the speech input API proposal can be found in the following links: Parent bug: https://bugs.webkit.org/show_bug.cgi?id=39485 Full API proposal: https://docs.google.com/View?id=dcfg79pz_5dhnp23f5 Relevant discussions in the WHATWG list: - http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-May/026338.html - http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2010-June/026747.html
Attachments
Patch (17.60 KB, patch)
2010-07-15 08:44 PDT, Satish Sampath
no flags
Patch (18.43 KB, patch)
2010-07-16 08:31 PDT, Satish Sampath
no flags
Patch (23.86 KB, patch)
2010-07-21 13:31 PDT, Satish Sampath
no flags
Patch (24.60 KB, patch)
2010-07-23 09:18 PDT, Satish Sampath
no flags
Patch (24.58 KB, patch)
2010-07-23 11:38 PDT, Satish Sampath
no flags
Satish Sampath
Comment 1 2010-07-15 08:44:50 PDT
WebKit Review Bot
Comment 2 2010-07-15 10:13:53 PDT
Jeremy Orlow
Comment 3 2010-07-15 10:17:07 PDT
Darin, can you please review the WebKit part? Marcus, can you please review the entire thing?
Satish Sampath
Comment 4 2010-07-15 10:27:52 PDT
Doh, didn't add the new files to chromium/WebKit.gyp (since I tested on windows in local machine and added manually to the .vcproj). I'll add them along with the review comments once they come in.
Marcus Bulach
Comment 5 2010-07-16 04:17:16 PDT
(In reply to comment #4) > Doh, didn't add the new files to chromium/WebKit.gyp (since I tested on windows in local machine and added manually to the .vcproj). I'll add them along with the review comments once they come in. Hi Satish, I can't do normative reviews yet, so here's some drive-by comments: 39 class WebSpeechInputClient { 40 public: 41 // Attaches a listener which receives future callbacks. 42 // Returns false if a listener was already attached. 43 virtual bool attachListener(WebSpeechInputClientListener* listener) = 0; I think the style is now to avoid pure virtuals and have: WEBKIT_ASSERT_NOT_REACHED(); see: http://trac.webkit.org/browser/trunk/WebKit/chromium/public/WebIDBCallbacks.h Also, no need to name the parameter when it's obvious from the type (please, check all other headers). 39 class WebSpeechInputClientListener { As above. 47 class SpeechInputClientImpl both file / classname should be prefixed with "Web", since they're in WebKit namespace. 43 SpeechInputClientImpl::SpeechInputClientImpl(WebSpeechInputClient* client) use initializer for m_client, and also zero-initialize m_listener! 64 WebCore::SpeechInputClientListener* m_listener; // Valid when recognition is in progress. This should probably be a WebPrivatePtr: http://trac.webkit.org/browser/trunk/WebKit/chromium/public/WebSerializedScriptValue.h 71 void SpeechInputClientImpl::recordingComplete() 72 { 73 ASSERT(m_listener); 74 if (m_listener) no need for the if since the assert will terminate. 78 void SpeechInputClientImpl::setRecognitionResult(const WebString& result) 79 { 80 ASSERT(m_listener); 81 if (m_listener) as above. 67 } // namespace WebCore 68 69 #endif // ENABLE(INPUT_SPEECH) this is closing WebKit, not WebCore namespace.
Satish Sampath
Comment 6 2010-07-16 08:31:15 PDT
Created attachment 61809 [details] Patch Addressed all of Marcus's comments, except changing the name of SpeechInputClientImpl to WebSpeechInputClientImpl. It looks to me that the 'Web' prefix is used primarily for interfaces/classes exposed from WebKit via the headers in chromium/public directory. Majority of classes in chromium/src doesn't have the 'Web' prefix and most of those which do have the prefix are implementations of classes exposed in the chromium/public directory. I'm happy to change the name if my understanding is incorrect and there is a strong reason to have the 'Web' prefix even for classes internal to WebKit.
Darin Fisher (:fishd, Google)
Comment 7 2010-07-16 11:04:38 PDT
Comment on attachment 61809 [details] Patch WebKit/chromium/public/WebViewClient.h:340 + virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; } indentation is wrong. no need for WebKit:: prefix WebKit/chromium/public/WebViewClient.h:340 + virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; } instead of introducing a new interface here, just add the methods from WebSpeechInputClient to WebViewClient. Then rename WebSpeechInputClientListener to WebSpeechInputListener. WebKit/chromium/public/WebSpeechInputClientListener.h:44 + virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); } nit: didCompleteRecording WebKit/chromium/public/WebSpeechInputClientListener.h:49 + virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); } didCompleteRecognition? WebKit/chromium/public/WebSpeechInputClient.h:45 + virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); } why do we need multiple listeners? WebKit/chromium/public/WebSpeechInputClient.h:57 + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } why do we need the recordingComplete (didCompleteRecording) event if there is an explicit stopRecording method? are there other ways that a recording could stop such that WebKit would require the notification that recording stopped? WebKit/chromium/public/WebSpeechInputClient.h:53 + virtual bool startRecognition() { WEBKIT_ASSERT_NOT_REACHED(); } nit: might be nice to put the startRecognition / cancelRecognition methods next to each other since they seem related. WebKit/chromium/public/WebSpeechInputClient.h:57 + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } should there be some way to start recording again? WebKit/chromium/public/WebSpeechInputClientListener.h:44 + virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); } since these methods are implemented by WebKit, they should be pure virtual. what if there are multiple pages independently listening for speech input? given these interfaces, it seems like one of the pages could cancel the speech recognition process, hampering the efforts of the other page. or, is that dealt with at a lower level within WebCore?
Satish Sampath
Comment 8 2010-07-16 11:40:01 PDT
(In reply to comment #7) Thanks for the comments, some questions/replies below. > WebKit/chromium/public/WebViewClient.h:340 > + virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; } > instead of introducing a new interface here, just add the methods > from WebSpeechInputClient to WebViewClient. > > Then rename WebSpeechInputClientListener to WebSpeechInputListener. I see other features such as Geolocation have an interface (WebKit::WebGeolocationService) and implemented in the chromium render process as a separate dispatcher-like class which seemed clean. Just double checking with you if we are moving away from that model.. > WebKit/chromium/public/WebSpeechInputClientListener.h:49 > + virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); } > didCompleteRecognition? This method can potentially get called multiple times, if there are partial results available as the user keeps speaking. The current name may be more suitable for such cases, is it ok to leave it as is? > WebKit/chromium/public/WebSpeechInputClient.h:45 > + virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); } > why do we need multiple listeners? We don't need, there should be a 1:1 mapping between the client and listener. But since the client is fetched from the embedder on creation of WebViewImpl, I was not sure how this attach can be done before that. Any suggestions for making this better? > WebKit/chromium/public/WebSpeechInputClient.h:57 > + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } > why do we need the recordingComplete (didCompleteRecording) event if > there is an explicit stopRecording method? are there other ways that > a recording could stop such that WebKit would require the notification > that recording stopped? stopRecording() is an optional call, without that the browser's speech recording 'endpointer' will detect silence in the input and stop recording automatically once the user stops speaking. stopRecording() is there to let users explicitly click the speech/mic button again in case they are not familiar with the workings or for better feature discoverability/usability. recordingComplete() will be issued in both these cases. The speech element in WebKit explicitly needs to know when recording stops so it can update the UI and indicate that recognition is in progress (which can take a while if done via a speech recognition server). > WebKit/chromium/public/WebSpeechInputClient.h:57 > + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } > should there be some way to start recording again? startRecognition() does that. This method is named stopRecording() and not stopRecognition() because the audio recorded so far is still recognized and the result returned to the input element. > WebKit/chromium/public/WebSpeechInputClientListener.h:44 > + virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); } > since these methods are implemented by WebKit, they should be pure virtual. I got comments from Marcus (above) to change from pure virtual to this model as he felt this was the new hotness, I'm happy to move back to pure virtual if you think that is the way to go. > what if there are multiple pages independently listening for speech > input? given these interfaces, it seems like one of the pages could > cancel the speech recognition process, hampering the efforts of the > other page. or, is that dealt with at a lower level within WebCore? In my understanding each WebViewImpl manages only a single page, but each page can have multiple speech enabled input elements. The code so far handles the multiple input element case by allowing only one input element to record at a time. Speech input code in the browser layer should handle the multiple page case, which is not done yet.
Darin Fisher (:fishd, Google)
Comment 9 2010-07-21 09:39:36 PDT
(In reply to comment #8) > (In reply to comment #7) > > Thanks for the comments, some questions/replies below. > > > WebKit/chromium/public/WebViewClient.h:340 > > + virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; } > > instead of introducing a new interface here, just add the methods > > from WebSpeechInputClient to WebViewClient. > > > > Then rename WebSpeechInputClientListener to WebSpeechInputListener. > > I see other features such as Geolocation have an interface (WebKit::WebGeolocationService) > and implemented in the chromium render process as a separate dispatcher-like class which > seemed clean. Just double checking with you if we are moving away from that model.. Yes, that's because WebGeolocationService has a lot of methods. I'm OK creating a separate interface for speech input if you think it is going to make things cleaner. I would just not call it WebSpeechInputClient. That name suggests that it is the "client" to a WebSpeechInput interface, which does not appear to exist. Instead, I would rename WebSpeechInputClient to WebSpeechInputService. Then I would rename WebSpeechInputClientListener to just WebSpeechInputListener. i.e., the speech input service can have listeners. > > WebKit/chromium/public/WebSpeechInputClientListener.h:49 > > + virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); } > > didCompleteRecognition? > > This method can potentially get called multiple times, if there are partial results > available as the user keeps speaking. The current name may be more suitable for such > cases, is it ok to leave it as is? OK, but please add comments explaining all of this. I found these interfaces to be very confusing when I first read them. > > WebKit/chromium/public/WebSpeechInputClient.h:45 > > + virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); } > > why do we need multiple listeners? > > We don't need, there should be a 1:1 mapping between the client and listener. > But since the client is fetched from the embedder on creation of WebViewImpl, > I was not sure how this attach can be done before that. Any suggestions for > making this better? You should pass the WebSpeechInputListener to the method that returns the WebSpeechInputService. It becomes like a "constructor parameter" in that case. > > WebKit/chromium/public/WebSpeechInputClient.h:57 > > + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } > > why do we need the recordingComplete (didCompleteRecording) event if > > there is an explicit stopRecording method? are there other ways that > > a recording could stop such that WebKit would require the notification > > that recording stopped? > > stopRecording() is an optional call, without that the browser's speech recording > 'endpointer' will detect silence in the input and stop recording automatically > once the user stops speaking. stopRecording() is there to let users explicitly > click the speech/mic button again in case they are not familiar with the workings > or for better feature discoverability/usability. > recordingComplete() will be issued in both these cases. The speech element in > WebKit explicitly needs to know when recording stops so it can update the UI and > indicate that recognition is in progress (which can take a while if done via a > speech recognition server). OK, please explain this in the comments. > > WebKit/chromium/public/WebSpeechInputClient.h:57 > > + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } > > should there be some way to start recording again? > > startRecognition() does that. This method is named stopRecording() and not > stopRecognition() because the audio recorded so far is still recognized and > the result returned to the input element. OK, please explain this in the comments. > > WebKit/chromium/public/WebSpeechInputClientListener.h:44 > > + virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); } > > since these methods are implemented by WebKit, they should be pure virtual. > > I got comments from Marcus (above) to change from pure virtual to this model > as he felt this was the new hotness, I'm happy to move back to pure virtual > if you think that is the way to go. The rule is very simple: use pure virtual for methods implemented by WebKit, but provide default implementations of methods implemented by the embedder. Please share this rule :) The point here is that we would prefer to always use pure virtual methods as it allows the compiler to catch cases in which we forget to implement a method. However, to make it easier to roll new versions of webkit into chromium, we provide default implementations for methods that chromium may need to implement. that way if the chromium side hasn't yet implemented the new method, things will still compile (although they might not function perfectly). > > what if there are multiple pages independently listening for speech > > input? given these interfaces, it seems like one of the pages could > > cancel the speech recognition process, hampering the efforts of the > > other page. or, is that dealt with at a lower level within WebCore? > > In my understanding each WebViewImpl manages only a single page, but each > page can have multiple speech enabled input elements. The code so far > handles the multiple input element case by allowing only one input element > to record at a time. Speech input code in the browser layer should handle > the multiple page case, which is not done yet. Makes sense.
Darin Fisher (:fishd, Google)
Comment 10 2010-07-21 09:43:03 PDT
(In reply to comment #9) > Instead, I would rename WebSpeechInputClient to WebSpeechInputService. Then > I would rename WebSpeechInputClientListener to just WebSpeechInputListener. Actually, even better: Instead of WebSpeechInputService, how about if we call it WebSpeechInputController? Since there is one of these per WebView, it seems wrong to add the "Service" suffix, and "Controller" seems like a more accurate description of its responsibility.
Satish Sampath
Comment 11 2010-07-21 13:31:42 PDT
Created attachment 62225 [details] Patch Thanks Darin. This latest patch addresses all your comments, renamed interfaces and methods, added comments.. Please take another look.
Darin Fisher (:fishd, Google)
Comment 12 2010-07-21 13:40:04 PDT
Comment on attachment 62225 [details] Patch I think you should similarly rename the WebCore classes, but perhaps that can be done in a separate patch. Looking at the WebCore level, you have several classes (in WebCore/page/): SpeechInput SpeechInputListener SpeechInputClient SpeechInputClientListener Why so many classes? SpeechInputListener and SpeechInputClientListener seem to be identical!
Satish Sampath
Comment 13 2010-07-21 14:06:04 PDT
The original idea was to have more functionality in the SpeechInput class and expose a subset to SpeechInputListener, but it has turned out to be a simple proxy now. In a subsequent patch I was planning to remove SpeechInput and SpeechInputListener , rename SpeechInputClientListener->SpeechInputListener and get InputFieldSpeechButtonElement implement that directly. Is that ok?
Satish Sampath
Comment 14 2010-07-23 09:18:59 PDT
Created attachment 62432 [details] Patch Removed unused headers and added a missing return statement.
Darin Fisher (:fishd, Google)
Comment 15 2010-07-23 10:08:11 PDT
Comment on attachment 62432 [details] Patch WebKit/chromium/public/WebViewClient.h:341 + virtual WebSpeechInputController* createSpeechInputController( this method is named create*, yet it returns an interface that defines a protected virtual destructor, so i have no way to destroy the thing that was created for me. perhaps you don't intend to allocate a new WebSpeechInputController each time this method is called? if that is the case, then this method should just have the create prefix dropped. it should just be named "speechInputController". cleaning up the webcore classes in a separate patch is completely fine. my concern with this patch is that the interfaces should match up well with what we want the webcore classes to look like. i'm happy if that's the plan. R=me with createSpeechInputController renamed to speechInputController.
Satish Sampath
Comment 16 2010-07-23 11:38:28 PDT
Created attachment 62451 [details] Patch Removed 'create' from method name as suggested. Could you also please give me a commit-queue+ since I am not a committer yet?
Steve Block
Comment 17 2010-07-26 02:28:15 PDT
Comment on attachment 62451 [details] Patch r=me, given Darin's earlier review
WebKit Commit Bot
Comment 18 2010-07-26 03:32:27 PDT
Comment on attachment 62451 [details] Patch Rejecting patch 62451 from commit-queue. Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--ignore-tests', 'compositing', '--quiet']" exit_code: 1 Running build-dumprendertree Compiling Java tests make: Nothing to be done for `default'. Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests Testing 20718 test cases. fast/loader/recursive-before-unload-crash.html -> failed Exiting early after 1 failures. 14202 tests run. 214.25s total testing time 14201 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 1 test case (<1%) had stderr output Full output: http://queues.webkit.org/results/3585565
Steve Block
Comment 19 2010-07-26 04:08:56 PDT
Comment on attachment 62451 [details] Patch Observed LayoutTest crash is unrelated to this patch, trying again.
WebKit Commit Bot
Comment 20 2010-07-26 04:53:02 PDT
Comment on attachment 62451 [details] Patch Clearing flags on attachment: 62451 Committed r64042: <http://trac.webkit.org/changeset/64042>
WebKit Commit Bot
Comment 21 2010-07-26 04:53:08 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.