Summary: | Speech input plumbing in webkit | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Satish Sampath <satish> | ||||||||||||
Component: | Forms | Assignee: | Nobody <webkit-unassigned> | ||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||
Severity: | Normal | CC: | bulach, commit-queue, dglazkov, fishd, jorlow, webkit.review.bot | ||||||||||||
Priority: | P2 | ||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | All | ||||||||||||||
Bug Depends on: | |||||||||||||||
Bug Blocks: | 39485 | ||||||||||||||
Attachments: |
|
Description
Satish Sampath
2010-07-15 06:43:54 PDT
Created attachment 61666 [details]
Patch
Attachment 61666 [details] did not build on chromium: Build output: http://webkit-commit-queue.appspot.com/results/3558030 Darin, can you please review the WebKit part? Marcus, can you please review the entire thing? Doh, didn't add the new files to chromium/WebKit.gyp (since I tested on windows in local machine and added manually to the .vcproj). I'll add them along with the review comments once they come in. (In reply to comment #4) > Doh, didn't add the new files to chromium/WebKit.gyp (since I tested on windows in local machine and added manually to the .vcproj). I'll add them along with the review comments once they come in. Hi Satish, I can't do normative reviews yet, so here's some drive-by comments: 39 class WebSpeechInputClient { 40 public: 41 // Attaches a listener which receives future callbacks. 42 // Returns false if a listener was already attached. 43 virtual bool attachListener(WebSpeechInputClientListener* listener) = 0; I think the style is now to avoid pure virtuals and have: WEBKIT_ASSERT_NOT_REACHED(); see: http://trac.webkit.org/browser/trunk/WebKit/chromium/public/WebIDBCallbacks.h Also, no need to name the parameter when it's obvious from the type (please, check all other headers). 39 class WebSpeechInputClientListener { As above. 47 class SpeechInputClientImpl both file / classname should be prefixed with "Web", since they're in WebKit namespace. 43 SpeechInputClientImpl::SpeechInputClientImpl(WebSpeechInputClient* client) use initializer for m_client, and also zero-initialize m_listener! 64 WebCore::SpeechInputClientListener* m_listener; // Valid when recognition is in progress. This should probably be a WebPrivatePtr: http://trac.webkit.org/browser/trunk/WebKit/chromium/public/WebSerializedScriptValue.h 71 void SpeechInputClientImpl::recordingComplete() 72 { 73 ASSERT(m_listener); 74 if (m_listener) no need for the if since the assert will terminate. 78 void SpeechInputClientImpl::setRecognitionResult(const WebString& result) 79 { 80 ASSERT(m_listener); 81 if (m_listener) as above. 67 } // namespace WebCore 68 69 #endif // ENABLE(INPUT_SPEECH) this is closing WebKit, not WebCore namespace. Created attachment 61809 [details]
Patch
Addressed all of Marcus's comments, except changing the name of SpeechInputClientImpl to WebSpeechInputClientImpl. It looks to me that the 'Web' prefix is used primarily for interfaces/classes exposed from WebKit via the headers in chromium/public directory. Majority of classes in chromium/src doesn't have the 'Web' prefix and most of those which do have the prefix are implementations of classes exposed in the chromium/public directory. I'm happy to change the name if my understanding is incorrect and there is a strong reason to have the 'Web' prefix even for classes internal to WebKit.
Comment on attachment 61809 [details]
Patch
WebKit/chromium/public/WebViewClient.h:340
+ virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; }
indentation is wrong. no need for WebKit:: prefix
WebKit/chromium/public/WebViewClient.h:340
+ virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; }
instead of introducing a new interface here, just add the methods
from WebSpeechInputClient to WebViewClient.
Then rename WebSpeechInputClientListener to WebSpeechInputListener.
WebKit/chromium/public/WebSpeechInputClientListener.h:44
+ virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); }
nit: didCompleteRecording
WebKit/chromium/public/WebSpeechInputClientListener.h:49
+ virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); }
didCompleteRecognition?
WebKit/chromium/public/WebSpeechInputClient.h:45
+ virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); }
why do we need multiple listeners?
WebKit/chromium/public/WebSpeechInputClient.h:57
+ virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
why do we need the recordingComplete (didCompleteRecording) event if
there is an explicit stopRecording method? are there other ways that
a recording could stop such that WebKit would require the notification
that recording stopped?
WebKit/chromium/public/WebSpeechInputClient.h:53
+ virtual bool startRecognition() { WEBKIT_ASSERT_NOT_REACHED(); }
nit: might be nice to put the startRecognition / cancelRecognition methods
next to each other since they seem related.
WebKit/chromium/public/WebSpeechInputClient.h:57
+ virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
should there be some way to start recording again?
WebKit/chromium/public/WebSpeechInputClientListener.h:44
+ virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); }
since these methods are implemented by WebKit, they should be pure virtual.
what if there are multiple pages independently listening for speech
input? given these interfaces, it seems like one of the pages could
cancel the speech recognition process, hampering the efforts of the
other page. or, is that dealt with at a lower level within WebCore?
(In reply to comment #7) Thanks for the comments, some questions/replies below. > WebKit/chromium/public/WebViewClient.h:340 > + virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; } > instead of introducing a new interface here, just add the methods > from WebSpeechInputClient to WebViewClient. > > Then rename WebSpeechInputClientListener to WebSpeechInputListener. I see other features such as Geolocation have an interface (WebKit::WebGeolocationService) and implemented in the chromium render process as a separate dispatcher-like class which seemed clean. Just double checking with you if we are moving away from that model.. > WebKit/chromium/public/WebSpeechInputClientListener.h:49 > + virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); } > didCompleteRecognition? This method can potentially get called multiple times, if there are partial results available as the user keeps speaking. The current name may be more suitable for such cases, is it ok to leave it as is? > WebKit/chromium/public/WebSpeechInputClient.h:45 > + virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); } > why do we need multiple listeners? We don't need, there should be a 1:1 mapping between the client and listener. But since the client is fetched from the embedder on creation of WebViewImpl, I was not sure how this attach can be done before that. Any suggestions for making this better? > WebKit/chromium/public/WebSpeechInputClient.h:57 > + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } > why do we need the recordingComplete (didCompleteRecording) event if > there is an explicit stopRecording method? are there other ways that > a recording could stop such that WebKit would require the notification > that recording stopped? stopRecording() is an optional call, without that the browser's speech recording 'endpointer' will detect silence in the input and stop recording automatically once the user stops speaking. stopRecording() is there to let users explicitly click the speech/mic button again in case they are not familiar with the workings or for better feature discoverability/usability. recordingComplete() will be issued in both these cases. The speech element in WebKit explicitly needs to know when recording stops so it can update the UI and indicate that recognition is in progress (which can take a while if done via a speech recognition server). > WebKit/chromium/public/WebSpeechInputClient.h:57 > + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } > should there be some way to start recording again? startRecognition() does that. This method is named stopRecording() and not stopRecognition() because the audio recorded so far is still recognized and the result returned to the input element. > WebKit/chromium/public/WebSpeechInputClientListener.h:44 > + virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); } > since these methods are implemented by WebKit, they should be pure virtual. I got comments from Marcus (above) to change from pure virtual to this model as he felt this was the new hotness, I'm happy to move back to pure virtual if you think that is the way to go. > what if there are multiple pages independently listening for speech > input? given these interfaces, it seems like one of the pages could > cancel the speech recognition process, hampering the efforts of the > other page. or, is that dealt with at a lower level within WebCore? In my understanding each WebViewImpl manages only a single page, but each page can have multiple speech enabled input elements. The code so far handles the multiple input element case by allowing only one input element to record at a time. Speech input code in the browser layer should handle the multiple page case, which is not done yet. (In reply to comment #8) > (In reply to comment #7) > > Thanks for the comments, some questions/replies below. > > > WebKit/chromium/public/WebViewClient.h:340 > > + virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; } > > instead of introducing a new interface here, just add the methods > > from WebSpeechInputClient to WebViewClient. > > > > Then rename WebSpeechInputClientListener to WebSpeechInputListener. > > I see other features such as Geolocation have an interface (WebKit::WebGeolocationService) > and implemented in the chromium render process as a separate dispatcher-like class which > seemed clean. Just double checking with you if we are moving away from that model.. Yes, that's because WebGeolocationService has a lot of methods. I'm OK creating a separate interface for speech input if you think it is going to make things cleaner. I would just not call it WebSpeechInputClient. That name suggests that it is the "client" to a WebSpeechInput interface, which does not appear to exist. Instead, I would rename WebSpeechInputClient to WebSpeechInputService. Then I would rename WebSpeechInputClientListener to just WebSpeechInputListener. i.e., the speech input service can have listeners. > > WebKit/chromium/public/WebSpeechInputClientListener.h:49 > > + virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); } > > didCompleteRecognition? > > This method can potentially get called multiple times, if there are partial results > available as the user keeps speaking. The current name may be more suitable for such > cases, is it ok to leave it as is? OK, but please add comments explaining all of this. I found these interfaces to be very confusing when I first read them. > > WebKit/chromium/public/WebSpeechInputClient.h:45 > > + virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); } > > why do we need multiple listeners? > > We don't need, there should be a 1:1 mapping between the client and listener. > But since the client is fetched from the embedder on creation of WebViewImpl, > I was not sure how this attach can be done before that. Any suggestions for > making this better? You should pass the WebSpeechInputListener to the method that returns the WebSpeechInputService. It becomes like a "constructor parameter" in that case. > > WebKit/chromium/public/WebSpeechInputClient.h:57 > > + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } > > why do we need the recordingComplete (didCompleteRecording) event if > > there is an explicit stopRecording method? are there other ways that > > a recording could stop such that WebKit would require the notification > > that recording stopped? > > stopRecording() is an optional call, without that the browser's speech recording > 'endpointer' will detect silence in the input and stop recording automatically > once the user stops speaking. stopRecording() is there to let users explicitly > click the speech/mic button again in case they are not familiar with the workings > or for better feature discoverability/usability. > recordingComplete() will be issued in both these cases. The speech element in > WebKit explicitly needs to know when recording stops so it can update the UI and > indicate that recognition is in progress (which can take a while if done via a > speech recognition server). OK, please explain this in the comments. > > WebKit/chromium/public/WebSpeechInputClient.h:57 > > + virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); } > > should there be some way to start recording again? > > startRecognition() does that. This method is named stopRecording() and not > stopRecognition() because the audio recorded so far is still recognized and > the result returned to the input element. OK, please explain this in the comments. > > WebKit/chromium/public/WebSpeechInputClientListener.h:44 > > + virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); } > > since these methods are implemented by WebKit, they should be pure virtual. > > I got comments from Marcus (above) to change from pure virtual to this model > as he felt this was the new hotness, I'm happy to move back to pure virtual > if you think that is the way to go. The rule is very simple: use pure virtual for methods implemented by WebKit, but provide default implementations of methods implemented by the embedder. Please share this rule :) The point here is that we would prefer to always use pure virtual methods as it allows the compiler to catch cases in which we forget to implement a method. However, to make it easier to roll new versions of webkit into chromium, we provide default implementations for methods that chromium may need to implement. that way if the chromium side hasn't yet implemented the new method, things will still compile (although they might not function perfectly). > > what if there are multiple pages independently listening for speech > > input? given these interfaces, it seems like one of the pages could > > cancel the speech recognition process, hampering the efforts of the > > other page. or, is that dealt with at a lower level within WebCore? > > In my understanding each WebViewImpl manages only a single page, but each > page can have multiple speech enabled input elements. The code so far > handles the multiple input element case by allowing only one input element > to record at a time. Speech input code in the browser layer should handle > the multiple page case, which is not done yet. Makes sense. (In reply to comment #9) > Instead, I would rename WebSpeechInputClient to WebSpeechInputService. Then > I would rename WebSpeechInputClientListener to just WebSpeechInputListener. Actually, even better: Instead of WebSpeechInputService, how about if we call it WebSpeechInputController? Since there is one of these per WebView, it seems wrong to add the "Service" suffix, and "Controller" seems like a more accurate description of its responsibility. Created attachment 62225 [details]
Patch
Thanks Darin. This latest patch addresses all your comments, renamed interfaces and methods, added comments.. Please take another look.
Comment on attachment 62225 [details]
Patch
I think you should similarly rename the WebCore classes, but perhaps that can be
done in a separate patch.
Looking at the WebCore level, you have several classes (in WebCore/page/):
SpeechInput
SpeechInputListener
SpeechInputClient
SpeechInputClientListener
Why so many classes? SpeechInputListener and SpeechInputClientListener seem
to be identical!
The original idea was to have more functionality in the SpeechInput class and expose a subset to SpeechInputListener, but it has turned out to be a simple proxy now. In a subsequent patch I was planning to remove SpeechInput and SpeechInputListener , rename SpeechInputClientListener->SpeechInputListener and get InputFieldSpeechButtonElement implement that directly. Is that ok? Created attachment 62432 [details]
Patch
Removed unused headers and added a missing return statement.
Comment on attachment 62432 [details]
Patch
WebKit/chromium/public/WebViewClient.h:341
+ virtual WebSpeechInputController* createSpeechInputController(
this method is named create*, yet it returns an interface that defines
a protected virtual destructor, so i have no way to destroy the thing
that was created for me. perhaps you don't intend to allocate a new
WebSpeechInputController each time this method is called? if that is
the case, then this method should just have the create prefix dropped.
it should just be named "speechInputController".
cleaning up the webcore classes in a separate patch is completely fine.
my concern with this patch is that the interfaces should match up well
with what we want the webcore classes to look like. i'm happy if that's
the plan.
R=me with createSpeechInputController renamed to speechInputController.
Created attachment 62451 [details]
Patch
Removed 'create' from method name as suggested. Could you also please give me a commit-queue+ since I am not a committer yet?
Comment on attachment 62451 [details]
Patch
r=me, given Darin's earlier review
Comment on attachment 62451 [details] Patch Rejecting patch 62451 from commit-queue. Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--ignore-tests', 'compositing', '--quiet']" exit_code: 1 Running build-dumprendertree Compiling Java tests make: Nothing to be done for `default'. Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests Testing 20718 test cases. fast/loader/recursive-before-unload-crash.html -> failed Exiting early after 1 failures. 14202 tests run. 214.25s total testing time 14201 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 1 test case (<1%) had stderr output Full output: http://queues.webkit.org/results/3585565 Comment on attachment 62451 [details]
Patch
Observed LayoutTest crash is unrelated to this patch, trying again.
Comment on attachment 62451 [details] Patch Clearing flags on attachment: 62451 Committed r64042: <http://trac.webkit.org/changeset/64042> All reviewed patches have been landed. Closing bug. |