Bug 81667 - Speech JavaScript API: Plumbing for Chromium
Summary: Speech JavaScript API: Plumbing for Chromium
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Hans Wennborg
URL:
Keywords:
Depends on: 83126
Blocks: 80261
  Show dependency treegraph
 
Reported: 2012-03-20 08:23 PDT by Hans Wennborg
Modified: 2012-04-04 03:57 PDT (History)
8 users (show)

See Also:


Attachments
Patch (40.57 KB, patch)
2012-03-20 08:38 PDT, Hans Wennborg
no flags Details | Formatted Diff | Diff
Patch (40.67 KB, patch)
2012-03-21 08:10 PDT, Hans Wennborg
no flags Details | Formatted Diff | Diff
Patch (42.76 KB, patch)
2012-03-23 07:41 PDT, Hans Wennborg
no flags Details | Formatted Diff | Diff
Patch (49.89 KB, patch)
2012-03-26 07:44 PDT, Hans Wennborg
no flags Details | Formatted Diff | Diff
Patch (54.64 KB, patch)
2012-03-29 07:27 PDT, Hans Wennborg
no flags Details | Formatted Diff | Diff
Patch (54.12 KB, patch)
2012-03-30 07:11 PDT, Hans Wennborg
no flags Details | Formatted Diff | Diff
Patch (60.12 KB, patch)
2012-04-02 06:24 PDT, Hans Wennborg
no flags Details | Formatted Diff | Diff
Patch (58.32 KB, patch)
2012-04-03 05:56 PDT, Hans Wennborg
fishd: review+
fishd: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hans Wennborg 2012-03-20 08:23:08 PDT
Speech JavaScript API: Plumbing for Chromium
Comment 1 Hans Wennborg 2012-03-20 08:38:11 PDT
Created attachment 132832 [details]
Patch
Comment 2 WebKit Review Bot 2012-03-20 08:40:55 PDT
Please wait for approval from fishd@chromium.org before submitting because this patch contains changes to the Chromium public API.
Comment 3 Hans Wennborg 2012-03-20 08:42:10 PDT
I've uploaded this as a single patch so it's easy to see how it all fits together. The bulk of the patch is in SpeechRecognitionProxy. If desired, I could break out the other wrapper classes to a separate patch.
Comment 4 Darin Fisher (:fishd, Google) 2012-03-20 12:41:54 PDT
Comment on attachment 132832 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=132832&action=review

> Source/WebKit/chromium/public/WebSpeechGrammar.h:45
> +    WEBKIT_EXPORT WebURL src() const;

nit: src -> sourceURL ?

> Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:46
> +    virtual void visibilityHidden() { WEBKIT_ASSERT_NOT_REACHED(); }

is visibilityHidden a command?  it seems like it is a notification about the visibility state changing to hidden?

> Source/WebKit/chromium/public/WebViewClient.h:320
> +    // Scripted speech -----------------------------------------------------

does scripted speech really require a different section?  can we just extend
the "Speech" section?

> Source/WebKit/chromium/public/WebViewClient.h:323
> +    virtual WebSpeechRecognitionClient* speechRecognitionClient() { return 0; }

is the caller expected to delete this pointer when they are done with it?  i'm guessing
the answer is NO since this function is not named "createFoo".  assuming the answer is
NO, then WebSpeechRecognitionClient should have a protected destructor to prevent unwanted
deletion through this pointer.

> Source/WebKit/chromium/src/SpeechRecognitionClientProxy.cpp:50
> +    if (!m_recognitionMap.contains(recognition)) {

usually we put ID maps like this on the chromium side?  does WebKit need
this map?  it seems like it might be better to create an API wrapper for
WebSpeechRecognition instead.

the WebKit API is supposed to be a thin layer between chromium and webcore.
if chromium needs IDs for IPC purposes, then that is a chromium concern,
not a webkit concern.
Comment 5 Hans Wennborg 2012-03-21 08:09:57 PDT
(In reply to comment #4)
> (From update of attachment 132832 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=132832&action=review

Thank you very much for the review!

> > Source/WebKit/chromium/public/WebSpeechGrammar.h:45
> > +    WEBKIT_EXPORT WebURL src() const;
> 
> nit: src -> sourceURL ?

It's called src() in the wrapped object, which in turn matches the IDL.
I'm happy to rename if you think sourceURL is preferable, though.

> > Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:46
> > +    virtual void visibilityHidden() { WEBKIT_ASSERT_NOT_REACHED(); }
> 
> is visibilityHidden a command?  it seems like it is a notification about the visibility state changing to hidden?

You're right. Renaming to notifyVisibilityHidden(). Happy to accept better naming suggestions too :)

> > Source/WebKit/chromium/public/WebViewClient.h:320
> > +    // Scripted speech -----------------------------------------------------
> 
> does scripted speech really require a different section?  can we just extend
> the "Speech" section?

Done, putting it in the existing speech section.

> > Source/WebKit/chromium/public/WebViewClient.h:323
> > +    virtual WebSpeechRecognitionClient* speechRecognitionClient() { return 0; }
> 
> is the caller expected to delete this pointer when they are done with it?  i'm guessing
> the answer is NO since this function is not named "createFoo".  assuming the answer is
> NO, then WebSpeechRecognitionClient should have a protected destructor to prevent unwanted
> deletion through this pointer.

Nice catch! Done.

> > Source/WebKit/chromium/src/SpeechRecognitionClientProxy.cpp:50
> > +    if (!m_recognitionMap.contains(recognition)) {
> 
> usually we put ID maps like this on the chromium side?  does WebKit need
> this map?  it seems like it might be better to create an API wrapper for
> WebSpeechRecognition instead.
> 
> the WebKit API is supposed to be a thin layer between chromium and webcore.
> if chromium needs IDs for IPC purposes, then that is a chromium concern,
> not a webkit concern.

Yes, passing a pointer to a WebSpeechRecognition object rather than an id seems nicer, I'll do that.

I still need to keep track between the mapping between the SpeechRecognition objects and the wrappers, though. It's important that that same wrapper gets used e.g. for a start() and a stop() call.
Comment 6 Hans Wennborg 2012-03-21 08:10:25 PDT
Created attachment 133043 [details]
Patch
Comment 7 WebKit Review Bot 2012-03-21 08:13:36 PDT
Please wait for approval from abarth@webkit.org, dglazkov@chromium.org, fishd@chromium.org, jamesr@chromium.org or tkent@chromium.org before submitting, as this patch contains changes to the Chromium public API. See also https://trac.webkit.org/wiki/ChromiumWebKitAPI.
Comment 8 Darin Fisher (:fishd, Google) 2012-03-22 10:00:27 PDT
Comment on attachment 133043 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=133043&action=review

> Source/WebKit/chromium/public/WebSpeechGrammar.h:45
> +    WEBKIT_EXPORT WebURL src() const;

I understand that this is just a reflection of a DOM attribute.  What is
the purpose of exposing this attribute?  Are you going to use it for
security decisions?  Or, will you do some resource loading externally to
WebKit?

> Source/WebKit/chromium/public/WebSpeechRecognition.h:43
> +#if WEBKIT_IMPLEMENTATION

nit: we usually put WEBKIT_IMPLEMENTATION sections at the end of the 'public' section.

> Source/WebKit/chromium/public/WebSpeechRecognition.h:47
> +    WEBKIT_EXPORT void audioStartCallback();

nit: how about giving these "event style" names?

  didStartAudio
  didStartSound
  didStartSpeech
  didEndSpeech
  didEndAudio
  ...

It is not quite clear to me how to rename resultCallback or
noMatchCallback or resultDeletedCallback.  Are those three
possible responses to something?

It would also be helpful to document some of the event flow here.
How is the embedder expected to call these methods?

My point is that I think you should do more than just cleave your
code into two pieces.  I think you should try make the API make
sense on its own.  Imagine someone is trying to hook this up to
a different embedder.  What do they need to know?  Please explain
with comments and helpful function names.

> Source/WebKit/chromium/public/WebSpeechRecognition.h:60
> +    WebCore::SpeechRecognition* m_speechRecognition;

are you sure this shouldn't be a WebPrivatePtr?  SpeechRecognition is
a reference counted class, right?

> Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:40
> +    virtual void start(WebSpeechRecognition*, const WebVector<WebSpeechGrammar>&, WebString language, bool continuous) { WEBKIT_ASSERT_NOT_REACHED(); }

are you sure you want to pass WebSpeechRecognition by pointer and not by const ref?
WebSpeechRecognition could just be a smart pointer around SpeechRecognition.

nit: "language" should be passed as "const WebString&"

nit: are you sure this interface wouldn't be better named WeBSpeechRecognitionController?  with methods like 'start', 'stop', and 'abort', it sure seems like this interface is more of a controller and less of a client.  A client is an interface that receives notifications, observes a system, updates UI, implements policy, etc.  A controller is an interface that takes action, drives a system, updates state, etc.

> Source/WebKit/chromium/src/SpeechRecognitionClientProxy.cpp:51
> +        m_recognitionMap.set(recognition, adoptPtr(new WebSpeechRecognition(recognition)));

I don't understand why you have a map here.  A WebSpeechRecognition should just be
a wrapper around a SpeechRecognition.  You should be able to map back-n-forth between
WebSpeechRecognition and SpeechRecognition just like we can map back-n-forth between
WebNode and Node.

To support an IDMap on the Chromium side, you can give WebSpeechRecognition an equals
and lessThan method, like we have for WebNode.  Also, you can define operator== and
operator<.  Then you can use it as a key in a map.  Would that address your concern?
Comment 9 Hans Wennborg 2012-03-23 07:40:24 PDT
(In reply to comment #8)
> (From update of attachment 133043 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=133043&action=review

Thanks for the review!

> > Source/WebKit/chromium/public/WebSpeechGrammar.h:45
> > +    WEBKIT_EXPORT WebURL src() const;
> 
> I understand that this is just a reflection of a DOM attribute.  What is
> the purpose of exposing this attribute?  Are you going to use it for
> security decisions?  Or, will you do some resource loading externally to
> WebKit?

The latter. We will pass this URL in our request to the speech server, which I suppose will then load it.

> > Source/WebKit/chromium/public/WebSpeechRecognition.h:43
> > +#if WEBKIT_IMPLEMENTATION
> 
> nit: we usually put WEBKIT_IMPLEMENTATION sections at the end of the 'public' section.

Done.

> > Source/WebKit/chromium/public/WebSpeechRecognition.h:47
> > +    WEBKIT_EXPORT void audioStartCallback();
> 
> nit: how about giving these "event style" names?
> 
>   didStartAudio
>   didStartSound
>   didStartSpeech
>   didEndSpeech
>   didEndAudio
>   ...
> 
> It is not quite clear to me how to rename resultCallback or
> noMatchCallback or resultDeletedCallback.  Are those three
> possible responses to something?

Yes, those three are called as results are streamed back from the speech server. (As per your next comment, I'll try to document this better.)

The flow is that the user calls start() on the SpeechRecognitionClient, which gets forwarded via the proxy to the embedder. The embedder then calls back via the WebSpeechRecognition interface: audioStartCallback(), speechStartCallback(), some resultCallback()s, etc. The callback methods correspond exactly to the "startaudio", "result", etc. events from the spec: http://speech-javascript-api-spec.googlecode.com/git/speechapi.html#speechreco-events

I agree that the function names aren't great. "didStartAudio", etc. sound better, but I don't know how "resultCallback" would be renamed to fit with that.

> It would also be helpful to document some of the event flow here.
> How is the embedder expected to call these methods?
> 
> My point is that I think you should do more than just cleave your
> code into two pieces.  I think you should try make the API make
> sense on its own.  Imagine someone is trying to hook this up to
> a different embedder.  What do they need to know?  Please explain
> with comments and helpful function names.

I've tried to add comments to make it more clear.

> > Source/WebKit/chromium/public/WebSpeechRecognition.h:60
> > +    WebCore::SpeechRecognition* m_speechRecognition;
> 
> are you sure this shouldn't be a WebPrivatePtr?  SpeechRecognition is
> a reference counted class, right?

Done.

> > Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:40
> > +    virtual void start(WebSpeechRecognition*, const WebVector<WebSpeechGrammar>&, WebString language, bool continuous) { WEBKIT_ASSERT_NOT_REACHED(); }
> 
> are you sure you want to pass WebSpeechRecognition by pointer and not by const ref?
> WebSpeechRecognition could just be a smart pointer around SpeechRecognition.

I was worried about how I would insert the WebRecognition objects in an IDMap Chromium-side, which is why I passed a pointer. As you pointed out below, this can be fixed, so I've changed the code to pass it by const ref.

> nit: "language" should be passed as "const WebString&"

Done.

> nit: are you sure this interface wouldn't be better named WeBSpeechRecognitionController?  with methods like 'start', 'stop', and 'abort', it sure seems like this interface is more of a controller and less of a client.  A client is an interface that receives notifications, observes a system, updates UI, implements policy, etc.  A controller is an interface that takes action, drives a system, updates state, etc.

Hmm, that does sound sensible. The problem is we already have a Controller class. In WebCore, there is both a SpeechRecognitionController object and the SpeechRecognitionClient interface. This was to squeeze the design into the Supplement pattern similarly to UserMediaController and UserMediaClient from http://trac.webkit.org/changeset/108437).

> > Source/WebKit/chromium/src/SpeechRecognitionClientProxy.cpp:51
> > +        m_recognitionMap.set(recognition, adoptPtr(new WebSpeechRecognition(recognition)));
> 
> I don't understand why you have a map here.  A WebSpeechRecognition should just be
> a wrapper around a SpeechRecognition.  You should be able to map back-n-forth between
> WebSpeechRecognition and SpeechRecognition just like we can map back-n-forth between
> WebNode and Node.
> 
> To support an IDMap on the Chromium side, you can give WebSpeechRecognition an equals
> and lessThan method, like we have for WebNode.  Also, you can define operator== and
> operator<.  Then you can use it as a key in a map.  Would that address your concern?

Yes, that addresses my concerns and makes everything much nicer :) Thanks for helping me get this straight.
Comment 10 Hans Wennborg 2012-03-23 07:41:01 PDT
Created attachment 133484 [details]
Patch
Comment 11 Satish Sampath 2012-03-23 07:50:06 PDT
Comment on attachment 133043 [details]
Patch

> I agree that the function names aren't great. "didStartAudio", etc. sound better, but I don't know how "resultCallback" would be renamed to fit with that.

In this case didReceiveResult, didReceiveNoMatch, didDeletePreviousResult and didReceiveError sound good to me.
I think there would be a bit of confusion between didStart, didStartAudio, didStartSound and didStartSpeech for a first time viewer so documenting them with a call flow and url to the spec would be useful.
Comment 12 Darin Fisher (:fishd, Google) 2012-03-23 17:32:58 PDT
(In reply to comment #11)
> (From update of attachment 133043 [details])
> > I agree that the function names aren't great. "didStartAudio", etc. sound better, but I don't know how "resultCallback" would be renamed to fit with that.
> 
> In this case didReceiveResult, didReceiveNoMatch, didDeletePreviousResult and didReceiveError sound good to me.
> I think there would be a bit of confusion between didStart, didStartAudio, didStartSound and didStartSpeech for a first time viewer so documenting them with a call flow and url to the spec would be useful.

^^^ I like Satish's suggestions.
Comment 13 Hans Wennborg 2012-03-26 07:44:12 PDT
(In reply to comment #11)
> (From update of attachment 133043 [details])
> > I agree that the function names aren't great. "didStartAudio", etc. sound better, but I don't know how "resultCallback" would be renamed to fit with that.
> 
> In this case didReceiveResult, didReceiveNoMatch, didDeletePreviousResult and didReceiveError sound good to me.
Done.

> I think there would be a bit of confusion between didStart, didStartAudio, didStartSound and didStartSpeech for a first time viewer so documenting them with a call flow and url to the spec would be useful.
Done.


(In reply to comment #12)
> ^^^ I like Satish's suggestions.
Done :)
Comment 14 Hans Wennborg 2012-03-26 07:44:39 PDT
Created attachment 133811 [details]
Patch
Comment 15 Darin Fisher (:fishd, Google) 2012-03-28 10:20:12 PDT
Comment on attachment 133811 [details]
Patch

Sorry for the high latency in getting feedback to you.  I think there are some
pretty considerable design problems in this API that should be resolved.


View in context: https://bugs.webkit.org/attachment.cgi?id=133811&action=review

> Source/WebKit/chromium/public/WebSpeechRecognition.h:42
> +// Wrapper around a SpeechRecognition pointer.

nit: this comment is not very helpful to WebKit API users.  the point of the
API is to hide WebCore details.  it would be better to have a comment that
explains what a WebSpeechRecognition object is.  it seems to be an object that
represents a "notification sink" for events related to a speech recognition request.
i think the class is probably poorly named since its name doesn't really match up
with its function.

> Source/WebKit/chromium/public/WebSpeechRecognition.h:107
> +    WEBKIT_EXPORT void didReceiveError(const WebString& msg, unsigned short code);

nit: spell it out... msg -> message

> Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:38
> +class WebSpeechRecognitionClient {

i'm still not sure why this is named with a Client suffix.  its methods are clearly
about controlling something...

> Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:49
> +    virtual void notifyVisibilityHidden() { WEBKIT_ASSERT_NOT_REACHED(); }

i don't understand why this method exists.  the embedder already knows when the
page is hidden.  afterall, it is the embedder who communicates that to WebKit
via WebView::setVisibilityState.  Why does the embedder need WebKit to tell it
what it already knows?

perhaps you are just trying to add a method to tell the embedder to abort any
ongoing speech recognition?  in that case, the method name should be specific
to that.
Comment 16 Hans Wennborg 2012-03-29 07:26:54 PDT
(In reply to comment #15)
> (From update of attachment 133811 [details])
> Sorry for the high latency in getting feedback to you.  I think there are some
> pretty considerable design problems in this API that should be resolved.

Thanks for the comments. I've tried to rework the patch so that the API makes sense on its own rather than just wrap WebCore objects and take names based on that.

The two most interesting classes in the API are now:

 - WebSpeechRecognizer: the interface used for starting and stopping recognition,
   which should be implemented by the embedder. (This could be renamed to
   WebSpeechRecognitionController if you prefer.)

 - WebSpeechRecognitionClient: a client that is passed as an argument to the
   start, stop, and abort functions; used for reporting on progress.

Hopefully this makes things more clear.


> View in context: https://bugs.webkit.org/attachment.cgi?id=133811&action=review
> 
> > Source/WebKit/chromium/public/WebSpeechRecognition.h:42
> > +// Wrapper around a SpeechRecognition pointer.
> 
> nit: this comment is not very helpful to WebKit API users.  the point of the
> API is to hide WebCore details.  it would be better to have a comment that
> explains what a WebSpeechRecognition object is.  it seems to be an object that
> represents a "notification sink" for events related to a speech recognition request.
> i think the class is probably poorly named since its name doesn't really match up
> with its function.
This class has been renamed to WebSpeechRecognitionClient.
 
> > Source/WebKit/chromium/public/WebSpeechRecognition.h:107
> > +    WEBKIT_EXPORT void didReceiveError(const WebString& msg, unsigned short code);
> 
> nit: spell it out... msg -> message
Done.

> > Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:38
> > +class WebSpeechRecognitionClient {
> 
> i'm still not sure why this is named with a Client suffix.  its methods are clearly
> about controlling something...
It's now WebSpeechRecognizer.

> > Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:49
> > +    virtual void notifyVisibilityHidden() { WEBKIT_ASSERT_NOT_REACHED(); }
> 
> i don't understand why this method exists.  the embedder already knows when the
> page is hidden.  afterall, it is the embedder who communicates that to WebKit
> via WebView::setVisibilityState.  Why does the embedder need WebKit to tell it
> what it already knows?
You're right, this can be handled in the embedder. Removing the function.
Comment 17 Hans Wennborg 2012-03-29 07:27:24 PDT
Created attachment 134578 [details]
Patch
Comment 18 Darin Fisher (:fishd, Google) 2012-03-29 11:42:11 PDT
Comment on attachment 134578 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=134578&action=review

> Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:60
> +    WEBKIT_EXPORT bool equals(const WebSpeechRecognitionClient&) const;

I think it probably makes more sense for this to be an interface and for the
recognizer to be instantiated each time you need to start a speech recognition
task.

> Source/WebKit/chromium/public/WebSpeechRecognitionParameters.h:39
> +    WebSpeechRecognitionParameters(const WebVector<WebSpeechGrammar>&, const WebString& language, bool continuous);

nit: elsewhere, we use a suffix of "Params"

> Source/WebKit/chromium/public/WebSpeechRecognitionParameters.h:41
> +    WEBKIT_EXPORT const WebVector<WebSpeechGrammar>& grammars() const;

nit: here you can just define these as inline getters since they just simply
return member variables.

> Source/WebKit/chromium/public/WebViewClient.h:321
> +    virtual WebSpeechRecognizer* speechRecognizer() { return 0; }

I really like the WebSpeechRecognizer name!  However, maybe this should return
an instance and be named createSpeechRecognizer?

Maybe WebSpeechRecogitionClient should be named WebSpeechRecognizerClient, and
it should also be an interface, which gets passed to createSpeechRecognizer.
Comment 19 Hans Wennborg 2012-03-30 07:11:11 PDT
(In reply to comment #18)
> (From update of attachment 134578 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=134578&action=review
> 
> > Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:60
> > +    WEBKIT_EXPORT bool equals(const WebSpeechRecognitionClient&) const;
> 
> I think it probably makes more sense for this to be an interface and for the
> recognizer to be instantiated each time you need to start a speech recognition
> task.

Hmm, I'm not sure about this one. If we instantiate a new recognizer each time we start a recognition, we need to have a mapping from SpeechRecognition objects to WebSpeechRecognizer, so that a stop() call can be propagated to the correct recognizer.

That brings us back to the state we had before, with SpeechRecognitionClientProxy having a HashMap member and logic to maintain the mapping. It would also need to manage the lifetime of the recognizers and destroy them at the right moment.

It seems to me that this brings back some of the problems from previous versions of this patch. Not having to worry about any mappings or lifetime issues, but shifting that responsibility to the embedder where it can be done easily since it know exactly how long it needs to keep things alive (until it has sent the last event).

> > Source/WebKit/chromium/public/WebSpeechRecognitionParameters.h:39
> > +    WebSpeechRecognitionParameters(const WebVector<WebSpeechGrammar>&, const WebString& language, bool continuous);
> 
> nit: elsewhere, we use a suffix of "Params"
Done.

> > Source/WebKit/chromium/public/WebSpeechRecognitionParameters.h:41
> > +    WEBKIT_EXPORT const WebVector<WebSpeechGrammar>& grammars() const;
> 
> nit: here you can just define these as inline getters since they just simply
> return member variables.
Done.

> > Source/WebKit/chromium/public/WebViewClient.h:321
> > +    virtual WebSpeechRecognizer* speechRecognizer() { return 0; }
> 
> I really like the WebSpeechRecognizer name!
So do I :)

> However, maybe this should return
> an instance and be named createSpeechRecognizer?
I think creating multiple instances of WebSpeechRecognizer would complicate things, as mentioned above.

> Maybe WebSpeechRecogitionClient should be named WebSpeechRecognizerClient
Done.

> , and
> it should also be an interface, which gets passed to createSpeechRecognizer.
I'm not sure about this one, as mentioned above.
Comment 20 Hans Wennborg 2012-03-30 07:11:44 PDT
Created attachment 134803 [details]
Patch
Comment 21 Darin Fisher (:fishd, Google) 2012-03-30 10:03:45 PDT
(In reply to comment #19)
> (In reply to comment #18)
> > (From update of attachment 134578 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=134578&action=review
> > 
> > > Source/WebKit/chromium/public/WebSpeechRecognitionClient.h:60
> > > +    WEBKIT_EXPORT bool equals(const WebSpeechRecognitionClient&) const;
> > 
> > I think it probably makes more sense for this to be an interface and for the
> > recognizer to be instantiated each time you need to start a speech recognition
> > task.
> 
> Hmm, I'm not sure about this one. If we instantiate a new recognizer each time we start a recognition, we need to have a mapping from SpeechRecognition objects to WebSpeechRecognizer, so that a stop() call can be propagated to the correct recognizer.
> 
> That brings us back to the state we had before, with SpeechRecognitionClientProxy having a HashMap member and logic to maintain the mapping. It would also need to manage the lifetime of the recognizers and destroy them at the right moment.
> 
> It seems to me that this brings back some of the problems from previous versions of this patch. Not having to worry about any mappings or lifetime issues, but shifting that responsibility to the embedder where it can be done easily since it know exactly how long it needs to keep things alive (until it has sent the last event).

OK, then how about defining WebSpeechRecognizer and WebSpeechRecognizerClient as interfaces, but keep WebSpeechRecognizer as a singleton, and then introduce WebSpeechRecogition as a wrapper for SpeechRecognition?  WebSpeechRecognition would just serve as a handle to idenitfy a recognition task.  It would not have any other state of methods.  I think WebSpeechRecognizerClient could be implemented by SpeechRecognitionClientProxy.
Comment 22 Hans Wennborg 2012-04-02 06:23:29 PDT
(In reply to comment #21)
> OK, then how about defining WebSpeechRecognizer and WebSpeechRecognizerClient as interfaces, but keep WebSpeechRecognizer as a singleton, and then introduce WebSpeechRecogition as a wrapper for SpeechRecognition?  WebSpeechRecognition would just serve as a handle to idenitfy a recognition task.  It would not have any other state of methods.  I think WebSpeechRecognizerClient could be implemented by SpeechRecognitionClientProxy.

Yes, this sound like a good approach! Patch coming up.
Comment 23 Hans Wennborg 2012-04-02 06:24:01 PDT
Created attachment 135084 [details]
Patch
Comment 24 Darin Fisher (:fishd, Google) 2012-04-02 09:56:33 PDT
Comment on attachment 135084 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=135084&action=review

Interface structure looks pretty good.  Just some minor nits now.

> Source/WebKit/chromium/public/WebSpeechRecognitionHandle.h:41
> +// A handle to a SpeechRecognition object. As long as there is a handle, the

nit: avoid making the user think about WebCore objects.  mentioning the SpeechRecognition
type in this comment breaks that rule.  I would just drop the first two sentences here,
and then I would change the third sentence to begin as "WebSpeechRecognitionHandle is
used by the WebSpeechRecognizer..."

> Source/WebKit/chromium/public/WebSpeechRecognitionParams.h:39
> +    WebSpeechRecognitionParams(const WebVector<WebSpeechGrammar>&, const WebString& language, bool continuous);

nit: implement the constructor inline or else it needs to be tagged with WEBKIT_EXPORT.  usually we just implement inline.

> Source/WebKit/chromium/public/WebSpeechRecognizer.h:41
> +    virtual void start(WebSpeechRecognitionHandle, const WebSpeechRecognitionParams&, WebSpeechRecognizerClient*) { WEBKIT_ASSERT_NOT_REACHED(); }

nit: pass handles as "const WebSpeechRecognitionHandle&" ... you should see other classes that are implemented using WebPrivatePtr being passed as const ref.

> Source/WebKit/chromium/public/WebSpeechRecognizerClient.h:44
> +    virtual void didStartAudio(WebSpeechRecognitionHandle&) = 0;

nit: pass handles as "const WebSpeechRecognitionHandle&"

> Source/WebKit/chromium/src/SpeechRecognitionClientProxy.cpp:76
> +    RefPtr<SpeechRecognition> recognition = static_cast<PassRefPtr<SpeechRecognition> >(handle);

the explicit cast to PassRefPtr<SpeechRecognition> should become unnecessary once you
change the argument type to "const WebSpeechRecognitionHandle&" as the conversion
operator defined on WebSpeechRecognitionHandle will then be used implicitly.
Comment 25 Hans Wennborg 2012-04-03 05:56:05 PDT
(In reply to comment #24)
> (From update of attachment 135084 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=135084&action=review
> 
> Interface structure looks pretty good.  Just some minor nits now.
> 
> > Source/WebKit/chromium/public/WebSpeechRecognitionHandle.h:41
> > +// A handle to a SpeechRecognition object. As long as there is a handle, the
> 
> nit: avoid making the user think about WebCore objects.  mentioning the SpeechRecognition
> type in this comment breaks that rule.  I would just drop the first two sentences here,
> and then I would change the third sentence to begin as "WebSpeechRecognitionHandle is
> used by the WebSpeechRecognizer..."
Done.

> > Source/WebKit/chromium/public/WebSpeechRecognitionParams.h:39
> > +    WebSpeechRecognitionParams(const WebVector<WebSpeechGrammar>&, const WebString& language, bool continuous);
> 
> nit: implement the constructor inline or else it needs to be tagged with WEBKIT_EXPORT.  usually we just implement inline.
Done.

> > Source/WebKit/chromium/public/WebSpeechRecognizer.h:41
> > +    virtual void start(WebSpeechRecognitionHandle, const WebSpeechRecognitionParams&, WebSpeechRecognizerClient*) { WEBKIT_ASSERT_NOT_REACHED(); }
> 
> nit: pass handles as "const WebSpeechRecognitionHandle&" ... you should see other classes that are implemented using WebPrivatePtr being passed as const ref.
Done.

> > Source/WebKit/chromium/public/WebSpeechRecognizerClient.h:44
> > +    virtual void didStartAudio(WebSpeechRecognitionHandle&) = 0;
> 
> nit: pass handles as "const WebSpeechRecognitionHandle&"
Done.

> > Source/WebKit/chromium/src/SpeechRecognitionClientProxy.cpp:76
> > +    RefPtr<SpeechRecognition> recognition = static_cast<PassRefPtr<SpeechRecognition> >(handle);
> 
> the explicit cast to PassRefPtr<SpeechRecognition> should become unnecessary once you
> change the argument type to "const WebSpeechRecognitionHandle&" as the conversion
> operator defined on WebSpeechRecognitionHandle will then be used implicitly.

Hmm, I still can't do RefPtr<SpeechRecognition> recognition = handle;

RefPtr<T> doesn't have a ctor that takes a PassRefPtr<T>.
What it does have is a ctor template that takes a PassRefPtr<U>, but since handle isn't a PassRefPtr in itself, the compiler can't do template argument deduction for U.

I could do this:

  RefPtr<SpeechRecognition> recognition;
  recognition = handle;
  recognition->didReceiveResult(...);

Which might be less ugly, but feels more weird.

I could also use the PassRefPtr directly:

  PassRefPtr<SpeechRecognition> recognition = handle;
  recognition->didReceiveResult(...);

But that fails on the "Local variables should never be PassRefPtr" rule.

This would be another solution:

  static PassRefPtr<SpeechRecognition> unwrapHandle(const WebSpeechRecognitionHandle& handle)
  {
      return handle;
  }

  void SpeechRecognitionClientProxy::didStartAudio(const WebSpeechRecognitionHandle& handle)
  {
      RefPtr<SpeechRecognition> recognition = unwrapHandle(handle);
      recognition->didStartAudio();
  }

Still a bit weird, but it avoids the static_cast.

I'm not sure what the best way is here, so I've kept the static_casts in for now.
Comment 26 Hans Wennborg 2012-04-03 05:56:47 PDT
Created attachment 135316 [details]
Patch
Comment 27 Darin Fisher (:fishd, Google) 2012-04-03 09:01:19 PDT
Comment on attachment 135316 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=135316&action=review

> Source/WebKit/chromium/public/WebSpeechRecognizerClient.h:39
> +public:

this interface should probably have a protected destructor to ensure
that the WebSpeechRecognizer implementation does not try to delete
its client.

> Source/WebKit/chromium/src/SpeechRecognitionClientProxy.cpp:106
> +    RefPtr<SpeechRecognition> recognition = static_cast<PassRefPtr<SpeechRecognition> >(handle);

OK... looking around the codebase, the more common pattern in cases like this is to do:

  RefPtr<SpeechRecognition> recognition = PassRefPtr<SpeechRecognition>(handle);

^^^ slightly less verbosity

It seems like it would be cool if we had a global unwrap function that somehow
did the right thing here.

  unwrap<SpeechRecognition>(handle)->didEndAudio()

But, such a change would probably be a bigger effort.  I'd say it seems out-of-scope
for this patch.
Comment 28 Hans Wennborg 2012-04-04 00:52:06 PDT
(In reply to comment #27)
> (From update of attachment 135316 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=135316&action=review
> 
> > Source/WebKit/chromium/public/WebSpeechRecognizerClient.h:39
> > +public:
> 
> this interface should probably have a protected destructor to ensure
> that the WebSpeechRecognizer implementation does not try to delete
> its client.
Done.

> > Source/WebKit/chromium/src/SpeechRecognitionClientProxy.cpp:106
> > +    RefPtr<SpeechRecognition> recognition = static_cast<PassRefPtr<SpeechRecognition> >(handle);
> 
> OK... looking around the codebase, the more common pattern in cases like this is to do:
> 
>   RefPtr<SpeechRecognition> recognition = PassRefPtr<SpeechRecognition>(handle);
> 
> ^^^ slightly less verbosity
Ah, yes that's nicer. Done.

> It seems like it would be cool if we had a global unwrap function that somehow
> did the right thing here.
> 
>   unwrap<SpeechRecognition>(handle)->didEndAudio()
> 
> But, such a change would probably be a bigger effort.  I'd say it seems out-of-scope
> for this patch.
Yeah that sounds like it might be a good idea. I'll have a think about it.
Comment 29 Hans Wennborg 2012-04-04 01:37:36 PDT
Committed r113149: <http://trac.webkit.org/changeset/113149>
Comment 30 Hans Wennborg 2012-04-04 03:57:06 PDT
Committed r113164: <http://trac.webkit.org/changeset/113164>