WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
218216
Set up basic infrastructure for SpeechRecognition
https://bugs.webkit.org/show_bug.cgi?id=218216
Summary
Set up basic infrastructure for SpeechRecognition
Sihui Liu
Reported
2020-10-26 18:09:36 PDT
...
Attachments
Patch
(90.26 KB, patch)
2020-10-26 18:31 PDT
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(90.88 KB, patch)
2020-10-26 21:47 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(90.89 KB, patch)
2020-10-26 22:24 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(90.96 KB, patch)
2020-10-26 22:42 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(90.96 KB, patch)
2020-10-26 22:55 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(91.00 KB, patch)
2020-10-26 22:59 PDT
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(60.88 KB, patch)
2020-10-28 00:57 PDT
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(60.98 KB, patch)
2020-10-28 10:13 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(61.27 KB, patch)
2020-10-28 10:34 PDT
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(61.31 KB, patch)
2020-10-28 11:42 PDT
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(111.76 KB, patch)
2020-10-29 22:00 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(111.83 KB, patch)
2020-10-29 23:23 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(111.91 KB, patch)
2020-10-29 23:33 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(112.21 KB, patch)
2020-10-30 00:08 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(112.34 KB, patch)
2020-10-30 00:20 PDT
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(112.49 KB, patch)
2020-10-30 00:38 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(124.51 KB, patch)
2020-10-30 16:49 PDT
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(127.07 KB, patch)
2020-10-30 18:41 PDT
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(128.55 KB, patch)
2020-10-30 20:29 PDT
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(130.39 KB, patch)
2020-11-02 11:39 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(130.42 KB, patch)
2020-11-02 12:01 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(130.39 KB, patch)
2020-11-02 12:13 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(130.39 KB, patch)
2020-11-02 13:22 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(138.74 KB, patch)
2020-11-02 19:35 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(136.92 KB, patch)
2020-11-03 11:25 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(139.81 KB, patch)
2020-11-03 11:30 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(139.81 KB, patch)
2020-11-03 12:20 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(140.06 KB, patch)
2020-11-03 13:10 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Show Obsolete
(27)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2020-10-26 18:09:51 PDT
<
rdar://problem/70703788
>
Sihui Liu
Comment 2
2020-10-26 18:31:39 PDT
Created
attachment 412369
[details]
Patch
Sihui Liu
Comment 3
2020-10-26 21:47:38 PDT
Created
attachment 412387
[details]
Patch
Sihui Liu
Comment 4
2020-10-26 22:24:11 PDT
Created
attachment 412391
[details]
Patch
Sihui Liu
Comment 5
2020-10-26 22:42:21 PDT
Created
attachment 412393
[details]
Patch
Sihui Liu
Comment 6
2020-10-26 22:55:28 PDT
Created
attachment 412394
[details]
Patch
Sihui Liu
Comment 7
2020-10-26 22:59:15 PDT
Created
attachment 412396
[details]
Patch
Sihui Liu
Comment 8
2020-10-28 00:57:32 PDT
Created
attachment 412511
[details]
Patch
Sihui Liu
Comment 9
2020-10-28 10:13:27 PDT
Created
attachment 412542
[details]
Patch
Sihui Liu
Comment 10
2020-10-28 10:34:48 PDT
Created
attachment 412543
[details]
Patch
Sihui Liu
Comment 11
2020-10-28 11:42:27 PDT
Created
attachment 412553
[details]
Patch
Alex Christensen
Comment 12
2020-10-29 07:32:35 PDT
Comment on
attachment 412553
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=412553&action=review
> Source/WebCore/ChangeLog:12 > + one. SpeechRecogntionConnection is currently one per process.
Is there a reason we can only have one per process? If not, it would probably be better to have one per Page.
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:164 > + alternatives.append(SpeechRecognitionAlternative::create(WTFMove(alternativeData.transcript), alternativeData.confidence));
reserveInitialCapacity and uncheckedAppend removes some branches.
> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:60 > + HashMap<SpeechRecognitionClientIdentifier, SpeechRecognitionClient*> m_clientMap;
Could this be WeakPtr<SpeechRecognitionClient> instead of storing raw pointers?
> Source/WebCore/Modules/speech/SpeechRecognitionError.h:32 > +enum class SpeechRecognitionErrorType {
: uint8_t
> Source/WebCore/Modules/speech/SpeechRecognitionError.h:67 > + return result;
I have a preference that we consistently use the Optional based decoding. ... Optional<String> message; decoder >> message; if (!message) return WTF::nullopt; return {{ WTFMove(*type), WTFMove(*message) }};
> Source/WebCore/Modules/speech/SpeechRecognizer.h:46 > + uint64_t m_maxAlternatives;
This and the booleans should have a default initializer because they aren't initialized in the constructor.
youenn fablet
Comment 13
2020-10-29 07:46:48 PDT
Comment on
attachment 412553
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=412553&action=review
> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:61 > + Deque<RefPtr<SpeechRecognitionRequest>> m_pendingRequests;
Can it be a Deque<Ref<>>?
> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:36 > + SpeechRecognitionClientIdentifier clientIdentifier() { return m_clientIdentifier; }
Should be const.
> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:46 > + String m_lang { emptyString() };
Why emptyString and not null string
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:47 > + m_connection.recognizerDidStart();
The flow is not really clear to me. It seems start is called by m_connection and then we call here m_connection.recognizerDidStart. If the intent is to have a completion handler as start might be done async, how about adding a CompletionHandler<void()> here? Ditto in other call sites. This might even remove the need for m_connection. Also, in other call sites, we usually have virtual methods on the connection as it could be subclassed (IPC based impl for instance). I am not clear whether in your case the connection is to be subclassed, or the recogniser is to be subclassed or something else. Maybe change log could give some insights about the design you have in mind.
> Source/WebCore/Modules/speech/SpeechRecognizer.h:35 > + SpeechRecognizer(SpeechRecognitionConnection&);
explicit
Sihui Liu
Comment 14
2020-10-29 09:03:12 PDT
Comment on
attachment 412553
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=412553&action=review
>> Source/WebCore/ChangeLog:12 >> + one. SpeechRecogntionConnection is currently one per process. > > Is there a reason we can only have one per process? If not, it would probably be better to have one per Page.
My original thought was only one SpeechRecognition can listen to the microphone at a time so making it one per process is enough. Then I found we may need user permission check via uidelegate, so per page seems more reasonable, will update.
>> Source/WebCore/Modules/speech/SpeechRecognition.cpp:164 >> + alternatives.append(SpeechRecognitionAlternative::create(WTFMove(alternativeData.transcript), alternativeData.confidence)); > > reserveInitialCapacity and uncheckedAppend removes some branches.
Will do.
>> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:60 >> + HashMap<SpeechRecognitionClientIdentifier, SpeechRecognitionClient*> m_clientMap; > > Could this be WeakPtr<SpeechRecognitionClient> instead of storing raw pointers?
Yes
>> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:61 >> + Deque<RefPtr<SpeechRecognitionRequest>> m_pendingRequests; > > Can it be a Deque<Ref<>>?
Yes
>> Source/WebCore/Modules/speech/SpeechRecognitionError.h:32 >> +enum class SpeechRecognitionErrorType { > > : uint8_t
Sure.
>> Source/WebCore/Modules/speech/SpeechRecognitionError.h:67 >> + return result; > > I have a preference that we consistently use the Optional based decoding. > > ... > > Optional<String> message; > decoder >> message; > if (!message) > return WTF::nullopt; > > return {{ > WTFMove(*type), > WTFMove(*message) > }};
Will change.
>> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:36 >> + SpeechRecognitionClientIdentifier clientIdentifier() { return m_clientIdentifier; } > > Should be const.
Okay.
>> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:46 >> + String m_lang { emptyString() }; > > Why emptyString and not null string
nullstring should work too. Any empty string should fall into default case.
>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:47 >> + m_connection.recognizerDidStart(); > > The flow is not really clear to me. > It seems start is called by m_connection and then we call here m_connection.recognizerDidStart. > If the intent is to have a completion handler as start might be done async, how about adding a CompletionHandler<void()> here? > Ditto in other call sites. > This might even remove the need for m_connection. > > Also, in other call sites, we usually have virtual methods on the connection as it could be subclassed (IPC based impl for instance). > I am not clear whether in your case the connection is to be subclassed, or the recogniser is to be subclassed or something else. > Maybe change log could give some insights about the design you have in mind.
CompletionHandler will work for didstart and didstop. There could be other events generated by recognizer; I guess an alternative is to store multiple CompletionHandlers for each event in SpeechRecognizer. Since the connection is not subclassed in this patch, I didn't add the virtuals. Planned to do a functional in-process version first.
>> Source/WebCore/Modules/speech/SpeechRecognizer.h:35 >> + SpeechRecognizer(SpeechRecognitionConnection&); > > explicit
Okay.
>> Source/WebCore/Modules/speech/SpeechRecognizer.h:46 >> + uint64_t m_maxAlternatives; > > This and the booleans should have a default initializer because they aren't initialized in the constructor.
Sure.
Sihui Liu
Comment 15
2020-10-29 22:00:48 PDT
Comment hidden (obsolete)
Created
attachment 412712
[details]
Patch
Sihui Liu
Comment 16
2020-10-29 23:23:40 PDT
Comment hidden (obsolete)
Created
attachment 412717
[details]
Patch
Sihui Liu
Comment 17
2020-10-29 23:33:48 PDT
Comment hidden (obsolete)
Created
attachment 412718
[details]
Patch
Sihui Liu
Comment 18
2020-10-30 00:08:42 PDT
Comment hidden (obsolete)
Created
attachment 412719
[details]
Patch
Sihui Liu
Comment 19
2020-10-30 00:20:55 PDT
Comment hidden (obsolete)
Created
attachment 412720
[details]
Patch
Sihui Liu
Comment 20
2020-10-30 00:38:07 PDT
Created
attachment 412721
[details]
Patch
youenn fablet
Comment 21
2020-10-30 08:14:46 PDT
Can you describe why we would want to run the speech recognition engine? Why not doing speech recognition in GPU Process? We are for instance able to do audio capture either in UIProcess or GPUProcess but are trying to move audio capture in GPU Process. If we are not totally sure about UIProcess vs GPUProcess, it might be good to isolate classes so that they do not rely on WebPageProxy. For instance, SpeechRecognitionServer is currently taking a WebPageProxy& but it could probably do things by just taking an IPC::Connection and do IPC messaging outside of WebPage.
Sihui Liu
Comment 22
2020-10-30 11:19:50 PDT
(In reply to youenn fablet from
comment #21
)
> Can you describe why we would want to run the speech recognition engine? > Why not doing speech recognition in GPU Process?
Do you mean where we want to run the speech recognition engine? I plan to put it in SpeechRecognitionServer in UI process. As we will do user permission check in the UI process and we can do audio capture in the UI process, put engine in UI process can reduce some IPCs and make the structure easier. It looks like our current audio capture code is tied to media stream, so I am thinking whether I can have an independent, simpler audio capture implementation inside SpeechRecognitionServer, e.g. using AVAudioEngine and AVAudioNode it may be a few lines. If this can happen (not conflicting with our current coreaudio implementation), SpeechRecognition implementation can be independent of media code. >
> We are for instance able to do audio capture either in UIProcess or > GPUProcess but are trying to move audio capture in GPU Process. > If we are not totally sure about UIProcess vs GPUProcess, it might be good > to isolate classes so that they do not rely on WebPageProxy. > > For instance, SpeechRecognitionServer is currently taking a WebPageProxy& > but it could probably do things by just taking an IPC::Connection and do IPC > messaging outside of WebPage.
Sounds good, will try.
youenn fablet
Comment 23
2020-10-30 11:48:02 PDT
(In reply to Sihui Liu from
comment #22
)
> (In reply to youenn fablet from
comment #21
) > > Can you describe why we would want to run the speech recognition engine? > > Why not doing speech recognition in GPU Process? > > Do you mean where we want to run the speech recognition engine?
Yes
> I plan to put it in SpeechRecognitionServer in UI process. As we will do > user permission check in the UI process and we can do audio capture in the > UI process, put engine in UI process can reduce some IPCs and make the > structure easier.
Plan is to move audio capture in GPUProcess. On MacOS, this is done in UIProcess currently as we could not do it in WebProcess due to an OS limitation. You can enable GPU Process audio capture through experimental features. On iOS, capture happens in WebProcess but experimental features allow to move them to GPU process. It is important on iOS that capture happens where audio rendering happens (audio sessions in different processes might interrupt each other). All of this processing is moving to GPUProcess so it might be best to move SpeechRecognition there as well.
> > It looks like our current audio capture code is tied to media stream, so I > am thinking whether I can have an independent, simpler audio capture > implementation inside SpeechRecognitionServer, e.g. using AVAudioEngine and > AVAudioNode it may be a few lines. If this can happen (not conflicting with > our current coreaudio implementation), SpeechRecognition implementation can > be independent of media code.
Not really, you can create a CoreAudioCaptureSource and you will get data from it. I don't know much about AVAudioEngine/AVAudioNode, maybe it will be better, Eric could provide advice there. If we plan to support iOS, we definitely need to make sure that a page that does both getUserMedia and speech recognition work jointly.
Sihui Liu
Comment 24
2020-10-30 13:26:16 PDT
(In reply to youenn fablet from
comment #23
)
> (In reply to Sihui Liu from
comment #22
) > > (In reply to youenn fablet from
comment #21
) > > > Can you describe why we would want to run the speech recognition engine? > > > Why not doing speech recognition in GPU Process? > > > > Do you mean where we want to run the speech recognition engine? > > Yes > > > I plan to put it in SpeechRecognitionServer in UI process. As we will do > > user permission check in the UI process and we can do audio capture in the > > UI process, put engine in UI process can reduce some IPCs and make the > > structure easier. > > Plan is to move audio capture in GPUProcess. > On MacOS, this is done in UIProcess currently as we could not do it in > WebProcess due to an OS limitation. You can enable GPU Process audio capture > through experimental features. > On iOS, capture happens in WebProcess but experimental features allow to > move them to GPU process. It is important on iOS that capture happens where > audio rendering happens (audio sessions in different processes might > interrupt each other). > > All of this processing is moving to GPUProcess so it might be best to move > SpeechRecognition there as well. >
I see. I will make SpeechRecognitionServer takes a ProcessProxy for easier move-around. If web process is doing capture in iOS, how do we co-ordinate sessions in different web processes now?
> > > > It looks like our current audio capture code is tied to media stream, so I > > am thinking whether I can have an independent, simpler audio capture > > implementation inside SpeechRecognitionServer, e.g. using AVAudioEngine and > > AVAudioNode it may be a few lines. If this can happen (not conflicting with > > our current coreaudio implementation), SpeechRecognition implementation can > > be independent of media code. > > Not really, you can create a CoreAudioCaptureSource and you will get data > from it.
Yes, will need the device info, permission, hash salt... will try using it first when doing audio capture
> I don't know much about AVAudioEngine/AVAudioNode, maybe it will be better, > Eric could provide advice there. > > If we plan to support iOS, we definitely need to make sure that a page that > does both > getUserMedia and speech recognition work jointly.
I think we want to support both iOS and macOS; recognition service is available on both platforms.
Sihui Liu
Comment 25
2020-10-30 16:49:11 PDT
Created
attachment 412810
[details]
Patch
Sihui Liu
Comment 26
2020-10-30 18:41:07 PDT
Created
attachment 412822
[details]
Patch
Sihui Liu
Comment 27
2020-10-30 20:29:22 PDT
Created
attachment 412827
[details]
Patch
youenn fablet
Comment 28
2020-11-02 01:09:54 PST
Comment on
attachment 412827
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=412827&action=review
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:39 > +#include <wtf/WeakPtr.h>
Probably not needed.
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:69 > + return Exception { InvalidStateError, "Recognition is being started or already started"_s };
Can we have a test for this one?
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:72 > + m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives);
If we do not have a connection, should we throw? Why not taking a Ref of m_connection instead of a weak ptr?
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:110 > + dispatchEvent(Event::create(eventNames().startEvent, Event::CanBubble::No, Event::IsCancelable::No));
Should we dispatch the event right away or enqueue a task to dispatch it?
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:154 > + uint64_t nonFinalResultIndex = allResults.size();
auto
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:159 > + alternatives.uncheckedAppend(SpeechRecognitionAlternative::create(WTFMove(alternativeData.transcript), alternativeData.confidence));
Can we do something like: auto alternatives = WTF::map(resultData.alternatives, [](auto& data) { return ... });
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:170 > + dispatchEvent(SpeechRecognitionEvent::create(eventNames().resultEvent, nonFinalResultIndex, makeRefPtr(resultList.get())));
WTFMove(resultList)
> Source/WebCore/Modules/speech/SpeechRecognition.h:39 > +class SpeechRecognition : public SpeechRecognitionClient, public ActiveDOMObject, public RefCounted<SpeechRecognition>, public EventTargetWithInlineData {
It is strange that SpeechRecognition is its own client? Should SpeechRecognitionClient be really SpeechRecognitionConnectionClient?
> Source/WebCore/Modules/speech/SpeechRecognitionClient.h:38 > + SpeechRecognitionClient()
explicit
> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:35 > +class SpeechRecognitionConnection : public RefCounted<SpeechRecognitionConnection>, public CanMakeWeakPtr<SpeechRecognitionConnection> {
We could add here include <weakptr.h> Do we need it though? Maybe we can take a Ref everywhere if we do not have ref cycles.
> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:44 > + SpeechRecognitionRequest(const SpeechRecognitionRequestInfo&);
explicit, Could take a &&
> Source/WebCore/Modules/speech/SpeechRecognitionRequestInfo.h:34 > + String lang { nullString() };
No need for nullString here.
> Source/WebCore/page/SpeechRecognitionProvider.h:55 > + std::unique_ptr<DummySpeechRecognitionConnection> m_dummyConnection;
Might be better to have an empty provider with m_dummyConnection and real providers without this dummy connection
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:36 > +SpeechRecognitionServer::SpeechRecognitionServer(WebProcessProxy& process, SpeechRecognitionServerIdentifier identifier)
We could make SpeechRecognitionServer WebProcessProxy agnostic by passing a Ref<IPC::Connection>&& here and moving the addMessageReceiver/removeMessageReceiver to WebProcessProxy::createSpeechRecognitionServer/removeSpeechRecognitionServer
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:48 > +void SpeechRecognitionServer::start(const WebCore::SpeechRecognitionRequestInfo& requestInfo)
Could be &&?
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:68 > +void SpeechRecognitionServer::stop(const WebCore::SpeechRecognitionRequestInfo& requestInfo)
Can we pass just an identifier? It seems only clientIdentifier is used. Or should we introduce a request identifier? Ditto for abort and invalidate
> Source/WebKit/UIProcess/SpeechRecognitionServer.h:42 > + explicit SpeechRecognitionServer(WebProcessProxy&, SpeechRecognitionServerIdentifier);
No need for explicit
> Source/WebKit/UIProcess/WebProcessProxy.cpp:1714 > + speechRecognitionServer.iterator->value = makeUnique<SpeechRecognitionServer>(*this, identifier);
Is there a possibility for trying to create two servers with the same identifier? Usually, we do ASSERT(!map.contains(identifier)); map.add(...)
> Source/WebKit/UIProcess/WebProcessProxy.cpp:1719 > + m_speechRecognitionServerMap.remove(identifier);
We could an ASSERT(map.contains())
> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:47 > + WebProcess::singleton().addMessageReceiver(Messages::WebSpeechRecognitionConnection::messageReceiverName(), m_identifier, *this);
In theory, we should probably move addMessageReceiver before sending a message.
> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:137 > + return WebProcess::singleton().parentProcessConnection();
This strategy might not work if we move server to GPUProcess and we do not crash WebProcess in case of GPUProcess crash.
> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.h:50 > + void registerClient(WebCore::SpeechRecognitionClient&) final;
Can we make them private?
Sihui Liu
Comment 29
2020-11-02 11:39:53 PST
Created
attachment 412951
[details]
Patch
Sihui Liu
Comment 30
2020-11-02 12:01:43 PST
Created
attachment 412957
[details]
Patch
Sihui Liu
Comment 31
2020-11-02 12:13:04 PST
Created
attachment 412959
[details]
Patch
Sihui Liu
Comment 32
2020-11-02 13:22:46 PST
Created
attachment 412964
[details]
Patch
Sihui Liu
Comment 33
2020-11-02 13:32:37 PST
(In reply to youenn fablet from
comment #28
)
> Comment on
attachment 412827
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=412827&action=review
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:39 > > +#include <wtf/WeakPtr.h> > > Probably not needed.
Removed.
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:69 > > + return Exception { InvalidStateError, "Recognition is being started or already started"_s }; > > Can we have a test for this one?
Test is added.
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:72 > > + m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives); > > If we do not have a connection, should we throw? > Why not taking a Ref of m_connection instead of a weak ptr?
This is not defined in spec. I guess there is no harm to add. Made m_connection RefPtr.
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:110 > > + dispatchEvent(Event::create(eventNames().startEvent, Event::CanBubble::No, Event::IsCancelable::No)); > > Should we dispatch the event right away or enqueue a task to dispatch it?
Not specified in spec. Maybe we can enqueue a task if we find it's necessary later?
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:154 > > + uint64_t nonFinalResultIndex = allResults.size(); > > auto
Okay.
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:159 > > + alternatives.uncheckedAppend(SpeechRecognitionAlternative::create(WTFMove(alternativeData.transcript), alternativeData.confidence)); > > Can we do something like: > auto alternatives = WTF::map(resultData.alternatives, [](auto& data) { > return ... });
Changed.
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:170 > > + dispatchEvent(SpeechRecognitionEvent::create(eventNames().resultEvent, nonFinalResultIndex, makeRefPtr(resultList.get()))); > > WTFMove(resultList)
Okay.
> > > Source/WebCore/Modules/speech/SpeechRecognition.h:39 > > +class SpeechRecognition : public SpeechRecognitionClient, public ActiveDOMObject, public RefCounted<SpeechRecognition>, public EventTargetWithInlineData { > > It is strange that SpeechRecognition is its own client? > Should SpeechRecognitionClient be really SpeechRecognitionConnectionClient?
Renamed.
> > > Source/WebCore/Modules/speech/SpeechRecognitionClient.h:38 > > + SpeechRecognitionClient() > > explicit
Added.
> > > Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:35 > > +class SpeechRecognitionConnection : public RefCounted<SpeechRecognitionConnection>, public CanMakeWeakPtr<SpeechRecognitionConnection> { > > We could add here include <weakptr.h> > Do we need it though? Maybe we can take a Ref everywhere if we do not have > ref cycles.
No cycle so far, so using ref and removing weakptr.
> > > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:44 > > + SpeechRecognitionRequest(const SpeechRecognitionRequestInfo&); > > explicit, Could take a &&
Okay.
> > > Source/WebCore/Modules/speech/SpeechRecognitionRequestInfo.h:34 > > + String lang { nullString() }; > > No need for nullString here.
Removed.
> > > Source/WebCore/page/SpeechRecognitionProvider.h:55 > > + std::unique_ptr<DummySpeechRecognitionConnection> m_dummyConnection; > > Might be better to have an empty provider with m_dummyConnection and real > providers without this dummy connection
Created a DummySpeechRecognitionProvider for WebKitLegacy.
> > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:36 > > +SpeechRecognitionServer::SpeechRecognitionServer(WebProcessProxy& process, SpeechRecognitionServerIdentifier identifier) > > We could make SpeechRecognitionServer WebProcessProxy agnostic by passing a > Ref<IPC::Connection>&& here and moving the > addMessageReceiver/removeMessageReceiver to > WebProcessProxy::createSpeechRecognitionServer/removeSpeechRecognitionServer
Okay.
> > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:48 > > +void SpeechRecognitionServer::start(const WebCore::SpeechRecognitionRequestInfo& requestInfo) > > Could be &&?
Yes.
> > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:68 > > +void SpeechRecognitionServer::stop(const WebCore::SpeechRecognitionRequestInfo& requestInfo) > > Can we pass just an identifier? > It seems only clientIdentifier is used. > Or should we introduce a request identifier? > > Ditto for abort and invalidate
Passed ClientIdentiifer instead. A client should only have one outgoing request at a time, so client id seems enough.
> > > Source/WebKit/UIProcess/SpeechRecognitionServer.h:42 > > + explicit SpeechRecognitionServer(WebProcessProxy&, SpeechRecognitionServerIdentifier); > > No need for explicit
Removed.
> > > Source/WebKit/UIProcess/WebProcessProxy.cpp:1714 > > + speechRecognitionServer.iterator->value = makeUnique<SpeechRecognitionServer>(*this, identifier); > > Is there a possibility for trying to create two servers with the same > identifier? > Usually, we do ASSERT(!map.contains(identifier)); map.add(...)
Added assert.
> > > Source/WebKit/UIProcess/WebProcessProxy.cpp:1719 > > + m_speechRecognitionServerMap.remove(identifier); > > We could an ASSERT(map.contains())
Added assert.
> > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:47 > > + WebProcess::singleton().addMessageReceiver(Messages::WebSpeechRecognitionConnection::messageReceiverName(), m_identifier, *this); > > In theory, we should probably move addMessageReceiver before sending a > message.
Moved.
> > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:137 > > + return WebProcess::singleton().parentProcessConnection(); > > This strategy might not work if we move server to GPUProcess and we do not > crash WebProcess in case of GPUProcess crash.
Maybe we can change this to getGPUProcessConnection() at that time.
> > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.h:50 > > + void registerClient(WebCore::SpeechRecognitionClient&) final; > > Can we make them private?
Okay.
Sihui Liu
Comment 34
2020-11-02 19:35:44 PST
Created
attachment 412997
[details]
Patch
youenn fablet
Comment 35
2020-11-03 06:08:12 PST
Comment on
attachment 412997
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=412997&action=review
> Source/WebKit/ChangeLog:55 > +2020-11-02 Sihui Liu <
sihui_liu@apple.com
>
double log
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:53 > + if (auto* page = document.page()) {
Maybe we could throw if there is no page and make m_connection a Ref<>. Do you know what other browsers are doing?
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:69 > + if (!m_connection)
If we make it a ref, we could remove that check.
> Source/WebCore/Modules/speech/SpeechRecognition.cpp:105 > + dispatchEvent(Event::create(eventNames().startEvent, Event::CanBubble::No, Event::IsCancelable::No));
I think we should enqueue a task. There is always the risk that didStart will be called from receiving an IPC message from UIProcess but document would be in page cache. The alternative would be to neuter the object when document goes into page cache.
> Source/WebCore/Modules/speech/SpeechRecognition.h:63 > + // SpeechRecognitionConnectionClient
Should be private probably
> Source/WebCore/Modules/speech/SpeechRecognitionConnectionClient.h:34 > +struct SpeechRecognitionError;
reverse the two.
> Source/WebCore/Modules/speech/SpeechRecognitionConnectionClient.h:38 > + explicit SpeechRecognitionConnectionClient()
No need for explicit
> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:44 > + SpeechRecognitionRequest(SpeechRecognitionRequestInfo&&);
explicit
> Source/WebCore/Modules/speech/SpeechRecognitionResultData.h:32 > + double confidence;
Should we set to 0 by default?
> Source/WebCore/Modules/speech/SpeechRecognitionUpdate.cpp:69 > + return update;
one liner?
> Source/WebCore/Modules/speech/SpeechRecognitionUpdate.cpp:75 > + return update;
one liner?
> Source/WebCore/page/SpeechRecognitionProvider.h:37 > + virtual ~SpeechRecognitionProvider() { };
= default to keep it consistent
> Source/WebCore/page/SpeechRecognitionProvider.h:41 > +class DummySpeechRecognitionProvider final : public WebCore::SpeechRecognitionProvider {
Maybe move it to its own file?
> Source/WebKit/UIProcess/WebProcessProxy.h:399 > + void destroySpeechRecognitionServer(SpeechRecognitionServerIdentifier);
Could be private?
> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.h:53 > + void didReceiveUpdate(const WebCore::SpeechRecognitionUpdate&) final;
private?
> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionProvider.h:36 > + WebSpeechRecognitionProvider(WebPage& page)
explicit. Do we need to take a page reference. Maybe we can just keep the page ID?
> Tools/DumpRenderTree/win/DumpRenderTree.cpp:924 > + prefsPrivate->setSpeechRecognitionEnabled(TRUE);
Do we really want it to be TRUE?
Sihui Liu
Comment 36
2020-11-03 10:38:25 PST
(In reply to youenn fablet from
comment #35
)
> Comment on
attachment 412997
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=412997&action=review
> > > Source/WebKit/ChangeLog:55 > > +2020-11-02 Sihui Liu <
sihui_liu@apple.com
> > > double log
Oops, will remove.
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:53 > > + if (auto* page = document.page()) { > > Maybe we could throw if there is no page and make m_connection a Ref<>. > Do you know what other browsers are doing?
Blink does not throw exception and if page is destroyed, it sets connection to null, which is similar to what we do.
> > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:69 > > + if (!m_connection) > > If we make it a ref, we could remove that check. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:105 > > + dispatchEvent(Event::create(eventNames().startEvent, Event::CanBubble::No, Event::IsCancelable::No)); > > I think we should enqueue a task. > There is always the risk that didStart will be called from receiving an IPC > message from UIProcess but document would be in page cache. > The alternative would be to neuter the object when document goes into page > cache.
Okay, will use queueTaskToDispatchEvent.
> > > Source/WebCore/Modules/speech/SpeechRecognition.h:63 > > + // SpeechRecognitionConnectionClient > > Should be private probably
Okay.
> > > Source/WebCore/Modules/speech/SpeechRecognitionConnectionClient.h:34 > > +struct SpeechRecognitionError; > > reverse the two.
Sure.
> > > Source/WebCore/Modules/speech/SpeechRecognitionConnectionClient.h:38 > > + explicit SpeechRecognitionConnectionClient() > > No need for explicit
Will remove.
> > > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:44 > > + SpeechRecognitionRequest(SpeechRecognitionRequestInfo&&); > > explicit
Okay.
> > > Source/WebCore/Modules/speech/SpeechRecognitionResultData.h:32 > > + double confidence; > > Should we set to 0 by default?
Will set.
> > > Source/WebCore/Modules/speech/SpeechRecognitionUpdate.cpp:69 > > + return update; > > one liner?
Okay.
> > > Source/WebCore/Modules/speech/SpeechRecognitionUpdate.cpp:75 > > + return update; > > one liner?
Okay.
> > > Source/WebCore/page/SpeechRecognitionProvider.h:37 > > + virtual ~SpeechRecognitionProvider() { }; > > = default to keep it consistent
Sure.
> > > Source/WebCore/page/SpeechRecognitionProvider.h:41 > > +class DummySpeechRecognitionProvider final : public WebCore::SpeechRecognitionProvider { > > Maybe move it to its own file?
Will move.
> > > Source/WebKit/UIProcess/WebProcessProxy.h:399 > > + void destroySpeechRecognitionServer(SpeechRecognitionServerIdentifier); > > Could be private?
Will change.
> > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.h:53 > > + void didReceiveUpdate(const WebCore::SpeechRecognitionUpdate&) final; > > private?
Okay.
> > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionProvider.h:36 > > + WebSpeechRecognitionProvider(WebPage& page) > > explicit. > Do we need to take a page reference. > Maybe we can just keep the page ID?
Sure, will change to take ID.
> > > Tools/DumpRenderTree/win/DumpRenderTree.cpp:924 > > + prefsPrivate->setSpeechRecognitionEnabled(TRUE); > > Do we really want it to be TRUE?
According to Sam's comment in
https://bugs.webkit.org/show_bug.cgi?id=217780#c21
, we want to make tests cross-platform if possible. The added test does not test actual speech recognition and should pass on Win.
Sihui Liu
Comment 37
2020-11-03 11:25:46 PST
Created
attachment 413080
[details]
Patch
Sihui Liu
Comment 38
2020-11-03 11:30:21 PST
Created
attachment 413083
[details]
Patch
Sihui Liu
Comment 39
2020-11-03 12:20:02 PST
Created
attachment 413089
[details]
Patch
Sihui Liu
Comment 40
2020-11-03 13:10:59 PST
Created
attachment 413099
[details]
Patch
EWS
Comment 41
2020-11-03 17:44:07 PST
Committed
r269348
: <
https://trac.webkit.org/changeset/269348
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 413099
[details]
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug