218216 – Set up basic infrastructure for SpeechRecognition

RESOLVED FIXED 218216

Set up basic infrastructure for SpeechRecognition

https://bugs.webkit.org/show_bug.cgi?id=218216

Summary Set up basic infrastructure for SpeechRecognition

Sihui Liu

Reported 2020-10-26 18:09:36 PDT

...

Attachments
Patch (90.26 KB, patch) 2020-10-26 18:31 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (90.88 KB, patch) 2020-10-26 21:47 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (90.89 KB, patch) 2020-10-26 22:24 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (90.96 KB, patch) 2020-10-26 22:42 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (90.96 KB, patch) 2020-10-26 22:55 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (91.00 KB, patch) 2020-10-26 22:59 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (60.88 KB, patch) 2020-10-28 00:57 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (60.98 KB, patch) 2020-10-28 10:13 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (61.27 KB, patch) 2020-10-28 10:34 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (61.31 KB, patch) 2020-10-28 11:42 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (111.76 KB, patch) 2020-10-29 22:00 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (111.83 KB, patch) 2020-10-29 23:23 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (111.91 KB, patch) 2020-10-29 23:33 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (112.21 KB, patch) 2020-10-30 00:08 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (112.34 KB, patch) 2020-10-30 00:20 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (112.49 KB, patch) 2020-10-30 00:38 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (124.51 KB, patch) 2020-10-30 16:49 PDT, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (127.07 KB, patch) 2020-10-30 18:41 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (128.55 KB, patch) 2020-10-30 20:29 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (130.39 KB, patch) 2020-11-02 11:39 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (130.42 KB, patch) 2020-11-02 12:01 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (130.39 KB, patch) 2020-11-02 12:13 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (130.39 KB, patch) 2020-11-02 13:22 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (138.74 KB, patch) 2020-11-02 19:35 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (136.92 KB, patch) 2020-11-03 11:25 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (139.81 KB, patch) 2020-11-03 11:30 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (139.81 KB, patch) 2020-11-03 12:20 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (140.06 KB, patch) 2020-11-03 13:10 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Show Obsolete (27) View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2020-10-26 18:09:51 PDT

<rdar://problem/70703788>

Sihui Liu

Comment 2 2020-10-26 18:31:39 PDT

Created attachment 412369 [details] Patch

Sihui Liu

Comment 3 2020-10-26 21:47:38 PDT

Created attachment 412387 [details] Patch

Sihui Liu

Comment 4 2020-10-26 22:24:11 PDT

Created attachment 412391 [details] Patch

Sihui Liu

Comment 5 2020-10-26 22:42:21 PDT

Created attachment 412393 [details] Patch

Sihui Liu

Comment 6 2020-10-26 22:55:28 PDT

Created attachment 412394 [details] Patch

Sihui Liu

Comment 7 2020-10-26 22:59:15 PDT

Created attachment 412396 [details] Patch

Sihui Liu

Comment 8 2020-10-28 00:57:32 PDT

Created attachment 412511 [details] Patch

Sihui Liu

Comment 9 2020-10-28 10:13:27 PDT

Created attachment 412542 [details] Patch

Sihui Liu

Comment 10 2020-10-28 10:34:48 PDT

Created attachment 412543 [details] Patch

Sihui Liu

Comment 11 2020-10-28 11:42:27 PDT

Created attachment 412553 [details] Patch

Alex Christensen

Comment 12 2020-10-29 07:32:35 PDT

Comment on attachment 412553 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=412553&action=review > Source/WebCore/ChangeLog:12 > + one. SpeechRecogntionConnection is currently one per process. Is there a reason we can only have one per process? If not, it would probably be better to have one per Page. > Source/WebCore/Modules/speech/SpeechRecognition.cpp:164 > + alternatives.append(SpeechRecognitionAlternative::create(WTFMove(alternativeData.transcript), alternativeData.confidence)); reserveInitialCapacity and uncheckedAppend removes some branches. > Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:60 > + HashMap<SpeechRecognitionClientIdentifier, SpeechRecognitionClient*> m_clientMap; Could this be WeakPtr<SpeechRecognitionClient> instead of storing raw pointers? > Source/WebCore/Modules/speech/SpeechRecognitionError.h:32 > +enum class SpeechRecognitionErrorType { : uint8_t > Source/WebCore/Modules/speech/SpeechRecognitionError.h:67 > + return result; I have a preference that we consistently use the Optional based decoding. ... Optional<String> message; decoder >> message; if (!message) return WTF::nullopt; return {{ WTFMove(*type), WTFMove(*message) }}; > Source/WebCore/Modules/speech/SpeechRecognizer.h:46 > + uint64_t m_maxAlternatives; This and the booleans should have a default initializer because they aren't initialized in the constructor.

youenn fablet

Comment 13 2020-10-29 07:46:48 PDT

Comment on attachment 412553 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=412553&action=review > Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:61 > + Deque<RefPtr<SpeechRecognitionRequest>> m_pendingRequests; Can it be a Deque<Ref<>>? > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:36 > + SpeechRecognitionClientIdentifier clientIdentifier() { return m_clientIdentifier; } Should be const. > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:46 > + String m_lang { emptyString() }; Why emptyString and not null string > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:47 > + m_connection.recognizerDidStart(); The flow is not really clear to me. It seems start is called by m_connection and then we call here m_connection.recognizerDidStart. If the intent is to have a completion handler as start might be done async, how about adding a CompletionHandler<void()> here? Ditto in other call sites. This might even remove the need for m_connection. Also, in other call sites, we usually have virtual methods on the connection as it could be subclassed (IPC based impl for instance). I am not clear whether in your case the connection is to be subclassed, or the recogniser is to be subclassed or something else. Maybe change log could give some insights about the design you have in mind. > Source/WebCore/Modules/speech/SpeechRecognizer.h:35 > + SpeechRecognizer(SpeechRecognitionConnection&); explicit

Sihui Liu

Comment 14 2020-10-29 09:03:12 PDT

Comment on attachment 412553 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=412553&action=review >> Source/WebCore/ChangeLog:12 >> + one. SpeechRecogntionConnection is currently one per process. > > Is there a reason we can only have one per process? If not, it would probably be better to have one per Page. My original thought was only one SpeechRecognition can listen to the microphone at a time so making it one per process is enough. Then I found we may need user permission check via uidelegate, so per page seems more reasonable, will update. >> Source/WebCore/Modules/speech/SpeechRecognition.cpp:164 >> + alternatives.append(SpeechRecognitionAlternative::create(WTFMove(alternativeData.transcript), alternativeData.confidence)); > > reserveInitialCapacity and uncheckedAppend removes some branches. Will do. >> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:60 >> + HashMap<SpeechRecognitionClientIdentifier, SpeechRecognitionClient*> m_clientMap; > > Could this be WeakPtr<SpeechRecognitionClient> instead of storing raw pointers? Yes >> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:61 >> + Deque<RefPtr<SpeechRecognitionRequest>> m_pendingRequests; > > Can it be a Deque<Ref<>>? Yes >> Source/WebCore/Modules/speech/SpeechRecognitionError.h:32 >> +enum class SpeechRecognitionErrorType { > > : uint8_t Sure. >> Source/WebCore/Modules/speech/SpeechRecognitionError.h:67 >> + return result; > > I have a preference that we consistently use the Optional based decoding. > > ... > > Optional<String> message; > decoder >> message; > if (!message) > return WTF::nullopt; > > return {{ > WTFMove(*type), > WTFMove(*message) > }}; Will change. >> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:36 >> + SpeechRecognitionClientIdentifier clientIdentifier() { return m_clientIdentifier; } > > Should be const. Okay. >> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:46 >> + String m_lang { emptyString() }; > > Why emptyString and not null string nullstring should work too. Any empty string should fall into default case. >> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:47 >> + m_connection.recognizerDidStart(); > > The flow is not really clear to me. > It seems start is called by m_connection and then we call here m_connection.recognizerDidStart. > If the intent is to have a completion handler as start might be done async, how about adding a CompletionHandler<void()> here? > Ditto in other call sites. > This might even remove the need for m_connection. > > Also, in other call sites, we usually have virtual methods on the connection as it could be subclassed (IPC based impl for instance). > I am not clear whether in your case the connection is to be subclassed, or the recogniser is to be subclassed or something else. > Maybe change log could give some insights about the design you have in mind. CompletionHandler will work for didstart and didstop. There could be other events generated by recognizer; I guess an alternative is to store multiple CompletionHandlers for each event in SpeechRecognizer. Since the connection is not subclassed in this patch, I didn't add the virtuals. Planned to do a functional in-process version first. >> Source/WebCore/Modules/speech/SpeechRecognizer.h:35 >> + SpeechRecognizer(SpeechRecognitionConnection&); > > explicit Okay. >> Source/WebCore/Modules/speech/SpeechRecognizer.h:46 >> + uint64_t m_maxAlternatives; > > This and the booleans should have a default initializer because they aren't initialized in the constructor. Sure.

Sihui Liu

Comment 15 2020-10-29 22:00:48 PDT Comment hidden (obsolete)

Created attachment 412712 [details] Patch

Sihui Liu

Comment 16 2020-10-29 23:23:40 PDT Comment hidden (obsolete)

Created attachment 412717 [details] Patch

Sihui Liu

Comment 17 2020-10-29 23:33:48 PDT Comment hidden (obsolete)

Created attachment 412718 [details] Patch

Sihui Liu

Comment 18 2020-10-30 00:08:42 PDT Comment hidden (obsolete)

Created attachment 412719 [details] Patch

Sihui Liu

Comment 19 2020-10-30 00:20:55 PDT Comment hidden (obsolete)

Created attachment 412720 [details] Patch

Sihui Liu

Comment 20 2020-10-30 00:38:07 PDT

Created attachment 412721 [details] Patch

youenn fablet

Comment 21 2020-10-30 08:14:46 PDT

Can you describe why we would want to run the speech recognition engine? Why not doing speech recognition in GPU Process? We are for instance able to do audio capture either in UIProcess or GPUProcess but are trying to move audio capture in GPU Process. If we are not totally sure about UIProcess vs GPUProcess, it might be good to isolate classes so that they do not rely on WebPageProxy. For instance, SpeechRecognitionServer is currently taking a WebPageProxy& but it could probably do things by just taking an IPC::Connection and do IPC messaging outside of WebPage.

Sihui Liu

Comment 22 2020-10-30 11:19:50 PDT

(In reply to youenn fablet from comment #21) > Can you describe why we would want to run the speech recognition engine? > Why not doing speech recognition in GPU Process? Do you mean where we want to run the speech recognition engine? I plan to put it in SpeechRecognitionServer in UI process. As we will do user permission check in the UI process and we can do audio capture in the UI process, put engine in UI process can reduce some IPCs and make the structure easier. It looks like our current audio capture code is tied to media stream, so I am thinking whether I can have an independent, simpler audio capture implementation inside SpeechRecognitionServer, e.g. using AVAudioEngine and AVAudioNode it may be a few lines. If this can happen (not conflicting with our current coreaudio implementation), SpeechRecognition implementation can be independent of media code. > > We are for instance able to do audio capture either in UIProcess or > GPUProcess but are trying to move audio capture in GPU Process. > If we are not totally sure about UIProcess vs GPUProcess, it might be good > to isolate classes so that they do not rely on WebPageProxy. > > For instance, SpeechRecognitionServer is currently taking a WebPageProxy& > but it could probably do things by just taking an IPC::Connection and do IPC > messaging outside of WebPage. Sounds good, will try.

youenn fablet

Comment 23 2020-10-30 11:48:02 PDT

(In reply to Sihui Liu from comment #22) > (In reply to youenn fablet from comment #21) > > Can you describe why we would want to run the speech recognition engine? > > Why not doing speech recognition in GPU Process? > > Do you mean where we want to run the speech recognition engine? Yes > I plan to put it in SpeechRecognitionServer in UI process. As we will do > user permission check in the UI process and we can do audio capture in the > UI process, put engine in UI process can reduce some IPCs and make the > structure easier. Plan is to move audio capture in GPUProcess. On MacOS, this is done in UIProcess currently as we could not do it in WebProcess due to an OS limitation. You can enable GPU Process audio capture through experimental features. On iOS, capture happens in WebProcess but experimental features allow to move them to GPU process. It is important on iOS that capture happens where audio rendering happens (audio sessions in different processes might interrupt each other). All of this processing is moving to GPUProcess so it might be best to move SpeechRecognition there as well. > > It looks like our current audio capture code is tied to media stream, so I > am thinking whether I can have an independent, simpler audio capture > implementation inside SpeechRecognitionServer, e.g. using AVAudioEngine and > AVAudioNode it may be a few lines. If this can happen (not conflicting with > our current coreaudio implementation), SpeechRecognition implementation can > be independent of media code. Not really, you can create a CoreAudioCaptureSource and you will get data from it. I don't know much about AVAudioEngine/AVAudioNode, maybe it will be better, Eric could provide advice there. If we plan to support iOS, we definitely need to make sure that a page that does both getUserMedia and speech recognition work jointly.

Sihui Liu

Comment 24 2020-10-30 13:26:16 PDT

(In reply to youenn fablet from comment #23) > (In reply to Sihui Liu from comment #22) > > (In reply to youenn fablet from comment #21) > > > Can you describe why we would want to run the speech recognition engine? > > > Why not doing speech recognition in GPU Process? > > > > Do you mean where we want to run the speech recognition engine? > > Yes > > > I plan to put it in SpeechRecognitionServer in UI process. As we will do > > user permission check in the UI process and we can do audio capture in the > > UI process, put engine in UI process can reduce some IPCs and make the > > structure easier. > > Plan is to move audio capture in GPUProcess. > On MacOS, this is done in UIProcess currently as we could not do it in > WebProcess due to an OS limitation. You can enable GPU Process audio capture > through experimental features. > On iOS, capture happens in WebProcess but experimental features allow to > move them to GPU process. It is important on iOS that capture happens where > audio rendering happens (audio sessions in different processes might > interrupt each other). > > All of this processing is moving to GPUProcess so it might be best to move > SpeechRecognition there as well. > I see. I will make SpeechRecognitionServer takes a ProcessProxy for easier move-around. If web process is doing capture in iOS, how do we co-ordinate sessions in different web processes now? > > > > It looks like our current audio capture code is tied to media stream, so I > > am thinking whether I can have an independent, simpler audio capture > > implementation inside SpeechRecognitionServer, e.g. using AVAudioEngine and > > AVAudioNode it may be a few lines. If this can happen (not conflicting with > > our current coreaudio implementation), SpeechRecognition implementation can > > be independent of media code. > > Not really, you can create a CoreAudioCaptureSource and you will get data > from it. Yes, will need the device info, permission, hash salt... will try using it first when doing audio capture > I don't know much about AVAudioEngine/AVAudioNode, maybe it will be better, > Eric could provide advice there. > > If we plan to support iOS, we definitely need to make sure that a page that > does both > getUserMedia and speech recognition work jointly. I think we want to support both iOS and macOS; recognition service is available on both platforms.

Sihui Liu

Comment 25 2020-10-30 16:49:11 PDT

Created attachment 412810 [details] Patch

Sihui Liu

Comment 26 2020-10-30 18:41:07 PDT

Created attachment 412822 [details] Patch

Sihui Liu

Comment 27 2020-10-30 20:29:22 PDT

Created attachment 412827 [details] Patch

youenn fablet

Comment 28 2020-11-02 01:09:54 PST

Comment on attachment 412827 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=412827&action=review > Source/WebCore/Modules/speech/SpeechRecognition.cpp:39 > +#include <wtf/WeakPtr.h> Probably not needed. > Source/WebCore/Modules/speech/SpeechRecognition.cpp:69 > + return Exception { InvalidStateError, "Recognition is being started or already started"_s }; Can we have a test for this one? > Source/WebCore/Modules/speech/SpeechRecognition.cpp:72 > + m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives); If we do not have a connection, should we throw? Why not taking a Ref of m_connection instead of a weak ptr? > Source/WebCore/Modules/speech/SpeechRecognition.cpp:110 > + dispatchEvent(Event::create(eventNames().startEvent, Event::CanBubble::No, Event::IsCancelable::No)); Should we dispatch the event right away or enqueue a task to dispatch it? > Source/WebCore/Modules/speech/SpeechRecognition.cpp:154 > + uint64_t nonFinalResultIndex = allResults.size(); auto > Source/WebCore/Modules/speech/SpeechRecognition.cpp:159 > + alternatives.uncheckedAppend(SpeechRecognitionAlternative::create(WTFMove(alternativeData.transcript), alternativeData.confidence)); Can we do something like: auto alternatives = WTF::map(resultData.alternatives, [](auto& data) { return ... }); > Source/WebCore/Modules/speech/SpeechRecognition.cpp:170 > + dispatchEvent(SpeechRecognitionEvent::create(eventNames().resultEvent, nonFinalResultIndex, makeRefPtr(resultList.get()))); WTFMove(resultList) > Source/WebCore/Modules/speech/SpeechRecognition.h:39 > +class SpeechRecognition : public SpeechRecognitionClient, public ActiveDOMObject, public RefCounted<SpeechRecognition>, public EventTargetWithInlineData { It is strange that SpeechRecognition is its own client? Should SpeechRecognitionClient be really SpeechRecognitionConnectionClient? > Source/WebCore/Modules/speech/SpeechRecognitionClient.h:38 > + SpeechRecognitionClient() explicit > Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:35 > +class SpeechRecognitionConnection : public RefCounted<SpeechRecognitionConnection>, public CanMakeWeakPtr<SpeechRecognitionConnection> { We could add here include <weakptr.h> Do we need it though? Maybe we can take a Ref everywhere if we do not have ref cycles. > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:44 > + SpeechRecognitionRequest(const SpeechRecognitionRequestInfo&); explicit, Could take a && > Source/WebCore/Modules/speech/SpeechRecognitionRequestInfo.h:34 > + String lang { nullString() }; No need for nullString here. > Source/WebCore/page/SpeechRecognitionProvider.h:55 > + std::unique_ptr<DummySpeechRecognitionConnection> m_dummyConnection; Might be better to have an empty provider with m_dummyConnection and real providers without this dummy connection > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:36 > +SpeechRecognitionServer::SpeechRecognitionServer(WebProcessProxy& process, SpeechRecognitionServerIdentifier identifier) We could make SpeechRecognitionServer WebProcessProxy agnostic by passing a Ref<IPC::Connection>&& here and moving the addMessageReceiver/removeMessageReceiver to WebProcessProxy::createSpeechRecognitionServer/removeSpeechRecognitionServer > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:48 > +void SpeechRecognitionServer::start(const WebCore::SpeechRecognitionRequestInfo& requestInfo) Could be &&? > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:68 > +void SpeechRecognitionServer::stop(const WebCore::SpeechRecognitionRequestInfo& requestInfo) Can we pass just an identifier? It seems only clientIdentifier is used. Or should we introduce a request identifier? Ditto for abort and invalidate > Source/WebKit/UIProcess/SpeechRecognitionServer.h:42 > + explicit SpeechRecognitionServer(WebProcessProxy&, SpeechRecognitionServerIdentifier); No need for explicit > Source/WebKit/UIProcess/WebProcessProxy.cpp:1714 > + speechRecognitionServer.iterator->value = makeUnique<SpeechRecognitionServer>(*this, identifier); Is there a possibility for trying to create two servers with the same identifier? Usually, we do ASSERT(!map.contains(identifier)); map.add(...) > Source/WebKit/UIProcess/WebProcessProxy.cpp:1719 > + m_speechRecognitionServerMap.remove(identifier); We could an ASSERT(map.contains()) > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:47 > + WebProcess::singleton().addMessageReceiver(Messages::WebSpeechRecognitionConnection::messageReceiverName(), m_identifier, *this); In theory, we should probably move addMessageReceiver before sending a message. > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:137 > + return WebProcess::singleton().parentProcessConnection(); This strategy might not work if we move server to GPUProcess and we do not crash WebProcess in case of GPUProcess crash. > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.h:50 > + void registerClient(WebCore::SpeechRecognitionClient&) final; Can we make them private?

Sihui Liu

Comment 29 2020-11-02 11:39:53 PST

Created attachment 412951 [details] Patch

Sihui Liu

Comment 30 2020-11-02 12:01:43 PST

Created attachment 412957 [details] Patch

Sihui Liu

Comment 31 2020-11-02 12:13:04 PST

Created attachment 412959 [details] Patch

Sihui Liu

Comment 32 2020-11-02 13:22:46 PST

Created attachment 412964 [details] Patch

Sihui Liu

Comment 33 2020-11-02 13:32:37 PST

(In reply to youenn fablet from comment #28) > Comment on attachment 412827 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=412827&action=review > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:39 > > +#include <wtf/WeakPtr.h> > > Probably not needed. Removed. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:69 > > + return Exception { InvalidStateError, "Recognition is being started or already started"_s }; > > Can we have a test for this one? Test is added. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:72 > > + m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives); > > If we do not have a connection, should we throw? > Why not taking a Ref of m_connection instead of a weak ptr? This is not defined in spec. I guess there is no harm to add. Made m_connection RefPtr. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:110 > > + dispatchEvent(Event::create(eventNames().startEvent, Event::CanBubble::No, Event::IsCancelable::No)); > > Should we dispatch the event right away or enqueue a task to dispatch it? Not specified in spec. Maybe we can enqueue a task if we find it's necessary later? > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:154 > > + uint64_t nonFinalResultIndex = allResults.size(); > > auto Okay. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:159 > > + alternatives.uncheckedAppend(SpeechRecognitionAlternative::create(WTFMove(alternativeData.transcript), alternativeData.confidence)); > > Can we do something like: > auto alternatives = WTF::map(resultData.alternatives, [](auto& data) { > return ... }); Changed. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:170 > > + dispatchEvent(SpeechRecognitionEvent::create(eventNames().resultEvent, nonFinalResultIndex, makeRefPtr(resultList.get()))); > > WTFMove(resultList) Okay. > > > Source/WebCore/Modules/speech/SpeechRecognition.h:39 > > +class SpeechRecognition : public SpeechRecognitionClient, public ActiveDOMObject, public RefCounted<SpeechRecognition>, public EventTargetWithInlineData { > > It is strange that SpeechRecognition is its own client? > Should SpeechRecognitionClient be really SpeechRecognitionConnectionClient? Renamed. > > > Source/WebCore/Modules/speech/SpeechRecognitionClient.h:38 > > + SpeechRecognitionClient() > > explicit Added. > > > Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:35 > > +class SpeechRecognitionConnection : public RefCounted<SpeechRecognitionConnection>, public CanMakeWeakPtr<SpeechRecognitionConnection> { > > We could add here include <weakptr.h> > Do we need it though? Maybe we can take a Ref everywhere if we do not have > ref cycles. No cycle so far, so using ref and removing weakptr. > > > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:44 > > + SpeechRecognitionRequest(const SpeechRecognitionRequestInfo&); > > explicit, Could take a && Okay. > > > Source/WebCore/Modules/speech/SpeechRecognitionRequestInfo.h:34 > > + String lang { nullString() }; > > No need for nullString here. Removed. > > > Source/WebCore/page/SpeechRecognitionProvider.h:55 > > + std::unique_ptr<DummySpeechRecognitionConnection> m_dummyConnection; > > Might be better to have an empty provider with m_dummyConnection and real > providers without this dummy connection Created a DummySpeechRecognitionProvider for WebKitLegacy. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:36 > > +SpeechRecognitionServer::SpeechRecognitionServer(WebProcessProxy& process, SpeechRecognitionServerIdentifier identifier) > > We could make SpeechRecognitionServer WebProcessProxy agnostic by passing a > Ref<IPC::Connection>&& here and moving the > addMessageReceiver/removeMessageReceiver to > WebProcessProxy::createSpeechRecognitionServer/removeSpeechRecognitionServer Okay. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:48 > > +void SpeechRecognitionServer::start(const WebCore::SpeechRecognitionRequestInfo& requestInfo) > > Could be &&? Yes. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:68 > > +void SpeechRecognitionServer::stop(const WebCore::SpeechRecognitionRequestInfo& requestInfo) > > Can we pass just an identifier? > It seems only clientIdentifier is used. > Or should we introduce a request identifier? > > Ditto for abort and invalidate Passed ClientIdentiifer instead. A client should only have one outgoing request at a time, so client id seems enough. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.h:42 > > + explicit SpeechRecognitionServer(WebProcessProxy&, SpeechRecognitionServerIdentifier); > > No need for explicit Removed. > > > Source/WebKit/UIProcess/WebProcessProxy.cpp:1714 > > + speechRecognitionServer.iterator->value = makeUnique<SpeechRecognitionServer>(*this, identifier); > > Is there a possibility for trying to create two servers with the same > identifier? > Usually, we do ASSERT(!map.contains(identifier)); map.add(...) Added assert. > > > Source/WebKit/UIProcess/WebProcessProxy.cpp:1719 > > + m_speechRecognitionServerMap.remove(identifier); > > We could an ASSERT(map.contains()) Added assert. > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:47 > > + WebProcess::singleton().addMessageReceiver(Messages::WebSpeechRecognitionConnection::messageReceiverName(), m_identifier, *this); > > In theory, we should probably move addMessageReceiver before sending a > message. Moved. > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:137 > > + return WebProcess::singleton().parentProcessConnection(); > > This strategy might not work if we move server to GPUProcess and we do not > crash WebProcess in case of GPUProcess crash. Maybe we can change this to getGPUProcessConnection() at that time. > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.h:50 > > + void registerClient(WebCore::SpeechRecognitionClient&) final; > > Can we make them private? Okay.

Sihui Liu

Comment 34 2020-11-02 19:35:44 PST

Created attachment 412997 [details] Patch

youenn fablet

Comment 35 2020-11-03 06:08:12 PST

Comment on attachment 412997 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=412997&action=review > Source/WebKit/ChangeLog:55 > +2020-11-02 Sihui Liu <sihui_liu@apple.com> double log > Source/WebCore/Modules/speech/SpeechRecognition.cpp:53 > + if (auto* page = document.page()) { Maybe we could throw if there is no page and make m_connection a Ref<>. Do you know what other browsers are doing? > Source/WebCore/Modules/speech/SpeechRecognition.cpp:69 > + if (!m_connection) If we make it a ref, we could remove that check. > Source/WebCore/Modules/speech/SpeechRecognition.cpp:105 > + dispatchEvent(Event::create(eventNames().startEvent, Event::CanBubble::No, Event::IsCancelable::No)); I think we should enqueue a task. There is always the risk that didStart will be called from receiving an IPC message from UIProcess but document would be in page cache. The alternative would be to neuter the object when document goes into page cache. > Source/WebCore/Modules/speech/SpeechRecognition.h:63 > + // SpeechRecognitionConnectionClient Should be private probably > Source/WebCore/Modules/speech/SpeechRecognitionConnectionClient.h:34 > +struct SpeechRecognitionError; reverse the two. > Source/WebCore/Modules/speech/SpeechRecognitionConnectionClient.h:38 > + explicit SpeechRecognitionConnectionClient() No need for explicit > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:44 > + SpeechRecognitionRequest(SpeechRecognitionRequestInfo&&); explicit > Source/WebCore/Modules/speech/SpeechRecognitionResultData.h:32 > + double confidence; Should we set to 0 by default? > Source/WebCore/Modules/speech/SpeechRecognitionUpdate.cpp:69 > + return update; one liner? > Source/WebCore/Modules/speech/SpeechRecognitionUpdate.cpp:75 > + return update; one liner? > Source/WebCore/page/SpeechRecognitionProvider.h:37 > + virtual ~SpeechRecognitionProvider() { }; = default to keep it consistent > Source/WebCore/page/SpeechRecognitionProvider.h:41 > +class DummySpeechRecognitionProvider final : public WebCore::SpeechRecognitionProvider { Maybe move it to its own file? > Source/WebKit/UIProcess/WebProcessProxy.h:399 > + void destroySpeechRecognitionServer(SpeechRecognitionServerIdentifier); Could be private? > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.h:53 > + void didReceiveUpdate(const WebCore::SpeechRecognitionUpdate&) final; private? > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionProvider.h:36 > + WebSpeechRecognitionProvider(WebPage& page) explicit. Do we need to take a page reference. Maybe we can just keep the page ID? > Tools/DumpRenderTree/win/DumpRenderTree.cpp:924 > + prefsPrivate->setSpeechRecognitionEnabled(TRUE); Do we really want it to be TRUE?

Sihui Liu

Comment 36 2020-11-03 10:38:25 PST

(In reply to youenn fablet from comment #35) > Comment on attachment 412997 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=412997&action=review > > > Source/WebKit/ChangeLog:55 > > +2020-11-02 Sihui Liu <sihui_liu@apple.com> > > double log Oops, will remove. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:53 > > + if (auto* page = document.page()) { > > Maybe we could throw if there is no page and make m_connection a Ref<>. > Do you know what other browsers are doing? Blink does not throw exception and if page is destroyed, it sets connection to null, which is similar to what we do. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:69 > > + if (!m_connection) > > If we make it a ref, we could remove that check. > > > Source/WebCore/Modules/speech/SpeechRecognition.cpp:105 > > + dispatchEvent(Event::create(eventNames().startEvent, Event::CanBubble::No, Event::IsCancelable::No)); > > I think we should enqueue a task. > There is always the risk that didStart will be called from receiving an IPC > message from UIProcess but document would be in page cache. > The alternative would be to neuter the object when document goes into page > cache. Okay, will use queueTaskToDispatchEvent. > > > Source/WebCore/Modules/speech/SpeechRecognition.h:63 > > + // SpeechRecognitionConnectionClient > > Should be private probably Okay. > > > Source/WebCore/Modules/speech/SpeechRecognitionConnectionClient.h:34 > > +struct SpeechRecognitionError; > > reverse the two. Sure. > > > Source/WebCore/Modules/speech/SpeechRecognitionConnectionClient.h:38 > > + explicit SpeechRecognitionConnectionClient() > > No need for explicit Will remove. > > > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:44 > > + SpeechRecognitionRequest(SpeechRecognitionRequestInfo&&); > > explicit Okay. > > > Source/WebCore/Modules/speech/SpeechRecognitionResultData.h:32 > > + double confidence; > > Should we set to 0 by default? Will set. > > > Source/WebCore/Modules/speech/SpeechRecognitionUpdate.cpp:69 > > + return update; > > one liner? Okay. > > > Source/WebCore/Modules/speech/SpeechRecognitionUpdate.cpp:75 > > + return update; > > one liner? Okay. > > > Source/WebCore/page/SpeechRecognitionProvider.h:37 > > + virtual ~SpeechRecognitionProvider() { }; > > = default to keep it consistent Sure. > > > Source/WebCore/page/SpeechRecognitionProvider.h:41 > > +class DummySpeechRecognitionProvider final : public WebCore::SpeechRecognitionProvider { > > Maybe move it to its own file? Will move. > > > Source/WebKit/UIProcess/WebProcessProxy.h:399 > > + void destroySpeechRecognitionServer(SpeechRecognitionServerIdentifier); > > Could be private? Will change. > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.h:53 > > + void didReceiveUpdate(const WebCore::SpeechRecognitionUpdate&) final; > > private? Okay. > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionProvider.h:36 > > + WebSpeechRecognitionProvider(WebPage& page) > > explicit. > Do we need to take a page reference. > Maybe we can just keep the page ID? Sure, will change to take ID. > > > Tools/DumpRenderTree/win/DumpRenderTree.cpp:924 > > + prefsPrivate->setSpeechRecognitionEnabled(TRUE); > > Do we really want it to be TRUE? According to Sam's comment in https://bugs.webkit.org/show_bug.cgi?id=217780#c21, we want to make tests cross-platform if possible. The added test does not test actual speech recognition and should pass on Win.

Sihui Liu

Comment 37 2020-11-03 11:25:46 PST

Created attachment 413080 [details] Patch

Sihui Liu

Comment 38 2020-11-03 11:30:21 PST

Created attachment 413083 [details] Patch

Sihui Liu

Comment 39 2020-11-03 12:20:02 PST

Created attachment 413089 [details] Patch

Sihui Liu

Comment 40 2020-11-03 13:10:59 PST

Created attachment 413099 [details] Patch

EWS

Comment 41 2020-11-03 17:44:07 PST

Committed r269348: <https://trac.webkit.org/changeset/269348> All reviewed patches have been landed. Closing bug and clearing flags on attachment 413099 [details].

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware All

OS All

Product WebKit

Component Accessibility

Assignee

Sihui Liu

Reported

2020-10-26 18:09 PDT

Modified

2020-11-10 13:18 PST History

CC List

14 users Show

URL

Keywords InRadar

Depends on

Blocks