219371 – Implement audio capture for SpeechRecognition on iOS

RESOLVED FIXED219371

Implement audio capture for SpeechRecognition on iOS

https://bugs.webkit.org/show_bug.cgi?id=219371

Summary Implement audio capture for SpeechRecognition on iOS

Sihui Liu

Reported 2020-11-30 14:42:44 PST

...

Attachments
Patch (75.37 KB, patch) 2020-11-30 14:46 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (76.00 KB, patch) 2020-11-30 18:04 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (76.69 KB, patch) 2020-11-30 18:22 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (77.14 KB, patch) 2020-11-30 20:29 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (77.18 KB, patch) 2020-11-30 20:43 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (82.26 KB, patch) 2020-12-01 08:57 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (81.24 KB, patch) 2020-12-01 09:06 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (81.27 KB, patch) 2020-12-01 09:17 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (81.53 KB, patch) 2020-12-01 10:00 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (84.39 KB, patch) 2020-12-01 10:39 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (85.96 KB, patch) 2020-12-01 14:28 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (85.96 KB, patch) 2020-12-01 14:33 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (84.72 KB, patch) 2020-12-01 23:23 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (85.76 KB, patch) 2020-12-01 23:35 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (146.59 KB, patch) 2020-12-03 16:12 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (147.32 KB, patch) 2020-12-03 17:53 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (147.08 KB, patch) 2020-12-03 18:03 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (148.80 KB, patch) 2020-12-03 21:55 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (150.29 KB, patch) 2020-12-04 02:25 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (146.29 KB, patch) 2020-12-04 10:05 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (146.29 KB, patch) 2020-12-04 10:14 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (146.30 KB, patch) 2020-12-04 10:36 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Fix MediaStream tests (123.04 KB, patch) 2020-12-04 12:07 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (82.79 KB, patch) 2020-12-04 17:04 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (82.76 KB, patch) 2020-12-04 20:39 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (84.05 KB, patch) 2020-12-04 21:51 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (82.90 KB, patch) 2020-12-04 22:19 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (89.15 KB, patch) 2020-12-07 14:09 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (89.15 KB, patch) 2020-12-07 14:10 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (88.63 KB, patch) 2020-12-08 17:21 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch for landing (88.59 KB, patch) 2020-12-08 22:47 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch for landing (88.46 KB, patch) 2020-12-08 23:16 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch for landing (88.40 KB, patch) 2020-12-08 23:55 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Show Obsolete (32) View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2020-11-30 14:42:54 PST

<rdar://problem/71824049>

Sihui Liu

Comment 2 2020-11-30 14:46:27 PST Comment hidden (obsolete)

Created attachment 415061 [details] Patch

Sihui Liu

Comment 3 2020-11-30 18:04:15 PST Comment hidden (obsolete)

Created attachment 415086 [details] Patch

Sihui Liu

Comment 4 2020-11-30 18:22:42 PST Comment hidden (obsolete)

Created attachment 415088 [details] Patch

Sihui Liu

Comment 5 2020-11-30 20:29:32 PST Comment hidden (obsolete)

Created attachment 415092 [details] Patch

Sihui Liu

Comment 6 2020-11-30 20:43:42 PST Comment hidden (obsolete)

Created attachment 415093 [details] Patch

Sihui Liu

Comment 7 2020-12-01 08:57:18 PST Comment hidden (obsolete)

Created attachment 415138 [details] Patch

Sihui Liu

Comment 8 2020-12-01 09:06:35 PST Comment hidden (obsolete)

Created attachment 415139 [details] Patch

Sihui Liu

Comment 9 2020-12-01 09:17:05 PST Comment hidden (obsolete)

Created attachment 415141 [details] Patch

Sihui Liu

Comment 10 2020-12-01 10:00:55 PST Comment hidden (obsolete)

Created attachment 415144 [details] Patch

Sihui Liu

Comment 11 2020-12-01 10:39:47 PST Comment hidden (obsolete)

Created attachment 415147 [details] Patch

Sihui Liu

Comment 12 2020-12-01 14:28:38 PST Comment hidden (obsolete)

Created attachment 415164 [details] Patch

Sihui Liu

Comment 13 2020-12-01 14:33:00 PST Comment hidden (obsolete)

Created attachment 415166 [details] Patch

Sihui Liu

Comment 14 2020-12-01 23:23:49 PST Comment hidden (obsolete)

Created attachment 415194 [details] Patch

Sihui Liu

Comment 15 2020-12-01 23:35:01 PST

Created attachment 415196 [details] Patch

Sihui Liu

Comment 16 2020-12-02 09:06:02 PST

Win bot failure should be irrelevant

youenn fablet

Comment 17 2020-12-02 14:06:13 PST

Comment on attachment 415196 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415196&action=review > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:60 > + return RealtimeMediaSourceCenter::singleton().audioCaptureFactory().createAudioCaptureSource(captureDevice, { }, { }); In case we do capture in GPU process, I think it will work out as follows: - GPU process proxy capture audio - Web process receives audio - Web process forwards audio to UIProcess. In practice though, this might not work well due to the sandbox extensions that GPU process proxy will need to have. I think we will need to do direct UIProcess<->UIProcess communication. It seems overall that the current classes are IPC::Connection based so they should be adaptable for that purpose. It is worth keeping that in mind in that patch though. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91 > + if (isMainThread()) { We should not be in main thread there? I guess we can add a FIXME somewhere. If the issue is IPC, we should add a message thread receiver and dispatch IPC messages to a specific dispatch queue. We can leave that to a followup though. By the way, why is it needed for the state update callback to be called synchronously here. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:103 > + if (isMainThread()) { Ditto > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:115 > + m_source = makeUnique<SpeechRecognitionCaptureSource>(*m_clientIdentifier, WTFMove(dataCallback), WTFMove(stateUpdateCallback), result.source()); It seems simpler to just always have a m_realtimeMediaSourceCreateFunction. Or we can simply pass a Ref<RealtimeMediaSource> directly in start method or in SpeechRecognizer constructor. > Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSource.h:39 > +class SpeechRecognitionRemoteRealtimeMediaSource : public WebCore::RealtimeMediaSource { This code is probably very similar to UserMediaCaptureManagerProxy::SourceProxy and its counter part RemoteCaptureSampleManager. Could we try to share code here? Maybe we could have RemoteCaptureSampleManager handle capture failed/stopped to unify both classes. > Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:51 > +class SpeechRecognitionRemoteRealtimeMediaSourceManager : public CanMakeWeakPtr<SpeechRecognitionRemoteRealtimeMediaSourceManager>, public IPC::MessageReceiver, private IPC::MessageSender { Can we try to see whether we can reuse code from RemoteCaptureSampleManager? > Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:54 > + SpeechRecognitionRemoteRealtimeMediaSourceManager(Ref<IPC::Connection>&&); explicit > Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:64 > + void remoteSourceStopped(WebCore::RealtimeMediaSourceIdentifier); These methods can probably be private. > Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:77 > + HashMap<WebCore::RealtimeMediaSourceIdentifier, SpeechRecognitionRemoteRealtimeMediaSource*> m_sources; This seems safe given SpeechRecognitionRemoteRealtimeMediaSource makes sure to unregister itself in destructor. I would still go with a HashMap of WeakPtr though for extra safety. > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:97 > + m_recognizer->setRealtimeMediaSourceCreateFunction([this](auto& device) { It seems this should go in constructor of recogniser. > Source/WebKit/UIProcess/SpeechRecognitionServer.h:61 > + void setRealtimeMediaSourceCreateFunction(WebCore::SpeechRecognizer::RealtimeMediaSourceCreateFunction&& function) { m_realtimeMediaSourceCreateFunction = WTFMove(function); } It seems this could go in constructor since it will not change over time. > Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.cpp:52 > +class SpeechRecognitionRealtimeMediaSourceManager::RealtimeMediaSource It seems we should try to share code with UserMediaCaptureManagerProxy::SourceProxy > Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.messages.in:34 > + RevokeSandboxExtensions() Maybe we can do it another patch, but it would be good to have one place where we handle sandbox extension granting and revoking for both getUserMedia and recogniser.

Sihui Liu

Comment 18 2020-12-02 16:54:20 PST

Comment on attachment 415196 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415196&action=review >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:60 >> + return RealtimeMediaSourceCenter::singleton().audioCaptureFactory().createAudioCaptureSource(captureDevice, { }, { }); > > In case we do capture in GPU process, I think it will work out as follows: > - GPU process proxy capture audio > - Web process receives audio > - Web process forwards audio to UIProcess. > In practice though, this might not work well due to the sandbox extensions that GPU process proxy will need to have. > I think we will need to do direct UIProcess<->UIProcess communication. > It seems overall that the current classes are IPC::Connection based so they should be adaptable for that purpose. > It is worth keeping that in mind in that patch though. Yes, I think direct UI - GPU process connection is the way to go. We probably just need to set the correct connection in SpeechRecognitionRemoteRealtimeMediaSourceManager. Current implementation will return error if capture in GPU process is enabled on iOS. >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91 >> + if (isMainThread()) { > > We should not be in main thread there? > I guess we can add a FIXME somewhere. If the issue is IPC, we should add a message thread receiver and dispatch IPC messages to a specific dispatch queue. > We can leave that to a followup though. > > By the way, why is it needed for the state update callback to be called synchronously here. Yes, this is because we handle the remoteAudioSamplesAvailable on main thread. It does not need to be called synchronously I guess (did not break existing tests); just seems unnecessary to schedule a future task if we already on the main thread. >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:103 >> + if (isMainThread()) { > > Ditto Ditto >> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:115 >> + m_source = makeUnique<SpeechRecognitionCaptureSource>(*m_clientIdentifier, WTFMove(dataCallback), WTFMove(stateUpdateCallback), result.source()); > > It seems simpler to just always have a m_realtimeMediaSourceCreateFunction. > Or we can simply pass a Ref<RealtimeMediaSource> directly in start method or in SpeechRecognizer constructor. Okay, will change. >> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSource.h:39 >> +class SpeechRecognitionRemoteRealtimeMediaSource : public WebCore::RealtimeMediaSource { > > This code is probably very similar to UserMediaCaptureManagerProxy::SourceProxy and its counter part RemoteCaptureSampleManager. > Could we try to share code here? > Maybe we could have RemoteCaptureSampleManager handle capture failed/stopped to unify both classes. Okay, will try. Seems we can move the code from different processes to Shared/. >> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:51 >> +class SpeechRecognitionRemoteRealtimeMediaSourceManager : public CanMakeWeakPtr<SpeechRecognitionRemoteRealtimeMediaSourceManager>, public IPC::MessageReceiver, private IPC::MessageSender { > > Can we try to see whether we can reuse code from RemoteCaptureSampleManager? Okay. >> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:64 >> + void remoteSourceStopped(WebCore::RealtimeMediaSourceIdentifier); > > These methods can probably be private. Sure. >> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:77 >> + HashMap<WebCore::RealtimeMediaSourceIdentifier, SpeechRecognitionRemoteRealtimeMediaSource*> m_sources; > > This seems safe given SpeechRecognitionRemoteRealtimeMediaSource makes sure to unregister itself in destructor. > I would still go with a HashMap of WeakPtr though for extra safety. Will change to use WeakPtr. >> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:97 >> + m_recognizer->setRealtimeMediaSourceCreateFunction([this](auto& device) { > > It seems this should go in constructor of recogniser. Sure, made it separate for the macro. >> Source/WebKit/UIProcess/SpeechRecognitionServer.h:61 >> + void setRealtimeMediaSourceCreateFunction(WebCore::SpeechRecognizer::RealtimeMediaSourceCreateFunction&& function) { m_realtimeMediaSourceCreateFunction = WTFMove(function); } > > It seems this could go in constructor since it will not change over time. Okay. >> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.cpp:52 >> +class SpeechRecognitionRealtimeMediaSourceManager::RealtimeMediaSource > > It seems we should try to share code with UserMediaCaptureManagerProxy::SourceProxy Will try. >> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.messages.in:34 >> + RevokeSandboxExtensions() > > Maybe we can do it another patch, but it would be good to have one place where we handle sandbox extension granting and revoking for both getUserMedia and recogniser. Yes, getUserMedia currently has two places (UserMediaProcessManager and UserMediaPermissionRequestManagerProxy) to grant sandbox extensions and may also be simplified by then.

Sihui Liu

Comment 19 2020-12-03 16:12:55 PST Comment hidden (obsolete)

Created attachment 415364 [details] Patch

Sihui Liu

Comment 20 2020-12-03 17:53:14 PST Comment hidden (obsolete)

Created attachment 415371 [details] Patch

Sihui Liu

Comment 21 2020-12-03 18:03:18 PST Comment hidden (obsolete)

Created attachment 415372 [details] Patch

Sihui Liu

Comment 22 2020-12-03 21:55:49 PST Comment hidden (obsolete)

Created attachment 415386 [details] Patch

Sihui Liu

Comment 23 2020-12-04 02:25:39 PST

Created attachment 415397 [details] Patch

youenn fablet

Comment 24 2020-12-04 07:18:02 PST

Comment on attachment 415397 [details] Patch Some getDisplayMedia tests seem to be failing. getDisplayMedia capture in MacOS happens in UIProcess, WebProcess receiving the video samples over IPC. View in context: https://bugs.webkit.org/attachment.cgi?id=415397&action=review > Source/WebKit/ChangeLog:149 > +2020-12-03 Sihui Liu <sihui_liu@apple.com> double entry. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91 > + if (isMainThread()) { Since they are not needed, let's just not add these switches. Ditto below. > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:106 > + if (m_realtimeMediaSourceCreateFunction) { I would make this callback mandatory so that we simplify this code path to only handle error cases or have a source. In the future, we will always capture in GPU process anyway. > Source/WebKit/UIProcess/SpeechRecognitionServer.h:61 > + using RealtimeMediaSourceCreateFunction = Function<RefPtr<WebCore::RealtimeMediaSource>(const WebCore::CaptureDevice&)>; We could probably use something like SpeechRecognitionErrorOr<Ref<WebCore::RealtimeMediaSource>> > Source/WebKit/WebProcess/cocoa/UserMediaCaptureManager.cpp:48 > + : RemoteRealtimeMediaSourceManager(nullptr, false) These values nullptr and false are not very clear.

Sihui Liu

Comment 25 2020-12-04 10:05:29 PST Comment hidden (obsolete)

Created attachment 415424 [details] Patch

Sihui Liu

Comment 26 2020-12-04 10:14:03 PST

Created attachment 415425 [details] Patch

Sihui Liu

Comment 27 2020-12-04 10:20:53 PST

(In reply to youenn fablet from comment #24) > Comment on attachment 415397 [details] > Patch > > Some getDisplayMedia tests seem to be failing. > getDisplayMedia capture in MacOS happens in UIProcess, WebProcess receiving > the video samples over IPC. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=415397&action=review > > > Source/WebKit/ChangeLog:149 > > +2020-12-03 Sihui Liu <sihui_liu@apple.com> > > double entry. Removed! > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91 > > + if (isMainThread()) { > > Since they are not needed, let's just not add these switches. Ditto below. Sure. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:106 > > + if (m_realtimeMediaSourceCreateFunction) { > > I would make this callback mandatory so that we simplify this code path to > only handle error cases or have a source. > In the future, we will always capture in GPU process anyway. Made m_realtimeMediaSourceCreateFunction mandatory. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.h:61 > > + using RealtimeMediaSourceCreateFunction = Function<RefPtr<WebCore::RealtimeMediaSource>(const WebCore::CaptureDevice&)>; > > We could probably use something like > SpeechRecognitionErrorOr<Ref<WebCore::RealtimeMediaSource>> I used WebCore::CaptureSourceOrError. > > > Source/WebKit/WebProcess/cocoa/UserMediaCaptureManager.cpp:48 > > + : RemoteRealtimeMediaSourceManager(nullptr, false) > > These values nullptr and false are not very clear. I agree, I've changed bool to enum class ShouldHandleSandboxExtensions. I also removed RemoteRealtimeMediaSourceManager from UserMediaCaptureManager, as this patch is getting bigger and harder to debug with specific code change for UserMedia. Will do replace/change existing classes UserMediaCaptureManager and UserMediaCaptureManagerProxy with shared RemoteRemoteRealtimeMediaSourceManager and RemoteRealtimeMediaSourceManager in a separate patch. What do you think?

Sihui Liu

Comment 28 2020-12-04 10:36:08 PST

Created attachment 415428 [details] Patch

Sihui Liu

Comment 29 2020-12-04 12:07:06 PST

Created attachment 415441 [details] Fix MediaStream tests

youenn fablet

Comment 30 2020-12-04 13:40:34 PST

Comment on attachment 415441 [details] Fix MediaStream tests View in context: https://bugs.webkit.org/attachment.cgi?id=415441&action=review > Source/WTF/ChangeLog:9 > + Turn on WebSpeech on iOS simulator by default for testing. Not very clear why we need this. We do not need them for WTR and for regular testing, we have the UI for that, or we can compile switched on locally. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:65 > + m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(clientIdentifier, WTFMove(dataCallback), WTFMove(stateUpdateCallback), WTFMove(source)); Could be written as: : m_impl(makeUnique<>(...)) > Source/WebKit/ChangeLog:17 > + Original WebKit::RemoteRealtimeMediaSource is renamed to WebKit::WebRemoteRealtimeMediaSource. We should What is missing from replacing WebRemoteRealtimeMediaSource by the new RemoteRealtimeMediaSource? Maybe we should first do this refactoring as preliminary patches? That would make sure we have code that is clear at any point in time. Also, it is difficult to see how the new code is close to replacing the old one. If that is too much work or too difficult, maybe we should revert to having specific speech classes for now. If we want to continue in that direction, ObsoleteRemoteRealtimeMediaSource might be a better name since that is the intent I think. > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:53 > + : private WebCore::RealtimeMediaSource::Observer There are probably a lot of WebCore:: We could be doing using namespace WebCore. > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:184 > + int64_t m_numberOfFrames { 0 }; Should be uint64_t probably > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:237 > + case WebCore::CaptureDevice::DeviceType::Microphone: This code is mostly dealing with audio (not processing video samples or camera sandbox extensions). Let's ASSERT_NOT_REACHED for the other cases. We could also remove orientation code as well for now. > Source/WebKit/Shared/media/RealtimeMediaSourceManager.h:49 > + RealtimeMediaSourceManager(IPC::Connection&); explicit, could take a Ref<IPC::Connection>&&. > Source/WebKit/Shared/media/RealtimeMediaSourceManager.h:78 > + HashMap<WebCore::RealtimeMediaSourceIdentifier, std::unique_ptr<RealtimeMediaSource>> m_sources; RealtimeMediaSource is not a great name since it makes think this is some kind of WebCore::RealtimeMediaSource which are ref counted. Let's use a shorter name like Source. > Source/WebKit/UIProcess/SpeechRecognitionServer.h:62 > + void setRealtimeMediaSourceCreateFunction(RealtimeMediaSourceCreateFunction&& function) { m_realtimeMediaSourceCreateFunction = WTFMove(function); } Could be in SpeechRecognitionServer constructor. > Source/WebKit/UIProcess/WebProcessProxy.cpp:1732 > + speechRecognitionServer->setRealtimeMediaSourceCreateFunction([weakThis = makeWeakPtr(this), weakPage = makeWeakPtr(targetPage)]() { If we move it to constructor, we could also make the lambda a method of WebPageProxy to keep the lambda small enough. > Source/WebKit/UIProcess/WebProcessProxy.cpp:1736 > + if (!weakPage || weakPage->preferences().captureAudioInGPUProcessEnabled()) I would move the weakPage check with weakThis check. And add a "not implemented for GPU process" specific message. Also, maybe only the weakPage is actually necessary since the page proxy keeps a ref to its process proxy. > Source/WebKit/UIProcess/WebProcessProxy.cpp:1744 > + auto& sourceManager = weakThis->ensureRemoteRealtimeMediaSourceManager(); sourceManager is only used once > Source/WebKit/UIProcess/WebProcessProxy.cpp:1768 > + addMessageReceiver(Messages::RemoteRealtimeMediaSourceManager::messageReceiverName(), *m_remoteRealtimeMediaSourceManager); It is probably safer to remove the message receiver at destruction time.

Sihui Liu

Comment 31 2020-12-04 14:16:48 PST

(In reply to youenn fablet from comment #30) > Comment on attachment 415441 [details] > Fix MediaStream tests > > View in context: > https://bugs.webkit.org/attachment.cgi?id=415441&action=review > > > Source/WTF/ChangeLog:9 > > + Turn on WebSpeech on iOS simulator by default for testing. > > Not very clear why we need this. > We do not need them for WTR and for regular testing, we have the UI for > that, or we can compile switched on locally. Okay, will revert. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:65 > > + m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(clientIdentifier, WTFMove(dataCallback), WTFMove(stateUpdateCallback), WTFMove(source)); > > Could be written as: > : m_impl(makeUnique<>(...)) > > > Source/WebKit/ChangeLog:17 > > + Original WebKit::RemoteRealtimeMediaSource is renamed to WebKit::WebRemoteRealtimeMediaSource. We should > > What is missing from replacing WebRemoteRealtimeMediaSource by the new > RemoteRealtimeMediaSource? Replacing WebRemoteRealtimeMediaSource requires change in UserMediaCaptureManager and RemoteRealtimeMediaSourceManager; and possibly some behavior changes that need to verify by testing. Then I found the refactoring work overweighs what I want to implement in this patch. > Maybe we should first do this refactoring as preliminary patches? That would > make sure we have code that is clear at any point in time. > Also, it is difficult to see how the new code is close to replacing the old > one. I was thinking about replacing UserMediaCaptureManagerProxy and UserMediaCaptureManager functionalities with RealtimeMediaSourceManager and RemoteRealtimeMediaSourceManager, not only the RemoteRealtimeMediaSource. > If that is too much work or too difficult, maybe we should revert to having > specific speech classes for now. Will do! > > If we want to continue in that direction, ObsoleteRemoteRealtimeMediaSource > might be a better name since that is the intent I think. > > > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:53 > > + : private WebCore::RealtimeMediaSource::Observer > > There are probably a lot of WebCore:: > We could be doing using namespace WebCore. Okay. > > > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:184 > > + int64_t m_numberOfFrames { 0 }; > > Should be uint64_t probably Will change. > > > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:237 > > + case WebCore::CaptureDevice::DeviceType::Microphone: > > This code is mostly dealing with audio (not processing video samples or > camera sandbox extensions). > Let's ASSERT_NOT_REACHED for the other cases. > We could also remove orientation code as well for now. Will revert to audio-only and speechrecognition-only version. > > > Source/WebKit/Shared/media/RealtimeMediaSourceManager.h:49 > > + RealtimeMediaSourceManager(IPC::Connection&); > > explicit, could take a Ref<IPC::Connection>&&. Sure. > > > Source/WebKit/Shared/media/RealtimeMediaSourceManager.h:78 > > + HashMap<WebCore::RealtimeMediaSourceIdentifier, std::unique_ptr<RealtimeMediaSource>> m_sources; > > RealtimeMediaSource is not a great name since it makes think this is some > kind of WebCore::RealtimeMediaSource which are ref counted. > Let's use a shorter name like Source. Okay. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.h:62 > > + void setRealtimeMediaSourceCreateFunction(RealtimeMediaSourceCreateFunction&& function) { m_realtimeMediaSourceCreateFunction = WTFMove(function); } > > Could be in SpeechRecognitionServer constructor. Will move. > > > Source/WebKit/UIProcess/WebProcessProxy.cpp:1732 > > + speechRecognitionServer->setRealtimeMediaSourceCreateFunction([weakThis = makeWeakPtr(this), weakPage = makeWeakPtr(targetPage)]() { > > If we move it to constructor, we could also make the lambda a method of > WebPageProxy to keep the lambda small enough. I guess you mean WebProcessProxy, sure! > > > Source/WebKit/UIProcess/WebProcessProxy.cpp:1736 > > + if (!weakPage || weakPage->preferences().captureAudioInGPUProcessEnabled()) > > I would move the weakPage check with weakThis check. > And add a "not implemented for GPU process" specific message. > Also, maybe only the weakPage is actually necessary since the page proxy > keeps a ref to its process proxy. Will change. > > > Source/WebKit/UIProcess/WebProcessProxy.cpp:1744 > > + auto& sourceManager = weakThis->ensureRemoteRealtimeMediaSourceManager(); > > sourceManager is only used once Will remove. > > > Source/WebKit/UIProcess/WebProcessProxy.cpp:1768 > > + addMessageReceiver(Messages::RemoteRealtimeMediaSourceManager::messageReceiverName(), *m_remoteRealtimeMediaSourceManager); > > It is probably safer to remove the message receiver at destruction time. Sure.

Sihui Liu

Comment 32 2020-12-04 17:04:27 PST Comment hidden (obsolete)

Created attachment 415477 [details] Patch

Sihui Liu

Comment 33 2020-12-04 20:39:36 PST

Created attachment 415483 [details] Patch

Sihui Liu

Comment 34 2020-12-04 21:51:20 PST Comment hidden (obsolete)

Created attachment 415485 [details] Patch

Sihui Liu

Comment 35 2020-12-04 22:19:22 PST Comment hidden (obsolete)

Created attachment 415486 [details] Patch

youenn fablet

Comment 36 2020-12-07 06:16:57 PST

Comment on attachment 415486 [details] Patch Some nits below, some tests would be good. View in context: https://bugs.webkit.org/attachment.cgi?id=415486&action=review > Source/WebCore/ChangeLog:9 > + Add a way to set RealtimeMediaSource member of SpeechRecognitionCaptureSourceImpl. We should be able to add some tests with mock audio source. Can we add/enable some tests? > Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.cpp:57 > + send(Messages::SpeechRecognitionRealtimeMediaSourceManager::AddSource(identifier, captureDevice)); I would go with createSource/deleteSource instead of addSource/removeSource (for both methods and IPC messages). > Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:54 > + SpeechRecognitionRemoteRealtimeMediaSourceManager(Ref<IPC::Connection>&&); explicit > Source/WebKit/UIProcess/WebProcessProxy.cpp:1754 > + return createRealtimeMediaSourceForSpeechRecognition(); If we make ensureSpeechRecognitionRemoteRealtimeMediaSourceManager public, then we could move createRealtimeMediaSourceForSpeechRecognition in WebPageProxy and write the lambda as: if (!weakPage) return error; return weakPage? weakPage->createRealtimeMediaSourceForSpeechRecognition() : CaptureSourceOrError { "Page is invalid" }; That will allow removing capturing 'this' in the lambda. > Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:41 > +class SpeechRecognitionRealtimeMediaSourceManager : public CanMakeWeakPtr<SpeechRecognitionRealtimeMediaSourceManager>, public IPC::MessageReceiver, private IPC::MessageSender { final? > Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:44 > + SpeechRecognitionRealtimeMediaSourceManager(Ref<IPC::Connection>&&); explicit > Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:59 > + void didReceiveMessage(IPC::Connection&, IPC::Decoder&) override; final

Sihui Liu

Comment 37 2020-12-07 11:43:18 PST

Comment on attachment 415486 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415486&action=review >> Source/WebCore/ChangeLog:9 >> + Add a way to set RealtimeMediaSource member of SpeechRecognitionCaptureSourceImpl. > > We should be able to add some tests with mock audio source. > Can we add/enable some tests? Hmm it seems using remote source or local source have no difference in generating events for SpeechRecognition, so this change does not affect the behavior of existing tests. What tests do you think should be added for this? >> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.cpp:57 >> + send(Messages::SpeechRecognitionRealtimeMediaSourceManager::AddSource(identifier, captureDevice)); > > I would go with createSource/deleteSource instead of addSource/removeSource (for both methods and IPC messages). Will change. >> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:54 >> + SpeechRecognitionRemoteRealtimeMediaSourceManager(Ref<IPC::Connection>&&); > > explicit Okay. >> Source/WebKit/UIProcess/WebProcessProxy.cpp:1754 >> + return createRealtimeMediaSourceForSpeechRecognition(); > > If we make ensureSpeechRecognitionRemoteRealtimeMediaSourceManager public, then we could move createRealtimeMediaSourceForSpeechRecognition in WebPageProxy and write the lambda as: > if (!weakPage) > return error; > return weakPage? weakPage->createRealtimeMediaSourceForSpeechRecognition() : CaptureSourceOrError { "Page is invalid" }; > > That will allow removing capturing 'this' in the lambda. Sure, will move it to WebPageProxy. >> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:41 >> +class SpeechRecognitionRealtimeMediaSourceManager : public CanMakeWeakPtr<SpeechRecognitionRealtimeMediaSourceManager>, public IPC::MessageReceiver, private IPC::MessageSender { > > final? Okay. >> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:44 >> + SpeechRecognitionRealtimeMediaSourceManager(Ref<IPC::Connection>&&); > > explicit Will add. >> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:59 >> + void didReceiveMessage(IPC::Connection&, IPC::Decoder&) override; > > final Will change.

youenn fablet

Comment 38 2020-12-07 11:47:33 PST

(In reply to Sihui Liu from comment #37) > Comment on attachment 415486 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=415486&action=review > > >> Source/WebCore/ChangeLog:9 > >> + Add a way to set RealtimeMediaSource member of SpeechRecognitionCaptureSourceImpl. > > > > We should be able to add some tests with mock audio source. > > Can we add/enable some tests? The audio session category is changed to playrecord. After this patch, this will be done in WebProcess, so a layout test could check this actually happens from internals API.

Sihui Liu

Comment 39 2020-12-07 11:49:22 PST

(In reply to youenn fablet from comment #38) > (In reply to Sihui Liu from comment #37) > > Comment on attachment 415486 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=415486&action=review > > > > >> Source/WebCore/ChangeLog:9 > > >> + Add a way to set RealtimeMediaSource member of SpeechRecognitionCaptureSourceImpl. > > > > > > We should be able to add some tests with mock audio source. > > > Can we add/enable some tests? > > The audio session category is changed to playrecord. > After this patch, this will be done in WebProcess, so a layout test could > check this actually happens from internals API. I see, will add.

Sihui Liu

Comment 40 2020-12-07 14:09:25 PST

Created attachment 415582 [details] Patch

Sihui Liu

Comment 41 2020-12-07 14:10:15 PST

Created attachment 415583 [details] Patch

youenn fablet

Comment 42 2020-12-08 01:02:04 PST

Comment on attachment 415583 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415583&action=review > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:119 > m_recognizer->start(clientIdentifier); This is a bit weird to have the return here and some code below. I think we should rewrite it a bit: - Remove SpeechRecognizer::start(SpeechRecognitionConnectionClientIdentifier identifier) - Add a #else here that, in case ENABLE(MEDIA_STREAM) is not defined will send back a SpeechRecognitionErrorType::AudioCapture error. We should probably at some point, either change compilation flags to enable RealtimeMediaSource if ENABLE(MEDIA_STREAM) && ENABLE(SPEECH_API), or just disable speech api if MEDIA_STREAM is not on. That part should be left to a follow-up patch. > Source/WebKit/UIProcess/WebProcessProxy.cpp:1718 > return; We should probably update m_speechRecognitionServerMap after getting the targetPage. > LayoutTests/fast/speechrecognition/ios/audio-capture.html:20 > + shouldBeEqualToString("internals.audioSessionCategory()", "PlayAndRecord"); Can we add a negative check before, for instance in start event handler?

youenn fablet

Comment 43 2020-12-08 01:02:42 PST

Please check api-mac before landing.

Sihui Liu

Comment 44 2020-12-08 17:21:38 PST

Created attachment 415693 [details] Patch

Sihui Liu

Comment 45 2020-12-08 22:47:58 PST

Created attachment 415719 [details] Patch for landing

Sihui Liu

Comment 46 2020-12-08 23:16:51 PST

Created attachment 415722 [details] Patch for landing

EWS

Comment 47 2020-12-08 23:51:29 PST

commit-queue failed to commit attachment 415722 [details] to WebKit repository.

Sihui Liu

Comment 48 2020-12-08 23:55:59 PST

Created attachment 415723 [details] Patch for landing

EWS

Comment 49 2020-12-09 00:28:44 PST

Committed r270574: <https://trac.webkit.org/changeset/270574> All reviewed patches have been landed. Closing bug and clearing flags on attachment 415723 [details].

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Accessibility

Assignee

Sihui Liu

Reported

2020-11-30 14:42 PST

Modified

2020-12-10 09:47 PST History

CC List

12 users Show

URL

Keywords InRadar

Depends on

Blocks