Bug 219371

Summary: Implement audio capture for SpeechRecognition on iOS
Product: WebKit Reporter: Sihui Liu <sihui_liu>
Component: AccessibilityAssignee: Sihui Liu <sihui_liu>
Status: RESOLVED FIXED    
Severity: Normal CC: annulen, eric.carlson, ews-watchlist, glenn, gyuyoung.kim, jer.noble, mkwst, philipj, ryuan.choi, sergio, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=219745
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Fix MediaStream tests
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing
none
Patch for landing
none
Patch for landing none

Description Sihui Liu 2020-11-30 14:42:44 PST
...
Comment 1 Radar WebKit Bug Importer 2020-11-30 14:42:54 PST
<rdar://problem/71824049>
Comment 2 Sihui Liu 2020-11-30 14:46:27 PST Comment hidden (obsolete)
Comment 3 Sihui Liu 2020-11-30 18:04:15 PST Comment hidden (obsolete)
Comment 4 Sihui Liu 2020-11-30 18:22:42 PST Comment hidden (obsolete)
Comment 5 Sihui Liu 2020-11-30 20:29:32 PST Comment hidden (obsolete)
Comment 6 Sihui Liu 2020-11-30 20:43:42 PST Comment hidden (obsolete)
Comment 7 Sihui Liu 2020-12-01 08:57:18 PST Comment hidden (obsolete)
Comment 8 Sihui Liu 2020-12-01 09:06:35 PST Comment hidden (obsolete)
Comment 9 Sihui Liu 2020-12-01 09:17:05 PST Comment hidden (obsolete)
Comment 10 Sihui Liu 2020-12-01 10:00:55 PST Comment hidden (obsolete)
Comment 11 Sihui Liu 2020-12-01 10:39:47 PST Comment hidden (obsolete)
Comment 12 Sihui Liu 2020-12-01 14:28:38 PST Comment hidden (obsolete)
Comment 13 Sihui Liu 2020-12-01 14:33:00 PST Comment hidden (obsolete)
Comment 14 Sihui Liu 2020-12-01 23:23:49 PST Comment hidden (obsolete)
Comment 15 Sihui Liu 2020-12-01 23:35:01 PST
Created attachment 415196 [details]
Patch
Comment 16 Sihui Liu 2020-12-02 09:06:02 PST
Win bot failure should be irrelevant
Comment 17 youenn fablet 2020-12-02 14:06:13 PST
Comment on attachment 415196 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415196&action=review

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:60
> +    return RealtimeMediaSourceCenter::singleton().audioCaptureFactory().createAudioCaptureSource(captureDevice, { }, { });

In case we do capture in GPU process, I think it will work out as follows:
- GPU process proxy capture audio
- Web process receives audio
- Web process forwards audio to UIProcess.
In practice though, this might not work well due to the sandbox extensions that GPU process proxy will need to have.
I think we will need to do direct UIProcess<->UIProcess communication.
It seems overall that the current classes are IPC::Connection based so they should be adaptable for that purpose.
It is worth keeping that in mind in that patch though.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91
> +            if (isMainThread()) {

We should not be in main thread there?
I guess we can add a FIXME somewhere. If the issue is IPC, we should add a message thread receiver and dispatch IPC messages to a specific dispatch queue.
We can leave that to a followup though.

By the way, why is it needed for the state update callback to be called synchronously here.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:103
> +            if (isMainThread()) {

Ditto

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:115
> +    m_source = makeUnique<SpeechRecognitionCaptureSource>(*m_clientIdentifier, WTFMove(dataCallback), WTFMove(stateUpdateCallback), result.source());

It seems simpler to just always have a m_realtimeMediaSourceCreateFunction.
Or we can simply pass a Ref<RealtimeMediaSource> directly in start method or in SpeechRecognizer constructor.

> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSource.h:39
> +class SpeechRecognitionRemoteRealtimeMediaSource : public WebCore::RealtimeMediaSource {

This code is probably very similar to UserMediaCaptureManagerProxy::SourceProxy and its counter part RemoteCaptureSampleManager.
Could we try to share code here?
Maybe we could have RemoteCaptureSampleManager handle capture failed/stopped to unify both classes.

> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:51
> +class SpeechRecognitionRemoteRealtimeMediaSourceManager : public CanMakeWeakPtr<SpeechRecognitionRemoteRealtimeMediaSourceManager>, public IPC::MessageReceiver, private IPC::MessageSender {

Can we try to see whether we can reuse code from RemoteCaptureSampleManager?

> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:54
> +    SpeechRecognitionRemoteRealtimeMediaSourceManager(Ref<IPC::Connection>&&);

explicit

> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:64
> +    void remoteSourceStopped(WebCore::RealtimeMediaSourceIdentifier);

These methods can probably be private.

> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:77
> +    HashMap<WebCore::RealtimeMediaSourceIdentifier, SpeechRecognitionRemoteRealtimeMediaSource*> m_sources;

This seems safe given SpeechRecognitionRemoteRealtimeMediaSource makes sure to unregister itself in destructor.
I would still go with a HashMap of WeakPtr though for extra safety.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:97
> +            m_recognizer->setRealtimeMediaSourceCreateFunction([this](auto& device) {

It seems this should go in constructor of recogniser.

> Source/WebKit/UIProcess/SpeechRecognitionServer.h:61
> +    void setRealtimeMediaSourceCreateFunction(WebCore::SpeechRecognizer::RealtimeMediaSourceCreateFunction&& function) { m_realtimeMediaSourceCreateFunction = WTFMove(function); }

It seems this could go in constructor since it will not change over time.

> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.cpp:52
> +class SpeechRecognitionRealtimeMediaSourceManager::RealtimeMediaSource

It seems we should try to share code with UserMediaCaptureManagerProxy::SourceProxy

> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.messages.in:34
> +    RevokeSandboxExtensions()

Maybe we can do it another patch, but it would be good to have one place where we handle sandbox extension granting and revoking for both getUserMedia and recogniser.
Comment 18 Sihui Liu 2020-12-02 16:54:20 PST
Comment on attachment 415196 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415196&action=review

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:60
>> +    return RealtimeMediaSourceCenter::singleton().audioCaptureFactory().createAudioCaptureSource(captureDevice, { }, { });
> 
> In case we do capture in GPU process, I think it will work out as follows:
> - GPU process proxy capture audio
> - Web process receives audio
> - Web process forwards audio to UIProcess.
> In practice though, this might not work well due to the sandbox extensions that GPU process proxy will need to have.
> I think we will need to do direct UIProcess<->UIProcess communication.
> It seems overall that the current classes are IPC::Connection based so they should be adaptable for that purpose.
> It is worth keeping that in mind in that patch though.

Yes, I think direct UI - GPU process connection is the way to go. We probably just need to set the correct connection in SpeechRecognitionRemoteRealtimeMediaSourceManager.  
Current implementation will return error if capture in GPU process is enabled on iOS.

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91
>> +            if (isMainThread()) {
> 
> We should not be in main thread there?
> I guess we can add a FIXME somewhere. If the issue is IPC, we should add a message thread receiver and dispatch IPC messages to a specific dispatch queue.
> We can leave that to a followup though.
> 
> By the way, why is it needed for the state update callback to be called synchronously here.

Yes, this is because we handle the remoteAudioSamplesAvailable on main thread.

It does not need to be called synchronously I guess (did not break existing tests); just seems unnecessary to schedule a future task if we already on the main thread.

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:103
>> +            if (isMainThread()) {
> 
> Ditto

Ditto

>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:115
>> +    m_source = makeUnique<SpeechRecognitionCaptureSource>(*m_clientIdentifier, WTFMove(dataCallback), WTFMove(stateUpdateCallback), result.source());
> 
> It seems simpler to just always have a m_realtimeMediaSourceCreateFunction.
> Or we can simply pass a Ref<RealtimeMediaSource> directly in start method or in SpeechRecognizer constructor.

Okay, will change.

>> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSource.h:39
>> +class SpeechRecognitionRemoteRealtimeMediaSource : public WebCore::RealtimeMediaSource {
> 
> This code is probably very similar to UserMediaCaptureManagerProxy::SourceProxy and its counter part RemoteCaptureSampleManager.
> Could we try to share code here?
> Maybe we could have RemoteCaptureSampleManager handle capture failed/stopped to unify both classes.

Okay, will try. Seems we can move the code from different processes to Shared/.

>> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:51
>> +class SpeechRecognitionRemoteRealtimeMediaSourceManager : public CanMakeWeakPtr<SpeechRecognitionRemoteRealtimeMediaSourceManager>, public IPC::MessageReceiver, private IPC::MessageSender {
> 
> Can we try to see whether we can reuse code from RemoteCaptureSampleManager?

Okay.

>> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:64
>> +    void remoteSourceStopped(WebCore::RealtimeMediaSourceIdentifier);
> 
> These methods can probably be private.

Sure.

>> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:77
>> +    HashMap<WebCore::RealtimeMediaSourceIdentifier, SpeechRecognitionRemoteRealtimeMediaSource*> m_sources;
> 
> This seems safe given SpeechRecognitionRemoteRealtimeMediaSource makes sure to unregister itself in destructor.
> I would still go with a HashMap of WeakPtr though for extra safety.

Will change to use WeakPtr.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:97
>> +            m_recognizer->setRealtimeMediaSourceCreateFunction([this](auto& device) {
> 
> It seems this should go in constructor of recogniser.

Sure, made it separate for the macro.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.h:61
>> +    void setRealtimeMediaSourceCreateFunction(WebCore::SpeechRecognizer::RealtimeMediaSourceCreateFunction&& function) { m_realtimeMediaSourceCreateFunction = WTFMove(function); }
> 
> It seems this could go in constructor since it will not change over time.

Okay.

>> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.cpp:52
>> +class SpeechRecognitionRealtimeMediaSourceManager::RealtimeMediaSource
> 
> It seems we should try to share code with UserMediaCaptureManagerProxy::SourceProxy

Will try.

>> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.messages.in:34
>> +    RevokeSandboxExtensions()
> 
> Maybe we can do it another patch, but it would be good to have one place where we handle sandbox extension granting and revoking for both getUserMedia and recogniser.

Yes, getUserMedia currently has two places (UserMediaProcessManager and UserMediaPermissionRequestManagerProxy) to grant sandbox extensions and may also be simplified by then.
Comment 19 Sihui Liu 2020-12-03 16:12:55 PST Comment hidden (obsolete)
Comment 20 Sihui Liu 2020-12-03 17:53:14 PST Comment hidden (obsolete)
Comment 21 Sihui Liu 2020-12-03 18:03:18 PST Comment hidden (obsolete)
Comment 22 Sihui Liu 2020-12-03 21:55:49 PST Comment hidden (obsolete)
Comment 23 Sihui Liu 2020-12-04 02:25:39 PST
Created attachment 415397 [details]
Patch
Comment 24 youenn fablet 2020-12-04 07:18:02 PST
Comment on attachment 415397 [details]
Patch

Some getDisplayMedia tests seem to be failing.
getDisplayMedia capture in MacOS happens in UIProcess, WebProcess receiving the video samples over IPC.

View in context: https://bugs.webkit.org/attachment.cgi?id=415397&action=review

> Source/WebKit/ChangeLog:149
> +2020-12-03  Sihui Liu  <sihui_liu@apple.com>

double entry.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91
> +            if (isMainThread()) {

Since they are not needed, let's just not add these switches. Ditto below.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:106
> +    if (m_realtimeMediaSourceCreateFunction) {

I would make this callback mandatory so that we simplify this code path to only handle error cases or have a source.
In the future, we will always capture in GPU process anyway.

> Source/WebKit/UIProcess/SpeechRecognitionServer.h:61
> +    using RealtimeMediaSourceCreateFunction = Function<RefPtr<WebCore::RealtimeMediaSource>(const WebCore::CaptureDevice&)>;

We could probably use something like SpeechRecognitionErrorOr<Ref<WebCore::RealtimeMediaSource>>

> Source/WebKit/WebProcess/cocoa/UserMediaCaptureManager.cpp:48
> +    : RemoteRealtimeMediaSourceManager(nullptr, false)

These values nullptr and false are not very clear.
Comment 25 Sihui Liu 2020-12-04 10:05:29 PST Comment hidden (obsolete)
Comment 26 Sihui Liu 2020-12-04 10:14:03 PST
Created attachment 415425 [details]
Patch
Comment 27 Sihui Liu 2020-12-04 10:20:53 PST
(In reply to youenn fablet from comment #24)
> Comment on attachment 415397 [details]
> Patch
> 
> Some getDisplayMedia tests seem to be failing.
> getDisplayMedia capture in MacOS happens in UIProcess, WebProcess receiving
> the video samples over IPC.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=415397&action=review
> 
> > Source/WebKit/ChangeLog:149
> > +2020-12-03  Sihui Liu  <sihui_liu@apple.com>
> 
> double entry.

Removed!

> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91
> > +            if (isMainThread()) {
> 
> Since they are not needed, let's just not add these switches. Ditto below.

Sure.

> 
> > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:106
> > +    if (m_realtimeMediaSourceCreateFunction) {
> 
> I would make this callback mandatory so that we simplify this code path to
> only handle error cases or have a source.
> In the future, we will always capture in GPU process anyway.

Made m_realtimeMediaSourceCreateFunction mandatory.

> 
> > Source/WebKit/UIProcess/SpeechRecognitionServer.h:61
> > +    using RealtimeMediaSourceCreateFunction = Function<RefPtr<WebCore::RealtimeMediaSource>(const WebCore::CaptureDevice&)>;
> 
> We could probably use something like
> SpeechRecognitionErrorOr<Ref<WebCore::RealtimeMediaSource>>

I used WebCore::CaptureSourceOrError.

> 
> > Source/WebKit/WebProcess/cocoa/UserMediaCaptureManager.cpp:48
> > +    : RemoteRealtimeMediaSourceManager(nullptr, false)
> 
> These values nullptr and false are not very clear.

I agree, I've changed bool to enum class ShouldHandleSandboxExtensions. I also removed RemoteRealtimeMediaSourceManager from UserMediaCaptureManager, as this patch is getting bigger and harder to debug with specific code change for UserMedia.

Will do replace/change existing classes UserMediaCaptureManager and UserMediaCaptureManagerProxy with shared RemoteRemoteRealtimeMediaSourceManager and RemoteRealtimeMediaSourceManager in a separate patch. What do you think?
Comment 28 Sihui Liu 2020-12-04 10:36:08 PST
Created attachment 415428 [details]
Patch
Comment 29 Sihui Liu 2020-12-04 12:07:06 PST
Created attachment 415441 [details]
Fix MediaStream tests
Comment 30 youenn fablet 2020-12-04 13:40:34 PST
Comment on attachment 415441 [details]
Fix MediaStream tests

View in context: https://bugs.webkit.org/attachment.cgi?id=415441&action=review

> Source/WTF/ChangeLog:9
> +        Turn on WebSpeech on iOS simulator by default for testing.

Not very clear why we need this.
We do not need them for WTR and for regular testing, we have the UI for that, or we can compile switched on locally.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:65
> +    m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(clientIdentifier, WTFMove(dataCallback), WTFMove(stateUpdateCallback), WTFMove(source));

Could be written as:
   : m_impl(makeUnique<>(...))

> Source/WebKit/ChangeLog:17
> +        Original WebKit::RemoteRealtimeMediaSource is renamed to WebKit::WebRemoteRealtimeMediaSource. We should

What is missing from replacing WebRemoteRealtimeMediaSource by the new RemoteRealtimeMediaSource?
Maybe we should first do this refactoring as preliminary patches? That would make sure we have code that is clear at any point in time.
Also, it is difficult to see how the new code is close to replacing the old one.
If that is too much work or too difficult, maybe we should revert to having specific speech classes for now.

If we want to continue in that direction, ObsoleteRemoteRealtimeMediaSource might be a better name since that is the intent I think.

> Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:53
> +    : private WebCore::RealtimeMediaSource::Observer

There are probably a lot of WebCore::
We could be doing using namespace WebCore.

> Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:184
> +    int64_t m_numberOfFrames { 0 };

Should be uint64_t probably

> Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:237
> +    case WebCore::CaptureDevice::DeviceType::Microphone:

This code is mostly dealing with audio (not processing video samples or camera sandbox extensions).
Let's ASSERT_NOT_REACHED for the other cases.
We could also remove orientation code as well for now.

> Source/WebKit/Shared/media/RealtimeMediaSourceManager.h:49
> +    RealtimeMediaSourceManager(IPC::Connection&);

explicit, could take a Ref<IPC::Connection>&&.

> Source/WebKit/Shared/media/RealtimeMediaSourceManager.h:78
> +    HashMap<WebCore::RealtimeMediaSourceIdentifier, std::unique_ptr<RealtimeMediaSource>> m_sources;

RealtimeMediaSource is not a great name since it makes think this is some kind of WebCore::RealtimeMediaSource which are ref counted.
Let's use a shorter name like Source.

> Source/WebKit/UIProcess/SpeechRecognitionServer.h:62
> +    void setRealtimeMediaSourceCreateFunction(RealtimeMediaSourceCreateFunction&& function) { m_realtimeMediaSourceCreateFunction = WTFMove(function); }

Could be in SpeechRecognitionServer constructor.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1732
> +    speechRecognitionServer->setRealtimeMediaSourceCreateFunction([weakThis = makeWeakPtr(this), weakPage = makeWeakPtr(targetPage)]() {

If we move it to constructor, we could also make the lambda a method of WebPageProxy to keep the lambda small enough.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1736
> +        if (!weakPage || weakPage->preferences().captureAudioInGPUProcessEnabled())

I would move the weakPage check with weakThis check.
And add a "not implemented for GPU process" specific message.
 Also, maybe only the weakPage is actually necessary since the page proxy keeps a ref to its process proxy.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1744
> +        auto& sourceManager = weakThis->ensureRemoteRealtimeMediaSourceManager();

sourceManager is only used once

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1768
> +        addMessageReceiver(Messages::RemoteRealtimeMediaSourceManager::messageReceiverName(), *m_remoteRealtimeMediaSourceManager);

It is probably safer to remove the message receiver at destruction time.
Comment 31 Sihui Liu 2020-12-04 14:16:48 PST
(In reply to youenn fablet from comment #30)
> Comment on attachment 415441 [details]
> Fix MediaStream tests
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=415441&action=review
> 
> > Source/WTF/ChangeLog:9
> > +        Turn on WebSpeech on iOS simulator by default for testing.
> 
> Not very clear why we need this.
> We do not need them for WTR and for regular testing, we have the UI for
> that, or we can compile switched on locally.

Okay, will revert.

> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:65
> > +    m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(clientIdentifier, WTFMove(dataCallback), WTFMove(stateUpdateCallback), WTFMove(source));
> 
> Could be written as:
>    : m_impl(makeUnique<>(...))
> 
> > Source/WebKit/ChangeLog:17
> > +        Original WebKit::RemoteRealtimeMediaSource is renamed to WebKit::WebRemoteRealtimeMediaSource. We should
> 
> What is missing from replacing WebRemoteRealtimeMediaSource by the new
> RemoteRealtimeMediaSource?

Replacing WebRemoteRealtimeMediaSource requires change in UserMediaCaptureManager and RemoteRealtimeMediaSourceManager; and possibly some behavior changes that need to verify by testing. Then I found the refactoring work overweighs what I want to implement in this patch.

> Maybe we should first do this refactoring as preliminary patches? That would
> make sure we have code that is clear at any point in time.
> Also, it is difficult to see how the new code is close to replacing the old
> one.

I was thinking about replacing UserMediaCaptureManagerProxy and UserMediaCaptureManager functionalities with RealtimeMediaSourceManager and RemoteRealtimeMediaSourceManager, not only the RemoteRealtimeMediaSource.

> If that is too much work or too difficult, maybe we should revert to having
> specific speech classes for now.

Will do!

> 
> If we want to continue in that direction, ObsoleteRemoteRealtimeMediaSource
> might be a better name since that is the intent I think.
> 
> > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:53
> > +    : private WebCore::RealtimeMediaSource::Observer
> 
> There are probably a lot of WebCore::
> We could be doing using namespace WebCore.

Okay.

> 
> > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:184
> > +    int64_t m_numberOfFrames { 0 };
> 
> Should be uint64_t probably

Will change.

> 
> > Source/WebKit/Shared/media/RealtimeMediaSourceManager.cpp:237
> > +    case WebCore::CaptureDevice::DeviceType::Microphone:
> 
> This code is mostly dealing with audio (not processing video samples or
> camera sandbox extensions).
> Let's ASSERT_NOT_REACHED for the other cases.
> We could also remove orientation code as well for now.

Will revert to audio-only and speechrecognition-only version.

> 
> > Source/WebKit/Shared/media/RealtimeMediaSourceManager.h:49
> > +    RealtimeMediaSourceManager(IPC::Connection&);
> 
> explicit, could take a Ref<IPC::Connection>&&.

Sure.

> 
> > Source/WebKit/Shared/media/RealtimeMediaSourceManager.h:78
> > +    HashMap<WebCore::RealtimeMediaSourceIdentifier, std::unique_ptr<RealtimeMediaSource>> m_sources;
> 
> RealtimeMediaSource is not a great name since it makes think this is some
> kind of WebCore::RealtimeMediaSource which are ref counted.
> Let's use a shorter name like Source.

Okay.

> 
> > Source/WebKit/UIProcess/SpeechRecognitionServer.h:62
> > +    void setRealtimeMediaSourceCreateFunction(RealtimeMediaSourceCreateFunction&& function) { m_realtimeMediaSourceCreateFunction = WTFMove(function); }
> 
> Could be in SpeechRecognitionServer constructor.

Will move.

> 
> > Source/WebKit/UIProcess/WebProcessProxy.cpp:1732
> > +    speechRecognitionServer->setRealtimeMediaSourceCreateFunction([weakThis = makeWeakPtr(this), weakPage = makeWeakPtr(targetPage)]() {
> 
> If we move it to constructor, we could also make the lambda a method of
> WebPageProxy to keep the lambda small enough.

I guess you mean WebProcessProxy, sure!

> 
> > Source/WebKit/UIProcess/WebProcessProxy.cpp:1736
> > +        if (!weakPage || weakPage->preferences().captureAudioInGPUProcessEnabled())
> 
> I would move the weakPage check with weakThis check.
> And add a "not implemented for GPU process" specific message.
>  Also, maybe only the weakPage is actually necessary since the page proxy
> keeps a ref to its process proxy.

Will change.

> 
> > Source/WebKit/UIProcess/WebProcessProxy.cpp:1744
> > +        auto& sourceManager = weakThis->ensureRemoteRealtimeMediaSourceManager();
> 
> sourceManager is only used once

Will remove.

> 
> > Source/WebKit/UIProcess/WebProcessProxy.cpp:1768
> > +        addMessageReceiver(Messages::RemoteRealtimeMediaSourceManager::messageReceiverName(), *m_remoteRealtimeMediaSourceManager);
> 
> It is probably safer to remove the message receiver at destruction time.

Sure.
Comment 32 Sihui Liu 2020-12-04 17:04:27 PST Comment hidden (obsolete)
Comment 33 Sihui Liu 2020-12-04 20:39:36 PST
Created attachment 415483 [details]
Patch
Comment 34 Sihui Liu 2020-12-04 21:51:20 PST Comment hidden (obsolete)
Comment 35 Sihui Liu 2020-12-04 22:19:22 PST Comment hidden (obsolete)
Comment 36 youenn fablet 2020-12-07 06:16:57 PST
Comment on attachment 415486 [details]
Patch

Some nits below, some tests would be good.

View in context: https://bugs.webkit.org/attachment.cgi?id=415486&action=review

> Source/WebCore/ChangeLog:9
> +        Add a way to set RealtimeMediaSource member of SpeechRecognitionCaptureSourceImpl.

We should be able to add some tests with mock audio source.
Can we add/enable some tests?

> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.cpp:57
> +    send(Messages::SpeechRecognitionRealtimeMediaSourceManager::AddSource(identifier, captureDevice));

I would go with createSource/deleteSource instead of addSource/removeSource (for both methods and IPC messages).

> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:54
> +    SpeechRecognitionRemoteRealtimeMediaSourceManager(Ref<IPC::Connection>&&);

explicit

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1754
> +        return createRealtimeMediaSourceForSpeechRecognition();

If we make ensureSpeechRecognitionRemoteRealtimeMediaSourceManager public, then we could move createRealtimeMediaSourceForSpeechRecognition in WebPageProxy and write the lambda as:
if (!weakPage)
    return error;
return weakPage? weakPage->createRealtimeMediaSourceForSpeechRecognition() : CaptureSourceOrError { "Page is invalid" };

That will allow removing capturing 'this' in the lambda.

> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:41
> +class SpeechRecognitionRealtimeMediaSourceManager : public CanMakeWeakPtr<SpeechRecognitionRealtimeMediaSourceManager>, public IPC::MessageReceiver, private IPC::MessageSender {

final?

> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:44
> +    SpeechRecognitionRealtimeMediaSourceManager(Ref<IPC::Connection>&&);

explicit

> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:59
> +    void didReceiveMessage(IPC::Connection&, IPC::Decoder&) override;

final
Comment 37 Sihui Liu 2020-12-07 11:43:18 PST
Comment on attachment 415486 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415486&action=review

>> Source/WebCore/ChangeLog:9
>> +        Add a way to set RealtimeMediaSource member of SpeechRecognitionCaptureSourceImpl.
> 
> We should be able to add some tests with mock audio source.
> Can we add/enable some tests?

Hmm it seems using remote source or local source have no difference in generating events for SpeechRecognition, so this change does not affect the behavior of existing tests. 
What tests do you think should be added for this?

>> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.cpp:57
>> +    send(Messages::SpeechRecognitionRealtimeMediaSourceManager::AddSource(identifier, captureDevice));
> 
> I would go with createSource/deleteSource instead of addSource/removeSource (for both methods and IPC messages).

Will change.

>> Source/WebKit/UIProcess/SpeechRecognitionRemoteRealtimeMediaSourceManager.h:54
>> +    SpeechRecognitionRemoteRealtimeMediaSourceManager(Ref<IPC::Connection>&&);
> 
> explicit

Okay.

>> Source/WebKit/UIProcess/WebProcessProxy.cpp:1754
>> +        return createRealtimeMediaSourceForSpeechRecognition();
> 
> If we make ensureSpeechRecognitionRemoteRealtimeMediaSourceManager public, then we could move createRealtimeMediaSourceForSpeechRecognition in WebPageProxy and write the lambda as:
> if (!weakPage)
>     return error;
> return weakPage? weakPage->createRealtimeMediaSourceForSpeechRecognition() : CaptureSourceOrError { "Page is invalid" };
> 
> That will allow removing capturing 'this' in the lambda.

Sure, will move it to WebPageProxy.

>> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:41
>> +class SpeechRecognitionRealtimeMediaSourceManager : public CanMakeWeakPtr<SpeechRecognitionRealtimeMediaSourceManager>, public IPC::MessageReceiver, private IPC::MessageSender {
> 
> final?

Okay.

>> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:44
>> +    SpeechRecognitionRealtimeMediaSourceManager(Ref<IPC::Connection>&&);
> 
> explicit

Will add.

>> Source/WebKit/WebProcess/Speech/SpeechRecognitionRealtimeMediaSourceManager.h:59
>> +    void didReceiveMessage(IPC::Connection&, IPC::Decoder&) override;
> 
> final

Will change.
Comment 38 youenn fablet 2020-12-07 11:47:33 PST
(In reply to Sihui Liu from comment #37)
> Comment on attachment 415486 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=415486&action=review
> 
> >> Source/WebCore/ChangeLog:9
> >> +        Add a way to set RealtimeMediaSource member of SpeechRecognitionCaptureSourceImpl.
> > 
> > We should be able to add some tests with mock audio source.
> > Can we add/enable some tests?

The audio session category is changed to playrecord.
After this patch, this will be done in WebProcess, so a layout test could check this actually happens from internals API.
Comment 39 Sihui Liu 2020-12-07 11:49:22 PST
(In reply to youenn fablet from comment #38)
> (In reply to Sihui Liu from comment #37)
> > Comment on attachment 415486 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=415486&action=review
> > 
> > >> Source/WebCore/ChangeLog:9
> > >> +        Add a way to set RealtimeMediaSource member of SpeechRecognitionCaptureSourceImpl.
> > > 
> > > We should be able to add some tests with mock audio source.
> > > Can we add/enable some tests?
> 
> The audio session category is changed to playrecord.
> After this patch, this will be done in WebProcess, so a layout test could
> check this actually happens from internals API.

I see, will add.
Comment 40 Sihui Liu 2020-12-07 14:09:25 PST
Created attachment 415582 [details]
Patch
Comment 41 Sihui Liu 2020-12-07 14:10:15 PST
Created attachment 415583 [details]
Patch
Comment 42 youenn fablet 2020-12-08 01:02:04 PST
Comment on attachment 415583 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415583&action=review

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:119
>      m_recognizer->start(clientIdentifier);

This is a bit weird to have the return here and some code below.
I think we should rewrite it a bit:
- Remove SpeechRecognizer::start(SpeechRecognitionConnectionClientIdentifier identifier)
- Add a #else here that, in case ENABLE(MEDIA_STREAM) is not defined will send back a SpeechRecognitionErrorType::AudioCapture error.

We should probably at some point, either change compilation flags to enable RealtimeMediaSource if ENABLE(MEDIA_STREAM) && ENABLE(SPEECH_API), or just disable speech api if MEDIA_STREAM is not on.
That part should be left to a follow-up patch.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1718
>          return;

We should probably update m_speechRecognitionServerMap after getting the targetPage.

> LayoutTests/fast/speechrecognition/ios/audio-capture.html:20
> +    shouldBeEqualToString("internals.audioSessionCategory()", "PlayAndRecord");

Can we add a negative check before, for instance in start event handler?
Comment 43 youenn fablet 2020-12-08 01:02:42 PST
Please check api-mac before landing.
Comment 44 Sihui Liu 2020-12-08 17:21:38 PST
Created attachment 415693 [details]
Patch
Comment 45 Sihui Liu 2020-12-08 22:47:58 PST
Created attachment 415719 [details]
Patch for landing
Comment 46 Sihui Liu 2020-12-08 23:16:51 PST
Created attachment 415722 [details]
Patch for landing
Comment 47 EWS 2020-12-08 23:51:29 PST
commit-queue failed to commit attachment 415722 [details] to WebKit repository.
Comment 48 Sihui Liu 2020-12-08 23:55:59 PST
Created attachment 415723 [details]
Patch for landing
Comment 49 EWS 2020-12-09 00:28:44 PST
Committed r270574: <https://trac.webkit.org/changeset/270574>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 415723 [details].