Bug 218476

Summary: Implement basic permission check for SpeechRecognition
Product: WebKit Reporter: Sihui Liu <sihui_liu>
Component: New BugsAssignee: Sihui Liu <sihui_liu>
Status: RESOLVED FIXED    
Severity: Normal CC: annulen, benjamin, cdumez, cmarcelo, eric.carlson, ews-watchlist, glenn, gyuyoung.kim, jer.noble, jiewen_tan, philipj, ryuan.choi, sergio, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing none

Description Sihui Liu 2020-11-02 19:42:33 PST
...
Comment 1 Sihui Liu 2020-11-02 23:40:13 PST Comment hidden (obsolete)
Comment 2 Sihui Liu 2020-11-04 17:21:50 PST Comment hidden (obsolete)
Comment 3 Sihui Liu 2020-11-04 22:30:53 PST Comment hidden (obsolete)
Comment 4 Sihui Liu 2020-11-04 23:22:17 PST Comment hidden (obsolete)
Comment 5 Sihui Liu 2020-11-04 23:42:42 PST Comment hidden (obsolete)
Comment 6 Sihui Liu 2020-11-04 23:44:32 PST Comment hidden (obsolete)
Comment 7 Sihui Liu 2020-11-06 01:12:51 PST Comment hidden (obsolete)
Comment 8 Sihui Liu 2020-11-06 08:11:35 PST
Created attachment 413434 [details]
Patch
Comment 9 youenn fablet 2020-11-06 08:52:31 PST
Comment on attachment 413434 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413434&action=review

> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:33
> +class SpeechRecognitionRequest : public RefCounted<SpeechRecognitionRequest>, public CanMakeWeakPtr<SpeechRecognitionRequest> {

I am not sure we need SpeechRecognitionRequest to be ref counted.
Either we could use directly SpeechRecognitionRequestInfo or a unique_ptr<SpeechRecognitionRequest>

> Source/WebCore/Modules/speech/SpeechRecognitionUpdate.h:68
> +    WEBCORE_EXPORT explicit SpeechRecognitionUpdate(SpeechRecognitionConnectionClientIdentifier, SpeechRecognitionUpdateType, Content);

No need for explicit

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:89
> +    checkMicrophoneAccess();

I wonder what the exact flow should be for TCC and Safari prompts.
It could be something like:
- We check for speech recognition service access. If denied, we fail.
- We check whether there are microphones, whether they can be accessed, but without triggering TCC prompt at this point. If denied, we fail.
- We ask application (or user via a prompt) whether user is fine to give access to speech recognition service. If denied, we fail.
- We start capturing microphone at which point we might want to trigger the TCC prompt.

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:92
> +void SpeechRecognitionPermissionManager::checkMicrophoneAccess()

We should try to share code with UserMediaPermissionRequestManagerProxy::requestSystemValidation and requestAVCaptureAccessForMediaType

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:50
> +    Deque<Ref<SpeechRecognitionPermissionRequest>> m_requests;

Why should we have more than one request?
It seems that currently with your design, the server is doing the queueing in which case we could simplify things, by trying to move the Ref<SpeechRecognitionPermissionRequest> as a parameter to most of these methods or capture in lambdas...

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:53
> +    Optional<bool> m_isSpeechRecognitionServiceAccessAuthorized;

We should probably do these checks only once but this is good for now

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:54
> +    HashMap<ClientOrigin, bool> m_originPermissionMap;

Given we might do a prompt, I think we will probably need something like a feature policy to allow delegation to third party iframes.
In which case we might only want top level origin in the map.
This is something we should probably do for getusermedia as well.

> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:36
> +    static Ref<SpeechRecognitionPermissionRequest> create(const WebCore::ClientOrigin& origin, CompletionHandler<void(bool)>&& completionHandler)

ClientOrigin&&

> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:41
> +    enum class Decision { Deny, Allow };

It seems the completion handler should take a Decision in stead of a bool

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:119
> +        weakRequest->setIsWaitingForPermissionCheck(false);

I am not sure we need this flag.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1715
> +    speechRecognitionServer.iterator->value = makeUnique<SpeechRecognitionServer>(makeRef(*connection()), identifier, [identifier, weakThis = makeWeakPtr(this)](const ClientOrigin& origin, CompletionHandler<void(bool)>&& completionHandler) {

Speech recognition server is per process and seems to enqueue requests.
Is that correct?
What happens if two pages in the same process are asking a request at the same time, with one request being done by a backgrounded page to try to show a prompt, and the visible page to wait for eternity.
It seems the enqueuing for permission at least might best be done on a per page basis.

> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:65
>      send(Messages::SpeechRecognitionServer::Start(info));

It is a bit sad to create a temporary SpeechRecognitionRequestInfo which will trigger some count churns.
Maybe we should instead pass all these parameters to Messages::SpeechRecognitionServer::Start.
Comment 10 Sihui Liu 2020-11-06 09:26:54 PST
(In reply to youenn fablet from comment #9)
> Comment on attachment 413434 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=413434&action=review
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:33
> > +class SpeechRecognitionRequest : public RefCounted<SpeechRecognitionRequest>, public CanMakeWeakPtr<SpeechRecognitionRequest> {
> 
> I am not sure we need SpeechRecognitionRequest to be ref counted.
> Either we could use directly SpeechRecognitionRequestInfo or a
> unique_ptr<SpeechRecognitionRequest>

Okay, unique_ptr<SpeechRecognitionRequest> sounds good.

> > Source/WebCore/Modules/speech/SpeechRecognitionUpdate.h:68
> > +    WEBCORE_EXPORT explicit SpeechRecognitionUpdate(SpeechRecognitionConnectionClientIdentifier, SpeechRecognitionUpdateType, Content);
> 
> No need for explicit

Will remove.

>
> > Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:89
> > +    checkMicrophoneAccess();
> 
> I wonder what the exact flow should be for TCC and Safari prompts.
> It could be something like:
> - We check for speech recognition service access. If denied, we fail.
> - We check whether there are microphones, whether they can be accessed, but
> without triggering TCC prompt at this point. If denied, we fail.
> - We ask application (or user via a prompt) whether user is fine to give
> access to speech recognition service. If denied, we fail.
> - We start capturing microphone at which point we might want to trigger the
> TCC prompt.
 
I see, so you are suggesting checking all existing states before any prompt. Will adjust the ordering.
 
> 
> > Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:92
> > +void SpeechRecognitionPermissionManager::checkMicrophoneAccess()
> 
> We should try to share code with
> UserMediaPermissionRequestManagerProxy::requestSystemValidation and
> requestAVCaptureAccessForMediaType

Will try merging.

> 
> > Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:50
> > +    Deque<Ref<SpeechRecognitionPermissionRequest>> m_requests;
> 
> Why should we have more than one request?
> It seems that currently with your design, the server is doing the queueing
> in which case we could simplify things, by trying to move the
> Ref<SpeechRecognitionPermissionRequest> as a parameter to most of these
> methods or capture in lambdas...

In this design, server does not cancel ongoing permission requests. So if server starts handling a SpeechRecognitionRequest, cancels it and handles next request, there can be two SpeechRecognitionPermissionRequests (since decision is not made synchronously), and we want to handle SpeechRecognitionPermissionRequests one by one, not in parallel. Otherwise we will need to record what origins are in pending permission state so we don't ask for the same origin twice?

> 
> > Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:53
> > +    Optional<bool> m_isSpeechRecognitionServiceAccessAuthorized;
> 
> We should probably do these checks only once but this is good for now
> 
> > Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:54
> > +    HashMap<ClientOrigin, bool> m_originPermissionMap;
> 
> Given we might do a prompt, I think we will probably need something like a
> feature policy to allow delegation to third party iframes.
> In which case we might only want top level origin in the map.
> This is something we should probably do for getusermedia as well.

You mean I should change delegate callback to only ask for the top-level origin now?

> 
> > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:36
> > +    static Ref<SpeechRecognitionPermissionRequest> create(const WebCore::ClientOrigin& origin, CompletionHandler<void(bool)>&& completionHandler)
> 
> ClientOrigin&&

Sure.

> 
> > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:41
> > +    enum class Decision { Deny, Allow };
> 
> It seems the completion handler should take a Decision in stead of a bool

Will change.

> 
> > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:119
> > +        weakRequest->setIsWaitingForPermissionCheck(false);
> 
> I am not sure we need this flag.

This flag is for checking SpeechRecognitionRequest state. If it is in permission check state, which means no capturing or speech recognition yet, then we can start next request immediately, without waiting for disconnection from audio/recognition service and final results are generated. Corresponding SpeechRecognitionPermissionRequest will be proceeded to last stage.

> 
> > Source/WebKit/UIProcess/WebProcessProxy.cpp:1715
> > +    speechRecognitionServer.iterator->value = makeUnique<SpeechRecognitionServer>(makeRef(*connection()), identifier, [identifier, weakThis = makeWeakPtr(this)](const ClientOrigin& origin, CompletionHandler<void(bool)>&& completionHandler) {
> 
> Speech recognition server is per process and seems to enqueue requests.
> Is that correct?
> What happens if two pages in the same process are asking a request at the
> same time, with one request being done by a backgrounded page to try to show
> a prompt, and the visible page to wait for eternity.
> It seems the enqueuing for permission at least might best be done on a per
> page basis.

Nope, server is per page now (its identifier is page identifier), and permission manager is also per page. May be we should handle request from server of visible page.

> 
> > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechRecognitionConnection.cpp:65
> >      send(Messages::SpeechRecognitionServer::Start(info));
> 
> It is a bit sad to create a temporary SpeechRecognitionRequestInfo which
> will trigger some count churns.
> Maybe we should instead pass all these parameters to
> Messages::SpeechRecognitionServer::Start.

Sure, will change.
Comment 11 Sihui Liu 2020-11-06 20:37:25 PST Comment hidden (obsolete)
Comment 12 Sihui Liu 2020-11-06 20:48:35 PST Comment hidden (obsolete)
Comment 13 Sihui Liu 2020-11-06 20:53:14 PST Comment hidden (obsolete)
Comment 14 Sihui Liu 2020-11-08 20:50:45 PST
Created attachment 413555 [details]
Patch
Comment 15 youenn fablet 2020-11-09 11:22:51 PST
> > I wonder what the exact flow should be for TCC and Safari prompts.
> > It could be something like:
> > - We check for speech recognition service access. If denied, we fail.
> > - We check whether there are microphones, whether they can be accessed, but
> > without triggering TCC prompt at this point. If denied, we fail.
> > - We ask application (or user via a prompt) whether user is fine to give
> > access to speech recognition service. If denied, we fail.
> > - We start capturing microphone at which point we might want to trigger the
> > TCC prompt.
>  
> I see, so you are suggesting checking all existing states before any prompt.
> Will adjust the ordering.

Eric mentioned that TCC microphone prompt should probably be done before the application prompt.
I was fearing that user might say no to microphone while user might have 
I guess that is fine as Speech Service TCC prompt will happen first so if user is interested, user should grant speech service, then will grant microphone access.
If user is not interested, user will deny speech service TCC prompt and will not have to see the microphone prompt
Comment 16 Sihui Liu 2020-11-09 11:40:11 PST
(In reply to youenn fablet from comment #15)
> > > I wonder what the exact flow should be for TCC and Safari prompts.
> > > It could be something like:
> > > - We check for speech recognition service access. If denied, we fail.
> > > - We check whether there are microphones, whether they can be accessed, but
> > > without triggering TCC prompt at this point. If denied, we fail.
> > > - We ask application (or user via a prompt) whether user is fine to give
> > > access to speech recognition service. If denied, we fail.
> > > - We start capturing microphone at which point we might want to trigger the
> > > TCC prompt.
> >  
> > I see, so you are suggesting checking all existing states before any prompt.
> > Will adjust the ordering.
> 
> Eric mentioned that TCC microphone prompt should probably be done before the
> application prompt.
> I was fearing that user might say no to microphone while user might have 
> I guess that is fine as Speech Service TCC prompt will happen first so if
> user is interested, user should grant speech service, then will grant
> microphone access.
> If user is not interested, user will deny speech service TCC prompt and will
> not have to see the microphone prompt

Okay, will change the ordering to be:
1. speech recognition TCC
2. microphone TCC
3. user/client permission
Comment 17 youenn fablet 2020-11-09 11:46:35 PST
> Okay, will change the ordering to be:
> 1. speech recognition TCC
> 2. microphone TCC
> 3. user/client permission

Right, something like:
1. Check speech TCC, check microphone TCC. Deny if any is denied
2. If speech or microphone is unknown, request with TCC prompt. Deny if any is denied.
3. user/client permission
Comment 18 youenn fablet 2020-11-09 12:07:38 PST
Comment on attachment 413555 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413555&action=review

> Source/WebKit/UIProcess/API/C/WKPage.cpp:2114
> +        void decidePolicyForSpeechRecognitionPermissionRequest(WebPageProxy& page, API::SecurityOrigin& requestingOrigin, API::SecurityOrigin& topOrigin, CompletionHandler<void(bool)>&& completionHandler) final

Given we go with feature permission, we should do as if requestingOrigin == topOrigin, so no need to pass the requesting origin.

> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.h:37
> +void requestAVCaptureAccessForMediaTypes(Vector<AVMediaType>&&, CompletionHandler<void(bool authorized)>&&);

I am not sure having a Vector is great given we only have two options.

> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.h:38
> +Optional<Vector<AVMediaType>> checkAVCaptureAccessForMediaTypesAndFilter(const Vector<AVMediaType>& types);

Ditto.
I am not sure a Vector is great here. It might be simpler to directly return an enun.

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:48
> +void SpeechRecognitionPermissionManager::request(WebCore::ClientOrigin&& origin, CompletionHandler<void(SpeechRecognitionPermissionDecision)>&& completiontHandler)

It seems some oft these methods could go in the cpp file.

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:98
> +        checkAndRequestUserPermissionForOrigin();

Should it be checkSpeechRecognitionServiceAccess()?

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:124
> +    checkSpeechRecognitionServiceAccess();

It seems a bit weird to have checkMicrophoneAccess() do checkSpeechRecognitionServiceAccess().
Maybe we should rename it or restructure a bit the code.

For instance, we could have something like:
void SpeechRecognitionPermissionManager::checkAccess(CompletionHandler<>&& callback)
// Do checks first.
auto speechServiceState = speechRecognitionServiceAccessState();
if (speechServiceState == Denied) {
   callback(Deny);
   return;
}
auto microphoneState = microphoneAccessState();
if (microphoneState == Denied) {
   callback(Deny);
   return;
}
// Request checks for undetermined case.
callback = microphoneState == Granted ? WTFMove(callback) : [callback = WTFMove(callback)](bool isGranted) {
    // request microphone TCC.
    if (!isGranted)
       return callback(Denied);
     requestMicrophone([callback = WTFMove(callback)](bool isGranted) { ... });
};

if (speechServiceState == Unknown) {
   requestSpeechService([callback = WTFMove(callback)](bool isGranted) { ... });
   return;
}
callback(granted);

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:186
> +    auto topOrigin =clientOrigin.topOrigin.securityOrigin();

s/=/= /

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:60
> +    HashMap<WebCore::SecurityOriginData, SpeechRecognitionPermissionDecision> m_originPermissionMap;

I am not sure we need this map at all.
For getUserMedia, we are deleting the permission manager whenever the top level document changes.
We should probably do so and not persist the permission from one page to another, even if they share the same origin.
We could for instance have a { Unknown, Denied, Granted } state and reset it each time the top level document is changing.
We could also reset it from Granted to Unknown based on a timer.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:109
> +    m_permissionChecker(request.clientOrigin(), [this, weakThis = makeWeakPtr(this), weakRequest = makeWeakPtr(request)](SpeechRecognitionPermissionDecision decision) {

The issue here is that a prompt in one page would remove the prompt in another page of the same origin.
We tend not to do that for getUserMedia: we do prompt in both pages. We do not prompt if the same page is doing several requests.
Comment 19 Eric Carlson 2020-11-09 12:36:06 PST
Comment on attachment 413555 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413555&action=review

> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.mm:72
> +Optional<Vector<AVMediaType>> checkAVCaptureAccessForMediaTypesAndFilter(const Vector<AVMediaType>& types)
> +{

I don't think it is ever correct for `types` to be empty, so I'd ASSERT that the list isn't empty here.

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:77
> +#if PLATFORM(MAC)
> +    if (currentProcessIsSandboxed() && !!sandbox_check(getpid(), "device-microphone", SANDBOX_FILTER_NONE))
> +        return false;
> +#endif

Don't you also need to check TCCAccessPreflight?

However, rather than doing this specifically for speech recognition, why not move UserMediaPermissionRequestManagerProxy::permittedToCaptureAudio (and UserMediaPermissionRequestManagerProxy::permittedToCaptureVideo while you're at it) into MediaPermissionUtilities.mm so the code can be shared?

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:115
> +        m_isMicrophoneAccessAuthorized = true;

I'm not sure this is right. How can capture work if we're not using mock devices and don't have AVCapture?

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:119
> +    if (m_isMicrophoneAccessAuthorized.hasValue() && !m_isMicrophoneAccessAuthorized.value()) {

Is it possible for m_isMicrophoneAccessAuthorized to not have a value at this point?

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:142
> +        m_isSpeechRecognitionServiceAccessAuthorized = true;

Ditto, how can speech recognition work if HAVE_SPEECHRECOGNIZER is false?

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:146
> +    if (m_isMicrophoneAccessAuthorized.hasValue() && !m_isMicrophoneAccessAuthorized.value()) {

I think you mean m_isSpeechRecognitionServiceAccessAuthorized, not m_isMicrophoneAccessAuthorized? Is it possible for it to not have a value at this point?

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:187
> +    bool shouldRequestMicrophoneAccessIfNeeded = !m_page.preferences().mockCaptureDevicesEnabled();
> +    auto decisionHandler = [this, weakThis = makeWeakPtr(*this), shouldRequestMicrophoneAccessIfNeeded](bool granted) {
> +        if (!weakThis)
> +            return;
> +
> +        m_originPermissionMap.set(m_requests.first()->origin().topOrigin, granted ? SpeechRecognitionPermissionDecision::Grant : SpeechRecognitionPermissionDecision::Deny);
> +        if (!granted) {
> +            completeCurrentRequest(SpeechRecognitionPermissionDecision::Deny);
> +            return;
> +        }
> +
> +        if (!shouldRequestMicrophoneAccessIfNeeded) {
> +            completeCurrentRequest(SpeechRecognitionPermissionDecision::Grant);
> +            return;
> +        }
> +
> +        requestMicrophoneAccessIfNeeded();
> +    };
> +
> +    auto requestingOrigin = clientOrigin.clientOrigin.securityOrigin();
> +    auto topOrigin =clientOrigin.topOrigin.securityOrigin();
> +    m_page.uiClient().decidePolicyForSpeechRecognitionPermissionRequest(m_page, API::SecurityOrigin::create(requestingOrigin.get()).get(), API::SecurityOrigin::create(topOrigin.get()).get(), WTFMove(decisionHandler));

I think the order of prompts here is wrong - I think it would be more logical to show theTCC prompts before the application prompt. Since this is ultimately about speech recognition, the microphone prompt only makes sense if the user allows the speech recognizer, so I suggest this order: speech recognizer TCC, microphone access TCC, application.

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:235
> +        auto decisionHandler = makeBlockPtr([mainThreadHandler = WTFMove(mainThreadHandler)](SFSpeechRecognizerAuthorizationStatus status) mutable {

Is it safe to not create and check a weak pointer to `this`?

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:118
> +            auto error = SpeechRecognitionError { SpeechRecognitionErrorType::NotAllowed, "permission check fails"_s };

I would change "permission check fails" to "Permission check failed" or "Failed permission check"
Comment 20 Sihui Liu 2020-11-09 13:47:58 PST
Comment on attachment 413555 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413555&action=review

>> Source/WebKit/UIProcess/API/C/WKPage.cpp:2114
>> +        void decidePolicyForSpeechRecognitionPermissionRequest(WebPageProxy& page, API::SecurityOrigin& requestingOrigin, API::SecurityOrigin& topOrigin, CompletionHandler<void(bool)>&& completionHandler) final
> 
> Given we go with feature permission, we should do as if requestingOrigin == topOrigin, so no need to pass the requesting origin.

Sure, will remove requesting origin

>> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.h:37
>> +void requestAVCaptureAccessForMediaTypes(Vector<AVMediaType>&&, CompletionHandler<void(bool authorized)>&&);
> 
> I am not sure having a Vector is great given we only have two options.

I saw AVMediaType has 8 values so chose to use vector. Consider we only use 2 values for now, maybe it's Okay to make separate functions.

>> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.h:38
>> +Optional<Vector<AVMediaType>> checkAVCaptureAccessForMediaTypesAndFilter(const Vector<AVMediaType>& types);
> 
> Ditto.
> I am not sure a Vector is great here. It might be simpler to directly return an enun.

Okay, will change to enum.

>> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.mm:72
>> +{
> 
> I don't think it is ever correct for `types` to be empty, so I'd ASSERT that the list isn't empty here.

Will change the signature of this function to not using Vector based on Youenn's comment.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:48
>> +void SpeechRecognitionPermissionManager::request(WebCore::ClientOrigin&& origin, CompletionHandler<void(SpeechRecognitionPermissionDecision)>&& completiontHandler)
> 
> It seems some oft these methods could go in the cpp file.

True, will move.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:77
>> +#endif
> 
> Don't you also need to check TCCAccessPreflight?
> 
> However, rather than doing this specifically for speech recognition, why not move UserMediaPermissionRequestManagerProxy::permittedToCaptureAudio (and UserMediaPermissionRequestManagerProxy::permittedToCaptureVideo while you're at it) into MediaPermissionUtilities.mm so the code can be shared?

I am not sure what TCCAccessPreflight does?

But moving UserMediaPermissionRequestManagerProxy::permittedToCaptureVideo into MediaPermissionUtilities.mm sounds good.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:98
>> +        checkAndRequestUserPermissionForOrigin();
> 
> Should it be checkSpeechRecognitionServiceAccess()?

We skip checks for speech recognition or microphone if using mock capture device.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:115
>> +        m_isMicrophoneAccessAuthorized = true;
> 
> I'm not sure this is right. How can capture work if we're not using mock devices and don't have AVCapture?

I made permission check pass by default on platforms without speech recognition backend implementation, so we can test the APIs on all platforms (e.g. firing End event correctly when stop() is called), otherwise the tests would fail on start().

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:119
>> +    if (m_isMicrophoneAccessAuthorized.hasValue() && !m_isMicrophoneAccessAuthorized.value()) {
> 
> Is it possible for m_isMicrophoneAccessAuthorized to not have a value at this point?

microphoneAuthorizationStatus == AVAuthorizationStatusNotDetermined && hasDescriptionString

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:124
>> +    checkSpeechRecognitionServiceAccess();
> 
> It seems a bit weird to have checkMicrophoneAccess() do checkSpeechRecognitionServiceAccess().
> Maybe we should rename it or restructure a bit the code.
> 
> For instance, we could have something like:
> void SpeechRecognitionPermissionManager::checkAccess(CompletionHandler<>&& callback)
> // Do checks first.
> auto speechServiceState = speechRecognitionServiceAccessState();
> if (speechServiceState == Denied) {
>    callback(Deny);
>    return;
> }
> auto microphoneState = microphoneAccessState();
> if (microphoneState == Denied) {
>    callback(Deny);
>    return;
> }
> // Request checks for undetermined case.
> callback = microphoneState == Granted ? WTFMove(callback) : [callback = WTFMove(callback)](bool isGranted) {
>     // request microphone TCC.
>     if (!isGranted)
>        return callback(Denied);
>      requestMicrophone([callback = WTFMove(callback)](bool isGranted) { ... });
> };
> 
> if (speechServiceState == Unknown) {
>    requestSpeechService([callback = WTFMove(callback)](bool isGranted) { ... });
>    return;
> }
> callback(granted);

Okay, will update.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:142
>> +        m_isSpeechRecognitionServiceAccessAuthorized = true;
> 
> Ditto, how can speech recognition work if HAVE_SPEECHRECOGNIZER is false?

To make Speech APIs testable on platforms without a backend implementation of recognition.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:146
>> +    if (m_isMicrophoneAccessAuthorized.hasValue() && !m_isMicrophoneAccessAuthorized.value()) {
> 
> I think you mean m_isSpeechRecognitionServiceAccessAuthorized, not m_isMicrophoneAccessAuthorized? Is it possible for it to not have a value at this point?

Yes it should be m_isSpeechRecognitionServiceAccessAuthorized!

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:186
>> +    auto topOrigin =clientOrigin.topOrigin.securityOrigin();
> 
> s/=/= /

Will change.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:187
>> +    m_page.uiClient().decidePolicyForSpeechRecognitionPermissionRequest(m_page, API::SecurityOrigin::create(requestingOrigin.get()).get(), API::SecurityOrigin::create(topOrigin.get()).get(), WTFMove(decisionHandler));
> 
> I think the order of prompts here is wrong - I think it would be more logical to show theTCC prompts before the application prompt. Since this is ultimately about speech recognition, the microphone prompt only makes sense if the user allows the speech recognizer, so I suggest this order: speech recognizer TCC, microphone access TCC, application.

Youenn mentioned this too, will restructure this.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:235
>> +        auto decisionHandler = makeBlockPtr([mainThreadHandler = WTFMove(mainThreadHandler)](SFSpeechRecognizerAuthorizationStatus status) mutable {
> 
> Is it safe to not create and check a weak pointer to `this`?

why? I think mainThreadHandler already checks weakThis

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:60
>> +    HashMap<WebCore::SecurityOriginData, SpeechRecognitionPermissionDecision> m_originPermissionMap;
> 
> I am not sure we need this map at all.
> For getUserMedia, we are deleting the permission manager whenever the top level document changes.
> We should probably do so and not persist the permission from one page to another, even if they share the same origin.
> We could for instance have a { Unknown, Denied, Granted } state and reset it each time the top level document is changing.
> We could also reset it from Granted to Unknown based on a timer.

I see, permission state should be cleaned when document of page changes.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:118
>> +            auto error = SpeechRecognitionError { SpeechRecognitionErrorType::NotAllowed, "permission check fails"_s };
> 
> I would change "permission check fails" to "Permission check failed" or "Failed permission check"

Sure.
Comment 21 Radar WebKit Bug Importer 2020-11-09 19:43:17 PST
<rdar://problem/71222638>
Comment 22 Sihui Liu 2020-11-09 22:04:34 PST
Created attachment 413662 [details]
Patch
Comment 23 Sihui Liu 2020-11-09 22:14:35 PST
Created attachment 413664 [details]
Patch
Comment 24 Sihui Liu 2020-11-09 23:43:27 PST
Created attachment 413672 [details]
Patch
Comment 25 Sihui Liu 2020-11-10 07:13:11 PST
Created attachment 413693 [details]
Patch
Comment 26 Sihui Liu 2020-11-10 08:40:37 PST
Created attachment 413699 [details]
Patch
Comment 27 youenn fablet 2020-11-10 12:26:16 PST
Comment on attachment 413699 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413699&action=review

> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:43
> +    ClientOrigin clientOrigin() const { return m_info.clientOrigin; }

could be const&

> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.mm:93
> +    if (type == MediaPermissionType::Audio) {

switch statement

> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.mm:106
> +    if (type == MediaPermissionType::Audio) {

Should be a switch statement

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:44
> +void SpeechRecognitionPermissionManager::startProcessingRequest()

It seems most of SpeechRecognitionPermissionManagerCocoa.mm could go to a cpp file (either SpeechRecognitionPermissionManagerCocoa.mm or SpeechRecognitionPermissionManager.cpp).

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:101
> +void SpeechRecognitionPermissionManager::requestSpeechRecognitionServiceAccess()

I would tend to move checkSpeechRecognitionServiceAccess and requestSpeechRecognitionServiceAccess as free function into MediaPermissionUtilities.h/.mm

> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:154
> +void SpeechRecognitionPermissionManager::requestUserPermission()

This function in particular would best be in the general cpp file.

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:39
> +        request->complete(SpeechRecognitionPermissionDecision::Deny);

I would expect reset to do that and clear m_requests

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:84
> +    ASSERT(!m_requests.isEmpty());

If there is a reset in between that clears requests, you might need to do a check here instead of an assert.

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:93
> +    m_userPermissionCheck = CheckResult::Unknown;

If you reset, you might also need to reset all queued requests.

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:60
> +    CheckResult m_speechRecognitionServiceCheck { CheckResult::Unknown };

I wonder whether a user might be able to change the TCC setting from Granted to Denied or Denied to Granted during the lifetime of a page.
In such a case, we might need to do these two checks for every request like we do for getUserMedia.

> LayoutTests/fast/speechrecognition/permission-error.html:10
> +    testRunner.setIsSpeechRecognitionPermissionGranted(false);

We could add some tests for the persistency.
Say do a request with permission granted, change it to deny but still able to proceed.
Maybe a layout test or API test showing that on page reload, we ask again the delegate.
Comment 28 Sihui Liu 2020-11-10 14:44:20 PST
Comment on attachment 413699 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413699&action=review

>> Source/WebCore/Modules/speech/SpeechRecognitionRequest.h:43
>> +    ClientOrigin clientOrigin() const { return m_info.clientOrigin; }
> 
> could be const&

Okay.

>> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.mm:93
>> +    if (type == MediaPermissionType::Audio) {
> 
> switch statement

Okay.

>> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.mm:106
>> +    if (type == MediaPermissionType::Audio) {
> 
> Should be a switch statement

Sure.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:44
>> +void SpeechRecognitionPermissionManager::startProcessingRequest()
> 
> It seems most of SpeechRecognitionPermissionManagerCocoa.mm could go to a cpp file (either SpeechRecognitionPermissionManagerCocoa.mm or SpeechRecognitionPermissionManager.cpp).

Okay.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:101
>> +void SpeechRecognitionPermissionManager::requestSpeechRecognitionServiceAccess()
> 
> I would tend to move checkSpeechRecognitionServiceAccess and requestSpeechRecognitionServiceAccess as free function into MediaPermissionUtilities.h/.mm

Okay.

>> Source/WebKit/UIProcess/Cocoa/SpeechRecognitionPermissionManagerCocoa.mm:154
>> +void SpeechRecognitionPermissionManager::requestUserPermission()
> 
> This function in particular would best be in the general cpp file.

Okay.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:39
>> +        request->complete(SpeechRecognitionPermissionDecision::Deny);
> 
> I would expect reset to do that and clear m_requests

Will move to reset

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:84
>> +    ASSERT(!m_requests.isEmpty());
> 
> If there is a reset in between that clears requests, you might need to do a check here instead of an assert.

hmm, in this case it seems easier to recreate a new SpeechRecognitionPermissionManager than reset, because we might: append a request ->  start the request -> reset -> append a new request(only request in queue) -> receive decision started request; then a wrong request will be completed.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:93
>> +    m_userPermissionCheck = CheckResult::Unknown;
> 
> If you reset, you might also need to reset all queued requests.

will remove reset() and creating a new SpeechRecognitionPermissionManager for reseting.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:60
>> +    CheckResult m_speechRecognitionServiceCheck { CheckResult::Unknown };
> 
> I wonder whether a user might be able to change the TCC setting from Granted to Denied or Denied to Granted during the lifetime of a page.
> In such a case, we might need to do these two checks for every request like we do for getUserMedia.

Looks like it's possible. There is a tccutil command line tool for resetting.
Will remove these variables.

>> LayoutTests/fast/speechrecognition/permission-error.html:10
>> +    testRunner.setIsSpeechRecognitionPermissionGranted(false);
> 
> We could add some tests for the persistency.
> Say do a request with permission granted, change it to deny but still able to proceed.
> Maybe a layout test or API test showing that on page reload, we ask again the delegate.

Okay, will add.
Comment 29 Sihui Liu 2020-11-10 20:36:07 PST
Created attachment 413766 [details]
Patch
Comment 30 Sihui Liu 2020-11-10 20:48:55 PST
Created attachment 413769 [details]
Patch
Comment 31 Sihui Liu 2020-11-10 22:26:41 PST
Created attachment 413782 [details]
Patch
Comment 32 youenn fablet 2020-11-11 03:12:54 PST
Comment on attachment 413782 [details]
Patch

Looks good to me.
Some questions below before r+ing it

View in context: https://bugs.webkit.org/attachment.cgi?id=413782&action=review

> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:77
> +    if (audioStatus == MediaPermissionResult::Denied) {

videoStatus.

> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:91
> +            });

requestAVCaptureAccessForType(MediaPermissionType::Video, WTFMove(completionHandler))

> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:99
> +        });

requestAVCaptureAccessForType(MediaPermissionType::Video, WTFMove(callback))

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:84
> +    if (!m_page.preferences().mockCaptureDevicesEnabled()) {

You could simplify this by setting m_microphoneCheck to Granted in case of m_page.preferences().mockCaptureDevicesEnabled()

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:121
> +    if (m_microphoneCheck == CheckResult::Unknown) {

I would probably make this a free function that would return a CheckResult.
Then you could write in SpeechRecognitionPermissionManager::startNextRequest()
m_microphoneCheck = computeMicrophoneAccess()
Ditto for checkSpeechRecognitionServiceAccess

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:42
> +    void reset();

No longer defined

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:62
> +    bool m_isUsingMockedCaptureDevice { false };

m_isUsingMockedCaptureDevice no longer needed

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:109
> +    m_permissionChecker(request.clientOrigin(), [this, weakThis = makeWeakPtr(this), weakRequest = makeWeakPtr(request)](SpeechRecognitionPermissionDecision decision) mutable {

AIUI, SpeechRecognitionServer is buffering all requests and processing them one at a time.
SpeechRecognitionPermissionManager is also doing the same.
One of the buffering might be unneeded if both objects have a one-to-one mapping.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1726
> +        }

It seems like we should not do this search for each permission request.
The lambda could take a weakPage for instance.
Also, it is a bit weird to compare a pageId and a SpeechRecognitionServerIdentifier.
Maybe the pageId should be also sent through IPC?
Comment 33 youenn fablet 2020-11-11 03:12:55 PST
Comment on attachment 413782 [details]
Patch

Looks good to me.
Some questions below before r+ing it

View in context: https://bugs.webkit.org/attachment.cgi?id=413782&action=review

> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:77
> +    if (audioStatus == MediaPermissionResult::Denied) {

videoStatus.

> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:91
> +            });

requestAVCaptureAccessForType(MediaPermissionType::Video, WTFMove(completionHandler))

> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:99
> +        });

requestAVCaptureAccessForType(MediaPermissionType::Video, WTFMove(callback))

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:84
> +    if (!m_page.preferences().mockCaptureDevicesEnabled()) {

You could simplify this by setting m_microphoneCheck to Granted in case of m_page.preferences().mockCaptureDevicesEnabled()

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:121
> +    if (m_microphoneCheck == CheckResult::Unknown) {

I would probably make this a free function that would return a CheckResult.
Then you could write in SpeechRecognitionPermissionManager::startNextRequest()
m_microphoneCheck = computeMicrophoneAccess()
Ditto for checkSpeechRecognitionServiceAccess

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:42
> +    void reset();

No longer defined

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:62
> +    bool m_isUsingMockedCaptureDevice { false };

m_isUsingMockedCaptureDevice no longer needed

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:109
> +    m_permissionChecker(request.clientOrigin(), [this, weakThis = makeWeakPtr(this), weakRequest = makeWeakPtr(request)](SpeechRecognitionPermissionDecision decision) mutable {

AIUI, SpeechRecognitionServer is buffering all requests and processing them one at a time.
SpeechRecognitionPermissionManager is also doing the same.
One of the buffering might be unneeded if both objects have a one-to-one mapping.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1726
> +        }

It seems like we should not do this search for each permission request.
The lambda could take a weakPage for instance.
Also, it is a bit weird to compare a pageId and a SpeechRecognitionServerIdentifier.
Maybe the pageId should be also sent through IPC?
Comment 34 Sihui Liu 2020-11-11 09:21:41 PST
Comment on attachment 413782 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413782&action=review

>>> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:77
>>> +    if (audioStatus == MediaPermissionResult::Denied) {
>> 
>> videoStatus.
> 
> videoStatus.

NIce catch!

>>> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:91
>>> +            });
>> 
>> requestAVCaptureAccessForType(MediaPermissionType::Video, WTFMove(completionHandler))
> 
> requestAVCaptureAccessForType(MediaPermissionType::Video, WTFMove(completionHandler))

Sure.

>>> Source/WebKit/UIProcess/Cocoa/UserMediaPermissionRequestManagerProxy.mm:99
>>> +        });
>> 
>> requestAVCaptureAccessForType(MediaPermissionType::Video, WTFMove(callback))
> 
> requestAVCaptureAccessForType(MediaPermissionType::Video, WTFMove(callback))

Sure.

>>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:84
>>> +    if (!m_page.preferences().mockCaptureDevicesEnabled()) {
>> 
>> You could simplify this by setting m_microphoneCheck to Granted in case of m_page.preferences().mockCaptureDevicesEnabled()
> 
> You could simplify this by setting m_microphoneCheck to Granted in case of m_page.preferences().mockCaptureDevicesEnabled()

Right, since we are resetting it every request now.

>>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:121
>>> +    if (m_microphoneCheck == CheckResult::Unknown) {
>> 
>> I would probably make this a free function that would return a CheckResult.
>> Then you could write in SpeechRecognitionPermissionManager::startNextRequest()
>> m_microphoneCheck = computeMicrophoneAccess()
>> Ditto for checkSpeechRecognitionServiceAccess
> 
> I would probably make this a free function that would return a CheckResult.
> Then you could write in SpeechRecognitionPermissionManager::startNextRequest()
> m_microphoneCheck = computeMicrophoneAccess()
> Ditto for checkSpeechRecognitionServiceAccess

Okay.

>>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:42
>>> +    void reset();
>> 
>> No longer defined
> 
> No longer defined

Will remove.

>>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.h:62
>>> +    bool m_isUsingMockedCaptureDevice { false };
>> 
>> m_isUsingMockedCaptureDevice no longer needed
> 
> m_isUsingMockedCaptureDevice no longer needed

will remove.

>>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:109
>>> +    m_permissionChecker(request.clientOrigin(), [this, weakThis = makeWeakPtr(this), weakRequest = makeWeakPtr(request)](SpeechRecognitionPermissionDecision decision) mutable {
>> 
>> AIUI, SpeechRecognitionServer is buffering all requests and processing them one at a time.
>> SpeechRecognitionPermissionManager is also doing the same.
>> One of the buffering might be unneeded if both objects have a one-to-one mapping.
> 
> AIUI, SpeechRecognitionServer is buffering all requests and processing them one at a time.
> SpeechRecognitionPermissionManager is also doing the same.
> One of the buffering might be unneeded if both objects have a one-to-one mapping.

That's true. In that case, we need to make sure either SpeechRecognitionServer only sends one request to SpeechRecognitionPermissionManager at a time so SpeechRecognitionPermissionManager does not need to record the check status for each request (using the queue in SpeechRecognitionPermissionManager), or SpeechRecognitionServer may run multiple requests at the same time after permission (using the queue in SpeechRecognitionPermissionManager).

>>> Source/WebKit/UIProcess/WebProcessProxy.cpp:1726
>>> +        }
>> 
>> It seems like we should not do this search for each permission request.
>> The lambda could take a weakPage for instance.
>> Also, it is a bit weird to compare a pageId and a SpeechRecognitionServerIdentifier.
>> Maybe the pageId should be also sent through IPC?
> 
> It seems like we should not do this search for each permission request.
> The lambda could take a weakPage for instance.
> Also, it is a bit weird to compare a pageId and a SpeechRecognitionServerIdentifier.
> Maybe the pageId should be also sent through IPC?

Sure, for simplicity SpeechRecognitionServerIdentifier is actually WebCore::PageIdentifier right now (using SpeechRecognitionServerIdentifier = WebCore::PageIdentifier).
Comment 35 Sihui Liu 2020-11-11 10:40:38 PST
Created attachment 413839 [details]
Patch
Comment 36 youenn fablet 2020-11-13 00:31:17 PST
Comment on attachment 413839 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413839&action=review

> Source/WebCore/Modules/speech/SpeechRecognition.cpp:73
> +    auto& document = downcast<Document>(*scriptExecutionContext());

Can we add a test trying to call start on an object created in an iframe that is gone.
scriptExecutionContext might be null and I am not sure m_state and m_connection checks cover this case.

> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.mm:94
> +    }

Should we use std::call_once here as well?

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:92
> +    m_speechRecognitionServiceCheck = computeSpeechRecognitionServiceAccess();

I would tend to move these in startProcessingRequest and inline checkMicrophoneAccess and checkSpeechRecognitionServiceAccess inside startProcessingRequest.

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:149
> +        m_microphoneCheck = CheckResult::Granted;

checkMicrophoneAccess is changing m_microphoneCheck while its name looks as if it should be const.
I would inline this code in startNextRequest

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:164
> +    ASSERT(isMainThread());

No longer needed?

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:185
> +    ASSERT(isMainThread());

No longer needed?

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:212
> +    if (!requestingOrigin->isSameOriginAs(topOrigin)) {

We should probably do that same origin check before the TCC prompts.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:45
> +    ASSERT(pendingRequest.isNewEntry);

In theory, we cannot really trust WebProcess and this clientIdentifier comes straight from IPC.
Maybe we should add a message check instead.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:46
> +    pendingRequest.iterator->value = makeUnique<WebCore::SpeechRecognitionRequest>(SpeechRecognitionRequestInfo { clientIdentifier, WTFMove(lang), continuous, interimResults, maxAlternatives, WTFMove(origin) });

I tend to prefer:
ASSERT(!m_pendingRequests.contains(clientIdentifier));
auto& pendingRequest = m_pendingRequests.add(clientIdentifier, makeUnique<>...).iterator->value;

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:68
> +        m_ongoingRequests.add(identifier, WTFMove(takenRequest));

Are we sure m_ongoingRequests does not have identifier?
It seems maybe we should ensure in start that m_pendingRequests and m_ongoingIdentifiers do not contain identifier.
Or we could have just one map and have a status in the request object.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:75
> +    if (m_pendingRequests.contains(clientIdentifier)) {

I would write it as if (m_pendingRequests.remove(clientIdentifier)) to improve efficiency.
Ditto below.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:81
> +    ASSERT(m_ongoingRequests.contains(clientIdentifier));

This also comes straight from IPC.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:93
> +    ASSERT(m_ongoingRequests.contains(clientIdentifier));

Ditto.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:102
> +    auto request = m_ongoingRequests.take(clientIdentifier);

There is no guarantee request is not null.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:127
> +        update = WebCore::SpeechRecognitionUpdate::createResult(clientIdentifier, *result);

We allocate a vector here for a very quick moment.
It might be better to just send all members in IPC directly and reconstruct the object in WebProcess to remove count churn and Vector allocations.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1723
> +    }

If there is no page, there is no need to change the map. We should exit early.
There is no guarantee that m_speechRecognitionServerMap does not already contain the identifier as well.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1725
> +    speechRecognitionServer.iterator->value = makeUnique<SpeechRecognitionServer>(makeRef(*connection()), identifier, [weakThis = makeWeakPtr(this), weakPage = makeWeakPtr(targetPage)](const ClientOrigin& origin, CompletionHandler<void(SpeechRecognitionPermissionDecision)>&& completionHandler) mutable {

(auto& origin, auto&& completionHandler)

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1727
> +            return;

No need for weakThis, let's not capture it.
Comment 37 Sihui Liu 2020-11-13 11:48:13 PST
Comment on attachment 413839 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=413839&action=review

>> Source/WebCore/Modules/speech/SpeechRecognition.cpp:73
>> +    auto& document = downcast<Document>(*scriptExecutionContext());
> 
> Can we add a test trying to call start on an object created in an iframe that is gone.
> scriptExecutionContext might be null and I am not sure m_state and m_connection checks cover this case.

Sure, will add.

>> Source/WebKit/UIProcess/Cocoa/MediaPermissionUtilities.mm:94
>> +    }
> 
> Should we use std::call_once here as well?

Sure, this value seems unlikely to change during the use of an app. Will change.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:92
>> +    m_speechRecognitionServiceCheck = computeSpeechRecognitionServiceAccess();
> 
> I would tend to move these in startProcessingRequest and inline checkMicrophoneAccess and checkSpeechRecognitionServiceAccess inside startProcessingRequest.

Sure.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:149
>> +        m_microphoneCheck = CheckResult::Granted;
> 
> checkMicrophoneAccess is changing m_microphoneCheck while its name looks as if it should be const.
> I would inline this code in startNextRequest

Sure.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:164
>> +    ASSERT(isMainThread());
> 
> No longer needed?

Will remove.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:185
>> +    ASSERT(isMainThread());
> 
> No longer needed?

Will remove.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:212
>> +    if (!requestingOrigin->isSameOriginAs(topOrigin)) {
> 
> We should probably do that same origin check before the TCC prompts.

Okay, will add.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:45
>> +    ASSERT(pendingRequest.isNewEntry);
> 
> In theory, we cannot really trust WebProcess and this clientIdentifier comes straight from IPC.
> Maybe we should add a message check instead.

Will add.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:46
>> +    pendingRequest.iterator->value = makeUnique<WebCore::SpeechRecognitionRequest>(SpeechRecognitionRequestInfo { clientIdentifier, WTFMove(lang), continuous, interimResults, maxAlternatives, WTFMove(origin) });
> 
> I tend to prefer:
> ASSERT(!m_pendingRequests.contains(clientIdentifier));
> auto& pendingRequest = m_pendingRequests.add(clientIdentifier, makeUnique<>...).iterator->value;

Sure.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:68
>> +        m_ongoingRequests.add(identifier, WTFMove(takenRequest));
> 
> Are we sure m_ongoingRequests does not have identifier?
> It seems maybe we should ensure in start that m_pendingRequests and m_ongoingIdentifiers do not contain identifier.
> Or we could have just one map and have a status in the request object.

Yes, a client can only have one request at a time.  Will add an assert for m_ongoingIdentifiers in start().

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:75
>> +    if (m_pendingRequests.contains(clientIdentifier)) {
> 
> I would write it as if (m_pendingRequests.remove(clientIdentifier)) to improve efficiency.
> Ditto below.

Sure.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:81
>> +    ASSERT(m_ongoingRequests.contains(clientIdentifier));
> 
> This also comes straight from IPC.

Will add message check.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:93
>> +    ASSERT(m_ongoingRequests.contains(clientIdentifier));
> 
> Ditto.

Will add message check.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:102
>> +    auto request = m_ongoingRequests.take(clientIdentifier);
> 
> There is no guarantee request is not null.

True, will add a check.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:127
>> +        update = WebCore::SpeechRecognitionUpdate::createResult(clientIdentifier, *result);
> 
> We allocate a vector here for a very quick moment.
> It might be better to just send all members in IPC directly and reconstruct the object in WebProcess to remove count churn and Vector allocations.

Sounds good, since that will change the IPC message format and this patch is big enough, we can do it in a follow-up patch.

>> Source/WebKit/UIProcess/WebProcessProxy.cpp:1723
>> +    }
> 
> If there is no page, there is no need to change the map. We should exit early.
> There is no guarantee that m_speechRecognitionServerMap does not already contain the identifier as well.

That's true, we don't need to change the map when page is gone.
I think we should receive createSpeechRecognitionServer and destroySpeechRecognitionServer in pair, unless WebProcessProxy is destroyed/connection is broken before destroySpeechRecognitionServer message is received, where we should have received one more create message.
Therefore, adding to m_speechRecognitionServerMapon create message and removing from m_speechRecognitionServerMap on destroy message would do the bookkeeping work.
If we receive two create messages for the same ID before a destroy message (i.e. m_speechRecognitionServerMap already has an identifier), that seems a bug because we will need to ensure server is destroyed after we receive the last destroy message.

>> Source/WebKit/UIProcess/WebProcessProxy.cpp:1725
>> +    speechRecognitionServer.iterator->value = makeUnique<SpeechRecognitionServer>(makeRef(*connection()), identifier, [weakThis = makeWeakPtr(this), weakPage = makeWeakPtr(targetPage)](const ClientOrigin& origin, CompletionHandler<void(SpeechRecognitionPermissionDecision)>&& completionHandler) mutable {
> 
> (auto& origin, auto&& completionHandler)

Okay.

>> Source/WebKit/UIProcess/WebProcessProxy.cpp:1727
>> +            return;
> 
> No need for weakThis, let's not capture it.

Okay.
Comment 38 Sihui Liu 2020-11-13 14:26:54 PST
Created attachment 414091 [details]
Patch
Comment 39 Sihui Liu 2020-11-13 16:28:15 PST
Created attachment 414104 [details]
Patch for landing
Comment 40 EWS 2020-11-13 18:17:14 PST
Committed r269810: <https://trac.webkit.org/changeset/269810>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 414104 [details].