Bug 221082 - Use user media permission prompt for speech recognition
Summary: Use user media permission prompt for speech recognition
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Sihui Liu
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-01-28 00:42 PST by Sihui Liu
Modified: 2021-08-19 14:18 PDT (History)
10 users (show)

See Also:


Attachments
Patch (43.05 KB, patch)
2021-01-28 01:08 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (44.26 KB, patch)
2021-01-28 12:33 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (36.76 KB, patch)
2021-01-28 14:10 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (40.10 KB, patch)
2021-01-29 11:44 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (40.33 KB, patch)
2021-01-29 12:22 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (44.98 KB, patch)
2021-01-29 16:23 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (44.92 KB, patch)
2021-01-29 17:40 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (45.14 KB, patch)
2021-01-29 19:07 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch for landing (45.36 KB, patch)
2021-02-01 09:59 PST, Sihui Liu
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sihui Liu 2021-01-28 00:42:00 PST
...
Comment 1 Sihui Liu 2021-01-28 01:08:47 PST
Created attachment 418629 [details]
Patch
Comment 2 Sihui Liu 2021-01-28 12:33:35 PST
Created attachment 418662 [details]
Patch
Comment 3 Sihui Liu 2021-01-28 14:10:53 PST
Created attachment 418667 [details]
Patch
Comment 4 Sihui Liu 2021-01-28 23:45:36 PST
rdar://problem/73372499
Comment 5 youenn fablet 2021-01-29 00:26:29 PST
Comment on attachment 418667 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=418667&action=review

> Source/WebCore/Modules/speech/SpeechRecognition.cpp:76
> +        return { };

We should probably throw an exception there. And add a test covering this case.

> Source/WebCore/Modules/speech/SpeechRecognition.cpp:78
> +    m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives, ClientOrigin { document.topOrigin().data(), document.securityOrigin().data() }, *frame);

Do we need to pass the frame or just the frame identifier?
Can we use Document::frameID()?

> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:42
> +    virtual void start(SpeechRecognitionConnectionClientIdentifier, const String& lang, bool continuous, bool interimResults, uint64_t maxAlternatives, ClientOrigin&&, Frame&) = 0;

const Frame&, or FrameIdentifier?

> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:37
> +    static Ref<SpeechRecognitionPermissionRequest> create(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler)

I would tend to give a name to the completion handler, something like:
using PermissionCallback = CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>;

> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:50
> +    const WebCore::FrameIdentifier frameIdentifier() const { return m_frameIdentifier; }

s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/

> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:53
> +    SpeechRecognitionPermissionRequest(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler)

s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/

> Source/WebKit/UIProcess/WebPageProxy.cpp:10407
> +    userMediaPermissionRequestManager().checkUserMediaPermissionForSpeechRecognition(frameIdentifier, requestingOrigin, topOrigin, *captureDevice, WTFMove(completionHandler));

Why not using UserMediaPermissionRequestManagerProxy::requestUserMediaPermissionForFrame and passing it an optional completion handler?
Completion handler could be stored in UserMediaPermissionRequestProxy to override the current behavior.

To limit the code changes, we could initially only use the completion handler for speech recognition code path.
Then use the completion handler in gum code path as well as this would allow to move back WebPage code from UserMediaPermissionRequestManagerProxy.cpp back to WebPage.cpp.
Or maybe we can do everything in this patch?
Callback could be called in UserMediaPermissionRequestManagerProxy::finishGrantingRequest and UserMediaPermissionRequestManagerProxy::denyRequest.
Comment 6 Sihui Liu 2021-01-29 10:28:01 PST
Comment on attachment 418667 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=418667&action=review

>> Source/WebCore/Modules/speech/SpeechRecognition.cpp:76
>> +        return { };
> 
> We should probably throw an exception there. And add a test covering this case.

Okay, we have a layout test that covers this exception, was not sure about changing the behavior. Will update.

>> Source/WebCore/Modules/speech/SpeechRecognition.cpp:78
>> +    m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives, ClientOrigin { document.topOrigin().data(), document.securityOrigin().data() }, *frame);
> 
> Do we need to pass the frame or just the frame identifier?
> Can we use Document::frameID()?

Just the identifier. Didn't know there is a Document::frameID(), will update.

>> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:42
>> +    virtual void start(SpeechRecognitionConnectionClientIdentifier, const String& lang, bool continuous, bool interimResults, uint64_t maxAlternatives, ClientOrigin&&, Frame&) = 0;
> 
> const Frame&, or FrameIdentifier?

FrameIdentifier

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:37
>> +    static Ref<SpeechRecognitionPermissionRequest> create(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler)
> 
> I would tend to give a name to the completion handler, something like:
> using PermissionCallback = CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>;

Okay.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:50
>> +    const WebCore::FrameIdentifier frameIdentifier() const { return m_frameIdentifier; }
> 
> s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/

Okay.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:53
>> +    SpeechRecognitionPermissionRequest(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler)
> 
> s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/

Okay.

>> Source/WebKit/UIProcess/WebPageProxy.cpp:10407
>> +    userMediaPermissionRequestManager().checkUserMediaPermissionForSpeechRecognition(frameIdentifier, requestingOrigin, topOrigin, *captureDevice, WTFMove(completionHandler));
> 
> Why not using UserMediaPermissionRequestManagerProxy::requestUserMediaPermissionForFrame and passing it an optional completion handler?
> Completion handler could be stored in UserMediaPermissionRequestProxy to override the current behavior.
> 
> To limit the code changes, we could initially only use the completion handler for speech recognition code path.
> Then use the completion handler in gum code path as well as this would allow to move back WebPage code from UserMediaPermissionRequestManagerProxy.cpp back to WebPage.cpp.
> Or maybe we can do everything in this patch?
> Callback could be called in UserMediaPermissionRequestManagerProxy::finishGrantingRequest and UserMediaPermissionRequestManagerProxy::denyRequest.

I did not use requestUserMediaPermissionForFrame to limit the changes. This patch basically only changes to use a different UIDelegate callback and invoke a different prompt.

requestUserMediaPermissionForFrame seems to go through the whole process including finding a device(getUserMediaPermissionInfo, invoking delegate callback from checkUserMediaPermissionForOrigin), checking deviceIdHashSalt, validating user media request constraints, checking a set of settings (like mockCaptureDevicesEnabled, which is checked by SpeechRecognitionPermissionManager), requesting system validation(checking TCC permission, which is done by SpeechRecognitionPermissionManager). I am not sure all steps are necessary for speech recognition request.

To use requestUserMediaPermissionForFrame, it seems better to merge some functionalities of SpeechRecognitionPermissionManager and userMediaPermissionRequestManager. How about switching to the same delegate callback first, and refactoring to merge in a follow up patch?
Comment 7 Sihui Liu 2021-01-29 11:44:29 PST
Created attachment 418751 [details]
Patch
Comment 8 Sihui Liu 2021-01-29 12:22:31 PST
Created attachment 418754 [details]
Patch
Comment 9 youenn fablet 2021-01-29 12:47:21 PST
Comment on attachment 418754 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=418754&action=review

> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.h:167
> +    HashMap<RefPtr<UserMediaPermissionRequestProxy>, CompletionHandler<void(bool)>> m_speechRecognitionRequestCallbacks;

I think it would be best to have UserMediaPermissionRequestProxy owns the CompletionHandler and only set it for speech requests.
It is fine to start with a dedicated checkUserMediaPermissionForSpeechRecognition, when having the request decision, we can check whether it has a callback.
If the request has a callback, we can keep calling it and returning early for now, and in a later patch, make full use of it.
Comment 10 Sihui Liu 2021-01-29 13:09:17 PST
(In reply to youenn fablet from comment #9)
> Comment on attachment 418754 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=418754&action=review
> 
> > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.h:167
> > +    HashMap<RefPtr<UserMediaPermissionRequestProxy>, CompletionHandler<void(bool)>> m_speechRecognitionRequestCallbacks;
> 
> I think it would be best to have UserMediaPermissionRequestProxy owns the
> CompletionHandler and only set it for speech requests.
> It is fine to start with a dedicated
> checkUserMediaPermissionForSpeechRecognition, when having the request
> decision, we can check whether it has a callback.
> If the request has a callback, we can keep calling it and returning early
> for now, and in a later patch, make full use of it.

I see, will make the change.
Comment 11 Sihui Liu 2021-01-29 16:23:36 PST
Created attachment 418785 [details]
Patch
Comment 12 Sihui Liu 2021-01-29 17:40:07 PST
Created attachment 418794 [details]
Patch
Comment 13 Sihui Liu 2021-01-29 19:07:48 PST
Created attachment 418799 [details]
Patch
Comment 14 youenn fablet 2021-02-01 00:23:44 PST
Comment on attachment 418799 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=418799&action=review

> Source/WebCore/Modules/speech/SpeechRecognition.cpp:79
> +    auto frameIdentifier = optionalFrameIdentifier ? *optionalFrameIdentifier : FrameIdentifier { };

We probably do not need to check frame.
We could do:
auto frameID = document.frameID();
if (!frameID)
    return Exception { InvalidStateError, "Recognition is not in a valid frame"_s };
...

> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:78
> +void SpeechRecognitionPermissionManager::request(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, SpeechRecognitionPermissionRequestCallback&& completiontHandler)

s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/
Ditto below.
Or is it to make sure it stays const?

> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:65
> +    const WebCore::FrameIdentifier m_frameIdentifier;

Ditto here.

> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:244
> +        callback(true);

We will miss device ID exposure which is done in finishGrantingRequest.
I guess we should fix this in a refactoring.

> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:616
> +void UserMediaPermissionRequestManagerProxy::checkUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, const WebCore::CaptureDevice& device, CompletionHandler<void(bool)>&& completionHandler)

ditto for const WebCore::FrameIdentifier

> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:632
> +    }

Can we replace wasRequestDenied and searchForGrantedRequest with getRequestAction().
We will miss the ability of being able to reprompt user if speech recognition is made as part of a user gesture.

> Source/WebKit/UIProcess/UserMediaPermissionRequestProxy.cpp:44
> +    , m_decisionCompletionHandler(WTFMove(decisionCompletionHandler))

We probably need to call this completion handler in UserMediaPermissionRequestManagerProxy::invalidatePendingRequests() or UserMediaPermissionRequestProxy::invalidate().

> Source/WebKit/UIProcess/WebPageProxy.cpp:10418
> +void WebPageProxy::requestUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, CompletionHandler<void(bool)>&& completionHandler)

const WebCore::FrameIdentifier?
Comment 15 Sihui Liu 2021-02-01 09:51:38 PST
Comment on attachment 418799 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=418799&action=review

>> Source/WebCore/Modules/speech/SpeechRecognition.cpp:79
>> +    auto frameIdentifier = optionalFrameIdentifier ? *optionalFrameIdentifier : FrameIdentifier { };
> 
> We probably do not need to check frame.
> We could do:
> auto frameID = document.frameID();
> if (!frameID)
>     return Exception { InvalidStateError, "Recognition is not in a valid frame"_s };
> ...

I found WebKitLegacy does not have frameIdentifier (WebFrameLoaderClient::frameID()). If we throw exception here, we will have different behavior in WebKit and WebKitLegacy and break some layout test.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:78
>> +void SpeechRecognitionPermissionManager::request(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, SpeechRecognitionPermissionRequestCallback&& completiontHandler)
> 
> s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/
> Ditto below.
> Or is it to make sure it stays const?

No, will remove const.

>> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:65
>> +    const WebCore::FrameIdentifier m_frameIdentifier;
> 
> Ditto here.

Okay.

>> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:244
>> +        callback(true);
> 
> We will miss device ID exposure which is done in finishGrantingRequest.
> I guess we should fix this in a refactoring.

Not sure what is device ID exposure. 
I leave a comment in https://bugs.webkit.org/show_bug.cgi?id=221213 to remind myself about this.

>> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:616
>> +void UserMediaPermissionRequestManagerProxy::checkUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, const WebCore::CaptureDevice& device, CompletionHandler<void(bool)>&& completionHandler)
> 
> ditto for const WebCore::FrameIdentifier

Okay.

>> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:632
>> +    }
> 
> Can we replace wasRequestDenied and searchForGrantedRequest with getRequestAction().
> We will miss the ability of being able to reprompt user if speech recognition is made as part of a user gesture.

Sure.

>> Source/WebKit/UIProcess/UserMediaPermissionRequestProxy.cpp:44
>> +    , m_decisionCompletionHandler(WTFMove(decisionCompletionHandler))
> 
> We probably need to call this completion handler in UserMediaPermissionRequestManagerProxy::invalidatePendingRequests() or UserMediaPermissionRequestProxy::invalidate().

Okay.

>> Source/WebKit/UIProcess/WebPageProxy.cpp:10418
>> +void WebPageProxy::requestUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, CompletionHandler<void(bool)>&& completionHandler)
> 
> const WebCore::FrameIdentifier?

Sure.
Comment 16 Sihui Liu 2021-02-01 09:59:45 PST
Created attachment 418890 [details]
Patch for landing
Comment 17 EWS 2021-02-01 12:30:50 PST
Committed r272165: <https://trac.webkit.org/changeset/272165>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 418890 [details].
Comment 18 Simon Fraser (smfr) 2021-08-19 13:50:48 PDT
Comment on attachment 418890 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=418890&action=review

> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:624
> +    auto request = UserMediaPermissionRequestProxy::create(*this, 0, frameIdentifier, frameIdentifier, requestingOrigin.isolatedCopy(), topOrigin.isolatedCopy(), Vector<WebCore::CaptureDevice> { device }, { }, { }, WTFMove(completionHandler));

Why is it OK to pass 0 as the userRequestID here?
Comment 19 Sihui Liu 2021-08-19 14:18:21 PDT
(In reply to Simon Fraser (smfr) from comment #18)
> Comment on attachment 418890 [details]
> Patch for landing
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=418890&action=review
> 
> > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:624
> > +    auto request = UserMediaPermissionRequestProxy::create(*this, 0, frameIdentifier, frameIdentifier, requestingOrigin.isolatedCopy(), topOrigin.isolatedCopy(), Vector<WebCore::CaptureDevice> { device }, { }, { }, WTFMove(completionHandler));
> 
> Why is it OK to pass 0 as the userRequestID here?

This does not correspond to a UserMediaPermissionRequest in web process. We create the RequestProxy only to check the media permission for speech.