Bug 221082

Summary: Use user media permission prompt for speech recognition
Product: WebKit Reporter: Sihui Liu <sihui_liu>
Component: MediaAssignee: Sihui Liu <sihui_liu>
Status: RESOLVED FIXED    
Severity: Normal CC: eric.carlson, ews-watchlist, ggaren, glenn, jer.noble, philipj, sergio, simon.fraser, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Patch for landing none

Sihui Liu
Reported 2021-01-28 00:42:00 PST
...
Attachments
Patch (43.05 KB, patch)
2021-01-28 01:08 PST, Sihui Liu
no flags
Patch (44.26 KB, patch)
2021-01-28 12:33 PST, Sihui Liu
no flags
Patch (36.76 KB, patch)
2021-01-28 14:10 PST, Sihui Liu
no flags
Patch (40.10 KB, patch)
2021-01-29 11:44 PST, Sihui Liu
ews-feeder: commit-queue-
Patch (40.33 KB, patch)
2021-01-29 12:22 PST, Sihui Liu
no flags
Patch (44.98 KB, patch)
2021-01-29 16:23 PST, Sihui Liu
ews-feeder: commit-queue-
Patch (44.92 KB, patch)
2021-01-29 17:40 PST, Sihui Liu
ews-feeder: commit-queue-
Patch (45.14 KB, patch)
2021-01-29 19:07 PST, Sihui Liu
no flags
Patch for landing (45.36 KB, patch)
2021-02-01 09:59 PST, Sihui Liu
no flags
Sihui Liu
Comment 1 2021-01-28 01:08:47 PST
Sihui Liu
Comment 2 2021-01-28 12:33:35 PST
Sihui Liu
Comment 3 2021-01-28 14:10:53 PST
Sihui Liu
Comment 4 2021-01-28 23:45:36 PST
youenn fablet
Comment 5 2021-01-29 00:26:29 PST
Comment on attachment 418667 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418667&action=review > Source/WebCore/Modules/speech/SpeechRecognition.cpp:76 > + return { }; We should probably throw an exception there. And add a test covering this case. > Source/WebCore/Modules/speech/SpeechRecognition.cpp:78 > + m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives, ClientOrigin { document.topOrigin().data(), document.securityOrigin().data() }, *frame); Do we need to pass the frame or just the frame identifier? Can we use Document::frameID()? > Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:42 > + virtual void start(SpeechRecognitionConnectionClientIdentifier, const String& lang, bool continuous, bool interimResults, uint64_t maxAlternatives, ClientOrigin&&, Frame&) = 0; const Frame&, or FrameIdentifier? > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:37 > + static Ref<SpeechRecognitionPermissionRequest> create(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler) I would tend to give a name to the completion handler, something like: using PermissionCallback = CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>; > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:50 > + const WebCore::FrameIdentifier frameIdentifier() const { return m_frameIdentifier; } s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:53 > + SpeechRecognitionPermissionRequest(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler) s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ > Source/WebKit/UIProcess/WebPageProxy.cpp:10407 > + userMediaPermissionRequestManager().checkUserMediaPermissionForSpeechRecognition(frameIdentifier, requestingOrigin, topOrigin, *captureDevice, WTFMove(completionHandler)); Why not using UserMediaPermissionRequestManagerProxy::requestUserMediaPermissionForFrame and passing it an optional completion handler? Completion handler could be stored in UserMediaPermissionRequestProxy to override the current behavior. To limit the code changes, we could initially only use the completion handler for speech recognition code path. Then use the completion handler in gum code path as well as this would allow to move back WebPage code from UserMediaPermissionRequestManagerProxy.cpp back to WebPage.cpp. Or maybe we can do everything in this patch? Callback could be called in UserMediaPermissionRequestManagerProxy::finishGrantingRequest and UserMediaPermissionRequestManagerProxy::denyRequest.
Sihui Liu
Comment 6 2021-01-29 10:28:01 PST
Comment on attachment 418667 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418667&action=review >> Source/WebCore/Modules/speech/SpeechRecognition.cpp:76 >> + return { }; > > We should probably throw an exception there. And add a test covering this case. Okay, we have a layout test that covers this exception, was not sure about changing the behavior. Will update. >> Source/WebCore/Modules/speech/SpeechRecognition.cpp:78 >> + m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives, ClientOrigin { document.topOrigin().data(), document.securityOrigin().data() }, *frame); > > Do we need to pass the frame or just the frame identifier? > Can we use Document::frameID()? Just the identifier. Didn't know there is a Document::frameID(), will update. >> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:42 >> + virtual void start(SpeechRecognitionConnectionClientIdentifier, const String& lang, bool continuous, bool interimResults, uint64_t maxAlternatives, ClientOrigin&&, Frame&) = 0; > > const Frame&, or FrameIdentifier? FrameIdentifier >> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:37 >> + static Ref<SpeechRecognitionPermissionRequest> create(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler) > > I would tend to give a name to the completion handler, something like: > using PermissionCallback = CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>; Okay. >> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:50 >> + const WebCore::FrameIdentifier frameIdentifier() const { return m_frameIdentifier; } > > s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ Okay. >> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:53 >> + SpeechRecognitionPermissionRequest(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler) > > s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ Okay. >> Source/WebKit/UIProcess/WebPageProxy.cpp:10407 >> + userMediaPermissionRequestManager().checkUserMediaPermissionForSpeechRecognition(frameIdentifier, requestingOrigin, topOrigin, *captureDevice, WTFMove(completionHandler)); > > Why not using UserMediaPermissionRequestManagerProxy::requestUserMediaPermissionForFrame and passing it an optional completion handler? > Completion handler could be stored in UserMediaPermissionRequestProxy to override the current behavior. > > To limit the code changes, we could initially only use the completion handler for speech recognition code path. > Then use the completion handler in gum code path as well as this would allow to move back WebPage code from UserMediaPermissionRequestManagerProxy.cpp back to WebPage.cpp. > Or maybe we can do everything in this patch? > Callback could be called in UserMediaPermissionRequestManagerProxy::finishGrantingRequest and UserMediaPermissionRequestManagerProxy::denyRequest. I did not use requestUserMediaPermissionForFrame to limit the changes. This patch basically only changes to use a different UIDelegate callback and invoke a different prompt. requestUserMediaPermissionForFrame seems to go through the whole process including finding a device(getUserMediaPermissionInfo, invoking delegate callback from checkUserMediaPermissionForOrigin), checking deviceIdHashSalt, validating user media request constraints, checking a set of settings (like mockCaptureDevicesEnabled, which is checked by SpeechRecognitionPermissionManager), requesting system validation(checking TCC permission, which is done by SpeechRecognitionPermissionManager). I am not sure all steps are necessary for speech recognition request. To use requestUserMediaPermissionForFrame, it seems better to merge some functionalities of SpeechRecognitionPermissionManager and userMediaPermissionRequestManager. How about switching to the same delegate callback first, and refactoring to merge in a follow up patch?
Sihui Liu
Comment 7 2021-01-29 11:44:29 PST
Sihui Liu
Comment 8 2021-01-29 12:22:31 PST
youenn fablet
Comment 9 2021-01-29 12:47:21 PST
Comment on attachment 418754 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418754&action=review > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.h:167 > + HashMap<RefPtr<UserMediaPermissionRequestProxy>, CompletionHandler<void(bool)>> m_speechRecognitionRequestCallbacks; I think it would be best to have UserMediaPermissionRequestProxy owns the CompletionHandler and only set it for speech requests. It is fine to start with a dedicated checkUserMediaPermissionForSpeechRecognition, when having the request decision, we can check whether it has a callback. If the request has a callback, we can keep calling it and returning early for now, and in a later patch, make full use of it.
Sihui Liu
Comment 10 2021-01-29 13:09:17 PST
(In reply to youenn fablet from comment #9) > Comment on attachment 418754 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=418754&action=review > > > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.h:167 > > + HashMap<RefPtr<UserMediaPermissionRequestProxy>, CompletionHandler<void(bool)>> m_speechRecognitionRequestCallbacks; > > I think it would be best to have UserMediaPermissionRequestProxy owns the > CompletionHandler and only set it for speech requests. > It is fine to start with a dedicated > checkUserMediaPermissionForSpeechRecognition, when having the request > decision, we can check whether it has a callback. > If the request has a callback, we can keep calling it and returning early > for now, and in a later patch, make full use of it. I see, will make the change.
Sihui Liu
Comment 11 2021-01-29 16:23:36 PST
Sihui Liu
Comment 12 2021-01-29 17:40:07 PST
Sihui Liu
Comment 13 2021-01-29 19:07:48 PST
youenn fablet
Comment 14 2021-02-01 00:23:44 PST
Comment on attachment 418799 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418799&action=review > Source/WebCore/Modules/speech/SpeechRecognition.cpp:79 > + auto frameIdentifier = optionalFrameIdentifier ? *optionalFrameIdentifier : FrameIdentifier { }; We probably do not need to check frame. We could do: auto frameID = document.frameID(); if (!frameID) return Exception { InvalidStateError, "Recognition is not in a valid frame"_s }; ... > Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:78 > +void SpeechRecognitionPermissionManager::request(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, SpeechRecognitionPermissionRequestCallback&& completiontHandler) s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ Ditto below. Or is it to make sure it stays const? > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:65 > + const WebCore::FrameIdentifier m_frameIdentifier; Ditto here. > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:244 > + callback(true); We will miss device ID exposure which is done in finishGrantingRequest. I guess we should fix this in a refactoring. > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:616 > +void UserMediaPermissionRequestManagerProxy::checkUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, const WebCore::CaptureDevice& device, CompletionHandler<void(bool)>&& completionHandler) ditto for const WebCore::FrameIdentifier > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:632 > + } Can we replace wasRequestDenied and searchForGrantedRequest with getRequestAction(). We will miss the ability of being able to reprompt user if speech recognition is made as part of a user gesture. > Source/WebKit/UIProcess/UserMediaPermissionRequestProxy.cpp:44 > + , m_decisionCompletionHandler(WTFMove(decisionCompletionHandler)) We probably need to call this completion handler in UserMediaPermissionRequestManagerProxy::invalidatePendingRequests() or UserMediaPermissionRequestProxy::invalidate(). > Source/WebKit/UIProcess/WebPageProxy.cpp:10418 > +void WebPageProxy::requestUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, CompletionHandler<void(bool)>&& completionHandler) const WebCore::FrameIdentifier?
Sihui Liu
Comment 15 2021-02-01 09:51:38 PST
Comment on attachment 418799 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418799&action=review >> Source/WebCore/Modules/speech/SpeechRecognition.cpp:79 >> + auto frameIdentifier = optionalFrameIdentifier ? *optionalFrameIdentifier : FrameIdentifier { }; > > We probably do not need to check frame. > We could do: > auto frameID = document.frameID(); > if (!frameID) > return Exception { InvalidStateError, "Recognition is not in a valid frame"_s }; > ... I found WebKitLegacy does not have frameIdentifier (WebFrameLoaderClient::frameID()). If we throw exception here, we will have different behavior in WebKit and WebKitLegacy and break some layout test. >> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:78 >> +void SpeechRecognitionPermissionManager::request(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, SpeechRecognitionPermissionRequestCallback&& completiontHandler) > > s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ > Ditto below. > Or is it to make sure it stays const? No, will remove const. >> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:65 >> + const WebCore::FrameIdentifier m_frameIdentifier; > > Ditto here. Okay. >> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:244 >> + callback(true); > > We will miss device ID exposure which is done in finishGrantingRequest. > I guess we should fix this in a refactoring. Not sure what is device ID exposure. I leave a comment in https://bugs.webkit.org/show_bug.cgi?id=221213 to remind myself about this. >> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:616 >> +void UserMediaPermissionRequestManagerProxy::checkUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, const WebCore::CaptureDevice& device, CompletionHandler<void(bool)>&& completionHandler) > > ditto for const WebCore::FrameIdentifier Okay. >> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:632 >> + } > > Can we replace wasRequestDenied and searchForGrantedRequest with getRequestAction(). > We will miss the ability of being able to reprompt user if speech recognition is made as part of a user gesture. Sure. >> Source/WebKit/UIProcess/UserMediaPermissionRequestProxy.cpp:44 >> + , m_decisionCompletionHandler(WTFMove(decisionCompletionHandler)) > > We probably need to call this completion handler in UserMediaPermissionRequestManagerProxy::invalidatePendingRequests() or UserMediaPermissionRequestProxy::invalidate(). Okay. >> Source/WebKit/UIProcess/WebPageProxy.cpp:10418 >> +void WebPageProxy::requestUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, CompletionHandler<void(bool)>&& completionHandler) > > const WebCore::FrameIdentifier? Sure.
Sihui Liu
Comment 16 2021-02-01 09:59:45 PST
Created attachment 418890 [details] Patch for landing
EWS
Comment 17 2021-02-01 12:30:50 PST
Committed r272165: <https://trac.webkit.org/changeset/272165> All reviewed patches have been landed. Closing bug and clearing flags on attachment 418890 [details].
Simon Fraser (smfr)
Comment 18 2021-08-19 13:50:48 PDT
Comment on attachment 418890 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=418890&action=review > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:624 > + auto request = UserMediaPermissionRequestProxy::create(*this, 0, frameIdentifier, frameIdentifier, requestingOrigin.isolatedCopy(), topOrigin.isolatedCopy(), Vector<WebCore::CaptureDevice> { device }, { }, { }, WTFMove(completionHandler)); Why is it OK to pass 0 as the userRequestID here?
Sihui Liu
Comment 19 2021-08-19 14:18:21 PDT
(In reply to Simon Fraser (smfr) from comment #18) > Comment on attachment 418890 [details] > Patch for landing > > View in context: > https://bugs.webkit.org/attachment.cgi?id=418890&action=review > > > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:624 > > + auto request = UserMediaPermissionRequestProxy::create(*this, 0, frameIdentifier, frameIdentifier, requestingOrigin.isolatedCopy(), topOrigin.isolatedCopy(), Vector<WebCore::CaptureDevice> { device }, { }, { }, WTFMove(completionHandler)); > > Why is it OK to pass 0 as the userRequestID here? This does not correspond to a UserMediaPermissionRequest in web process. We create the RequestProxy only to check the media permission for speech.
Note You need to log in before you can comment on or make changes to this bug.