Bug 221082

Summary:

Use user media permission prompt for speech recognition

Product:

WebKit

Reporter:

Sihui Liu <sihui_liu>

Component:

Media

Assignee:

Sihui Liu <sihui_liu>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

eric.carlson, ews-watchlist, ggaren, glenn, jer.noble, philipj, sergio, simon.fraser, webkit-bug-importer, youennf

Priority:

Keywords:

InRadar

Version:

WebKit Nightly Build

Hardware:

Unspecified

OS:

Unspecified

Attachments:

Description	Flags
Patch	none
Patch	none
Patch	none
Patch	ews-feeder: commit-queue-
Patch	none
Patch	ews-feeder: commit-queue-
Patch	ews-feeder: commit-queue-
Patch	none
Patch for landing	none

Sihui Liu

Reported 2021-01-28 00:42:00 PST

...

Attachments
Patch (43.05 KB, patch) 2021-01-28 01:08 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (44.26 KB, patch) 2021-01-28 12:33 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (36.76 KB, patch) 2021-01-28 14:10 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (40.10 KB, patch) 2021-01-29 11:44 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (40.33 KB, patch) 2021-01-29 12:22 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (44.98 KB, patch) 2021-01-29 16:23 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (44.92 KB, patch) 2021-01-29 17:40 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (45.14 KB, patch) 2021-01-29 19:07 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch for landing (45.36 KB, patch) 2021-02-01 09:59 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Show Obsolete (8) View All Add attachment proposed patch, testcase, etc.

Sihui Liu

Comment 1 2021-01-28 01:08:47 PST

Created attachment 418629 [details] Patch

Sihui Liu

Comment 2 2021-01-28 12:33:35 PST

Created attachment 418662 [details] Patch

Sihui Liu

Comment 3 2021-01-28 14:10:53 PST

Created attachment 418667 [details] Patch

Sihui Liu

Comment 4 2021-01-28 23:45:36 PST

rdar://problem/73372499

youenn fablet

Comment 5 2021-01-29 00:26:29 PST

Comment on attachment 418667 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418667&action=review > Source/WebCore/Modules/speech/SpeechRecognition.cpp:76 > + return { }; We should probably throw an exception there. And add a test covering this case. > Source/WebCore/Modules/speech/SpeechRecognition.cpp:78 > + m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives, ClientOrigin { document.topOrigin().data(), document.securityOrigin().data() }, *frame); Do we need to pass the frame or just the frame identifier? Can we use Document::frameID()? > Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:42 > + virtual void start(SpeechRecognitionConnectionClientIdentifier, const String& lang, bool continuous, bool interimResults, uint64_t maxAlternatives, ClientOrigin&&, Frame&) = 0; const Frame&, or FrameIdentifier? > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:37 > + static Ref<SpeechRecognitionPermissionRequest> create(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler) I would tend to give a name to the completion handler, something like: using PermissionCallback = CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>; > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:50 > + const WebCore::FrameIdentifier frameIdentifier() const { return m_frameIdentifier; } s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:53 > + SpeechRecognitionPermissionRequest(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler) s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ > Source/WebKit/UIProcess/WebPageProxy.cpp:10407 > + userMediaPermissionRequestManager().checkUserMediaPermissionForSpeechRecognition(frameIdentifier, requestingOrigin, topOrigin, *captureDevice, WTFMove(completionHandler)); Why not using UserMediaPermissionRequestManagerProxy::requestUserMediaPermissionForFrame and passing it an optional completion handler? Completion handler could be stored in UserMediaPermissionRequestProxy to override the current behavior. To limit the code changes, we could initially only use the completion handler for speech recognition code path. Then use the completion handler in gum code path as well as this would allow to move back WebPage code from UserMediaPermissionRequestManagerProxy.cpp back to WebPage.cpp. Or maybe we can do everything in this patch? Callback could be called in UserMediaPermissionRequestManagerProxy::finishGrantingRequest and UserMediaPermissionRequestManagerProxy::denyRequest.

Sihui Liu

Comment 6 2021-01-29 10:28:01 PST

Comment on attachment 418667 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418667&action=review >> Source/WebCore/Modules/speech/SpeechRecognition.cpp:76 >> + return { }; > > We should probably throw an exception there. And add a test covering this case. Okay, we have a layout test that covers this exception, was not sure about changing the behavior. Will update. >> Source/WebCore/Modules/speech/SpeechRecognition.cpp:78 >> + m_connection->start(identifier(), m_lang, m_continuous, m_interimResults, m_maxAlternatives, ClientOrigin { document.topOrigin().data(), document.securityOrigin().data() }, *frame); > > Do we need to pass the frame or just the frame identifier? > Can we use Document::frameID()? Just the identifier. Didn't know there is a Document::frameID(), will update. >> Source/WebCore/Modules/speech/SpeechRecognitionConnection.h:42 >> + virtual void start(SpeechRecognitionConnectionClientIdentifier, const String& lang, bool continuous, bool interimResults, uint64_t maxAlternatives, ClientOrigin&&, Frame&) = 0; > > const Frame&, or FrameIdentifier? FrameIdentifier >> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:37 >> + static Ref<SpeechRecognitionPermissionRequest> create(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler) > > I would tend to give a name to the completion handler, something like: > using PermissionCallback = CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>; Okay. >> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:50 >> + const WebCore::FrameIdentifier frameIdentifier() const { return m_frameIdentifier; } > > s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ Okay. >> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:53 >> + SpeechRecognitionPermissionRequest(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, CompletionHandler<void(Optional<WebCore::SpeechRecognitionError>&&)>&& completionHandler) > > s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ Okay. >> Source/WebKit/UIProcess/WebPageProxy.cpp:10407 >> + userMediaPermissionRequestManager().checkUserMediaPermissionForSpeechRecognition(frameIdentifier, requestingOrigin, topOrigin, *captureDevice, WTFMove(completionHandler)); > > Why not using UserMediaPermissionRequestManagerProxy::requestUserMediaPermissionForFrame and passing it an optional completion handler? > Completion handler could be stored in UserMediaPermissionRequestProxy to override the current behavior. > > To limit the code changes, we could initially only use the completion handler for speech recognition code path. > Then use the completion handler in gum code path as well as this would allow to move back WebPage code from UserMediaPermissionRequestManagerProxy.cpp back to WebPage.cpp. > Or maybe we can do everything in this patch? > Callback could be called in UserMediaPermissionRequestManagerProxy::finishGrantingRequest and UserMediaPermissionRequestManagerProxy::denyRequest. I did not use requestUserMediaPermissionForFrame to limit the changes. This patch basically only changes to use a different UIDelegate callback and invoke a different prompt. requestUserMediaPermissionForFrame seems to go through the whole process including finding a device(getUserMediaPermissionInfo, invoking delegate callback from checkUserMediaPermissionForOrigin), checking deviceIdHashSalt, validating user media request constraints, checking a set of settings (like mockCaptureDevicesEnabled, which is checked by SpeechRecognitionPermissionManager), requesting system validation(checking TCC permission, which is done by SpeechRecognitionPermissionManager). I am not sure all steps are necessary for speech recognition request. To use requestUserMediaPermissionForFrame, it seems better to merge some functionalities of SpeechRecognitionPermissionManager and userMediaPermissionRequestManager. How about switching to the same delegate callback first, and refactoring to merge in a follow up patch?

Sihui Liu

Comment 7 2021-01-29 11:44:29 PST

Created attachment 418751 [details] Patch

Sihui Liu

Comment 8 2021-01-29 12:22:31 PST

Created attachment 418754 [details] Patch

youenn fablet

Comment 9 2021-01-29 12:47:21 PST

Comment on attachment 418754 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418754&action=review > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.h:167 > + HashMap<RefPtr<UserMediaPermissionRequestProxy>, CompletionHandler<void(bool)>> m_speechRecognitionRequestCallbacks; I think it would be best to have UserMediaPermissionRequestProxy owns the CompletionHandler and only set it for speech requests. It is fine to start with a dedicated checkUserMediaPermissionForSpeechRecognition, when having the request decision, we can check whether it has a callback. If the request has a callback, we can keep calling it and returning early for now, and in a later patch, make full use of it.

Sihui Liu

Comment 10 2021-01-29 13:09:17 PST

(In reply to youenn fablet from comment #9) > Comment on attachment 418754 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=418754&action=review > > > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.h:167 > > + HashMap<RefPtr<UserMediaPermissionRequestProxy>, CompletionHandler<void(bool)>> m_speechRecognitionRequestCallbacks; > > I think it would be best to have UserMediaPermissionRequestProxy owns the > CompletionHandler and only set it for speech requests. > It is fine to start with a dedicated > checkUserMediaPermissionForSpeechRecognition, when having the request > decision, we can check whether it has a callback. > If the request has a callback, we can keep calling it and returning early > for now, and in a later patch, make full use of it. I see, will make the change.

Sihui Liu

Comment 11 2021-01-29 16:23:36 PST

Created attachment 418785 [details] Patch

Sihui Liu

Comment 12 2021-01-29 17:40:07 PST

Created attachment 418794 [details] Patch

Sihui Liu

Comment 13 2021-01-29 19:07:48 PST

Created attachment 418799 [details] Patch

youenn fablet

Comment 14 2021-02-01 00:23:44 PST

Comment on attachment 418799 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418799&action=review > Source/WebCore/Modules/speech/SpeechRecognition.cpp:79 > + auto frameIdentifier = optionalFrameIdentifier ? *optionalFrameIdentifier : FrameIdentifier { }; We probably do not need to check frame. We could do: auto frameID = document.frameID(); if (!frameID) return Exception { InvalidStateError, "Recognition is not in a valid frame"_s }; ... > Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:78 > +void SpeechRecognitionPermissionManager::request(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, SpeechRecognitionPermissionRequestCallback&& completiontHandler) s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ Ditto below. Or is it to make sure it stays const? > Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:65 > + const WebCore::FrameIdentifier m_frameIdentifier; Ditto here. > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:244 > + callback(true); We will miss device ID exposure which is done in finishGrantingRequest. I guess we should fix this in a refactoring. > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:616 > +void UserMediaPermissionRequestManagerProxy::checkUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, const WebCore::CaptureDevice& device, CompletionHandler<void(bool)>&& completionHandler) ditto for const WebCore::FrameIdentifier > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:632 > + } Can we replace wasRequestDenied and searchForGrantedRequest with getRequestAction(). We will miss the ability of being able to reprompt user if speech recognition is made as part of a user gesture. > Source/WebKit/UIProcess/UserMediaPermissionRequestProxy.cpp:44 > + , m_decisionCompletionHandler(WTFMove(decisionCompletionHandler)) We probably need to call this completion handler in UserMediaPermissionRequestManagerProxy::invalidatePendingRequests() or UserMediaPermissionRequestProxy::invalidate(). > Source/WebKit/UIProcess/WebPageProxy.cpp:10418 > +void WebPageProxy::requestUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, CompletionHandler<void(bool)>&& completionHandler) const WebCore::FrameIdentifier?

Sihui Liu

Comment 15 2021-02-01 09:51:38 PST

Comment on attachment 418799 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418799&action=review >> Source/WebCore/Modules/speech/SpeechRecognition.cpp:79 >> + auto frameIdentifier = optionalFrameIdentifier ? *optionalFrameIdentifier : FrameIdentifier { }; > > We probably do not need to check frame. > We could do: > auto frameID = document.frameID(); > if (!frameID) > return Exception { InvalidStateError, "Recognition is not in a valid frame"_s }; > ... I found WebKitLegacy does not have frameIdentifier (WebFrameLoaderClient::frameID()). If we throw exception here, we will have different behavior in WebKit and WebKitLegacy and break some layout test. >> Source/WebKit/UIProcess/SpeechRecognitionPermissionManager.cpp:78 >> +void SpeechRecognitionPermissionManager::request(const String& lang, const WebCore::ClientOrigin& origin, const WebCore::FrameIdentifier frameIdentifier, SpeechRecognitionPermissionRequestCallback&& completiontHandler) > > s/const WebCore::FrameIdentifier/WebCore::FrameIdentifier/ > Ditto below. > Or is it to make sure it stays const? No, will remove const. >> Source/WebKit/UIProcess/SpeechRecognitionPermissionRequest.h:65 >> + const WebCore::FrameIdentifier m_frameIdentifier; > > Ditto here. Okay. >> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:244 >> + callback(true); > > We will miss device ID exposure which is done in finishGrantingRequest. > I guess we should fix this in a refactoring. Not sure what is device ID exposure. I leave a comment in https://bugs.webkit.org/show_bug.cgi?id=221213 to remind myself about this. >> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:616 >> +void UserMediaPermissionRequestManagerProxy::checkUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, const WebCore::CaptureDevice& device, CompletionHandler<void(bool)>&& completionHandler) > > ditto for const WebCore::FrameIdentifier Okay. >> Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:632 >> + } > > Can we replace wasRequestDenied and searchForGrantedRequest with getRequestAction(). > We will miss the ability of being able to reprompt user if speech recognition is made as part of a user gesture. Sure. >> Source/WebKit/UIProcess/UserMediaPermissionRequestProxy.cpp:44 >> + , m_decisionCompletionHandler(WTFMove(decisionCompletionHandler)) > > We probably need to call this completion handler in UserMediaPermissionRequestManagerProxy::invalidatePendingRequests() or UserMediaPermissionRequestProxy::invalidate(). Okay. >> Source/WebKit/UIProcess/WebPageProxy.cpp:10418 >> +void WebPageProxy::requestUserMediaPermissionForSpeechRecognition(const WebCore::FrameIdentifier frameIdentifier, const WebCore::SecurityOrigin& requestingOrigin, const WebCore::SecurityOrigin& topOrigin, CompletionHandler<void(bool)>&& completionHandler) > > const WebCore::FrameIdentifier? Sure.

Sihui Liu

Comment 16 2021-02-01 09:59:45 PST

Created attachment 418890 [details] Patch for landing

EWS

Comment 17 2021-02-01 12:30:50 PST

Committed r272165: <https://trac.webkit.org/changeset/272165> All reviewed patches have been landed. Closing bug and clearing flags on attachment 418890 [details].

Simon Fraser (smfr)

Comment 18 2021-08-19 13:50:48 PDT

Comment on attachment 418890 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=418890&action=review > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:624 > + auto request = UserMediaPermissionRequestProxy::create(*this, 0, frameIdentifier, frameIdentifier, requestingOrigin.isolatedCopy(), topOrigin.isolatedCopy(), Vector<WebCore::CaptureDevice> { device }, { }, { }, WTFMove(completionHandler)); Why is it OK to pass 0 as the userRequestID here?

Sihui Liu

Comment 19 2021-08-19 14:18:21 PDT

(In reply to Simon Fraser (smfr) from comment #18) > Comment on attachment 418890 [details] > Patch for landing > > View in context: > https://bugs.webkit.org/attachment.cgi?id=418890&action=review > > > Source/WebKit/UIProcess/UserMediaPermissionRequestManagerProxy.cpp:624 > > + auto request = UserMediaPermissionRequestProxy::create(*this, 0, frameIdentifier, frameIdentifier, requestingOrigin.isolatedCopy(), topOrigin.isolatedCopy(), Vector<WebCore::CaptureDevice> { device }, { }, { }, WTFMove(completionHandler)); > > Why is it OK to pass 0 as the userRequestID here? This does not correspond to a UserMediaPermissionRequest in web process. We create the RequestProxy only to check the media permission for speech.

Note You need to log in before you can comment on or make changes to this bug.