218855 – Implement audio capture for SpeechRecognition on macOS

RESOLVED FIXED 218855

Implement audio capture for SpeechRecognition on macOS

https://bugs.webkit.org/show_bug.cgi?id=218855

Summary Implement audio capture for SpeechRecognition on macOS

Sihui Liu

Reported 2020-11-12 09:55:05 PST

...

Attachments
WIP (19.82 KB, patch) 2020-11-12 09:56 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (43.67 KB, patch) 2020-11-16 18:05 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (45.23 KB, patch) 2020-11-17 00:05 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (43.31 KB, patch) 2020-11-17 23:21 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (43.43 KB, patch) 2020-11-17 23:35 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (43.69 KB, patch) 2020-11-17 23:54 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (43.72 KB, patch) 2020-11-17 23:59 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (43.72 KB, patch) 2020-11-18 00:06 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (43.72 KB, patch) 2020-11-18 00:35 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (43.70 KB, patch) 2020-11-18 00:50 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (44.26 KB, patch) 2020-11-18 01:34 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (44.38 KB, patch) 2020-11-18 09:28 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (67.23 KB, patch) 2020-11-20 08:35 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (67.24 KB, patch) 2020-11-20 08:38 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (67.13 KB, patch) 2020-11-20 08:43 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (67.24 KB, patch) 2020-11-20 08:53 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (69.84 KB, patch) 2020-11-20 10:13 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (69.84 KB, patch) 2020-11-20 10:14 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (69.84 KB, patch) 2020-11-20 10:21 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (190.44 KB, patch) 2020-11-20 14:06 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (69.44 KB, patch) 2020-11-20 14:09 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (69.44 KB, patch) 2020-11-20 14:17 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (69.50 KB, patch) 2020-11-20 14:52 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (73.83 KB, patch) 2020-11-20 22:26 PST, Sihui Liu	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (73.94 KB, patch) 2020-11-20 22:54 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (70.04 KB, patch) 2020-11-21 18:44 PST, Sihui Liu	no flags	Details Formatted Diff Diff
Show Obsolete (25) View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2020-11-12 09:55:19 PST

<rdar://problem/71331001>

Sihui Liu

Comment 2 2020-11-12 09:56:51 PST

Created attachment 413947 [details] WIP

Sihui Liu

Comment 3 2020-11-16 18:05:32 PST

Created attachment 414296 [details] Patch

Sihui Liu

Comment 4 2020-11-17 00:05:07 PST

Created attachment 414314 [details] Patch

youenn fablet

Comment 5 2020-11-17 11:46:30 PST

Comment on attachment 414314 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414314&action=review > Source/WebCore/Modules/speech/SpeechRecognizerDelegate.h:39 > + virtual CaptureSourceOrError createCaptureSource() = 0; It is a bit weird to have this callback here. It might be simpler to focus this interface on reporting state only. > Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:54 > + auto sourceOrError = m_delegate->createCaptureSource(); Instead of doing that, it seems SpeechRecognizer could get a Ref<RealtimeMediaSource> as input. If needed, we could add a completion handler to start method as well. > Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:73 > + m_delegate->didStop(); Should we add a completion handler to stop instead? > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:101 > + startNextGrantedRequest(); These changes seem somehow independent of the audio capture part. Maybe we should split them. > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:118 > + m_grantedRequests.append(makeWeakPtr(request)); I would have expected that a granted request would override the previous one, i.e stop it I guess. Is this behavior shared with Chrome and Firefox. > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:137 > + m_recognizer->start(); Is it needed to recreate a recogniser for every request? > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:168 > + } I would try to search for the default device and use that one, otherwise pick the first enabled device. This does not require a Vector though. > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:175 > + return CaptureSourceOrError { "createCaptureSource is not implemented" }; on iOS, we might need to send IPC to WebProcess to create a realtime media source there instead of in UIProcess. > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:186 > +void SpeechRecognitionServer::didStartCapturingAudio() Maybe we could simplify the API so that there is a sendUpdate delegate taking a WebCore::SpeechRecognitionUpdateType? That would reduce the delegate API a bit.

Sihui Liu

Comment 6 2020-11-17 12:52:39 PST

Comment on attachment 414314 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414314&action=review >> Source/WebCore/Modules/speech/SpeechRecognizerDelegate.h:39 >> + virtual CaptureSourceOrError createCaptureSource() = 0; > > It is a bit weird to have this callback here. > It might be simpler to focus this interface on reporting state only. Okay, will remove. >> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:54 >> + auto sourceOrError = m_delegate->createCaptureSource(); > > Instead of doing that, it seems SpeechRecognizer could get a Ref<RealtimeMediaSource> as input. > If needed, we could add a completion handler to start method as well. Okay, will add Ref<RealtimeMediaSource> as input. >> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:73 >> + m_delegate->didStop(); > > Should we add a completion handler to stop instead? Probably not, because completion handler of stop may not be called if abort() is called after stop(). >> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:101 >> + startNextGrantedRequest(); > > These changes seem somehow independent of the audio capture part. > Maybe we should split them. That sounds good. Added this here as I figured out what SpeechRecognitionServer should do after creating SpeechRecognizer. >> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:118 >> + m_grantedRequests.append(makeWeakPtr(request)); > > I would have expected that a granted request would override the previous one, i.e stop it I guess. > Is this behavior shared with Chrome and Firefox. You are right; tried on Chrome and it aborts ongoing request when another request is granted. Will change this. >> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:137 >> + m_recognizer->start(); > > Is it needed to recreate a recogniser for every request? Not really I guess. It is just easier to distinguish between abort and stop. (stop: wait until recognizer acknowledges it stops; abort: throw away everything immediately). And technically we will need different recognizers for requests of different languages, which is not needed for now. >> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:168 >> + } > > I would try to search for the default device and use that one, otherwise pick the first enabled device. > This does not require a Vector though. Sure, will change. >> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:175 >> + return CaptureSourceOrError { "createCaptureSource is not implemented" }; > > on iOS, we might need to send IPC to WebProcess to create a realtime media source there instead of in UIProcess. Yes, seems we need to create something like RemoteRealtimeMediaSource in UI process. Will try that in another patch. >> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:186 >> +void SpeechRecognitionServer::didStartCapturingAudio() > > Maybe we could simplify the API so that there is a sendUpdate delegate taking a WebCore::SpeechRecognitionUpdateType? > That would reduce the delegate API a bit. You mean simplifying interfaces of SpeechRecognizerDelegate? Okay will try.

Sihui Liu

Comment 7 2020-11-17 23:21:16 PST Comment hidden (obsolete)

Created attachment 414417 [details] Patch

Sihui Liu

Comment 8 2020-11-17 23:35:03 PST Comment hidden (obsolete)

Created attachment 414419 [details] Patch

Sihui Liu

Comment 9 2020-11-17 23:54:04 PST Comment hidden (obsolete)

Created attachment 414420 [details] Patch

Sihui Liu

Comment 10 2020-11-17 23:59:02 PST Comment hidden (obsolete)

Created attachment 414422 [details] Patch

Sihui Liu

Comment 11 2020-11-18 00:06:47 PST Comment hidden (obsolete)

Created attachment 414423 [details] Patch

Sihui Liu

Comment 12 2020-11-18 00:35:30 PST Comment hidden (obsolete)

Created attachment 414424 [details] Patch

Sihui Liu

Comment 13 2020-11-18 00:50:19 PST Comment hidden (obsolete)

Created attachment 414426 [details] Patch

Sihui Liu

Comment 14 2020-11-18 01:34:43 PST Comment hidden (obsolete)

Created attachment 414429 [details] Patch

Sihui Liu

Comment 15 2020-11-18 09:28:43 PST

Created attachment 414456 [details] Patch

youenn fablet

Comment 16 2020-11-19 04:37:44 PST

Comment on attachment 414456 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414456&action=review > Source/WebCore/ChangeLog:14 > + Manually tested with MiniBrowser. Can we write some layout test. We should be able to get mocks going on so that some events get fired in WTR. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:37 > +class SpeechRecognitionCaptureSource : public RealtimeMediaSource::Observer, public RealtimeMediaSource::AudioSampleObserver { final? > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:56 > + m_source = makeUnique<SpeechRecognitionCaptureSource>([this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionCaptureSource::Buffer buffer) { This will probably be called in a background thread so I am not sure this is good. Maybe you do not need to do processing in a background thread and could hop to the main thread here if latency for the speech recogniser is not an issue. Speech recognition layoutTests with mock audio in debug should be able to capture these kind of issues. Also, this processing might be done in the audio high priority thread where we try to not allocate memory. The typical way of doing things is with a ring buffer or AudioSampleDataSource. > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80 > + m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) { s.const ...&/auto/ > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:104 > return; We also probably need some tests where we trigger several requests in parallel and verify we have the right events on the page.

Sihui Liu

Comment 17 2020-11-19 09:10:45 PST

(In reply to youenn fablet from comment #16) > Comment on attachment 414456 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=414456&action=review > > > Source/WebCore/ChangeLog:14 > > + Manually tested with MiniBrowser. > > Can we write some layout test. > We should be able to get mocks going on so that some events get fired in WTR. Sure. Will write. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:37 > > +class SpeechRecognitionCaptureSource : public RealtimeMediaSource::Observer, public RealtimeMediaSource::AudioSampleObserver { > > final? Will add. > > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:56 > > + m_source = makeUnique<SpeechRecognitionCaptureSource>([this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionCaptureSource::Buffer buffer) { > > This will probably be called in a background thread so I am not sure this is > good. > Maybe you do not need to do processing in a background thread and could hop > to the main thread here if latency for the speech recogniser is not an issue. > Speech recognition layoutTests with mock audio in debug should be able to > capture these kind of issues. > > Also, this processing might be done in the audio high priority thread where > we try to not allocate memory. > The typical way of doing things is with a ring buffer or > AudioSampleDataSource. I see, will try hopping back to main thread and using ring buffer. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80 > > + m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) { > > s.const ...&/auto/ Okay. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:104 > > return; > > We also probably need some tests where we trigger several requests in > parallel and verify we have the right events on the page. Sure, will add.

Sihui Liu

Comment 18 2020-11-20 08:35:25 PST Comment hidden (obsolete)

Created attachment 414679 [details] Patch

Sihui Liu

Comment 19 2020-11-20 08:38:19 PST Comment hidden (obsolete)

Created attachment 414681 [details] Patch

Sihui Liu

Comment 20 2020-11-20 08:43:35 PST

Created attachment 414684 [details] Patch

Sihui Liu

Comment 21 2020-11-20 08:53:29 PST

Created attachment 414687 [details] Patch

youenn fablet

Comment 22 2020-11-20 09:16:22 PST

Comment on attachment 414687 [details] Patch Looks almost ready to go. Can you fix the m_errorCallback and check the weakThis creation? View in context: https://bugs.webkit.org/attachment.cgi?id=414687&action=review > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:47 > + using ErrorCallback = Function<void(String)>; String&& > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:78 > + setUpSource(*captureDevice); s/setUp/setup/ ? Or maybe just setSource. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:85 > + , m_errorCallback(WTFMove(errorCallback)) I wound tend to merge m_startCallback and m_errorCallback in a m_updateStateCallback. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:127 > + auto dataSource = AudioSampleDataSource::create(description.sampleRate() * 2, *m_source); We want to reduce it to less than 2 seconds but I guess that is fine. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:134 > + m_errorCallback("Unable to set the output format of data source"); Not great to have m_errorCallback called on either main thread or background thread. Maybe we should hop there. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:143 > + callOnMainThread([this, weakThis = makeWeakPtr(this), time, audioDescription, sampleCount] { You need to make sure weak pointer is created at constructor time, either using WeakPtrFactoryInitialization::Eager or explicitly in constructor. > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:42 > + if (m_clientIdentifier) { if (!m_clientIdentifier) return; > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:85 > + m_delegateCallback(SpeechRecognitionUpdate::createError(identifier, error)); It seems errorCallback and startCallback could be merged by them taking a SpeechRecognitionUpdate object. You might just need to add a check to nullptr m_source > Source/WebCore/platform/cocoa/MediaUtilities.cpp:52 > + auto format = createAudioFormatDescription(description); Maybe add a FIXME to check whether we can reuse the format for several sample buffers. > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80 > + m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) { auto&

Sihui Liu

Comment 23 2020-11-20 10:13:50 PST

Created attachment 414693 [details] Patch

Sihui Liu

Comment 24 2020-11-20 10:14:39 PST

Created attachment 414694 [details] Patch

Sihui Liu

Comment 25 2020-11-20 10:20:41 PST

(In reply to youenn fablet from comment #22) > Comment on attachment 414687 [details] > Patch > > Looks almost ready to go. > Can you fix the m_errorCallback and check the weakThis creation? > Updated patch to merge m_errorCallback and m_startCallback to m_updateStateCallback and initialize m_weakThis in constructor of SpeechRecognitionCaptureSourceImpl. > View in context: > https://bugs.webkit.org/attachment.cgi?id=414687&action=review > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:47 > > + using ErrorCallback = Function<void(String)>; > > String&& > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:78 > > + setUpSource(*captureDevice); > > s/setUp/setup/ ? > Or maybe just setSource. Changed to setSource. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:85 > > + , m_errorCallback(WTFMove(errorCallback)) > > I wound tend to merge m_startCallback and m_errorCallback in a > m_updateStateCallback. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:127 > > + auto dataSource = AudioSampleDataSource::create(description.sampleRate() * 2, *m_source); > > We want to reduce it to less than 2 seconds but I guess that is fine. Changed to 1. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:134 > > + m_errorCallback("Unable to set the output format of data source"); > > Not great to have m_errorCallback called on either main thread or background > thread. > Maybe we should hop there. Added SpeechRecognitionCaptureSourceImpl::invokeUpdateStateCallback to hop to main thread. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:143 > > + callOnMainThread([this, weakThis = makeWeakPtr(this), time, audioDescription, sampleCount] { > > You need to make sure weak pointer is created at constructor time, either > using WeakPtrFactoryInitialization::Eager or explicitly in constructor. Choose the second as RealtimeMediaSource::Observer, which SpeechRecognitionCaptureSourceImpl.cpp inherits, is not WeakPtrFactoryInitialization::Eager. > > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:42 > > + if (m_clientIdentifier) { > > if (!m_clientIdentifier) > return; > > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:85 > > + m_delegateCallback(SpeechRecognitionUpdate::createError(identifier, error)); > > It seems errorCallback and startCallback could be merged by them taking a > SpeechRecognitionUpdate object. > You might just need to add a check to nullptr m_source For distinguishing start and error event, Optional<String> seems enough. > > > Source/WebCore/platform/cocoa/MediaUtilities.cpp:52 > > + auto format = createAudioFormatDescription(description); > > Maybe add a FIXME to check whether we can reuse the format for several > sample buffers. Added. > > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80 > > + m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) { > > auto& Okay.

Sihui Liu

Comment 26 2020-11-20 10:21:38 PST

Created attachment 414695 [details] Patch

youenn fablet

Comment 27 2020-11-20 11:54:35 PST

Comment on attachment 414695 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review > Source/WebKit/ChangeLog:21 > +2020-11-20 Sihui Liu <sihui_liu@apple.com> Double entry > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:34 > + m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(WTFMove(dataCallback), WTFMove(updateStateCallback)); Where do we set the source? > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:44 > + SpeechRecognitionCaptureSource() = default; Do we need this constructor? > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:59 > + , m_weakThis(makeWeakPtr(this)) You might not need m_weakThis, just call makeWeakPtr(this) > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:87 > + , m_updateStateCallback(WTFMove(updateStateCallback)) It would probably be better to have SpeechRecognitionCaptureSourceImpl constructor take a Ref<RealtimeMediaSource>&&. Might use a static create method. That can be done as a follow-up. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:118 > + m_source->stop(); We should probably call removeObserver and removeAudioSampleObserver. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:141 > + m_dataSource = WTFMove(dataSource); We need a lock here when setting m_dataSource since m_dataSource is called from a background thread. We should tryLock in the background thread. If failing, return early and wait for next time to get the lock. We can just lock in the main thread. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:151 > + m_dataSource->pullSamples(*data.list(), sampleCount, time.timeValue(), 0, AudioSampleDataSource::Copy); Just there, we probably do not want to keep the lock in m_dataCallback, just in case. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:182 > + callOnMainThread([this, weakThis = m_weakThis, optionalError = errorMessage] { This is potentially not safe as we do not isolate copy string. I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54 > + SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&); Why do we need both constructors? > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 > + m_source = nullptr; I would make updateStateCallback take a SpeechRecognitionUpdate. And write is as follow: if (!weakThis) return; m_delegateCallback(WTFMove(update)); if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) m_source = nullptr; > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:104 > + Vector<SpeechRecognitionResultData> resultDatas; We might need to build a mock backend in the future given that the real backend requires a TCC prompt.

Sihui Liu

Comment 28 2020-11-20 12:07:10 PST

Comment on attachment 414695 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review >> Source/WebKit/ChangeLog:21 >> +2020-11-20 Sihui Liu <sihui_liu@apple.com> > > Double entry Will remove. >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:34 >> + m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(WTFMove(dataCallback), WTFMove(updateStateCallback)); > > Where do we set the source? SpeechRecognizer? >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:44 >> + SpeechRecognitionCaptureSource() = default; > > Do we need this constructor? Plan to subclass it for iOS implementation, not needed now. >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:59 >> + , m_weakThis(makeWeakPtr(this)) > > You might not need m_weakThis, just call makeWeakPtr(this) Okay. >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:87 >> + , m_updateStateCallback(WTFMove(updateStateCallback)) > > It would probably be better to have SpeechRecognitionCaptureSourceImpl constructor take a Ref<RealtimeMediaSource>&&. > Might use a static create method. That can be done as a follow-up. Okay, will do a follow-up >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:118 >> + m_source->stop(); > > We should probably call removeObserver and removeAudioSampleObserver. Sure. >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:141 >> + m_dataSource = WTFMove(dataSource); > > We need a lock here when setting m_dataSource since m_dataSource is called from a background thread. > We should tryLock in the background thread. If failing, return early and wait for next time to get the lock. > We can just lock in the main thread. Okay, will add lock. >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:151 >> + m_dataSource->pullSamples(*data.list(), sampleCount, time.timeValue(), 0, AudioSampleDataSource::Copy); > > Just there, we probably do not want to keep the lock in m_dataCallback, just in case. Okay. >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:182 >> + callOnMainThread([this, weakThis = m_weakThis, optionalError = errorMessage] { > > This is potentially not safe as we do not isolate copy string. > I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable. Will do isolated copy. In audioSamplesAvailable? >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54 >> + SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&); > > Why do we need both constructors? Supposedly we should specify device/source for iOS. >> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 >> + m_source = nullptr; > > I would make updateStateCallback take a SpeechRecognitionUpdate. > And write is as follow: > if (!weakThis) > return; > m_delegateCallback(WTFMove(update)); > if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) > m_source = nullptr; Sure. >> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:104 >> + Vector<SpeechRecognitionResultData> resultDatas; > > We might need to build a mock backend in the future given that the real backend requires a TCC prompt. Okay.

youenn fablet

Comment 29 2020-11-20 12:10:56 PST

> > This is potentially not safe as we do not isolate copy string. > > I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable. > > Will do isolated copy. > In audioSamplesAvailable? Not needed, you might want to callOnMainThread in audioSamplesAvailable and create the string in main thread. > >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54 > >> + SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&); > > > > Why do we need both constructors? > > Supposedly we should specify device/source for iOS. Let's remove the one that is not used for now.

Sihui Liu

Comment 30 2020-11-20 12:35:17 PST

Comment on attachment 414695 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review >>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 >>> + m_source = nullptr; >> >> I would make updateStateCallback take a SpeechRecognitionUpdate. >> And write is as follow: >> if (!weakThis) >> return; >> m_delegateCallback(WTFMove(update)); >> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) >> m_source = nullptr; > > Sure. hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. Do you think we should pass clientIdentifier to it?

youenn fablet

Comment 31 2020-11-20 12:41:15 PST

Comment on attachment 414695 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review >>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54 >>> + SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&); >> >> Why do we need both constructors? > > Supposedly we should specify device/source for iOS. Well, for now, let's just keep the first one, that will remove some duplicated code. >>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 >>>> + m_source = nullptr; >>> >>> I would make updateStateCallback take a SpeechRecognitionUpdate. >>> And write is as follow: >>> if (!weakThis) >>> return; >>> m_delegateCallback(WTFMove(update)); >>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) >>> m_source = nullptr; >> >> Sure. > > hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. > Do you think we should pass clientIdentifier to it? Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier It seems SpeechRecognitionCaptureSourceIdentifier is not used. Otherwise, we could just pass a SpeechRecognitionUpdateType. Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging.

Sihui Liu

Comment 32 2020-11-20 12:57:01 PST

Comment on attachment 414695 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review >>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 >>>>> + m_source = nullptr; >>>> >>>> I would make updateStateCallback take a SpeechRecognitionUpdate. >>>> And write is as follow: >>>> if (!weakThis) >>>> return; >>>> m_delegateCallback(WTFMove(update)); >>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) >>>> m_source = nullptr; >>> >>> Sure. >> >> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. >> Do you think we should pass clientIdentifier to it? > > Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier > It seems SpeechRecognitionCaptureSourceIdentifier is not used. > Otherwise, we could just pass a SpeechRecognitionUpdateType. > Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging. SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources between processes. Okay, will pass a type instead.

youenn fablet

Comment 33 2020-11-20 12:59:46 PST

(In reply to Sihui Liu from comment #32) > Comment on attachment 414695 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=414695&action=review > > >>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 > >>>>> + m_source = nullptr; > >>>> > >>>> I would make updateStateCallback take a SpeechRecognitionUpdate. > >>>> And write is as follow: > >>>> if (!weakThis) > >>>> return; > >>>> m_delegateCallback(WTFMove(update)); > >>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) > >>>> m_source = nullptr; > >>> > >>> Sure. > >> > >> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. > >> Do you think we should pass clientIdentifier to it? > > > > Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier > > It seems SpeechRecognitionCaptureSourceIdentifier is not used. > > Otherwise, we could just pass a SpeechRecognitionUpdateType. > > Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging. > > SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources > between processes. Okay, will pass a type instead. I am not sure SpeechRecognitionCaptureSourceIdentifier is the right thing then. You might best use a RealtimeMediaSource which will actually be remote. The remote version will have an ID but this will be hidden from speech recognition.

Sihui Liu

Comment 34 2020-11-20 13:14:46 PST

(In reply to youenn fablet from comment #33) > (In reply to Sihui Liu from comment #32) > > Comment on attachment 414695 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=414695&action=review > > > > >>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 > > >>>>> + m_source = nullptr; > > >>>> > > >>>> I would make updateStateCallback take a SpeechRecognitionUpdate. > > >>>> And write is as follow: > > >>>> if (!weakThis) > > >>>> return; > > >>>> m_delegateCallback(WTFMove(update)); > > >>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) > > >>>> m_source = nullptr; > > >>> > > >>> Sure. > > >> > > >> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. > > >> Do you think we should pass clientIdentifier to it? > > > > > > Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier > > > It seems SpeechRecognitionCaptureSourceIdentifier is not used. > > > Otherwise, we could just pass a SpeechRecognitionUpdateType. > > > Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging. > > > > SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources > > between processes. Okay, will pass a type instead. > > I am not sure SpeechRecognitionCaptureSourceIdentifier is the right thing > then. > You might best use a RealtimeMediaSource which will actually be remote. > The remote version will have an ID but this will be hidden from speech > recognition. ah I see, You are suggesting changing RealtimeMediaSource in SpeechRecognitionCaptureSourceImpl for remote case. I was thinking about having another RemoteSpeechRecognitionCaptureSource.

Sihui Liu

Comment 35 2020-11-20 14:06:24 PST

Created attachment 414720 [details] Patch

Sihui Liu

Comment 36 2020-11-20 14:09:26 PST

Created attachment 414722 [details] Patch

Sihui Liu

Comment 37 2020-11-20 14:17:53 PST

Created attachment 414724 [details] Patch

youenn fablet

Comment 38 2020-11-20 14:35:11 PST

r=me once bots are green. I think it might be good to do a follow-up patch that would merge SpeechRecognitionCaptureSourceImpl inside SpeechRecognitionCaptureSource. RealtimeMediaSource should be able to encapsulate whether the source is remote or not. For instance MediaStreamTrack is using a RealtimeMediaSource which is either local or remote without any issue. View in context: https://bugs.webkit.org/attachment.cgi?id=414722&action=review > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:130 > + invokeStartOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format"); I would write it as: callOnMainThread([this, weakThis = makeWeakPtr(this)] { if (weakThis) m_startOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format"); }); This is not much longer and removes the need for invokeStartOrErrorCallback. I would tend to rename m_startOrErrorCallback to m_updateStateCallback. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:139 > + if (m_dataSourceLock.tryLock()) m_dataSourceLock is not unlocked. It might be best to write it like that: if (auto locker = tryHoldLock(m_dataSourceLock)). I am wondering whether the layout tests are actually running that code path. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:154 > + m_dataSourceLock.unlock(); This is fine like this, I tend to prefer holdLock though. > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:186 > + callOnMainThread([this, weakThis = makeWeakPtr(this), type, error = error.isolatedCopy()] { This makes an unnecessary string copy here. We can prevent it by using String&& and doing WTFMove(error).isolatedCopy(). > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:52 > + m_delegateCallback(SpeechRecognitionUpdate::createError(*m_clientIdentifier, error)); We might be doing some unnecessary count churning here when copying error. > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:67 > + auto dataCallback = [weakThis = makeWeakPtr(this)](const WTF::MediaTime& time, const PlatformAudioData& data, const AudioStreamDescription& description, size_t sampleCount) { could use auto for the parameters > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:82 > + auto updateStateCallback = [this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionUpdateType type, const String& errorMessage) { could use auto for the parameters

Sihui Liu

Comment 39 2020-11-20 14:52:16 PST

Created attachment 414733 [details] Patch

Sihui Liu

Comment 40 2020-11-20 22:26:19 PST

Created attachment 414754 [details] Patch

Sihui Liu

Comment 41 2020-11-20 22:54:38 PST

Created attachment 414756 [details] Patch

Sihui Liu

Comment 42 2020-11-20 23:14:09 PST

(In reply to youenn fablet from comment #38) > r=me once bots are green. > I think it might be good to do a follow-up patch that would merge > SpeechRecognitionCaptureSourceImpl inside SpeechRecognitionCaptureSource. > RealtimeMediaSource should be able to encapsulate whether the source is > remote or not. > For instance MediaStreamTrack is using a RealtimeMediaSource which is either > local or remote without any issue. Sure. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=414722&action=review > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:130 > > + invokeStartOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format"); > > I would write it as: > callOnMainThread([this, weakThis = makeWeakPtr(this)] { > if (weakThis) > m_startOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable > to set input format"); > }); > This is not much longer and removes the need for invokeStartOrErrorCallback. > I would tend to rename m_startOrErrorCallback to m_updateStateCallback. Do you mean > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:139 > > + if (m_dataSourceLock.tryLock()) > > m_dataSourceLock is not unlocked. > It might be best to write it like that: if (auto locker = > tryHoldLock(m_dataSourceLock)). > I am wondering whether the layout tests are actually running that code path. Right, the layout tests did catch that (not releasing lock). Will change to use locker. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:154 > > + m_dataSourceLock.unlock(); > > This is fine like this, I tend to prefer holdLock though. Changed. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:186 > > + callOnMainThread([this, weakThis = makeWeakPtr(this), type, error = error.isolatedCopy()] { > > This makes an unnecessary string copy here. > We can prevent it by using String&& and doing WTFMove(error).isolatedCopy(). I've updated the patch to make the callback take a SpeechRecognitionUpdate instead of string as you suggested. > > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:52 > > + m_delegateCallback(SpeechRecognitionUpdate::createError(*m_clientIdentifier, error)); > > We might be doing some unnecessary count churning here when copying error. > > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:67 > > + auto dataCallback = [weakThis = makeWeakPtr(this)](const WTF::MediaTime& time, const PlatformAudioData& data, const AudioStreamDescription& description, size_t sampleCount) { > > could use auto for the parameters Sure. > > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:82 > > + auto updateStateCallback = [this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionUpdateType type, const String& errorMessage) { > > could use auto for the parameters Sure.

youenn fablet

Comment 43 2020-11-21 01:58:32 PST

Comment on attachment 414756 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=414756&action=review > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91 > + invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format"); Let's use an explicit callOnMainThread here so that we do not need to go in the business of using isolatedCopy. We could add to invokeStateUpdateCallback an ASSERT(isMainThread()); > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:96 > + invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set output format"); Ditto

Sihui Liu

Comment 44 2020-11-21 18:43:09 PST

(In reply to youenn fablet from comment #43) > Comment on attachment 414756 [details] > Patch Thanks for the review! Will update. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=414756&action=review > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91 > > + invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format"); > > Let's use an explicit callOnMainThread here so that we do not need to go in > the business of using isolatedCopy. > We could add to invokeStateUpdateCallback an ASSERT(isMainThread()); > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:96 > > + invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set output format"); > > Ditto

Sihui Liu

Comment 45 2020-11-21 18:44:48 PST

Created attachment 414779 [details] Patch

EWS

Comment 46 2020-11-21 21:51:33 PST

Committed r270158: <https://trac.webkit.org/changeset/270158> All reviewed patches have been landed. Closing bug and clearing flags on attachment 414779 [details].

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Accessibility

Assignee

Sihui Liu

Reported

2020-11-12 09:55 PST

Modified

2020-11-21 21:51 PST History

CC List

14 users Show

URL

Keywords InRadar

Depends on

Blocks