WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
218855
Implement audio capture for SpeechRecognition on macOS
https://bugs.webkit.org/show_bug.cgi?id=218855
Summary
Implement audio capture for SpeechRecognition on macOS
Sihui Liu
Reported
2020-11-12 09:55:05 PST
...
Attachments
WIP
(19.82 KB, patch)
2020-11-12 09:56 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(43.67 KB, patch)
2020-11-16 18:05 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(45.23 KB, patch)
2020-11-17 00:05 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(43.31 KB, patch)
2020-11-17 23:21 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(43.43 KB, patch)
2020-11-17 23:35 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(43.69 KB, patch)
2020-11-17 23:54 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(43.72 KB, patch)
2020-11-17 23:59 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(43.72 KB, patch)
2020-11-18 00:06 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(43.72 KB, patch)
2020-11-18 00:35 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(43.70 KB, patch)
2020-11-18 00:50 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(44.26 KB, patch)
2020-11-18 01:34 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(44.38 KB, patch)
2020-11-18 09:28 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(67.23 KB, patch)
2020-11-20 08:35 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(67.24 KB, patch)
2020-11-20 08:38 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(67.13 KB, patch)
2020-11-20 08:43 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(67.24 KB, patch)
2020-11-20 08:53 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(69.84 KB, patch)
2020-11-20 10:13 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(69.84 KB, patch)
2020-11-20 10:14 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(69.84 KB, patch)
2020-11-20 10:21 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(190.44 KB, patch)
2020-11-20 14:06 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(69.44 KB, patch)
2020-11-20 14:09 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(69.44 KB, patch)
2020-11-20 14:17 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(69.50 KB, patch)
2020-11-20 14:52 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(73.83 KB, patch)
2020-11-20 22:26 PST
,
Sihui Liu
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(73.94 KB, patch)
2020-11-20 22:54 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Patch
(70.04 KB, patch)
2020-11-21 18:44 PST
,
Sihui Liu
no flags
Details
Formatted Diff
Diff
Show Obsolete
(25)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2020-11-12 09:55:19 PST
<
rdar://problem/71331001
>
Sihui Liu
Comment 2
2020-11-12 09:56:51 PST
Created
attachment 413947
[details]
WIP
Sihui Liu
Comment 3
2020-11-16 18:05:32 PST
Created
attachment 414296
[details]
Patch
Sihui Liu
Comment 4
2020-11-17 00:05:07 PST
Created
attachment 414314
[details]
Patch
youenn fablet
Comment 5
2020-11-17 11:46:30 PST
Comment on
attachment 414314
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414314&action=review
> Source/WebCore/Modules/speech/SpeechRecognizerDelegate.h:39 > + virtual CaptureSourceOrError createCaptureSource() = 0;
It is a bit weird to have this callback here. It might be simpler to focus this interface on reporting state only.
> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:54 > + auto sourceOrError = m_delegate->createCaptureSource();
Instead of doing that, it seems SpeechRecognizer could get a Ref<RealtimeMediaSource> as input. If needed, we could add a completion handler to start method as well.
> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:73 > + m_delegate->didStop();
Should we add a completion handler to stop instead?
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:101 > + startNextGrantedRequest();
These changes seem somehow independent of the audio capture part. Maybe we should split them.
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:118 > + m_grantedRequests.append(makeWeakPtr(request));
I would have expected that a granted request would override the previous one, i.e stop it I guess. Is this behavior shared with Chrome and Firefox.
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:137 > + m_recognizer->start();
Is it needed to recreate a recogniser for every request?
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:168 > + }
I would try to search for the default device and use that one, otherwise pick the first enabled device. This does not require a Vector though.
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:175 > + return CaptureSourceOrError { "createCaptureSource is not implemented" };
on iOS, we might need to send IPC to WebProcess to create a realtime media source there instead of in UIProcess.
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:186 > +void SpeechRecognitionServer::didStartCapturingAudio()
Maybe we could simplify the API so that there is a sendUpdate delegate taking a WebCore::SpeechRecognitionUpdateType? That would reduce the delegate API a bit.
Sihui Liu
Comment 6
2020-11-17 12:52:39 PST
Comment on
attachment 414314
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414314&action=review
>> Source/WebCore/Modules/speech/SpeechRecognizerDelegate.h:39 >> + virtual CaptureSourceOrError createCaptureSource() = 0; > > It is a bit weird to have this callback here. > It might be simpler to focus this interface on reporting state only.
Okay, will remove.
>> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:54 >> + auto sourceOrError = m_delegate->createCaptureSource(); > > Instead of doing that, it seems SpeechRecognizer could get a Ref<RealtimeMediaSource> as input. > If needed, we could add a completion handler to start method as well.
Okay, will add Ref<RealtimeMediaSource> as input.
>> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:73 >> + m_delegate->didStop(); > > Should we add a completion handler to stop instead?
Probably not, because completion handler of stop may not be called if abort() is called after stop().
>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:101 >> + startNextGrantedRequest(); > > These changes seem somehow independent of the audio capture part. > Maybe we should split them.
That sounds good. Added this here as I figured out what SpeechRecognitionServer should do after creating SpeechRecognizer.
>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:118 >> + m_grantedRequests.append(makeWeakPtr(request)); > > I would have expected that a granted request would override the previous one, i.e stop it I guess. > Is this behavior shared with Chrome and Firefox.
You are right; tried on Chrome and it aborts ongoing request when another request is granted. Will change this.
>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:137 >> + m_recognizer->start(); > > Is it needed to recreate a recogniser for every request?
Not really I guess. It is just easier to distinguish between abort and stop. (stop: wait until recognizer acknowledges it stops; abort: throw away everything immediately). And technically we will need different recognizers for requests of different languages, which is not needed for now.
>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:168 >> + } > > I would try to search for the default device and use that one, otherwise pick the first enabled device. > This does not require a Vector though.
Sure, will change.
>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:175 >> + return CaptureSourceOrError { "createCaptureSource is not implemented" }; > > on iOS, we might need to send IPC to WebProcess to create a realtime media source there instead of in UIProcess.
Yes, seems we need to create something like RemoteRealtimeMediaSource in UI process. Will try that in another patch.
>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:186 >> +void SpeechRecognitionServer::didStartCapturingAudio() > > Maybe we could simplify the API so that there is a sendUpdate delegate taking a WebCore::SpeechRecognitionUpdateType? > That would reduce the delegate API a bit.
You mean simplifying interfaces of SpeechRecognizerDelegate? Okay will try.
Sihui Liu
Comment 7
2020-11-17 23:21:16 PST
Comment hidden (obsolete)
Created
attachment 414417
[details]
Patch
Sihui Liu
Comment 8
2020-11-17 23:35:03 PST
Comment hidden (obsolete)
Created
attachment 414419
[details]
Patch
Sihui Liu
Comment 9
2020-11-17 23:54:04 PST
Comment hidden (obsolete)
Created
attachment 414420
[details]
Patch
Sihui Liu
Comment 10
2020-11-17 23:59:02 PST
Comment hidden (obsolete)
Created
attachment 414422
[details]
Patch
Sihui Liu
Comment 11
2020-11-18 00:06:47 PST
Comment hidden (obsolete)
Created
attachment 414423
[details]
Patch
Sihui Liu
Comment 12
2020-11-18 00:35:30 PST
Comment hidden (obsolete)
Created
attachment 414424
[details]
Patch
Sihui Liu
Comment 13
2020-11-18 00:50:19 PST
Comment hidden (obsolete)
Created
attachment 414426
[details]
Patch
Sihui Liu
Comment 14
2020-11-18 01:34:43 PST
Comment hidden (obsolete)
Created
attachment 414429
[details]
Patch
Sihui Liu
Comment 15
2020-11-18 09:28:43 PST
Created
attachment 414456
[details]
Patch
youenn fablet
Comment 16
2020-11-19 04:37:44 PST
Comment on
attachment 414456
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414456&action=review
> Source/WebCore/ChangeLog:14 > + Manually tested with MiniBrowser.
Can we write some layout test. We should be able to get mocks going on so that some events get fired in WTR.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:37 > +class SpeechRecognitionCaptureSource : public RealtimeMediaSource::Observer, public RealtimeMediaSource::AudioSampleObserver {
final?
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:56 > + m_source = makeUnique<SpeechRecognitionCaptureSource>([this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionCaptureSource::Buffer buffer) {
This will probably be called in a background thread so I am not sure this is good. Maybe you do not need to do processing in a background thread and could hop to the main thread here if latency for the speech recogniser is not an issue. Speech recognition layoutTests with mock audio in debug should be able to capture these kind of issues. Also, this processing might be done in the audio high priority thread where we try to not allocate memory. The typical way of doing things is with a ring buffer or AudioSampleDataSource.
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80 > + m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) {
s.const ...&/auto/
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:104 > return;
We also probably need some tests where we trigger several requests in parallel and verify we have the right events on the page.
Sihui Liu
Comment 17
2020-11-19 09:10:45 PST
(In reply to youenn fablet from
comment #16
)
> Comment on
attachment 414456
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=414456&action=review
> > > Source/WebCore/ChangeLog:14 > > + Manually tested with MiniBrowser. > > Can we write some layout test. > We should be able to get mocks going on so that some events get fired in WTR.
Sure. Will write.
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:37 > > +class SpeechRecognitionCaptureSource : public RealtimeMediaSource::Observer, public RealtimeMediaSource::AudioSampleObserver { > > final?
Will add.
> > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:56 > > + m_source = makeUnique<SpeechRecognitionCaptureSource>([this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionCaptureSource::Buffer buffer) { > > This will probably be called in a background thread so I am not sure this is > good. > Maybe you do not need to do processing in a background thread and could hop > to the main thread here if latency for the speech recogniser is not an issue. > Speech recognition layoutTests with mock audio in debug should be able to > capture these kind of issues. > > Also, this processing might be done in the audio high priority thread where > we try to not allocate memory. > The typical way of doing things is with a ring buffer or > AudioSampleDataSource.
I see, will try hopping back to main thread and using ring buffer.
> > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80 > > + m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) { > > s.const ...&/auto/
Okay.
> > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:104 > > return; > > We also probably need some tests where we trigger several requests in > parallel and verify we have the right events on the page.
Sure, will add.
Sihui Liu
Comment 18
2020-11-20 08:35:25 PST
Comment hidden (obsolete)
Created
attachment 414679
[details]
Patch
Sihui Liu
Comment 19
2020-11-20 08:38:19 PST
Comment hidden (obsolete)
Created
attachment 414681
[details]
Patch
Sihui Liu
Comment 20
2020-11-20 08:43:35 PST
Created
attachment 414684
[details]
Patch
Sihui Liu
Comment 21
2020-11-20 08:53:29 PST
Created
attachment 414687
[details]
Patch
youenn fablet
Comment 22
2020-11-20 09:16:22 PST
Comment on
attachment 414687
[details]
Patch Looks almost ready to go. Can you fix the m_errorCallback and check the weakThis creation? View in context:
https://bugs.webkit.org/attachment.cgi?id=414687&action=review
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:47 > + using ErrorCallback = Function<void(String)>;
String&&
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:78 > + setUpSource(*captureDevice);
s/setUp/setup/ ? Or maybe just setSource.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:85 > + , m_errorCallback(WTFMove(errorCallback))
I wound tend to merge m_startCallback and m_errorCallback in a m_updateStateCallback.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:127 > + auto dataSource = AudioSampleDataSource::create(description.sampleRate() * 2, *m_source);
We want to reduce it to less than 2 seconds but I guess that is fine.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:134 > + m_errorCallback("Unable to set the output format of data source");
Not great to have m_errorCallback called on either main thread or background thread. Maybe we should hop there.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:143 > + callOnMainThread([this, weakThis = makeWeakPtr(this), time, audioDescription, sampleCount] {
You need to make sure weak pointer is created at constructor time, either using WeakPtrFactoryInitialization::Eager or explicitly in constructor.
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:42 > + if (m_clientIdentifier) {
if (!m_clientIdentifier) return;
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:85 > + m_delegateCallback(SpeechRecognitionUpdate::createError(identifier, error));
It seems errorCallback and startCallback could be merged by them taking a SpeechRecognitionUpdate object. You might just need to add a check to nullptr m_source
> Source/WebCore/platform/cocoa/MediaUtilities.cpp:52 > + auto format = createAudioFormatDescription(description);
Maybe add a FIXME to check whether we can reuse the format for several sample buffers.
> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80 > + m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) {
auto&
Sihui Liu
Comment 23
2020-11-20 10:13:50 PST
Created
attachment 414693
[details]
Patch
Sihui Liu
Comment 24
2020-11-20 10:14:39 PST
Created
attachment 414694
[details]
Patch
Sihui Liu
Comment 25
2020-11-20 10:20:41 PST
(In reply to youenn fablet from
comment #22
)
> Comment on
attachment 414687
[details]
> Patch > > Looks almost ready to go. > Can you fix the m_errorCallback and check the weakThis creation? >
Updated patch to merge m_errorCallback and m_startCallback to m_updateStateCallback and initialize m_weakThis in constructor of SpeechRecognitionCaptureSourceImpl.
> View in context: >
https://bugs.webkit.org/attachment.cgi?id=414687&action=review
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:47 > > + using ErrorCallback = Function<void(String)>; > > String&& > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:78 > > + setUpSource(*captureDevice); > > s/setUp/setup/ ? > Or maybe just setSource.
Changed to setSource.
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:85 > > + , m_errorCallback(WTFMove(errorCallback)) > > I wound tend to merge m_startCallback and m_errorCallback in a > m_updateStateCallback. > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:127 > > + auto dataSource = AudioSampleDataSource::create(description.sampleRate() * 2, *m_source); > > We want to reduce it to less than 2 seconds but I guess that is fine.
Changed to 1.
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:134 > > + m_errorCallback("Unable to set the output format of data source"); > > Not great to have m_errorCallback called on either main thread or background > thread. > Maybe we should hop there.
Added SpeechRecognitionCaptureSourceImpl::invokeUpdateStateCallback to hop to main thread.
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:143 > > + callOnMainThread([this, weakThis = makeWeakPtr(this), time, audioDescription, sampleCount] { > > You need to make sure weak pointer is created at constructor time, either > using WeakPtrFactoryInitialization::Eager or explicitly in constructor.
Choose the second as RealtimeMediaSource::Observer, which SpeechRecognitionCaptureSourceImpl.cpp inherits, is not WeakPtrFactoryInitialization::Eager.
> > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:42 > > + if (m_clientIdentifier) { > > if (!m_clientIdentifier) > return; > > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:85 > > + m_delegateCallback(SpeechRecognitionUpdate::createError(identifier, error)); > > It seems errorCallback and startCallback could be merged by them taking a > SpeechRecognitionUpdate object. > You might just need to add a check to nullptr m_source
For distinguishing start and error event, Optional<String> seems enough.
> > > Source/WebCore/platform/cocoa/MediaUtilities.cpp:52 > > + auto format = createAudioFormatDescription(description); > > Maybe add a FIXME to check whether we can reuse the format for several > sample buffers.
Added.
> > > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80 > > + m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) { > > auto&
Okay.
Sihui Liu
Comment 26
2020-11-20 10:21:38 PST
Created
attachment 414695
[details]
Patch
youenn fablet
Comment 27
2020-11-20 11:54:35 PST
Comment on
attachment 414695
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414695&action=review
> Source/WebKit/ChangeLog:21 > +2020-11-20 Sihui Liu <
sihui_liu@apple.com
>
Double entry
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:34 > + m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(WTFMove(dataCallback), WTFMove(updateStateCallback));
Where do we set the source?
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:44 > + SpeechRecognitionCaptureSource() = default;
Do we need this constructor?
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:59 > + , m_weakThis(makeWeakPtr(this))
You might not need m_weakThis, just call makeWeakPtr(this)
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:87 > + , m_updateStateCallback(WTFMove(updateStateCallback))
It would probably be better to have SpeechRecognitionCaptureSourceImpl constructor take a Ref<RealtimeMediaSource>&&. Might use a static create method. That can be done as a follow-up.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:118 > + m_source->stop();
We should probably call removeObserver and removeAudioSampleObserver.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:141 > + m_dataSource = WTFMove(dataSource);
We need a lock here when setting m_dataSource since m_dataSource is called from a background thread. We should tryLock in the background thread. If failing, return early and wait for next time to get the lock. We can just lock in the main thread.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:151 > + m_dataSource->pullSamples(*data.list(), sampleCount, time.timeValue(), 0, AudioSampleDataSource::Copy);
Just there, we probably do not want to keep the lock in m_dataCallback, just in case.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:182 > + callOnMainThread([this, weakThis = m_weakThis, optionalError = errorMessage] {
This is potentially not safe as we do not isolate copy string. I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54 > + SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&);
Why do we need both constructors?
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 > + m_source = nullptr;
I would make updateStateCallback take a SpeechRecognitionUpdate. And write is as follow: if (!weakThis) return; m_delegateCallback(WTFMove(update)); if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) m_source = nullptr;
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:104 > + Vector<SpeechRecognitionResultData> resultDatas;
We might need to build a mock backend in the future given that the real backend requires a TCC prompt.
Sihui Liu
Comment 28
2020-11-20 12:07:10 PST
Comment on
attachment 414695
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414695&action=review
>> Source/WebKit/ChangeLog:21 >> +2020-11-20 Sihui Liu <
sihui_liu@apple.com
> > > Double entry
Will remove.
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:34 >> + m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(WTFMove(dataCallback), WTFMove(updateStateCallback)); > > Where do we set the source?
SpeechRecognizer?
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:44 >> + SpeechRecognitionCaptureSource() = default; > > Do we need this constructor?
Plan to subclass it for iOS implementation, not needed now.
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:59 >> + , m_weakThis(makeWeakPtr(this)) > > You might not need m_weakThis, just call makeWeakPtr(this)
Okay.
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:87 >> + , m_updateStateCallback(WTFMove(updateStateCallback)) > > It would probably be better to have SpeechRecognitionCaptureSourceImpl constructor take a Ref<RealtimeMediaSource>&&. > Might use a static create method. That can be done as a follow-up.
Okay, will do a follow-up
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:118 >> + m_source->stop(); > > We should probably call removeObserver and removeAudioSampleObserver.
Sure.
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:141 >> + m_dataSource = WTFMove(dataSource); > > We need a lock here when setting m_dataSource since m_dataSource is called from a background thread. > We should tryLock in the background thread. If failing, return early and wait for next time to get the lock. > We can just lock in the main thread.
Okay, will add lock.
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:151 >> + m_dataSource->pullSamples(*data.list(), sampleCount, time.timeValue(), 0, AudioSampleDataSource::Copy); > > Just there, we probably do not want to keep the lock in m_dataCallback, just in case.
Okay.
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:182 >> + callOnMainThread([this, weakThis = m_weakThis, optionalError = errorMessage] { > > This is potentially not safe as we do not isolate copy string. > I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable.
Will do isolated copy. In audioSamplesAvailable?
>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54 >> + SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&); > > Why do we need both constructors?
Supposedly we should specify device/source for iOS.
>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 >> + m_source = nullptr; > > I would make updateStateCallback take a SpeechRecognitionUpdate. > And write is as follow: > if (!weakThis) > return; > m_delegateCallback(WTFMove(update)); > if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) > m_source = nullptr;
Sure.
>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:104 >> + Vector<SpeechRecognitionResultData> resultDatas; > > We might need to build a mock backend in the future given that the real backend requires a TCC prompt.
Okay.
youenn fablet
Comment 29
2020-11-20 12:10:56 PST
> > This is potentially not safe as we do not isolate copy string. > > I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable. > > Will do isolated copy. > In audioSamplesAvailable?
Not needed, you might want to callOnMainThread in audioSamplesAvailable and create the string in main thread.
> >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54 > >> + SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&); > > > > Why do we need both constructors? > > Supposedly we should specify device/source for iOS.
Let's remove the one that is not used for now.
Sihui Liu
Comment 30
2020-11-20 12:35:17 PST
Comment on
attachment 414695
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414695&action=review
>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 >>> + m_source = nullptr; >> >> I would make updateStateCallback take a SpeechRecognitionUpdate. >> And write is as follow: >> if (!weakThis) >> return; >> m_delegateCallback(WTFMove(update)); >> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) >> m_source = nullptr; > > Sure.
hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. Do you think we should pass clientIdentifier to it?
youenn fablet
Comment 31
2020-11-20 12:41:15 PST
Comment on
attachment 414695
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414695&action=review
>>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54 >>> + SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&); >> >> Why do we need both constructors? > > Supposedly we should specify device/source for iOS.
Well, for now, let's just keep the first one, that will remove some duplicated code.
>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 >>>> + m_source = nullptr; >>> >>> I would make updateStateCallback take a SpeechRecognitionUpdate. >>> And write is as follow: >>> if (!weakThis) >>> return; >>> m_delegateCallback(WTFMove(update)); >>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) >>> m_source = nullptr; >> >> Sure. > > hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. > Do you think we should pass clientIdentifier to it?
Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier It seems SpeechRecognitionCaptureSourceIdentifier is not used. Otherwise, we could just pass a SpeechRecognitionUpdateType. Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging.
Sihui Liu
Comment 32
2020-11-20 12:57:01 PST
Comment on
attachment 414695
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414695&action=review
>>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 >>>>> + m_source = nullptr; >>>> >>>> I would make updateStateCallback take a SpeechRecognitionUpdate. >>>> And write is as follow: >>>> if (!weakThis) >>>> return; >>>> m_delegateCallback(WTFMove(update)); >>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) >>>> m_source = nullptr; >>> >>> Sure. >> >> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. >> Do you think we should pass clientIdentifier to it? > > Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier > It seems SpeechRecognitionCaptureSourceIdentifier is not used. > Otherwise, we could just pass a SpeechRecognitionUpdateType. > Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging.
SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources between processes. Okay, will pass a type instead.
youenn fablet
Comment 33
2020-11-20 12:59:46 PST
(In reply to Sihui Liu from
comment #32
)
> Comment on
attachment 414695
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=414695&action=review
> > >>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 > >>>>> + m_source = nullptr; > >>>> > >>>> I would make updateStateCallback take a SpeechRecognitionUpdate. > >>>> And write is as follow: > >>>> if (!weakThis) > >>>> return; > >>>> m_delegateCallback(WTFMove(update)); > >>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) > >>>> m_source = nullptr; > >>> > >>> Sure. > >> > >> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. > >> Do you think we should pass clientIdentifier to it? > > > > Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier > > It seems SpeechRecognitionCaptureSourceIdentifier is not used. > > Otherwise, we could just pass a SpeechRecognitionUpdateType. > > Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging. > > SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources > between processes. Okay, will pass a type instead.
I am not sure SpeechRecognitionCaptureSourceIdentifier is the right thing then. You might best use a RealtimeMediaSource which will actually be remote. The remote version will have an ID but this will be hidden from speech recognition.
Sihui Liu
Comment 34
2020-11-20 13:14:46 PST
(In reply to youenn fablet from
comment #33
)
> (In reply to Sihui Liu from
comment #32
) > > Comment on
attachment 414695
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=414695&action=review
> > > > >>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89 > > >>>>> + m_source = nullptr; > > >>>> > > >>>> I would make updateStateCallback take a SpeechRecognitionUpdate. > > >>>> And write is as follow: > > >>>> if (!weakThis) > > >>>> return; > > >>>> m_delegateCallback(WTFMove(update)); > > >>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End) > > >>>> m_source = nullptr; > > >>> > > >>> Sure. > > >> > > >> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate. > > >> Do you think we should pass clientIdentifier to it? > > > > > > Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier > > > It seems SpeechRecognitionCaptureSourceIdentifier is not used. > > > Otherwise, we could just pass a SpeechRecognitionUpdateType. > > > Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging. > > > > SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources > > between processes. Okay, will pass a type instead. > > I am not sure SpeechRecognitionCaptureSourceIdentifier is the right thing > then. > You might best use a RealtimeMediaSource which will actually be remote. > The remote version will have an ID but this will be hidden from speech > recognition.
ah I see, You are suggesting changing RealtimeMediaSource in SpeechRecognitionCaptureSourceImpl for remote case. I was thinking about having another RemoteSpeechRecognitionCaptureSource.
Sihui Liu
Comment 35
2020-11-20 14:06:24 PST
Created
attachment 414720
[details]
Patch
Sihui Liu
Comment 36
2020-11-20 14:09:26 PST
Created
attachment 414722
[details]
Patch
Sihui Liu
Comment 37
2020-11-20 14:17:53 PST
Created
attachment 414724
[details]
Patch
youenn fablet
Comment 38
2020-11-20 14:35:11 PST
r=me once bots are green. I think it might be good to do a follow-up patch that would merge SpeechRecognitionCaptureSourceImpl inside SpeechRecognitionCaptureSource. RealtimeMediaSource should be able to encapsulate whether the source is remote or not. For instance MediaStreamTrack is using a RealtimeMediaSource which is either local or remote without any issue. View in context:
https://bugs.webkit.org/attachment.cgi?id=414722&action=review
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:130 > + invokeStartOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format");
I would write it as: callOnMainThread([this, weakThis = makeWeakPtr(this)] { if (weakThis) m_startOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format"); }); This is not much longer and removes the need for invokeStartOrErrorCallback. I would tend to rename m_startOrErrorCallback to m_updateStateCallback.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:139 > + if (m_dataSourceLock.tryLock())
m_dataSourceLock is not unlocked. It might be best to write it like that: if (auto locker = tryHoldLock(m_dataSourceLock)). I am wondering whether the layout tests are actually running that code path.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:154 > + m_dataSourceLock.unlock();
This is fine like this, I tend to prefer holdLock though.
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:186 > + callOnMainThread([this, weakThis = makeWeakPtr(this), type, error = error.isolatedCopy()] {
This makes an unnecessary string copy here. We can prevent it by using String&& and doing WTFMove(error).isolatedCopy().
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:52 > + m_delegateCallback(SpeechRecognitionUpdate::createError(*m_clientIdentifier, error));
We might be doing some unnecessary count churning here when copying error.
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:67 > + auto dataCallback = [weakThis = makeWeakPtr(this)](const WTF::MediaTime& time, const PlatformAudioData& data, const AudioStreamDescription& description, size_t sampleCount) {
could use auto for the parameters
> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:82 > + auto updateStateCallback = [this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionUpdateType type, const String& errorMessage) {
could use auto for the parameters
Sihui Liu
Comment 39
2020-11-20 14:52:16 PST
Created
attachment 414733
[details]
Patch
Sihui Liu
Comment 40
2020-11-20 22:26:19 PST
Created
attachment 414754
[details]
Patch
Sihui Liu
Comment 41
2020-11-20 22:54:38 PST
Created
attachment 414756
[details]
Patch
Sihui Liu
Comment 42
2020-11-20 23:14:09 PST
(In reply to youenn fablet from
comment #38
)
> r=me once bots are green. > I think it might be good to do a follow-up patch that would merge > SpeechRecognitionCaptureSourceImpl inside SpeechRecognitionCaptureSource. > RealtimeMediaSource should be able to encapsulate whether the source is > remote or not. > For instance MediaStreamTrack is using a RealtimeMediaSource which is either > local or remote without any issue.
Sure.
> > View in context: >
https://bugs.webkit.org/attachment.cgi?id=414722&action=review
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:130 > > + invokeStartOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format"); > > I would write it as: > callOnMainThread([this, weakThis = makeWeakPtr(this)] { > if (weakThis) > m_startOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable > to set input format"); > }); > This is not much longer and removes the need for invokeStartOrErrorCallback. > I would tend to rename m_startOrErrorCallback to m_updateStateCallback.
Do you mean
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:139 > > + if (m_dataSourceLock.tryLock()) > > m_dataSourceLock is not unlocked. > It might be best to write it like that: if (auto locker = > tryHoldLock(m_dataSourceLock)). > I am wondering whether the layout tests are actually running that code path.
Right, the layout tests did catch that (not releasing lock). Will change to use locker.
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:154 > > + m_dataSourceLock.unlock(); > > This is fine like this, I tend to prefer holdLock though.
Changed.
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:186 > > + callOnMainThread([this, weakThis = makeWeakPtr(this), type, error = error.isolatedCopy()] { > > This makes an unnecessary string copy here. > We can prevent it by using String&& and doing WTFMove(error).isolatedCopy().
I've updated the patch to make the callback take a SpeechRecognitionUpdate instead of string as you suggested.
> > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:52 > > + m_delegateCallback(SpeechRecognitionUpdate::createError(*m_clientIdentifier, error)); > > We might be doing some unnecessary count churning here when copying error. > > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:67 > > + auto dataCallback = [weakThis = makeWeakPtr(this)](const WTF::MediaTime& time, const PlatformAudioData& data, const AudioStreamDescription& description, size_t sampleCount) { > > could use auto for the parameters
Sure.
> > > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:82 > > + auto updateStateCallback = [this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionUpdateType type, const String& errorMessage) { > > could use auto for the parameters
Sure.
youenn fablet
Comment 43
2020-11-21 01:58:32 PST
Comment on
attachment 414756
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=414756&action=review
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91 > + invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format");
Let's use an explicit callOnMainThread here so that we do not need to go in the business of using isolatedCopy. We could add to invokeStateUpdateCallback an ASSERT(isMainThread());
> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:96 > + invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set output format");
Ditto
Sihui Liu
Comment 44
2020-11-21 18:43:09 PST
(In reply to youenn fablet from
comment #43
)
> Comment on
attachment 414756
[details]
> Patch
Thanks for the review! Will update.
> > View in context: >
https://bugs.webkit.org/attachment.cgi?id=414756&action=review
> > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91 > > + invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format"); > > Let's use an explicit callOnMainThread here so that we do not need to go in > the business of using isolatedCopy. > We could add to invokeStateUpdateCallback an ASSERT(isMainThread()); > > > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:96 > > + invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set output format"); > > Ditto
Sihui Liu
Comment 45
2020-11-21 18:44:48 PST
Created
attachment 414779
[details]
Patch
EWS
Comment 46
2020-11-21 21:51:33 PST
Committed
r270158
: <
https://trac.webkit.org/changeset/270158
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 414779
[details]
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug