Bug 218855 - Implement audio capture for SpeechRecognition on macOS
Summary: Implement audio capture for SpeechRecognition on macOS
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Accessibility (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Sihui Liu
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-11-12 09:55 PST by Sihui Liu
Modified: 2020-11-21 21:51 PST (History)
14 users (show)

See Also:


Attachments
WIP (19.82 KB, patch)
2020-11-12 09:56 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (43.67 KB, patch)
2020-11-16 18:05 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (45.23 KB, patch)
2020-11-17 00:05 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (43.31 KB, patch)
2020-11-17 23:21 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (43.43 KB, patch)
2020-11-17 23:35 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (43.69 KB, patch)
2020-11-17 23:54 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (43.72 KB, patch)
2020-11-17 23:59 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (43.72 KB, patch)
2020-11-18 00:06 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (43.72 KB, patch)
2020-11-18 00:35 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (43.70 KB, patch)
2020-11-18 00:50 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (44.26 KB, patch)
2020-11-18 01:34 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (44.38 KB, patch)
2020-11-18 09:28 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (67.23 KB, patch)
2020-11-20 08:35 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (67.24 KB, patch)
2020-11-20 08:38 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (67.13 KB, patch)
2020-11-20 08:43 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (67.24 KB, patch)
2020-11-20 08:53 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (69.84 KB, patch)
2020-11-20 10:13 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (69.84 KB, patch)
2020-11-20 10:14 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (69.84 KB, patch)
2020-11-20 10:21 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (190.44 KB, patch)
2020-11-20 14:06 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (69.44 KB, patch)
2020-11-20 14:09 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (69.44 KB, patch)
2020-11-20 14:17 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (69.50 KB, patch)
2020-11-20 14:52 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (73.83 KB, patch)
2020-11-20 22:26 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (73.94 KB, patch)
2020-11-20 22:54 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (70.04 KB, patch)
2020-11-21 18:44 PST, Sihui Liu
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sihui Liu 2020-11-12 09:55:05 PST
...
Comment 1 Radar WebKit Bug Importer 2020-11-12 09:55:19 PST
<rdar://problem/71331001>
Comment 2 Sihui Liu 2020-11-12 09:56:51 PST
Created attachment 413947 [details]
WIP
Comment 3 Sihui Liu 2020-11-16 18:05:32 PST
Created attachment 414296 [details]
Patch
Comment 4 Sihui Liu 2020-11-17 00:05:07 PST
Created attachment 414314 [details]
Patch
Comment 5 youenn fablet 2020-11-17 11:46:30 PST
Comment on attachment 414314 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414314&action=review

> Source/WebCore/Modules/speech/SpeechRecognizerDelegate.h:39
> +    virtual CaptureSourceOrError createCaptureSource() = 0;

It is a bit weird to have this callback here.
It might be simpler to focus this interface on reporting state only.

> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:54
> +    auto sourceOrError = m_delegate->createCaptureSource();

Instead of doing that, it seems SpeechRecognizer could get a Ref<RealtimeMediaSource> as input.
If needed, we could add a completion handler to start method as well.

> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:73
> +    m_delegate->didStop();

Should we add a completion handler to stop instead?

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:101
> +        startNextGrantedRequest();

These changes seem somehow independent of the audio capture part.
Maybe we should split them.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:118
> +    m_grantedRequests.append(makeWeakPtr(request));

I would have expected that a granted request would override the previous one, i.e stop it I guess.
Is this behavior shared with Chrome and Firefox.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:137
> +    m_recognizer->start();

Is it needed to recreate a recogniser for every request?

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:168
> +    }

I would try to search for the default device and use that one, otherwise pick the first enabled device.
This does not require a Vector though.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:175
> +    return CaptureSourceOrError { "createCaptureSource is not implemented" };

on iOS, we might need to send IPC to WebProcess to create a realtime media source there instead of in UIProcess.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:186
> +void SpeechRecognitionServer::didStartCapturingAudio()

Maybe we could simplify the API so that there is a sendUpdate delegate taking a WebCore::SpeechRecognitionUpdateType?
That would reduce the delegate API a bit.
Comment 6 Sihui Liu 2020-11-17 12:52:39 PST
Comment on attachment 414314 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414314&action=review

>> Source/WebCore/Modules/speech/SpeechRecognizerDelegate.h:39
>> +    virtual CaptureSourceOrError createCaptureSource() = 0;
> 
> It is a bit weird to have this callback here.
> It might be simpler to focus this interface on reporting state only.

Okay, will remove.

>> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:54
>> +    auto sourceOrError = m_delegate->createCaptureSource();
> 
> Instead of doing that, it seems SpeechRecognizer could get a Ref<RealtimeMediaSource> as input.
> If needed, we could add a completion handler to start method as well.

Okay, will add Ref<RealtimeMediaSource> as input.

>> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:73
>> +    m_delegate->didStop();
> 
> Should we add a completion handler to stop instead?

Probably not, because completion handler of stop may not be called if abort() is called after stop().

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:101
>> +        startNextGrantedRequest();
> 
> These changes seem somehow independent of the audio capture part.
> Maybe we should split them.

That sounds good. Added this here as I figured out what SpeechRecognitionServer should do after creating SpeechRecognizer.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:118
>> +    m_grantedRequests.append(makeWeakPtr(request));
> 
> I would have expected that a granted request would override the previous one, i.e stop it I guess.
> Is this behavior shared with Chrome and Firefox.

You are right; tried on Chrome and it aborts ongoing request when another request is granted. Will change this.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:137
>> +    m_recognizer->start();
> 
> Is it needed to recreate a recogniser for every request?

Not really I guess. It is just easier to distinguish between abort and stop. (stop: wait until recognizer acknowledges it stops; abort: throw away everything immediately).
And technically we will need different recognizers for requests of different languages, which is not needed for now.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:168
>> +    }
> 
> I would try to search for the default device and use that one, otherwise pick the first enabled device.
> This does not require a Vector though.

Sure, will change.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:175
>> +    return CaptureSourceOrError { "createCaptureSource is not implemented" };
> 
> on iOS, we might need to send IPC to WebProcess to create a realtime media source there instead of in UIProcess.

Yes, seems we need to create something like RemoteRealtimeMediaSource in UI process. Will try that in another patch.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:186
>> +void SpeechRecognitionServer::didStartCapturingAudio()
> 
> Maybe we could simplify the API so that there is a sendUpdate delegate taking a WebCore::SpeechRecognitionUpdateType?
> That would reduce the delegate API a bit.

You mean simplifying interfaces of SpeechRecognizerDelegate? Okay will try.
Comment 7 Sihui Liu 2020-11-17 23:21:16 PST Comment hidden (obsolete)
Comment 8 Sihui Liu 2020-11-17 23:35:03 PST Comment hidden (obsolete)
Comment 9 Sihui Liu 2020-11-17 23:54:04 PST Comment hidden (obsolete)
Comment 10 Sihui Liu 2020-11-17 23:59:02 PST Comment hidden (obsolete)
Comment 11 Sihui Liu 2020-11-18 00:06:47 PST Comment hidden (obsolete)
Comment 12 Sihui Liu 2020-11-18 00:35:30 PST Comment hidden (obsolete)
Comment 13 Sihui Liu 2020-11-18 00:50:19 PST Comment hidden (obsolete)
Comment 14 Sihui Liu 2020-11-18 01:34:43 PST Comment hidden (obsolete)
Comment 15 Sihui Liu 2020-11-18 09:28:43 PST
Created attachment 414456 [details]
Patch
Comment 16 youenn fablet 2020-11-19 04:37:44 PST
Comment on attachment 414456 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414456&action=review

> Source/WebCore/ChangeLog:14
> +        Manually tested with MiniBrowser.

Can we write some layout test.
We should be able to get mocks going on so that some events get fired in WTR.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:37
> +class SpeechRecognitionCaptureSource : public RealtimeMediaSource::Observer, public RealtimeMediaSource::AudioSampleObserver {

final?

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:56
> +    m_source = makeUnique<SpeechRecognitionCaptureSource>([this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionCaptureSource::Buffer buffer) {

This will probably be called in a background thread so I am not sure this is good.
Maybe you do not need to do processing in a background thread and could hop to the main thread here if latency for the speech recogniser is not an issue.
Speech recognition layoutTests with mock audio in debug should be able to capture these kind of issues.

Also, this processing might be done in the audio high priority thread where we try to not allocate memory.
The typical way of doing things is with a ring buffer or AudioSampleDataSource.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80
> +        m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) {

s.const ...&/auto/

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:104
>          return;

We also probably need some tests where we trigger several requests in parallel and verify we have the right events on the page.
Comment 17 Sihui Liu 2020-11-19 09:10:45 PST
(In reply to youenn fablet from comment #16)
> Comment on attachment 414456 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=414456&action=review
> 
> > Source/WebCore/ChangeLog:14
> > +        Manually tested with MiniBrowser.
> 
> Can we write some layout test.
> We should be able to get mocks going on so that some events get fired in WTR.

Sure. Will write.

> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:37
> > +class SpeechRecognitionCaptureSource : public RealtimeMediaSource::Observer, public RealtimeMediaSource::AudioSampleObserver {
> 
> final?

Will add.

> 
> > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:56
> > +    m_source = makeUnique<SpeechRecognitionCaptureSource>([this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionCaptureSource::Buffer buffer) {
> 
> This will probably be called in a background thread so I am not sure this is
> good.
> Maybe you do not need to do processing in a background thread and could hop
> to the main thread here if latency for the speech recogniser is not an issue.
> Speech recognition layoutTests with mock audio in debug should be able to
> capture these kind of issues.
> 
> Also, this processing might be done in the audio high priority thread where
> we try to not allocate memory.
> The typical way of doing things is with a ring buffer or
> AudioSampleDataSource.

I see, will try hopping back to main thread and using ring buffer.
> 
> > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80
> > +        m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) {
> 
> s.const ...&/auto/

Okay.

> 
> > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:104
> >          return;
> 
> We also probably need some tests where we trigger several requests in
> parallel and verify we have the right events on the page.

Sure, will add.
Comment 18 Sihui Liu 2020-11-20 08:35:25 PST Comment hidden (obsolete)
Comment 19 Sihui Liu 2020-11-20 08:38:19 PST Comment hidden (obsolete)
Comment 20 Sihui Liu 2020-11-20 08:43:35 PST
Created attachment 414684 [details]
Patch
Comment 21 Sihui Liu 2020-11-20 08:53:29 PST
Created attachment 414687 [details]
Patch
Comment 22 youenn fablet 2020-11-20 09:16:22 PST
Comment on attachment 414687 [details]
Patch

Looks almost ready to go.
Can you fix the m_errorCallback and check the weakThis creation?

View in context: https://bugs.webkit.org/attachment.cgi?id=414687&action=review

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:47
> +    using ErrorCallback = Function<void(String)>;

String&&

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:78
> +    setUpSource(*captureDevice);

s/setUp/setup/ ?
Or maybe just setSource.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:85
> +    , m_errorCallback(WTFMove(errorCallback))

I wound tend to merge m_startCallback and m_errorCallback in a m_updateStateCallback.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:127
> +        auto dataSource = AudioSampleDataSource::create(description.sampleRate() * 2, *m_source);

We want to reduce it to less than 2 seconds but I guess that is fine.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:134
> +            m_errorCallback("Unable to set the output format of data source");

Not great to have m_errorCallback called on either main thread or background thread.
Maybe we should hop there.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:143
> +    callOnMainThread([this, weakThis = makeWeakPtr(this), time, audioDescription, sampleCount] {

You need to make sure weak pointer is created at constructor time, either using WeakPtrFactoryInitialization::Eager or explicitly in constructor.

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:42
> +    if (m_clientIdentifier) {

if (!m_clientIdentifier)
    return;

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:85
> +        m_delegateCallback(SpeechRecognitionUpdate::createError(identifier, error));

It seems errorCallback and startCallback could be merged by them taking a SpeechRecognitionUpdate object.
You might just need to add a check to nullptr m_source

> Source/WebCore/platform/cocoa/MediaUtilities.cpp:52
> +    auto format = createAudioFormatDescription(description);

Maybe add a FIXME to check whether we can reuse the format for several sample buffers.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80
> +        m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) {

auto&
Comment 23 Sihui Liu 2020-11-20 10:13:50 PST
Created attachment 414693 [details]
Patch
Comment 24 Sihui Liu 2020-11-20 10:14:39 PST
Created attachment 414694 [details]
Patch
Comment 25 Sihui Liu 2020-11-20 10:20:41 PST
(In reply to youenn fablet from comment #22)
> Comment on attachment 414687 [details]
> Patch
> 
> Looks almost ready to go.
> Can you fix the m_errorCallback and check the weakThis creation?
> 

Updated patch to merge m_errorCallback and m_startCallback to m_updateStateCallback and initialize m_weakThis in constructor of SpeechRecognitionCaptureSourceImpl.

> View in context:
> https://bugs.webkit.org/attachment.cgi?id=414687&action=review
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:47
> > +    using ErrorCallback = Function<void(String)>;
> 
> String&&
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:78
> > +    setUpSource(*captureDevice);
> 
> s/setUp/setup/ ?
> Or maybe just setSource.

Changed to setSource.

> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:85
> > +    , m_errorCallback(WTFMove(errorCallback))
> 
> I wound tend to merge m_startCallback and m_errorCallback in a
> m_updateStateCallback.
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:127
> > +        auto dataSource = AudioSampleDataSource::create(description.sampleRate() * 2, *m_source);
> 
> We want to reduce it to less than 2 seconds but I guess that is fine.

Changed to 1.

> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:134
> > +            m_errorCallback("Unable to set the output format of data source");
> 
> Not great to have m_errorCallback called on either main thread or background
> thread.
> Maybe we should hop there.

Added SpeechRecognitionCaptureSourceImpl::invokeUpdateStateCallback to hop to main thread.
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:143
> > +    callOnMainThread([this, weakThis = makeWeakPtr(this), time, audioDescription, sampleCount] {
> 
> You need to make sure weak pointer is created at constructor time, either
> using WeakPtrFactoryInitialization::Eager or explicitly in constructor.

Choose the second as RealtimeMediaSource::Observer, which SpeechRecognitionCaptureSourceImpl.cpp inherits, is not WeakPtrFactoryInitialization::Eager.

> 
> > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:42
> > +    if (m_clientIdentifier) {
> 
> if (!m_clientIdentifier)
>     return;
> 
> > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:85
> > +        m_delegateCallback(SpeechRecognitionUpdate::createError(identifier, error));
> 
> It seems errorCallback and startCallback could be merged by them taking a
> SpeechRecognitionUpdate object.
> You might just need to add a check to nullptr m_source

For distinguishing start and error event, Optional<String> seems enough.

> 
> > Source/WebCore/platform/cocoa/MediaUtilities.cpp:52
> > +    auto format = createAudioFormatDescription(description);
> 
> Maybe add a FIXME to check whether we can reuse the format for several
> sample buffers.

Added.

> 
> > Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:80
> > +        m_recognizer = makeUnique<SpeechRecognizer>([this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) {
> 
> auto&

Okay.
Comment 26 Sihui Liu 2020-11-20 10:21:38 PST
Created attachment 414695 [details]
Patch
Comment 27 youenn fablet 2020-11-20 11:54:35 PST
Comment on attachment 414695 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review

> Source/WebKit/ChangeLog:21
> +2020-11-20  Sihui Liu  <sihui_liu@apple.com>

Double entry

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:34
> +    m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(WTFMove(dataCallback), WTFMove(updateStateCallback));

Where do we set the source?

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:44
> +    SpeechRecognitionCaptureSource() = default;

Do we need this constructor?

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:59
> +    , m_weakThis(makeWeakPtr(this))

You might not need m_weakThis, just call makeWeakPtr(this)

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:87
> +    , m_updateStateCallback(WTFMove(updateStateCallback))

It would probably be better to have SpeechRecognitionCaptureSourceImpl constructor take a Ref<RealtimeMediaSource>&&.
Might use a static create method. That can be done as a follow-up.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:118
> +        m_source->stop();

We should probably call removeObserver and removeAudioSampleObserver.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:141
> +        m_dataSource = WTFMove(dataSource);

We need a lock here when setting m_dataSource since m_dataSource is called from a background thread.
We should tryLock in the background thread. If failing, return early and wait for next time to get the lock.
We can just lock in the main thread.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:151
> +        m_dataSource->pullSamples(*data.list(), sampleCount, time.timeValue(), 0, AudioSampleDataSource::Copy);

Just there, we probably do not want to keep the lock in m_dataCallback, just in case.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:182
> +    callOnMainThread([this, weakThis = m_weakThis, optionalError = errorMessage] {

This is potentially not safe as we do not isolate copy string.
I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54
> +    SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&);

Why do we need both constructors?

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89
> +        m_source = nullptr;

I would make updateStateCallback take a SpeechRecognitionUpdate.
And write is as follow:
if (!weakThis)
   return;
m_delegateCallback(WTFMove(update));
if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End)
    m_source = nullptr;

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:104
> +        Vector<SpeechRecognitionResultData> resultDatas;

We might need to build a mock backend in the future given that the real backend requires a TCC prompt.
Comment 28 Sihui Liu 2020-11-20 12:07:10 PST
Comment on attachment 414695 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review

>> Source/WebKit/ChangeLog:21
>> +2020-11-20  Sihui Liu  <sihui_liu@apple.com>
> 
> Double entry

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.cpp:34
>> +    m_impl = makeUnique<SpeechRecognitionCaptureSourceImpl>(WTFMove(dataCallback), WTFMove(updateStateCallback));
> 
> Where do we set the source?

SpeechRecognizer?

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSource.h:44
>> +    SpeechRecognitionCaptureSource() = default;
> 
> Do we need this constructor?

Plan to subclass it for iOS implementation, not needed now.

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:59
>> +    , m_weakThis(makeWeakPtr(this))
> 
> You might not need m_weakThis, just call makeWeakPtr(this)

Okay.

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:87
>> +    , m_updateStateCallback(WTFMove(updateStateCallback))
> 
> It would probably be better to have SpeechRecognitionCaptureSourceImpl constructor take a Ref<RealtimeMediaSource>&&.
> Might use a static create method. That can be done as a follow-up.

Okay, will do a follow-up

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:118
>> +        m_source->stop();
> 
> We should probably call removeObserver and removeAudioSampleObserver.

Sure.

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:141
>> +        m_dataSource = WTFMove(dataSource);
> 
> We need a lock here when setting m_dataSource since m_dataSource is called from a background thread.
> We should tryLock in the background thread. If failing, return early and wait for next time to get the lock.
> We can just lock in the main thread.

Okay, will add lock.

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:151
>> +        m_dataSource->pullSamples(*data.list(), sampleCount, time.timeValue(), 0, AudioSampleDataSource::Copy);
> 
> Just there, we probably do not want to keep the lock in m_dataCallback, just in case.

Okay.

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:182
>> +    callOnMainThread([this, weakThis = m_weakThis, optionalError = errorMessage] {
> 
> This is potentially not safe as we do not isolate copy string.
> I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable.

Will do isolated copy. 
In audioSamplesAvailable?

>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54
>> +    SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&);
> 
> Why do we need both constructors?

Supposedly we should specify device/source for iOS.

>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89
>> +        m_source = nullptr;
> 
> I would make updateStateCallback take a SpeechRecognitionUpdate.
> And write is as follow:
> if (!weakThis)
>    return;
> m_delegateCallback(WTFMove(update));
> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End)
>     m_source = nullptr;

Sure.

>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:104
>> +        Vector<SpeechRecognitionResultData> resultDatas;
> 
> We might need to build a mock backend in the future given that the real backend requires a TCC prompt.

Okay.
Comment 29 youenn fablet 2020-11-20 12:10:56 PST
> > This is potentially not safe as we do not isolate copy string.
> > I think it might just be simpler to call callOnMainThread directly in audioSamplesAvailable.
> 
> Will do isolated copy. 
> In audioSamplesAvailable?

Not needed, you might want to callOnMainThread in audioSamplesAvailable and create the string in main thread.

> >> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54
> >> +    SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&);
> > 
> > Why do we need both constructors?
> 
> Supposedly we should specify device/source for iOS.

Let's remove the one that is not used for now.
Comment 30 Sihui Liu 2020-11-20 12:35:17 PST
Comment on attachment 414695 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review

>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89
>>> +        m_source = nullptr;
>> 
>> I would make updateStateCallback take a SpeechRecognitionUpdate.
>> And write is as follow:
>> if (!weakThis)
>>    return;
>> m_delegateCallback(WTFMove(update));
>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End)
>>     m_source = nullptr;
> 
> Sure.

hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate.
Do you think we should pass clientIdentifier to it?
Comment 31 youenn fablet 2020-11-20 12:41:15 PST
Comment on attachment 414695 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review

>>> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.h:54
>>> +    SpeechRecognitionCaptureSourceImpl(SpeechRecognitionCaptureSourceIdentifier, const CaptureDevice&, DataCallback&&, UpdateStateCallback&&);
>> 
>> Why do we need both constructors?
> 
> Supposedly we should specify device/source for iOS.

Well, for now, let's just keep the first one, that will remove some duplicated code.

>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89
>>>> +        m_source = nullptr;
>>> 
>>> I would make updateStateCallback take a SpeechRecognitionUpdate.
>>> And write is as follow:
>>> if (!weakThis)
>>>    return;
>>> m_delegateCallback(WTFMove(update));
>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End)
>>>     m_source = nullptr;
>> 
>> Sure.
> 
> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate.
> Do you think we should pass clientIdentifier to it?

Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier
It seems SpeechRecognitionCaptureSourceIdentifier is not used.
Otherwise, we could just pass a SpeechRecognitionUpdateType.
Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging.
Comment 32 Sihui Liu 2020-11-20 12:57:01 PST
Comment on attachment 414695 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414695&action=review

>>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89
>>>>> +        m_source = nullptr;
>>>> 
>>>> I would make updateStateCallback take a SpeechRecognitionUpdate.
>>>> And write is as follow:
>>>> if (!weakThis)
>>>>    return;
>>>> m_delegateCallback(WTFMove(update));
>>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End)
>>>>     m_source = nullptr;
>>> 
>>> Sure.
>> 
>> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate.
>> Do you think we should pass clientIdentifier to it?
> 
> Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier
> It seems SpeechRecognitionCaptureSourceIdentifier is not used.
> Otherwise, we could just pass a SpeechRecognitionUpdateType.
> Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging.

SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources between processes. Okay, will pass a type instead.
Comment 33 youenn fablet 2020-11-20 12:59:46 PST
(In reply to Sihui Liu from comment #32)
> Comment on attachment 414695 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=414695&action=review
> 
> >>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89
> >>>>> +        m_source = nullptr;
> >>>> 
> >>>> I would make updateStateCallback take a SpeechRecognitionUpdate.
> >>>> And write is as follow:
> >>>> if (!weakThis)
> >>>>    return;
> >>>> m_delegateCallback(WTFMove(update));
> >>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End)
> >>>>     m_source = nullptr;
> >>> 
> >>> Sure.
> >> 
> >> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate.
> >> Do you think we should pass clientIdentifier to it?
> > 
> > Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier
> > It seems SpeechRecognitionCaptureSourceIdentifier is not used.
> > Otherwise, we could just pass a SpeechRecognitionUpdateType.
> > Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging.
> 
> SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources
> between processes. Okay, will pass a type instead.

I am not sure SpeechRecognitionCaptureSourceIdentifier is the right thing then.
You might best use a RealtimeMediaSource which will actually be remote.
The remote version will have an ID but this will be hidden from speech recognition.
Comment 34 Sihui Liu 2020-11-20 13:14:46 PST
(In reply to youenn fablet from comment #33)
> (In reply to Sihui Liu from comment #32)
> > Comment on attachment 414695 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=414695&action=review
> > 
> > >>>>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:89
> > >>>>> +        m_source = nullptr;
> > >>>> 
> > >>>> I would make updateStateCallback take a SpeechRecognitionUpdate.
> > >>>> And write is as follow:
> > >>>> if (!weakThis)
> > >>>>    return;
> > >>>> m_delegateCallback(WTFMove(update));
> > >>>> if if (type == SpeechRecognitionUpdateType::Error || type == SpeechRecognitionUpdateType::End)
> > >>>>     m_source = nullptr;
> > >>> 
> > >>> Sure.
> > >> 
> > >> hmm SpeechReocognitionCaptureSource does not know the clientIdentifier so it cannot create a SpeechRecognitionUpdate.
> > >> Do you think we should pass clientIdentifier to it?
> > > 
> > > Ah, I thought the clientIdentifier was the same as SpeechRecognitionCaptureSourceIdentifier
> > > It seems SpeechRecognitionCaptureSourceIdentifier is not used.
> > > Otherwise, we could just pass a SpeechRecognitionUpdateType.
> > > Maybe this is ok to not pass a precise error string to the web page, we usually can get that information through release logging.
> > 
> > SpeechRecognitionCaptureSourceIdentifier will be used for syncing sources
> > between processes. Okay, will pass a type instead.
> 
> I am not sure SpeechRecognitionCaptureSourceIdentifier is the right thing
> then.
> You might best use a RealtimeMediaSource which will actually be remote.
> The remote version will have an ID but this will be hidden from speech
> recognition.

ah I see, You are suggesting changing RealtimeMediaSource in SpeechRecognitionCaptureSourceImpl for remote case. I was thinking about having another RemoteSpeechRecognitionCaptureSource.
Comment 35 Sihui Liu 2020-11-20 14:06:24 PST
Created attachment 414720 [details]
Patch
Comment 36 Sihui Liu 2020-11-20 14:09:26 PST
Created attachment 414722 [details]
Patch
Comment 37 Sihui Liu 2020-11-20 14:17:53 PST
Created attachment 414724 [details]
Patch
Comment 38 youenn fablet 2020-11-20 14:35:11 PST
r=me once bots are green.
I think it might be good to do a follow-up patch that would merge SpeechRecognitionCaptureSourceImpl inside SpeechRecognitionCaptureSource.
RealtimeMediaSource should be able to encapsulate whether the source is remote or not.
For instance MediaStreamTrack is using a RealtimeMediaSource which is either local or remote without any issue. 

View in context: https://bugs.webkit.org/attachment.cgi?id=414722&action=review

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:130
> +            invokeStartOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format");

I would write it as:
callOnMainThread([this, weakThis = makeWeakPtr(this)] {
     if (weakThis)
        m_startOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format");
});
This is not much longer and removes the need for invokeStartOrErrorCallback.
I would tend to rename m_startOrErrorCallback to m_updateStateCallback.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:139
> +        if (m_dataSourceLock.tryLock())

m_dataSourceLock is not unlocked.
It might be best to write it like that: if (auto locker = tryHoldLock(m_dataSourceLock)).
I am wondering whether the layout tests are actually running that code path.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:154
> +        m_dataSourceLock.unlock();

This is fine like this, I tend to prefer holdLock though.

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:186
> +    callOnMainThread([this, weakThis = makeWeakPtr(this), type, error = error.isolatedCopy()] {

This makes an unnecessary string copy here.
We can prevent it by using String&& and doing WTFMove(error).isolatedCopy().

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:52
> +    m_delegateCallback(SpeechRecognitionUpdate::createError(*m_clientIdentifier, error));

We might be doing some unnecessary count churning here when copying error.

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:67
> +    auto dataCallback = [weakThis = makeWeakPtr(this)](const WTF::MediaTime& time, const PlatformAudioData& data, const AudioStreamDescription& description, size_t sampleCount) {

could use auto for the parameters

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:82
> +    auto updateStateCallback = [this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionUpdateType type, const String& errorMessage) {

could use auto for the parameters
Comment 39 Sihui Liu 2020-11-20 14:52:16 PST
Created attachment 414733 [details]
Patch
Comment 40 Sihui Liu 2020-11-20 22:26:19 PST
Created attachment 414754 [details]
Patch
Comment 41 Sihui Liu 2020-11-20 22:54:38 PST
Created attachment 414756 [details]
Patch
Comment 42 Sihui Liu 2020-11-20 23:14:09 PST
(In reply to youenn fablet from comment #38)
> r=me once bots are green.
> I think it might be good to do a follow-up patch that would merge
> SpeechRecognitionCaptureSourceImpl inside SpeechRecognitionCaptureSource.
> RealtimeMediaSource should be able to encapsulate whether the source is
> remote or not.
> For instance MediaStreamTrack is using a RealtimeMediaSource which is either
> local or remote without any issue. 

Sure.

> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=414722&action=review
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:130
> > +            invokeStartOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format");
> 
> I would write it as:
> callOnMainThread([this, weakThis = makeWeakPtr(this)] {
>      if (weakThis)
>         m_startOrErrorCallback(SpeechRecognitionUpdateType::Error, "Unable
> to set input format");
> });
> This is not much longer and removes the need for invokeStartOrErrorCallback.
> I would tend to rename m_startOrErrorCallback to m_updateStateCallback.

Do you mean 
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:139
> > +        if (m_dataSourceLock.tryLock())
> 
> m_dataSourceLock is not unlocked.
> It might be best to write it like that: if (auto locker =
> tryHoldLock(m_dataSourceLock)).
> I am wondering whether the layout tests are actually running that code path.

Right, the layout tests did catch that (not releasing lock). Will change to use locker.

> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:154
> > +        m_dataSourceLock.unlock();
> 
> This is fine like this, I tend to prefer holdLock though.

Changed.

> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:186
> > +    callOnMainThread([this, weakThis = makeWeakPtr(this), type, error = error.isolatedCopy()] {
> 
> This makes an unnecessary string copy here.
> We can prevent it by using String&& and doing WTFMove(error).isolatedCopy().

I've updated the patch to make the callback take a SpeechRecognitionUpdate instead of string as you suggested.

> 
> > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:52
> > +    m_delegateCallback(SpeechRecognitionUpdate::createError(*m_clientIdentifier, error));
> 
> We might be doing some unnecessary count churning here when copying error.
> 
> > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:67
> > +    auto dataCallback = [weakThis = makeWeakPtr(this)](const WTF::MediaTime& time, const PlatformAudioData& data, const AudioStreamDescription& description, size_t sampleCount) {
> 
> could use auto for the parameters

Sure.

> 
> > Source/WebCore/Modules/speech/SpeechRecognizer.cpp:82
> > +    auto updateStateCallback = [this, weakThis = makeWeakPtr(this), identifier](SpeechRecognitionUpdateType type, const String& errorMessage) {
> 
> could use auto for the parameters

Sure.
Comment 43 youenn fablet 2020-11-21 01:58:32 PST
Comment on attachment 414756 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414756&action=review

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91
> +            invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format");

Let's use an explicit callOnMainThread here so that we do not need to go in the business of using isolatedCopy.
We could add to invokeStateUpdateCallback an ASSERT(isMainThread());

> Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:96
> +            invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set output format");

Ditto
Comment 44 Sihui Liu 2020-11-21 18:43:09 PST
(In reply to youenn fablet from comment #43)
> Comment on attachment 414756 [details]
> Patch
Thanks for the review! Will update.

> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=414756&action=review
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:91
> > +            invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set input format");
> 
> Let's use an explicit callOnMainThread here so that we do not need to go in
> the business of using isolatedCopy.
> We could add to invokeStateUpdateCallback an ASSERT(isMainThread());
> 
> > Source/WebCore/Modules/speech/SpeechRecognitionCaptureSourceImpl.cpp:96
> > +            invokeStateUpdateCallback(SpeechRecognitionUpdateType::Error, "Unable to set output format");
> 
> Ditto
Comment 45 Sihui Liu 2020-11-21 18:44:48 PST
Created attachment 414779 [details]
Patch
Comment 46 EWS 2020-11-21 21:51:33 PST
Committed r270158: <https://trac.webkit.org/changeset/270158>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 414779 [details].