Bug 219459 - Implement recognizer for SpeechRecognition
Summary: Implement recognizer for SpeechRecognition
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Accessibility (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Sihui Liu
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-12-02 15:56 PST by Sihui Liu
Modified: 2020-12-14 09:48 PST (History)
3 users (show)

See Also:


Attachments
Patch (28.66 KB, patch)
2020-12-07 11:09 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (29.73 KB, patch)
2020-12-07 14:19 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (29.73 KB, patch)
2020-12-07 14:28 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (29.77 KB, patch)
2020-12-07 14:34 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (37.22 KB, patch)
2020-12-08 00:54 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (36.23 KB, patch)
2020-12-08 07:51 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (36.24 KB, patch)
2020-12-08 08:06 PST, Sihui Liu
darin: review-
Details | Formatted Diff | Diff
Patch (60.56 KB, patch)
2020-12-10 02:18 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (62.09 KB, patch)
2020-12-10 08:13 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (62.07 KB, patch)
2020-12-10 08:20 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (62.39 KB, patch)
2020-12-10 09:05 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (62.85 KB, patch)
2020-12-10 11:05 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (63.86 KB, patch)
2020-12-10 14:57 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (68.60 KB, patch)
2020-12-11 11:26 PST, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (68.31 KB, patch)
2020-12-11 12:07 PST, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (70.84 KB, patch)
2020-12-14 08:46 PST, Sihui Liu
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sihui Liu 2020-12-02 15:56:53 PST
...
Comment 1 Radar WebKit Bug Importer 2020-12-02 15:57:01 PST
<rdar://problem/71914465>
Comment 2 Sihui Liu 2020-12-07 11:09:43 PST
Created attachment 415566 [details]
Patch
Comment 3 Sihui Liu 2020-12-07 14:19:20 PST
Created attachment 415585 [details]
Patch
Comment 4 Sihui Liu 2020-12-07 14:28:29 PST
Created attachment 415586 [details]
Patch
Comment 5 Sihui Liu 2020-12-07 14:34:51 PST
Created attachment 415587 [details]
Patch
Comment 6 Sihui Liu 2020-12-08 00:54:34 PST
Created attachment 415617 [details]
Patch
Comment 7 Sihui Liu 2020-12-08 07:51:17 PST
Created attachment 415644 [details]
Patch
Comment 8 Sihui Liu 2020-12-08 08:06:28 PST
Created attachment 415646 [details]
Patch
Comment 9 Darin Adler 2020-12-08 11:36:35 PST
Comment on attachment 415646 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415646&action=review

> Source/WebCore/Configurations/WebCore.xcconfig:128
> +WK_SPEECH_LDFLAGS = $(WK_SPEECH_LDFLAGS_$(WK_PLATFORM_NAME));
> +WK_SPEECH_LDFLAGS_macosx = -framework Speech;
> +WK_SPEECH_LDFLAGS_iphoneos = -framework Speech;
> +WK_SPEECH_LDFLAGS_iphonesimulator = -framework Speech;
> +WK_SPEECH_LDFLAGS_maccatalyst = -framework Speech;

Is this OK for the base system on macOS? Is the Speech framework included in that reduced system?

Does this affect launch time? Adding another framework that mostly isn’t used can slow down launch if that framework is not part of the shared cache, I believe.

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:46
> +        auto error = WebCore::SpeechRecognitionError { WebCore::SpeechRecognitionErrorType::ServiceNotAllowed, "Failed to start recognition" };

Should use _s on the string constant here since it’s going to be used to make a WTF::String.

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:48
> +        auto update = WebCore::SpeechRecognitionUpdate::createError(identifier, error);
> +        invokeDelegateCallback(update);

Seems like this could be collapsed into a single line like the other similar cases below.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:89
> +        return nil;

This is missing a call to either "[self release]" or "[self dealloc]".

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:91
> +    if (![_recognizer.get() isAvailable])

The "get()" here is not needed.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:92
> +        return nil;

This is missing a call to either "[self release]" or "[self dealloc]".

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:104
> +    [_request _setMaximumRecognitionDuration:60 * 60];

Where does this magic constant come from? Maybe a named constant at the top of the file would be better. We could also comment how we chose that value.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:107
> +    _task = [_recognizer.get() recognitionTaskWithRequest:_request.get() delegate:self];

The first "get()" here isn't needed.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:119
> +        data.alternatives.append(WebCore::SpeechRecognitionAlternativeData { [[segment substring] UTF8String], [segment confidence] });

There’s no reason to call UTF8String here. We’re doing to store this into a WTF::String, and it can do it with NSString. Calling UTF8String makes the operation less efficient, and I don’t think provides any value.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:124
> +            data.alternatives.append(WebCore::SpeechRecognitionAlternativeData { [segmentAlternative UTF8String], 0.0 });

There’s no reason to call UTF8String here. We’re doing to store this into a WTF::String, and it can do it with NSString. Calling UTF8String makes the operation less efficient, and I don’t think provides any value.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:193
> +    // FIXME: This is never called.

This is a confusing comment. Why is this never called? We say FIXME: What should someone be fixing?

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:214
> +    // FIXME: This is never called.

Ditto.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:231
> +    auto error = WebCore::SpeechRecognitionError { WebCore::SpeechRecognitionErrorType::Aborted, [task.error.localizedDescription UTF8String] };

There’s no reason to call UTF8String here. We’re doing to store this into a WTF::String, and it can do it with NSString. Calling UTF8String makes the operation less efficient, and I don’t think provides any value.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:233
> +    auto update = WebCore::SpeechRecognitionUpdate::createError(_identifier, error);
> +    _delegateCallback(update);

I think we could collapse these into a single line.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:285
> +    [m_task.get() audioSamplesAvailable:buffer.get()];

The first "get()" here isn’t needed.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:304
> +    [m_task.get() stop];

I don’t think this get() is needed.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:310
> +    [m_task.get() abort];

Ditto.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:319
> +    [task.get() abort];

Ditto.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1728
> +        if (!weakPage || !weakPage->preferences().mockCaptureDevicesEnabled())
> +            return false;
> +        return true;

I would have written this:

    return weakPage && weakPage->preferences().mockCaptureDevicesEnabled();
Comment 10 youenn fablet 2020-12-09 01:27:57 PST
Comment on attachment 415646 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415646&action=review

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:123
> +    invokeDelegateCallback(SpeechRecognitionUpdate::create(*m_clientIdentifier, SpeechRecognitionUpdateType::AudioEnd));

Are we sure m_clientIdentifier is always not null here?
Maybe we should change the model as a clean-up patch by creating a SpeechRecognizer for every new recognition task.
Then we would pass the client identifier as part of SpeechRecognizer constructor and it would no longer be optional.
We could keep another boolean internal state to track whether being stopped/aborted.
The recogniser could also become mock or not mock at creation time as well.

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:128
> +void SpeechRecognizer::dataCaptured(const WTF::MediaTime&, const PlatformAudioData&, const AudioStreamDescription&, size_t)

We can probably remove WTF:: here.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:114
> +    auto segments = [transcription segments];

Can we use reserveInitialCapacity?

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:133
> +    [_request appendAudioSampleBuffer:sampleBuffer];

We tend to push audio samples in background threads only.
If this method is only supposed to be called in the main thread, we could add an isMainThread ASSERT.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:200
> +    [self callbackWithResult:transcription isFinal:NO];

Is the delegate guaranteed to be called on the main thread?

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:239
> +@interface WebSpeechRecognizerTaskMock : WebSpeechRecognizerTask

I would tend to move WebSpeechRecognizerTaskMock to its own file.

> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:290
> +    auto taskClass = mockCaptureDevicesEnabled ? [WebSpeechRecognizerTaskMock class] : [WebSpeechRecognizerTaskMock class];

In both cases, we are creating a WebSpeechRecognizerTaskMock, the second one is probably WebSpeechRecognizerTask.
Comment 11 Sihui Liu 2020-12-09 11:35:42 PST
Comment on attachment 415646 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415646&action=review

>> Source/WebCore/Configurations/WebCore.xcconfig:128
>> +WK_SPEECH_LDFLAGS_maccatalyst = -framework Speech;
> 
> Is this OK for the base system on macOS? Is the Speech framework included in that reduced system?
> 
> Does this affect launch time? Adding another framework that mostly isn’t used can slow down launch if that framework is not part of the shared cache, I believe.

Thanks for pointing this out! The framework is not in base system, should add -weak.

It may affect launch time, but it seems the framework is too small to cause noticeable change.

>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:46
>> +        auto error = WebCore::SpeechRecognitionError { WebCore::SpeechRecognitionErrorType::ServiceNotAllowed, "Failed to start recognition" };
> 
> Should use _s on the string constant here since it’s going to be used to make a WTF::String.

Sure.

>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:48
>> +        invokeDelegateCallback(update);
> 
> Seems like this could be collapsed into a single line like the other similar cases below.

Will collapse this.

>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:123
>> +    invokeDelegateCallback(SpeechRecognitionUpdate::create(*m_clientIdentifier, SpeechRecognitionUpdateType::AudioEnd));
> 
> Are we sure m_clientIdentifier is always not null here?
> Maybe we should change the model as a clean-up patch by creating a SpeechRecognizer for every new recognition task.
> Then we would pass the client identifier as part of SpeechRecognizer constructor and it would no longer be optional.
> We could keep another boolean internal state to track whether being stopped/aborted.
> The recogniser could also become mock or not mock at creation time as well.

Sounds good. Filed https://bugs.webkit.org/show_bug.cgi?id=219699.

>> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:128
>> +void SpeechRecognizer::dataCaptured(const WTF::MediaTime&, const PlatformAudioData&, const AudioStreamDescription&, size_t)
> 
> We can probably remove WTF:: here.

Sure.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:89
>> +        return nil;
> 
> This is missing a call to either "[self release]" or "[self dealloc]".

ah, will add!

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:91
>> +    if (![_recognizer.get() isAvailable])
> 
> The "get()" here is not needed.

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:92
>> +        return nil;
> 
> This is missing a call to either "[self release]" or "[self dealloc]".

Will add!

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:104
>> +    [_request _setMaximumRecognitionDuration:60 * 60];
> 
> Where does this magic constant come from? Maybe a named constant at the top of the file would be better. We could also comment how we chose that value.

Will add!

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:107
>> +    _task = [_recognizer.get() recognitionTaskWithRequest:_request.get() delegate:self];
> 
> The first "get()" here isn't needed.

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:114
>> +    auto segments = [transcription segments];
> 
> Can we use reserveInitialCapacity?

Sure.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:119
>> +        data.alternatives.append(WebCore::SpeechRecognitionAlternativeData { [[segment substring] UTF8String], [segment confidence] });
> 
> There’s no reason to call UTF8String here. We’re doing to store this into a WTF::String, and it can do it with NSString. Calling UTF8String makes the operation less efficient, and I don’t think provides any value.

Will remove this.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:124
>> +            data.alternatives.append(WebCore::SpeechRecognitionAlternativeData { [segmentAlternative UTF8String], 0.0 });
> 
> There’s no reason to call UTF8String here. We’re doing to store this into a WTF::String, and it can do it with NSString. Calling UTF8String makes the operation less efficient, and I don’t think provides any value.

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:133
>> +    [_request appendAudioSampleBuffer:sampleBuffer];
> 
> We tend to push audio samples in background threads only.
> If this method is only supposed to be called in the main thread, we could add an isMainThread ASSERT.

Sure.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:193
>> +    // FIXME: This is never called.
> 
> This is a confusing comment. Why is this never called? We say FIXME: What should someone be fixing?

hmm, left a comment here to remind myself this delegate callback never gets called during my testing. Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:200
>> +    [self callbackWithResult:transcription isFinal:NO];
> 
> Is the delegate guaranteed to be called on the main thread?

Good question! Will check this on testing; and all other callbacks.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:214
>> +    // FIXME: This is never called.
> 
> Ditto.

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:231
>> +    auto error = WebCore::SpeechRecognitionError { WebCore::SpeechRecognitionErrorType::Aborted, [task.error.localizedDescription UTF8String] };
> 
> There’s no reason to call UTF8String here. We’re doing to store this into a WTF::String, and it can do it with NSString. Calling UTF8String makes the operation less efficient, and I don’t think provides any value.

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:233
>> +    _delegateCallback(update);
> 
> I think we could collapse these into a single line.

Okay.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:239
>> +@interface WebSpeechRecognizerTaskMock : WebSpeechRecognizerTask
> 
> I would tend to move WebSpeechRecognizerTaskMock to its own file.

Will move.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:285
>> +    [m_task.get() audioSamplesAvailable:buffer.get()];
> 
> The first "get()" here isn’t needed.

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:290
>> +    auto taskClass = mockCaptureDevicesEnabled ? [WebSpeechRecognizerTaskMock class] : [WebSpeechRecognizerTaskMock class];
> 
> In both cases, we are creating a WebSpeechRecognizerTaskMock, the second one is probably WebSpeechRecognizerTask.

Oops!

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:304
>> +    [m_task.get() stop];
> 
> I don’t think this get() is needed.

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:310
>> +    [m_task.get() abort];
> 
> Ditto.

Will remove.

>> Source/WebCore/Modules/speech/SpeechRecognizerCocoa.mm:319
>> +    [task.get() abort];
> 
> Ditto.

Will remove.

>> Source/WebKit/UIProcess/WebProcessProxy.cpp:1728
>> +        return true;
> 
> I would have written this:
> 
>     return weakPage && weakPage->preferences().mockCaptureDevicesEnabled();

NIce.
Comment 12 Darin Adler 2020-12-09 12:20:50 PST
(In reply to Sihui Liu from comment #11)
> It may affect launch time, but it seems the framework is too small to cause
> noticeable change.

Please double check this with the people who worked so hard to make web processes launch quickly so they can make sure we measure after the fact to be sure.
Comment 13 Sihui Liu 2020-12-10 02:18:29 PST
Created attachment 415848 [details]
Patch
Comment 14 Sihui Liu 2020-12-10 08:13:16 PST
Created attachment 415877 [details]
Patch
Comment 15 Sihui Liu 2020-12-10 08:20:08 PST
Created attachment 415878 [details]
Patch
Comment 16 Sihui Liu 2020-12-10 09:05:11 PST
Created attachment 415881 [details]
Patch
Comment 17 Sihui Liu 2020-12-10 11:05:55 PST
Created attachment 415903 [details]
Patch
Comment 18 Sihui Liu 2020-12-10 14:57:29 PST
Created attachment 415936 [details]
Patch
Comment 19 youenn fablet 2020-12-11 01:09:09 PST
Comment on attachment 415936 [details]
Patch

This looks almost ready to me.
It would be good to get some more tests.

View in context: https://bugs.webkit.org/attachment.cgi?id=415936&action=review

> Source/WebCore/ChangeLog:11
> +        Manually tested in MiniBrowser.

If we beef up the mock implementation to return some fake recognised words, we could probably add some tests.

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:69
> +void SpeechRecognizer::start(SpeechRecognitionConnectionClientIdentifier clientIdentifier, Ref<RealtimeMediaSource>&& source, bool mockCaptureDevicesEnabled, const String& localeIdentifier, bool continuous, bool interimResults, uint64_t maxAlternatives)

s/mockCaptureDevicesEnabled/mockSpeechRecognitionEnabled/

> Source/WebCore/Modules/speech/SpeechRecognizer.cpp:77
> +        auto update = WebCore::SpeechRecognitionUpdate::createError(clientIdentifier, error);

WTFMove() or inline error creation in this line

> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:45
> +bool SpeechRecognizer::startRecognition(bool mockCaptureDevicesEnabled, SpeechRecognitionConnectionClientIdentifier identifier, const String& localeIdentifier, bool continuous, bool interimResults, uint64_t alternatives)

s/mockCaptureDevicesEnabled/mockSpeechDevicesEnabled/

> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:70
> +void SpeechRecognizer::resetRecognition()

Should we merge abort and reset methods?
If we abort, do we want m_task to not stay null?

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:119
> +    Vector<WebCore::SpeechRecognitionResultData> datas;

No need for WebCore:: prefix in this file

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:120
> +    datas.reserveInitialCapacity(_maxAlternatives <= segments.count ? _maxAlternatives : segments.count);

This should probably be: datas.reserveInitialCapacity(segments.count);

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:127
> +            if (data.alternatives.size() == _maxAlternatives)

This is for data.alternatives that you could potentially try doing reserveInitialCapacity(_maxAlternatives <= segments.alternativeSubstrings.count...);

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:132
> +        datas.append(WTFMove(data));

Could do uncheckedAppend here.
Or you could write it as: datas.append(SpeechRecognitionResultData {isFinal , WTFMove(alternatives });

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:134
> +    _delegateCallback(WebCore::SpeechRecognitionUpdate::createResult(_identifier, datas));

Might need a WTFMove here.

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:215
> +    auto update = WebCore::SpeechRecognitionUpdate::createError(_identifier, error);

WTFMove()

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:253
> +    if (successfully) {

We usually do if (!successfully)

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:258
> +    auto error = WebCore::SpeechRecognitionError { WebCore::SpeechRecognitionErrorType::Aborted, task.error.localizedDescription};

s/}/ }/

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:259
> +    _delegateCallback(WebCore::SpeechRecognitionUpdate::createError(_identifier, error));

WTFMove

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTaskMock.mm:50
> +    }

Do we not want to also exercise the WebCore::SpeechRecognitionUpdate::createResult code path?
Can we do something like adding a timer that will output some bip bop words regularly or as a one shot?

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:48
> +    , m_checkIfMockCaptureDevicesEnabled(WTFMove(checkIfEnabled))

I would rename it to checkIfMockSpeechEnabled.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:107
> +            m_recognizer->reset();

Are we using reset elsewhere?
If not, can we implement reset so that it does not send any update?
That way, we could remove m_isResetting.

> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:116
> +        m_recognizer->reset();

If we call reset, isn't there a chance that we will call its callback above that will send update message while it should not?
reset is no longer explicitly sending a message, so maybe we no longer need m_isResetting.

> Source/WebKit/UIProcess/WebProcessProxy.cpp:1729
> +        return !weakPage || weakPage->preferences().mockCaptureDevicesEnabled();

In theory it should be return weakPage && weakPage->preferences().mockCaptureDevicesEnabled();
I would name it checkIfMockSpeechDevicesEnabled.
Comment 20 Sihui Liu 2020-12-11 11:26:40 PST
Created attachment 416017 [details]
Patch
Comment 21 Sihui Liu 2020-12-11 12:07:16 PST
Created attachment 416028 [details]
Patch
Comment 22 Sihui Liu 2020-12-11 12:11:19 PST
Comment on attachment 415936 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415936&action=review

>> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:70
>> +void SpeechRecognizer::resetRecognition()
> 
> Should we merge abort and reset methods?
> If we abort, do we want m_task to not stay null?

abort() means abort task, but still listen to async update (error or end), and wait for task to complete naturally.
reset() means abort task, and not listen to any async update by setting task null.
This is a bit confusing and may be simplified..

>> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:119
>> +    Vector<WebCore::SpeechRecognitionResultData> datas;
> 
> No need for WebCore:: prefix in this file

Added "using namespace WebCore".

>> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTaskMock.mm:50
>> +    }
> 
> Do we not want to also exercise the WebCore::SpeechRecognitionUpdate::createResult code path?
> Can we do something like adding a timer that will output some bip bop words regularly or as a one shot?

Added one fake final result for each received buffer.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:107
>> +            m_recognizer->reset();
> 
> Are we using reset elsewhere?
> If not, can we implement reset so that it does not send any update?
> That way, we could remove m_isResetting.

reset() is called after an error/end update from recognizer delegate callback (from audio capture/recognition), or error update from server starting another request.

>> Source/WebKit/UIProcess/SpeechRecognitionServer.cpp:116
>> +        m_recognizer->reset();
> 
> If we call reset, isn't there a chance that we will call its callback above that will send update message while it should not?
> reset is no longer explicitly sending a message, so maybe we no longer need m_isResetting.

Yes; basically reset() means not waiting for async update but allow sending sync update message.
So this event ordering is possible for some recognition: Error -> AudioEnd -> SpeechEnd -> End*
Comment 23 youenn fablet 2020-12-14 01:09:42 PST
Comment on attachment 416028 [details]
Patch

LGTM.
As a side question, do we have enough existing tests for abort and stop cases?

View in context: https://bugs.webkit.org/attachment.cgi?id=416028&action=review

> Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:48
> +    m_task = adoptNS([[taskClass alloc] initWithIdentifier:identifier locale:localeIdentifier doMultipleRecognitions:continuous reportInterimResults:interimResults maxAlternatives:alternatives delegateCallback:[this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) {

auto& update

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:38
> +static constexpr size_t maximumRecognitionDuration = 60 * 60;

We should probably think about our strategy for a page that is doing speech recognition and goes into the background on iOS.
It seems to me it might be best to mute the audio capture or maybe stop the recognition.

There is also the case of a page that is doing recognition and another page starts audio capture.
Except on iPadOS, the above strategy would also handle this case.

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:85
> +    _maxAlternatives = alternatives;

Can be done as a one liner:
_maxAlternatives = alternatives ? alternatives : 1

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:136
> +    _delegateCallback(SpeechRecognitionUpdate::createResult(_identifier, datas));

WTFMove(data)

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:214
> +        return;

Can be added with above if

> Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTaskMock.mm:64
> +    _delegateCallback(SpeechRecognitionUpdate::createResult(_identifier, datas));

WTFMove()
Comment 24 Sihui Liu 2020-12-14 08:46:23 PST
Created attachment 416163 [details]
Patch
Comment 25 Sihui Liu 2020-12-14 09:30:00 PST
(In reply to youenn fablet from comment #23)
> Comment on attachment 416028 [details]
> Patch
> 
> LGTM.
> As a side question, do we have enough existing tests for abort and stop
> cases?

Added another test for restarting the same recognition after stop.
The two added tests should be the most basic cases for stop().
abort() is nothing different from stop() with our current test setup. We can probably make WebSpeechRecognizerTaskMock behave more like SFSpeechRecognizer, e.g. calling callback asynchronously.

> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=416028&action=review
> 
> > Source/WebCore/Modules/speech/cocoa/SpeechRecognizerCocoa.mm:48
> > +    m_task = adoptNS([[taskClass alloc] initWithIdentifier:identifier locale:localeIdentifier doMultipleRecognitions:continuous reportInterimResults:interimResults maxAlternatives:alternatives delegateCallback:[this, weakThis = makeWeakPtr(this)](const SpeechRecognitionUpdate& update) {
> 
> auto& update
> 
> > Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:38
> > +static constexpr size_t maximumRecognitionDuration = 60 * 60;
> 
> We should probably think about our strategy for a page that is doing speech
> recognition and goes into the background on iOS.
> It seems to me it might be best to mute the audio capture or maybe stop the
> recognition.
> 
> There is also the case of a page that is doing recognition and another page
> starts audio capture.
> Except on iPadOS, the above strategy would also handle this case.

ah good point. We should abort recognition when page becomes invisible (process is backgrounded) or audio capture will be started in a different web page (or in a different process as it seems CoreAudioCaptureSource::startProducingData() can set active source and mute inactive ones in the same process).

> 
> > Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:85
> > +    _maxAlternatives = alternatives;
> 
> Can be done as a one liner:
> _maxAlternatives = alternatives ? alternatives : 1
> 
> > Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:136
> > +    _delegateCallback(SpeechRecognitionUpdate::createResult(_identifier, datas));
> 
> WTFMove(data)
> 
> > Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTask.mm:214
> > +        return;
> 
> Can be added with above if
> 
> > Source/WebCore/Modules/speech/cocoa/WebSpeechRecognizerTaskMock.mm:64
> > +    _delegateCallback(SpeechRecognitionUpdate::createResult(_identifier, datas));
> 
> WTFMove()
Comment 26 EWS 2020-12-14 09:48:12 PST
Committed r270772: <https://trac.webkit.org/changeset/270772>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 416163 [details].