Bug 195645 - [macOS] Broker access to Speech Synthesis
Summary: [macOS] Broker access to Speech Synthesis
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Per Arne Vollan
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-03-12 15:21 PDT by Per Arne Vollan
Modified: 2019-03-15 11:18 PDT (History)
6 users (show)

See Also:


Attachments
Patch (52.20 KB, patch)
2019-03-12 15:53 PDT, Per Arne Vollan
no flags Details | Formatted Diff | Diff
Patch (52.46 KB, patch)
2019-03-12 16:32 PDT, Per Arne Vollan
no flags Details | Formatted Diff | Diff
Patch (55.24 KB, patch)
2019-03-12 16:50 PDT, Per Arne Vollan
no flags Details | Formatted Diff | Diff
Patch (55.92 KB, patch)
2019-03-12 17:14 PDT, Per Arne Vollan
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews106 for mac-highsierra-wk2 (3.00 MB, application/zip)
2019-03-12 18:43 PDT, EWS Watchlist
no flags Details
Patch (56.09 KB, patch)
2019-03-12 20:00 PDT, Per Arne Vollan
no flags Details | Formatted Diff | Diff
Patch (58.22 KB, patch)
2019-03-13 19:15 PDT, Per Arne Vollan
no flags Details | Formatted Diff | Diff
Patch (58.75 KB, patch)
2019-03-13 20:46 PDT, Per Arne Vollan
no flags Details | Formatted Diff | Diff
Patch (59.01 KB, patch)
2019-03-14 16:54 PDT, Per Arne Vollan
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Per Arne Vollan 2019-03-12 15:21:58 PDT
Speech synthesis should be performed in the UI process.
Comment 1 Per Arne Vollan 2019-03-12 15:22:20 PDT
rdar://problem/35369026
Comment 2 Per Arne Vollan 2019-03-12 15:53:22 PDT
Created attachment 364464 [details]
Patch
Comment 3 Per Arne Vollan 2019-03-12 16:32:40 PDT
Created attachment 364469 [details]
Patch
Comment 4 Per Arne Vollan 2019-03-12 16:50:33 PDT
Created attachment 364472 [details]
Patch
Comment 5 Per Arne Vollan 2019-03-12 17:14:41 PDT
Created attachment 364479 [details]
Patch
Comment 6 EWS Watchlist 2019-03-12 18:43:34 PDT
Comment on attachment 364479 [details]
Patch

Attachment 364479 [details] did not pass mac-wk2-ews (mac-wk2):
Output: https://webkit-queues.webkit.org/results/11483397

New failing tests:
fast/speechsynthesis/speech-synthesis-elapsed-time.html
fast/speechsynthesis/speech-synthesis-boundary-events.html
fast/speechsynthesis/speech-synthesis-gc-utterance-crash.html
fast/speechsynthesis/speech-synthesis-voices.html
fast/speechsynthesis/speech-synthesis-cancel-crash.html
fast/speechsynthesis/speech-synthesis-speak.html
fast/speechsynthesis/speech-synthesis-pause-resume.html
Comment 7 EWS Watchlist 2019-03-12 18:43:35 PDT
Created attachment 364492 [details]
Archive of layout-test-results from ews106 for mac-highsierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-highsierra-wk2  Platform: Mac OS X 10.13.6
Comment 8 Per Arne Vollan 2019-03-12 20:00:11 PDT
Created attachment 364499 [details]
Patch
Comment 9 Brent Fulgham 2019-03-12 20:40:25 PDT
Comment on attachment 364499 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=364499&action=review

Looks great! I think we could remove some of the boilerplate message passing code by using the “sendWithAsyncReply” features of the IPC layer.

> Source/WebCore/Modules/speech/SpeechSynthesis.cpp:125
> +        m_speechSynthesisClient->speak(*utterance.platformUtterance());

It seems like both of these ‘speech’ implementations should take an object (not a pointer), or both should take a pointer, but not this mix.

> Source/WebCore/Modules/speech/SpeechSynthesis.h:105
> +    SpeechSynthesisClient* m_speechSynthesisClient { nullptr };

This should either be a WeakPtr or a RefPtr. We should always be leery of bare pointers.

> Source/WebCore/platform/PlatformSpeechSynthesizer.h:43
> +enum class SpeechBoundary : uint8_t {

Good!

> Source/WebKit/UIProcess/WebPageProxy.cpp:8808
> +void WebPageProxy::speechSynthesisVoiceList(Vector<WebSpeechSynthesisVoice>& result)

If these were Implemented for sendWithAsyncRepky, this would take a completion handler you could call when done to get the response back to the WebProcess.

> Source/WebKit/UIProcess/WebPageProxy.cpp:8811
> +        m_synthesizer = std::make_unique<PlatformSpeechSynthesizer>(this);

Maybe this should be a helper “ensureSynthesizer()” that returns a PlatformSpeechSynthesizer& that you can use elsewhere.

> Source/WebKit/UIProcess/WebPageProxy.cpp:8832
> +    m_synthesizer->speak(m_utterance.get());

You could pass the CompletionHandler to PlatformSpeechSynthesizer::speak

> Source/WebKit/UIProcess/WebPageProxy.cpp:8837
> +    m_synthesizer->cancel();

It seems like m_synthesizer can be nullptr in a lot of cases. Should these null check? Or at least assert?

> Source/WebKit/UIProcess/WebPageProxy.cpp:8842
> +    m_synthesizer->pause();

Ditto ->pause(WTFMove(completionHandler))

> Source/WebKit/UIProcess/WebPageProxy.messages.in:555
> +    SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync

I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList)

> Source/WebKit/UIProcess/WebPageProxy.messages.in:556
> +    SpeechSynthesisSpeak(String text, String lang, float volume, float rate, float pitch, MonotonicTime startTime, String voiceURI, String voiceName, String voiceLang, bool localService, bool defaultVoice)

If these were Async() you could pass a Lambda on the calling side and it would automatically be called...

> Source/WebKit/WebProcess/com.apple.WebProcess.sb.in:-606
> -       (global-name "com.apple.speech.speechsynthesisd")

Hooray!

> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:43
> +        voices.append(WebCore::PlatformSpeechSynthesisVoice::create(voice.voiceURI, voice.name, voice.lang, voice.localService, voice.defaultLang));

Might be nicer to WTFMove() the parr’s of voice into the new object, since we discard it all when we are done.

> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:52
> +    m_page.send(Messages::WebPageProxy::SpeechSynthesisSpeak(utterance.text(), utterance.lang(), utterance.volume(), utterance.rate(), utterance.pitch(), utterance.startTime(), voice->voiceURI(), voice->name(), voice->lang(), voice->localService(), voice->isDefault()));

If you used sendWithAsyncReply you could pass a Lambda to be called when the speak command completed and avoid writing a “didSpeak” response handler.

> Source/WebKit/WebProcess/WebPage/WebPage.cpp:6406
> +    corePage()->speechSynthesisClient()->observer()->didStartSpeaking();

These could be done as Lambdas in the speechSynthesisClient if you used sendWithAsyncReply on the sending side.

> Source/WebKit/WebProcess/WebPage/WebPage.messages.in:554
> +    DidResumeSpeaking()

I don’t think you need these ‘DidXZ’ implementations if they are the return call of an sendWithAsyncReply.
Comment 10 Sam Weinig 2019-03-12 22:16:57 PDT
Comment on attachment 364499 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=364499&action=review

>> Source/WebKit/UIProcess/WebPageProxy.messages.in:555
>> +    SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync
> 
> I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList)

Adding more LegacySync messages is really not ideal. Have you considered alternatives? Like sending the voice list over on process initialization?
Comment 11 Per Arne Vollan 2019-03-13 10:12:45 PDT
(In reply to Sam Weinig from comment #10)
> Comment on attachment 364499 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=364499&action=review
> 
> >> Source/WebKit/UIProcess/WebPageProxy.messages.in:555
> >> +    SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync
> > 
> > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList)
> 
> Adding more LegacySync messages is really not ideal. Have you considered
> alternatives? Like sending the voice list over on process initialization?

This is definitely a good idea, although I am a little concerned about the performance of process startup when doing this.
Comment 12 Per Arne Vollan 2019-03-13 16:27:00 PDT
(In reply to Per Arne Vollan from comment #11)
> (In reply to Sam Weinig from comment #10)
> > Comment on attachment 364499 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=364499&action=review
> > 
> > >> Source/WebKit/UIProcess/WebPageProxy.messages.in:555
> > >> +    SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync
> > > 
> > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList)
> > 
> > Adding more LegacySync messages is really not ideal. Have you considered
> > alternatives? Like sending the voice list over on process initialization?
> 
> This is definitely a good idea, although I am a little concerned about the
> performance of process startup when doing this.

It takes about 150ms to get get the voice list the first time, so I don't think it would be good to add this to the startup time of the UI process.
Comment 13 Per Arne Vollan 2019-03-13 19:15:10 PDT
Created attachment 364608 [details]
Patch
Comment 14 Per Arne Vollan 2019-03-13 19:21:27 PDT
(In reply to Brent Fulgham from comment #9)
> Comment on attachment 364499 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=364499&action=review
> 
> Looks great! I think we could remove some of the boilerplate message passing
> code by using the “sendWithAsyncReply” features of the IPC layer.
> 
> > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:125
> > +        m_speechSynthesisClient->speak(*utterance.platformUtterance());
> 
> It seems like both of these ‘speech’ implementations should take an object
> (not a pointer), or both should take a pointer, but not this mix.
> 
> > Source/WebCore/Modules/speech/SpeechSynthesis.h:105
> > +    SpeechSynthesisClient* m_speechSynthesisClient { nullptr };
> 
> This should either be a WeakPtr or a RefPtr. We should always be leery of
> bare pointers.
> 
> > Source/WebCore/platform/PlatformSpeechSynthesizer.h:43
> > +enum class SpeechBoundary : uint8_t {
> 
> Good!
> 
> > Source/WebKit/UIProcess/WebPageProxy.cpp:8808
> > +void WebPageProxy::speechSynthesisVoiceList(Vector<WebSpeechSynthesisVoice>& result)
> 
> If these were Implemented for sendWithAsyncRepky, this would take a
> completion handler you could call when done to get the response back to the
> WebProcess.
> 
> > Source/WebKit/UIProcess/WebPageProxy.cpp:8811
> > +        m_synthesizer = std::make_unique<PlatformSpeechSynthesizer>(this);
> 
> Maybe this should be a helper “ensureSynthesizer()” that returns a
> PlatformSpeechSynthesizer& that you can use elsewhere.
> 
> > Source/WebKit/UIProcess/WebPageProxy.cpp:8832
> > +    m_synthesizer->speak(m_utterance.get());
> 
> You could pass the CompletionHandler to PlatformSpeechSynthesizer::speak
> 
> > Source/WebKit/UIProcess/WebPageProxy.cpp:8837
> > +    m_synthesizer->cancel();
> 
> It seems like m_synthesizer can be nullptr in a lot of cases. Should these
> null check? Or at least assert?
> 
> > Source/WebKit/UIProcess/WebPageProxy.cpp:8842
> > +    m_synthesizer->pause();
> 
> Ditto ->pause(WTFMove(completionHandler))
> 
> > Source/WebKit/UIProcess/WebPageProxy.messages.in:555
> > +    SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync
> 
> I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice>
> voiceList)
> 

I created https://bugs.webkit.org/show_bug.cgi?id=195723 to follow up on this.


> > Source/WebKit/UIProcess/WebPageProxy.messages.in:556
> > +    SpeechSynthesisSpeak(String text, String lang, float volume, float rate, float pitch, MonotonicTime startTime, String voiceURI, String voiceName, String voiceLang, bool localService, bool defaultVoice)
> 
> If these were Async() you could pass a Lambda on the calling side and it
> would automatically be called...
> 
> > Source/WebKit/WebProcess/com.apple.WebProcess.sb.in:-606
> > -       (global-name "com.apple.speech.speechsynthesisd")
> 
> Hooray!
> 
> > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:43
> > +        voices.append(WebCore::PlatformSpeechSynthesisVoice::create(voice.voiceURI, voice.name, voice.lang, voice.localService, voice.defaultLang));
> 
> Might be nicer to WTFMove() the parr’s of voice into the new object, since
> we discard it all when we are done.
> 
> > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:52
> > +    m_page.send(Messages::WebPageProxy::SpeechSynthesisSpeak(utterance.text(), utterance.lang(), utterance.volume(), utterance.rate(), utterance.pitch(), utterance.startTime(), voice->voiceURI(), voice->name(), voice->lang(), voice->localService(), voice->isDefault()));
> 
> If you used sendWithAsyncReply you could pass a Lambda to be called when the
> speak command completed and avoid writing a “didSpeak” response handler.
> 
> > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6406
> > +    corePage()->speechSynthesisClient()->observer()->didStartSpeaking();
> 
> These could be done as Lambdas in the speechSynthesisClient if you used
> sendWithAsyncReply on the sending side.
> 
> > Source/WebKit/WebProcess/WebPage/WebPage.messages.in:554
> > +    DidResumeSpeaking()
> 
> I don’t think you need these ‘DidXZ’ implementations if they are the return
> call of an sendWithAsyncReply.

Thanks for reviewing! I have updated the patch.
Comment 15 Per Arne Vollan 2019-03-13 20:46:11 PDT
Created attachment 364622 [details]
Patch
Comment 16 Brent Fulgham 2019-03-13 22:48:03 PDT
Comment on attachment 364622 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=364622&action=review

I think this is close, but we could do better by removing more boilerplate method implementations. This might mean touching other platform-specific files to thread the completion handler brought.

> Source/WebCore/Modules/speech/SpeechSynthesis.cpp:75
>      if (m_voiceList.size())

Nit: I really prefer !m_voiceList.emoty() (or isEmoty?)

> Source/WebCore/Modules/speech/SpeechSynthesis.cpp:84
> +            m_platformSpeechSynthesizer = std::make_unique<PlatformSpeechSynthesizer>(this);

You repeat this pattern in a few places. I suggest making a “ensureSpeechSynthesizer” method that does this and returns a reference.

> Source/WebCore/Modules/speech/SpeechSynthesis.cpp:88
> +            m_voiceList.append(SpeechSynthesisVoice::create(*voice));

It bugs me that these two m_voiceList loops exist.! Can they be shared somehow? Seems like we could just use a reference to the voiceList from either source.

> Source/WebCore/Modules/speech/SpeechSynthesis.h:45
> +    static Ref<SpeechSynthesis> create(WeakPtr<SpeechSynthesisClient>);

Do we ever create a SpeechSynthesis object without a Client? I think we could pass a reference, and do the makeWeakPtr inside the constructor.

> Source/WebKit/UIProcess/WebPageProxy.cpp:8839
> +    m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler);

It seems like we should be able to pass the completion handler to “speak”, below. The implementation of speak would take a completion handler as the final argument, and call it when complete (rather then calling “didFinishSpeaking” (Ditto for the other methods).

It might not be necessary to have the SpeechSynhesisData object at all, and just thread the state through the call stack.

This also seems to be the only place where the Utterance is used. I wonder if we need to remember it elsewhere. Can we just WTFMove it into the “speak” method?

> Source/WebKit/UIProcess/WebPageProxy.h:2465
> +        CompletionHandler<void()> speakingResumedCompletionHandler;

You might not need to store these three handlers, and perhaps not the utterance.

> Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:243
> +void WebPageProxy::didStartSpeaking(WebCore::PlatformSpeechSynthesisUtterance&)

I can tell where these get called, I guess the SpeechSynthesis object calls them at some point?

I think these could be discarded if you just passed the completion handler into “speak”, “cancel”, etc., and invoked the handler instead of calling this method.

> Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:248
> +{

Do we need to remember the utterance at all, or even pass the argument? We never seem to use it,,,

> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:75
> +        m_page.corePage()->speechSynthesisClient()->observer()->didPauseSpeaking();

This looks like weird spacing (missing indent)
Comment 17 Per Arne Vollan 2019-03-14 16:02:08 PDT
(In reply to Brent Fulgham from comment #16)
> Comment on attachment 364622 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=364622&action=review
> 
> I think this is close, but we could do better by removing more boilerplate
> method implementations. This might mean touching other platform-specific
> files to thread the completion handler brought.
> 
> > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:75
> >      if (m_voiceList.size())
> 
> Nit: I really prefer !m_voiceList.emoty() (or isEmoty?)
> 
> > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:84
> > +            m_platformSpeechSynthesizer = std::make_unique<PlatformSpeechSynthesizer>(this);
> 
> You repeat this pattern in a few places. I suggest making a
> “ensureSpeechSynthesizer” method that does this and returns a reference.
> 
> > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:88
> > +            m_voiceList.append(SpeechSynthesisVoice::create(*voice));
> 
> It bugs me that these two m_voiceList loops exist.! Can they be shared
> somehow? Seems like we could just use a reference to the voiceList from
> either source.
> 
> > Source/WebCore/Modules/speech/SpeechSynthesis.h:45
> > +    static Ref<SpeechSynthesis> create(WeakPtr<SpeechSynthesisClient>);
> 
> Do we ever create a SpeechSynthesis object without a Client? I think we
> could pass a reference, and do the makeWeakPtr inside the constructor.
> 

Yes, for WebKitLegacy there will be no client.

> > Source/WebKit/UIProcess/WebPageProxy.cpp:8839
> > +    m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler);
> 
> It seems like we should be able to pass the completion handler to “speak”,
> below. The implementation of speak would take a completion handler as the
> final argument, and call it when complete (rather then calling
> “didFinishSpeaking” (Ditto for the other methods).
> 

I think that's an excellent approach. Since this means changing the existing PlatformSpeechSynthesizer, which will make the scope of the patch bigger, do you think we could do this in a follow-up patch?

> It might not be necessary to have the SpeechSynhesisData object at all, and
> just thread the state through the call stack.
> 
> This also seems to be the only place where the Utterance is used. I wonder
> if we need to remember it elsewhere. Can we just WTFMove it into the “speak”
> method?
> 
> > Source/WebKit/UIProcess/WebPageProxy.h:2465
> > +        CompletionHandler<void()> speakingResumedCompletionHandler;
> 
> You might not need to store these three handlers, and perhaps not the
> utterance.
> 
> > Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:243
> > +void WebPageProxy::didStartSpeaking(WebCore::PlatformSpeechSynthesisUtterance&)
> 
> I can tell where these get called, I guess the SpeechSynthesis object calls
> them at some point?
> 
> I think these could be discarded if you just passed the completion handler
> into “speak”, “cancel”, etc., and invoked the handler instead of calling
> this method.
> 
> > Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:248
> > +{
> 
> Do we need to remember the utterance at all, or even pass the argument? We
> never seem to use it,,,
> 
> > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:75
> > +        m_page.corePage()->speechSynthesisClient()->observer()->didPauseSpeaking();
> 
> This looks like weird spacing (missing indent)

Thanks for reviewing! I will update the patch.
Comment 18 Per Arne Vollan 2019-03-14 16:54:27 PDT
Created attachment 364719 [details]
Patch
Comment 19 Brent Fulgham 2019-03-14 18:33:50 PDT
Comment on attachment 364622 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=364622&action=review

>>> Source/WebKit/UIProcess/WebPageProxy.cpp:8839
>>> +    m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler);
>> 
>> It seems like we should be able to pass the completion handler to “speak”, below. The implementation of speak would take a completion handler as the final argument, and call it when complete (rather then calling “didFinishSpeaking” (Ditto for the other methods).
>> 
>> It might not be necessary to have the SpeechSynhesisData object at all, and just thread the state through the call stack.
>> 
>> This also seems to be the only place where the Utterance is used. I wonder if we need to remember it elsewhere. Can we just WTFMove it into the “speak” method?
> 
> I think that's an excellent approach. Since this means changing the existing PlatformSpeechSynthesizer, which will make the scope of the patch bigger, do you think we could do this in a follow-up patch?

Sure!
Comment 20 Brent Fulgham 2019-03-15 10:39:18 PDT
Comment on attachment 364719 [details]
Patch

r=me.
Comment 21 Per Arne Vollan 2019-03-15 10:52:39 PDT
Comment on attachment 364719 [details]
Patch

Thanks for reviewing!
Comment 22 WebKit Commit Bot 2019-03-15 11:18:42 PDT
Comment on attachment 364719 [details]
Patch

Clearing flags on attachment: 364719

Committed r243002: <https://trac.webkit.org/changeset/243002>
Comment 23 WebKit Commit Bot 2019-03-15 11:18:45 PDT
All reviewed patches have been landed.  Closing bug.