195645 – [macOS] Broker access to Speech Synthesis

RESOLVED FIXED 195645

[macOS] Broker access to Speech Synthesis

https://bugs.webkit.org/show_bug.cgi?id=195645

Summary [macOS] Broker access to Speech Synthesis

Per Arne Vollan

Reported 2019-03-12 15:21:58 PDT

Speech synthesis should be performed in the UI process.

Attachments
Patch (52.20 KB, patch) 2019-03-12 15:53 PDT, Per Arne Vollan	no flags	Details Formatted Diff Diff
Patch (52.46 KB, patch) 2019-03-12 16:32 PDT, Per Arne Vollan	no flags	Details Formatted Diff Diff
Patch (55.24 KB, patch) 2019-03-12 16:50 PDT, Per Arne Vollan	no flags	Details Formatted Diff Diff
Patch (55.92 KB, patch) 2019-03-12 17:14 PDT, Per Arne Vollan	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews106 for mac-highsierra-wk2 (3.00 MB, application/zip) 2019-03-12 18:43 PDT, EWS Watchlist	no flags	Details
Patch (56.09 KB, patch) 2019-03-12 20:00 PDT, Per Arne Vollan	no flags	Details Formatted Diff Diff
Patch (58.22 KB, patch) 2019-03-13 19:15 PDT, Per Arne Vollan	no flags	Details Formatted Diff Diff
Patch (58.75 KB, patch) 2019-03-13 20:46 PDT, Per Arne Vollan	no flags	Details Formatted Diff Diff
Patch (59.01 KB, patch) 2019-03-14 16:54 PDT, Per Arne Vollan	no flags	Details Formatted Diff Diff
Show Obsolete (8) View All Add attachment proposed patch, testcase, etc.

Per Arne Vollan

Comment 1 2019-03-12 15:22:20 PDT

rdar://problem/35369026

Per Arne Vollan

Comment 2 2019-03-12 15:53:22 PDT

Created attachment 364464 [details] Patch

Per Arne Vollan

Comment 3 2019-03-12 16:32:40 PDT

Created attachment 364469 [details] Patch

Per Arne Vollan

Comment 4 2019-03-12 16:50:33 PDT

Created attachment 364472 [details] Patch

Per Arne Vollan

Comment 5 2019-03-12 17:14:41 PDT

Created attachment 364479 [details] Patch

EWS Watchlist

Comment 6 2019-03-12 18:43:34 PDT

Comment on attachment 364479 [details] Patch Attachment 364479 [details] did not pass mac-wk2-ews (mac-wk2): Output: https://webkit-queues.webkit.org/results/11483397 New failing tests: fast/speechsynthesis/speech-synthesis-elapsed-time.html fast/speechsynthesis/speech-synthesis-boundary-events.html fast/speechsynthesis/speech-synthesis-gc-utterance-crash.html fast/speechsynthesis/speech-synthesis-voices.html fast/speechsynthesis/speech-synthesis-cancel-crash.html fast/speechsynthesis/speech-synthesis-speak.html fast/speechsynthesis/speech-synthesis-pause-resume.html

EWS Watchlist

Comment 7 2019-03-12 18:43:35 PDT

Created attachment 364492 [details] Archive of layout-test-results from ews106 for mac-highsierra-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-highsierra-wk2 Platform: Mac OS X 10.13.6

Per Arne Vollan

Comment 8 2019-03-12 20:00:11 PDT

Created attachment 364499 [details] Patch

Brent Fulgham

Comment 9 2019-03-12 20:40:25 PDT

Comment on attachment 364499 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=364499&action=review Looks great! I think we could remove some of the boilerplate message passing code by using the “sendWithAsyncReply” features of the IPC layer. > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:125 > + m_speechSynthesisClient->speak(*utterance.platformUtterance()); It seems like both of these ‘speech’ implementations should take an object (not a pointer), or both should take a pointer, but not this mix. > Source/WebCore/Modules/speech/SpeechSynthesis.h:105 > + SpeechSynthesisClient* m_speechSynthesisClient { nullptr }; This should either be a WeakPtr or a RefPtr. We should always be leery of bare pointers. > Source/WebCore/platform/PlatformSpeechSynthesizer.h:43 > +enum class SpeechBoundary : uint8_t { Good! > Source/WebKit/UIProcess/WebPageProxy.cpp:8808 > +void WebPageProxy::speechSynthesisVoiceList(Vector<WebSpeechSynthesisVoice>& result) If these were Implemented for sendWithAsyncRepky, this would take a completion handler you could call when done to get the response back to the WebProcess. > Source/WebKit/UIProcess/WebPageProxy.cpp:8811 > + m_synthesizer = std::make_unique<PlatformSpeechSynthesizer>(this); Maybe this should be a helper “ensureSynthesizer()” that returns a PlatformSpeechSynthesizer& that you can use elsewhere. > Source/WebKit/UIProcess/WebPageProxy.cpp:8832 > + m_synthesizer->speak(m_utterance.get()); You could pass the CompletionHandler to PlatformSpeechSynthesizer::speak > Source/WebKit/UIProcess/WebPageProxy.cpp:8837 > + m_synthesizer->cancel(); It seems like m_synthesizer can be nullptr in a lot of cases. Should these null check? Or at least assert? > Source/WebKit/UIProcess/WebPageProxy.cpp:8842 > + m_synthesizer->pause(); Ditto ->pause(WTFMove(completionHandler)) > Source/WebKit/UIProcess/WebPageProxy.messages.in:555 > + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) > Source/WebKit/UIProcess/WebPageProxy.messages.in:556 > + SpeechSynthesisSpeak(String text, String lang, float volume, float rate, float pitch, MonotonicTime startTime, String voiceURI, String voiceName, String voiceLang, bool localService, bool defaultVoice) If these were Async() you could pass a Lambda on the calling side and it would automatically be called... > Source/WebKit/WebProcess/com.apple.WebProcess.sb.in:-606 > - (global-name "com.apple.speech.speechsynthesisd") Hooray! > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:43 > + voices.append(WebCore::PlatformSpeechSynthesisVoice::create(voice.voiceURI, voice.name, voice.lang, voice.localService, voice.defaultLang)); Might be nicer to WTFMove() the parr’s of voice into the new object, since we discard it all when we are done. > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:52 > + m_page.send(Messages::WebPageProxy::SpeechSynthesisSpeak(utterance.text(), utterance.lang(), utterance.volume(), utterance.rate(), utterance.pitch(), utterance.startTime(), voice->voiceURI(), voice->name(), voice->lang(), voice->localService(), voice->isDefault())); If you used sendWithAsyncReply you could pass a Lambda to be called when the speak command completed and avoid writing a “didSpeak” response handler. > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6406 > + corePage()->speechSynthesisClient()->observer()->didStartSpeaking(); These could be done as Lambdas in the speechSynthesisClient if you used sendWithAsyncReply on the sending side. > Source/WebKit/WebProcess/WebPage/WebPage.messages.in:554 > + DidResumeSpeaking() I don’t think you need these ‘DidXZ’ implementations if they are the return call of an sendWithAsyncReply.

Sam Weinig

Comment 10 2019-03-12 22:16:57 PDT

Comment on attachment 364499 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=364499&action=review >> Source/WebKit/UIProcess/WebPageProxy.messages.in:555 >> + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) Adding more LegacySync messages is really not ideal. Have you considered alternatives? Like sending the voice list over on process initialization?

Per Arne Vollan

Comment 11 2019-03-13 10:12:45 PDT

(In reply to Sam Weinig from comment #10) > Comment on attachment 364499 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=364499&action=review > > >> Source/WebKit/UIProcess/WebPageProxy.messages.in:555 > >> + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync > > > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) > > Adding more LegacySync messages is really not ideal. Have you considered > alternatives? Like sending the voice list over on process initialization? This is definitely a good idea, although I am a little concerned about the performance of process startup when doing this.

Per Arne Vollan

Comment 12 2019-03-13 16:27:00 PDT

(In reply to Per Arne Vollan from comment #11) > (In reply to Sam Weinig from comment #10) > > Comment on attachment 364499 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=364499&action=review > > > > >> Source/WebKit/UIProcess/WebPageProxy.messages.in:555 > > >> + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync > > > > > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) > > > > Adding more LegacySync messages is really not ideal. Have you considered > > alternatives? Like sending the voice list over on process initialization? > > This is definitely a good idea, although I am a little concerned about the > performance of process startup when doing this. It takes about 150ms to get get the voice list the first time, so I don't think it would be good to add this to the startup time of the UI process.

Per Arne Vollan

Comment 13 2019-03-13 19:15:10 PDT

Created attachment 364608 [details] Patch

Per Arne Vollan

Comment 14 2019-03-13 19:21:27 PDT

(In reply to Brent Fulgham from comment #9) > Comment on attachment 364499 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=364499&action=review > > Looks great! I think we could remove some of the boilerplate message passing > code by using the “sendWithAsyncReply” features of the IPC layer. > > > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:125 > > + m_speechSynthesisClient->speak(*utterance.platformUtterance()); > > It seems like both of these ‘speech’ implementations should take an object > (not a pointer), or both should take a pointer, but not this mix. > > > Source/WebCore/Modules/speech/SpeechSynthesis.h:105 > > + SpeechSynthesisClient* m_speechSynthesisClient { nullptr }; > > This should either be a WeakPtr or a RefPtr. We should always be leery of > bare pointers. > > > Source/WebCore/platform/PlatformSpeechSynthesizer.h:43 > > +enum class SpeechBoundary : uint8_t { > > Good! > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8808 > > +void WebPageProxy::speechSynthesisVoiceList(Vector<WebSpeechSynthesisVoice>& result) > > If these were Implemented for sendWithAsyncRepky, this would take a > completion handler you could call when done to get the response back to the > WebProcess. > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8811 > > + m_synthesizer = std::make_unique<PlatformSpeechSynthesizer>(this); > > Maybe this should be a helper “ensureSynthesizer()” that returns a > PlatformSpeechSynthesizer& that you can use elsewhere. > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8832 > > + m_synthesizer->speak(m_utterance.get()); > > You could pass the CompletionHandler to PlatformSpeechSynthesizer::speak > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8837 > > + m_synthesizer->cancel(); > > It seems like m_synthesizer can be nullptr in a lot of cases. Should these > null check? Or at least assert? > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8842 > > + m_synthesizer->pause(); > > Ditto ->pause(WTFMove(completionHandler)) > > > Source/WebKit/UIProcess/WebPageProxy.messages.in:555 > > + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> > voiceList) > I created https://bugs.webkit.org/show_bug.cgi?id=195723 to follow up on this. > > Source/WebKit/UIProcess/WebPageProxy.messages.in:556 > > + SpeechSynthesisSpeak(String text, String lang, float volume, float rate, float pitch, MonotonicTime startTime, String voiceURI, String voiceName, String voiceLang, bool localService, bool defaultVoice) > > If these were Async() you could pass a Lambda on the calling side and it > would automatically be called... > > > Source/WebKit/WebProcess/com.apple.WebProcess.sb.in:-606 > > - (global-name "com.apple.speech.speechsynthesisd") > > Hooray! > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:43 > > + voices.append(WebCore::PlatformSpeechSynthesisVoice::create(voice.voiceURI, voice.name, voice.lang, voice.localService, voice.defaultLang)); > > Might be nicer to WTFMove() the parr’s of voice into the new object, since > we discard it all when we are done. > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:52 > > + m_page.send(Messages::WebPageProxy::SpeechSynthesisSpeak(utterance.text(), utterance.lang(), utterance.volume(), utterance.rate(), utterance.pitch(), utterance.startTime(), voice->voiceURI(), voice->name(), voice->lang(), voice->localService(), voice->isDefault())); > > If you used sendWithAsyncReply you could pass a Lambda to be called when the > speak command completed and avoid writing a “didSpeak” response handler. > > > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6406 > > + corePage()->speechSynthesisClient()->observer()->didStartSpeaking(); > > These could be done as Lambdas in the speechSynthesisClient if you used > sendWithAsyncReply on the sending side. > > > Source/WebKit/WebProcess/WebPage/WebPage.messages.in:554 > > + DidResumeSpeaking() > > I don’t think you need these ‘DidXZ’ implementations if they are the return > call of an sendWithAsyncReply. Thanks for reviewing! I have updated the patch.

Per Arne Vollan

Comment 15 2019-03-13 20:46:11 PDT

Created attachment 364622 [details] Patch

Brent Fulgham

Comment 16 2019-03-13 22:48:03 PDT

Comment on attachment 364622 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=364622&action=review I think this is close, but we could do better by removing more boilerplate method implementations. This might mean touching other platform-specific files to thread the completion handler brought. > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:75 > if (m_voiceList.size()) Nit: I really prefer !m_voiceList.emoty() (or isEmoty?) > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:84 > + m_platformSpeechSynthesizer = std::make_unique<PlatformSpeechSynthesizer>(this); You repeat this pattern in a few places. I suggest making a “ensureSpeechSynthesizer” method that does this and returns a reference. > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:88 > + m_voiceList.append(SpeechSynthesisVoice::create(*voice)); It bugs me that these two m_voiceList loops exist.! Can they be shared somehow? Seems like we could just use a reference to the voiceList from either source. > Source/WebCore/Modules/speech/SpeechSynthesis.h:45 > + static Ref<SpeechSynthesis> create(WeakPtr<SpeechSynthesisClient>); Do we ever create a SpeechSynthesis object without a Client? I think we could pass a reference, and do the makeWeakPtr inside the constructor. > Source/WebKit/UIProcess/WebPageProxy.cpp:8839 > + m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler); It seems like we should be able to pass the completion handler to “speak”, below. The implementation of speak would take a completion handler as the final argument, and call it when complete (rather then calling “didFinishSpeaking” (Ditto for the other methods). It might not be necessary to have the SpeechSynhesisData object at all, and just thread the state through the call stack. This also seems to be the only place where the Utterance is used. I wonder if we need to remember it elsewhere. Can we just WTFMove it into the “speak” method? > Source/WebKit/UIProcess/WebPageProxy.h:2465 > + CompletionHandler<void()> speakingResumedCompletionHandler; You might not need to store these three handlers, and perhaps not the utterance. > Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:243 > +void WebPageProxy::didStartSpeaking(WebCore::PlatformSpeechSynthesisUtterance&) I can tell where these get called, I guess the SpeechSynthesis object calls them at some point? I think these could be discarded if you just passed the completion handler into “speak”, “cancel”, etc., and invoked the handler instead of calling this method. > Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:248 > +{ Do we need to remember the utterance at all, or even pass the argument? We never seem to use it,,, > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:75 > + m_page.corePage()->speechSynthesisClient()->observer()->didPauseSpeaking(); This looks like weird spacing (missing indent)

Per Arne Vollan

Comment 17 2019-03-14 16:02:08 PDT

(In reply to Brent Fulgham from comment #16) > Comment on attachment 364622 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=364622&action=review > > I think this is close, but we could do better by removing more boilerplate > method implementations. This might mean touching other platform-specific > files to thread the completion handler brought. > > > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:75 > > if (m_voiceList.size()) > > Nit: I really prefer !m_voiceList.emoty() (or isEmoty?) > > > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:84 > > + m_platformSpeechSynthesizer = std::make_unique<PlatformSpeechSynthesizer>(this); > > You repeat this pattern in a few places. I suggest making a > “ensureSpeechSynthesizer” method that does this and returns a reference. > > > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:88 > > + m_voiceList.append(SpeechSynthesisVoice::create(*voice)); > > It bugs me that these two m_voiceList loops exist.! Can they be shared > somehow? Seems like we could just use a reference to the voiceList from > either source. > > > Source/WebCore/Modules/speech/SpeechSynthesis.h:45 > > + static Ref<SpeechSynthesis> create(WeakPtr<SpeechSynthesisClient>); > > Do we ever create a SpeechSynthesis object without a Client? I think we > could pass a reference, and do the makeWeakPtr inside the constructor. > Yes, for WebKitLegacy there will be no client. > > Source/WebKit/UIProcess/WebPageProxy.cpp:8839 > > + m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler); > > It seems like we should be able to pass the completion handler to “speak”, > below. The implementation of speak would take a completion handler as the > final argument, and call it when complete (rather then calling > “didFinishSpeaking” (Ditto for the other methods). > I think that's an excellent approach. Since this means changing the existing PlatformSpeechSynthesizer, which will make the scope of the patch bigger, do you think we could do this in a follow-up patch? > It might not be necessary to have the SpeechSynhesisData object at all, and > just thread the state through the call stack. > > This also seems to be the only place where the Utterance is used. I wonder > if we need to remember it elsewhere. Can we just WTFMove it into the “speak” > method? > > > Source/WebKit/UIProcess/WebPageProxy.h:2465 > > + CompletionHandler<void()> speakingResumedCompletionHandler; > > You might not need to store these three handlers, and perhaps not the > utterance. > > > Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:243 > > +void WebPageProxy::didStartSpeaking(WebCore::PlatformSpeechSynthesisUtterance&) > > I can tell where these get called, I guess the SpeechSynthesis object calls > them at some point? > > I think these could be discarded if you just passed the completion handler > into “speak”, “cancel”, etc., and invoked the handler instead of calling > this method. > > > Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:248 > > +{ > > Do we need to remember the utterance at all, or even pass the argument? We > never seem to use it,,, > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:75 > > + m_page.corePage()->speechSynthesisClient()->observer()->didPauseSpeaking(); > > This looks like weird spacing (missing indent) Thanks for reviewing! I will update the patch.

Per Arne Vollan

Comment 18 2019-03-14 16:54:27 PDT

Created attachment 364719 [details] Patch

Brent Fulgham

Comment 19 2019-03-14 18:33:50 PDT

Comment on attachment 364622 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=364622&action=review >>> Source/WebKit/UIProcess/WebPageProxy.cpp:8839 >>> + m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler); >> >> It seems like we should be able to pass the completion handler to “speak”, below. The implementation of speak would take a completion handler as the final argument, and call it when complete (rather then calling “didFinishSpeaking” (Ditto for the other methods). >> >> It might not be necessary to have the SpeechSynhesisData object at all, and just thread the state through the call stack. >> >> This also seems to be the only place where the Utterance is used. I wonder if we need to remember it elsewhere. Can we just WTFMove it into the “speak” method? > > I think that's an excellent approach. Since this means changing the existing PlatformSpeechSynthesizer, which will make the scope of the patch bigger, do you think we could do this in a follow-up patch? Sure!

Brent Fulgham

Comment 20 2019-03-15 10:39:18 PDT

Comment on attachment 364719 [details] Patch r=me.

Per Arne Vollan

Comment 21 2019-03-15 10:52:39 PDT

Comment on attachment 364719 [details] Patch Thanks for reviewing!

WebKit Commit Bot

Comment 22 2019-03-15 11:18:42 PDT

Comment on attachment 364719 [details] Patch Clearing flags on attachment: 364719 Committed r243002: <https://trac.webkit.org/changeset/243002>

WebKit Commit Bot

Comment 23 2019-03-15 11:18:45 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebKit Misc.

Assignee

Per Arne Vollan

Reported

2019-03-12 15:21 PDT

Modified

2019-03-15 11:18 PDT History

CC List

6 users Show

URL

Keywords InRadar

Depends on

Blocks