WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
195645
[macOS] Broker access to Speech Synthesis
https://bugs.webkit.org/show_bug.cgi?id=195645
Summary
[macOS] Broker access to Speech Synthesis
Per Arne Vollan
Reported
2019-03-12 15:21:58 PDT
Speech synthesis should be performed in the UI process.
Attachments
Patch
(52.20 KB, patch)
2019-03-12 15:53 PDT
,
Per Arne Vollan
no flags
Details
Formatted Diff
Diff
Patch
(52.46 KB, patch)
2019-03-12 16:32 PDT
,
Per Arne Vollan
no flags
Details
Formatted Diff
Diff
Patch
(55.24 KB, patch)
2019-03-12 16:50 PDT
,
Per Arne Vollan
no flags
Details
Formatted Diff
Diff
Patch
(55.92 KB, patch)
2019-03-12 17:14 PDT
,
Per Arne Vollan
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from ews106 for mac-highsierra-wk2
(3.00 MB, application/zip)
2019-03-12 18:43 PDT
,
EWS Watchlist
no flags
Details
Patch
(56.09 KB, patch)
2019-03-12 20:00 PDT
,
Per Arne Vollan
no flags
Details
Formatted Diff
Diff
Patch
(58.22 KB, patch)
2019-03-13 19:15 PDT
,
Per Arne Vollan
no flags
Details
Formatted Diff
Diff
Patch
(58.75 KB, patch)
2019-03-13 20:46 PDT
,
Per Arne Vollan
no flags
Details
Formatted Diff
Diff
Patch
(59.01 KB, patch)
2019-03-14 16:54 PDT
,
Per Arne Vollan
no flags
Details
Formatted Diff
Diff
Show Obsolete
(8)
View All
Add attachment
proposed patch, testcase, etc.
Per Arne Vollan
Comment 1
2019-03-12 15:22:20 PDT
rdar://problem/35369026
Per Arne Vollan
Comment 2
2019-03-12 15:53:22 PDT
Created
attachment 364464
[details]
Patch
Per Arne Vollan
Comment 3
2019-03-12 16:32:40 PDT
Created
attachment 364469
[details]
Patch
Per Arne Vollan
Comment 4
2019-03-12 16:50:33 PDT
Created
attachment 364472
[details]
Patch
Per Arne Vollan
Comment 5
2019-03-12 17:14:41 PDT
Created
attachment 364479
[details]
Patch
EWS Watchlist
Comment 6
2019-03-12 18:43:34 PDT
Comment on
attachment 364479
[details]
Patch
Attachment 364479
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
https://webkit-queues.webkit.org/results/11483397
New failing tests: fast/speechsynthesis/speech-synthesis-elapsed-time.html fast/speechsynthesis/speech-synthesis-boundary-events.html fast/speechsynthesis/speech-synthesis-gc-utterance-crash.html fast/speechsynthesis/speech-synthesis-voices.html fast/speechsynthesis/speech-synthesis-cancel-crash.html fast/speechsynthesis/speech-synthesis-speak.html fast/speechsynthesis/speech-synthesis-pause-resume.html
EWS Watchlist
Comment 7
2019-03-12 18:43:35 PDT
Created
attachment 364492
[details]
Archive of layout-test-results from ews106 for mac-highsierra-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-highsierra-wk2 Platform: Mac OS X 10.13.6
Per Arne Vollan
Comment 8
2019-03-12 20:00:11 PDT
Created
attachment 364499
[details]
Patch
Brent Fulgham
Comment 9
2019-03-12 20:40:25 PDT
Comment on
attachment 364499
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=364499&action=review
Looks great! I think we could remove some of the boilerplate message passing code by using the “sendWithAsyncReply” features of the IPC layer.
> Source/WebCore/Modules/speech/SpeechSynthesis.cpp:125 > + m_speechSynthesisClient->speak(*utterance.platformUtterance());
It seems like both of these ‘speech’ implementations should take an object (not a pointer), or both should take a pointer, but not this mix.
> Source/WebCore/Modules/speech/SpeechSynthesis.h:105 > + SpeechSynthesisClient* m_speechSynthesisClient { nullptr };
This should either be a WeakPtr or a RefPtr. We should always be leery of bare pointers.
> Source/WebCore/platform/PlatformSpeechSynthesizer.h:43 > +enum class SpeechBoundary : uint8_t {
Good!
> Source/WebKit/UIProcess/WebPageProxy.cpp:8808 > +void WebPageProxy::speechSynthesisVoiceList(Vector<WebSpeechSynthesisVoice>& result)
If these were Implemented for sendWithAsyncRepky, this would take a completion handler you could call when done to get the response back to the WebProcess.
> Source/WebKit/UIProcess/WebPageProxy.cpp:8811 > + m_synthesizer = std::make_unique<PlatformSpeechSynthesizer>(this);
Maybe this should be a helper “ensureSynthesizer()” that returns a PlatformSpeechSynthesizer& that you can use elsewhere.
> Source/WebKit/UIProcess/WebPageProxy.cpp:8832 > + m_synthesizer->speak(m_utterance.get());
You could pass the CompletionHandler to PlatformSpeechSynthesizer::speak
> Source/WebKit/UIProcess/WebPageProxy.cpp:8837 > + m_synthesizer->cancel();
It seems like m_synthesizer can be nullptr in a lot of cases. Should these null check? Or at least assert?
> Source/WebKit/UIProcess/WebPageProxy.cpp:8842 > + m_synthesizer->pause();
Ditto ->pause(WTFMove(completionHandler))
> Source/WebKit/UIProcess/WebPageProxy.messages.in:555 > + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync
I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList)
> Source/WebKit/UIProcess/WebPageProxy.messages.in:556 > + SpeechSynthesisSpeak(String text, String lang, float volume, float rate, float pitch, MonotonicTime startTime, String voiceURI, String voiceName, String voiceLang, bool localService, bool defaultVoice)
If these were Async() you could pass a Lambda on the calling side and it would automatically be called...
> Source/WebKit/WebProcess/com.apple.WebProcess.sb.in:-606 > - (global-name "com.apple.speech.speechsynthesisd")
Hooray!
> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:43 > + voices.append(WebCore::PlatformSpeechSynthesisVoice::create(voice.voiceURI, voice.name, voice.lang, voice.localService, voice.defaultLang));
Might be nicer to WTFMove() the parr’s of voice into the new object, since we discard it all when we are done.
> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:52 > + m_page.send(Messages::WebPageProxy::SpeechSynthesisSpeak(utterance.text(), utterance.lang(), utterance.volume(), utterance.rate(), utterance.pitch(), utterance.startTime(), voice->voiceURI(), voice->name(), voice->lang(), voice->localService(), voice->isDefault()));
If you used sendWithAsyncReply you could pass a Lambda to be called when the speak command completed and avoid writing a “didSpeak” response handler.
> Source/WebKit/WebProcess/WebPage/WebPage.cpp:6406 > + corePage()->speechSynthesisClient()->observer()->didStartSpeaking();
These could be done as Lambdas in the speechSynthesisClient if you used sendWithAsyncReply on the sending side.
> Source/WebKit/WebProcess/WebPage/WebPage.messages.in:554 > + DidResumeSpeaking()
I don’t think you need these ‘DidXZ’ implementations if they are the return call of an sendWithAsyncReply.
Sam Weinig
Comment 10
2019-03-12 22:16:57 PDT
Comment on
attachment 364499
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=364499&action=review
>> Source/WebKit/UIProcess/WebPageProxy.messages.in:555 >> + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList)
Adding more LegacySync messages is really not ideal. Have you considered alternatives? Like sending the voice list over on process initialization?
Per Arne Vollan
Comment 11
2019-03-13 10:12:45 PDT
(In reply to Sam Weinig from
comment #10
)
> Comment on
attachment 364499
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=364499&action=review
> > >> Source/WebKit/UIProcess/WebPageProxy.messages.in:555 > >> + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync > > > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) > > Adding more LegacySync messages is really not ideal. Have you considered > alternatives? Like sending the voice list over on process initialization?
This is definitely a good idea, although I am a little concerned about the performance of process startup when doing this.
Per Arne Vollan
Comment 12
2019-03-13 16:27:00 PDT
(In reply to Per Arne Vollan from
comment #11
)
> (In reply to Sam Weinig from
comment #10
) > > Comment on
attachment 364499
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=364499&action=review
> > > > >> Source/WebKit/UIProcess/WebPageProxy.messages.in:555 > > >> + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync > > > > > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) > > > > Adding more LegacySync messages is really not ideal. Have you considered > > alternatives? Like sending the voice list over on process initialization? > > This is definitely a good idea, although I am a little concerned about the > performance of process startup when doing this.
It takes about 150ms to get get the voice list the first time, so I don't think it would be good to add this to the startup time of the UI process.
Per Arne Vollan
Comment 13
2019-03-13 19:15:10 PDT
Created
attachment 364608
[details]
Patch
Per Arne Vollan
Comment 14
2019-03-13 19:21:27 PDT
(In reply to Brent Fulgham from
comment #9
)
> Comment on
attachment 364499
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=364499&action=review
> > Looks great! I think we could remove some of the boilerplate message passing > code by using the “sendWithAsyncReply” features of the IPC layer. > > > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:125 > > + m_speechSynthesisClient->speak(*utterance.platformUtterance()); > > It seems like both of these ‘speech’ implementations should take an object > (not a pointer), or both should take a pointer, but not this mix. > > > Source/WebCore/Modules/speech/SpeechSynthesis.h:105 > > + SpeechSynthesisClient* m_speechSynthesisClient { nullptr }; > > This should either be a WeakPtr or a RefPtr. We should always be leery of > bare pointers. > > > Source/WebCore/platform/PlatformSpeechSynthesizer.h:43 > > +enum class SpeechBoundary : uint8_t { > > Good! > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8808 > > +void WebPageProxy::speechSynthesisVoiceList(Vector<WebSpeechSynthesisVoice>& result) > > If these were Implemented for sendWithAsyncRepky, this would take a > completion handler you could call when done to get the response back to the > WebProcess. > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8811 > > + m_synthesizer = std::make_unique<PlatformSpeechSynthesizer>(this); > > Maybe this should be a helper “ensureSynthesizer()” that returns a > PlatformSpeechSynthesizer& that you can use elsewhere. > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8832 > > + m_synthesizer->speak(m_utterance.get()); > > You could pass the CompletionHandler to PlatformSpeechSynthesizer::speak > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8837 > > + m_synthesizer->cancel(); > > It seems like m_synthesizer can be nullptr in a lot of cases. Should these > null check? Or at least assert? > > > Source/WebKit/UIProcess/WebPageProxy.cpp:8842 > > + m_synthesizer->pause(); > > Ditto ->pause(WTFMove(completionHandler)) > > > Source/WebKit/UIProcess/WebPageProxy.messages.in:555 > > + SpeechSynthesisVoiceList() -> (Vector<WebKit::WebSpeechSynthesisVoice> voiceList) LegacySync > > I think this should be Async (Vector<WebKit::WebSpeechSynthesisVoice> > voiceList) >
I created
https://bugs.webkit.org/show_bug.cgi?id=195723
to follow up on this.
> > Source/WebKit/UIProcess/WebPageProxy.messages.in:556 > > + SpeechSynthesisSpeak(String text, String lang, float volume, float rate, float pitch, MonotonicTime startTime, String voiceURI, String voiceName, String voiceLang, bool localService, bool defaultVoice) > > If these were Async() you could pass a Lambda on the calling side and it > would automatically be called... > > > Source/WebKit/WebProcess/com.apple.WebProcess.sb.in:-606 > > - (global-name "com.apple.speech.speechsynthesisd") > > Hooray! > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:43 > > + voices.append(WebCore::PlatformSpeechSynthesisVoice::create(voice.voiceURI, voice.name, voice.lang, voice.localService, voice.defaultLang)); > > Might be nicer to WTFMove() the parr’s of voice into the new object, since > we discard it all when we are done. > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:52 > > + m_page.send(Messages::WebPageProxy::SpeechSynthesisSpeak(utterance.text(), utterance.lang(), utterance.volume(), utterance.rate(), utterance.pitch(), utterance.startTime(), voice->voiceURI(), voice->name(), voice->lang(), voice->localService(), voice->isDefault())); > > If you used sendWithAsyncReply you could pass a Lambda to be called when the > speak command completed and avoid writing a “didSpeak” response handler. > > > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6406 > > + corePage()->speechSynthesisClient()->observer()->didStartSpeaking(); > > These could be done as Lambdas in the speechSynthesisClient if you used > sendWithAsyncReply on the sending side. > > > Source/WebKit/WebProcess/WebPage/WebPage.messages.in:554 > > + DidResumeSpeaking() > > I don’t think you need these ‘DidXZ’ implementations if they are the return > call of an sendWithAsyncReply.
Thanks for reviewing! I have updated the patch.
Per Arne Vollan
Comment 15
2019-03-13 20:46:11 PDT
Created
attachment 364622
[details]
Patch
Brent Fulgham
Comment 16
2019-03-13 22:48:03 PDT
Comment on
attachment 364622
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=364622&action=review
I think this is close, but we could do better by removing more boilerplate method implementations. This might mean touching other platform-specific files to thread the completion handler brought.
> Source/WebCore/Modules/speech/SpeechSynthesis.cpp:75 > if (m_voiceList.size())
Nit: I really prefer !m_voiceList.emoty() (or isEmoty?)
> Source/WebCore/Modules/speech/SpeechSynthesis.cpp:84 > + m_platformSpeechSynthesizer = std::make_unique<PlatformSpeechSynthesizer>(this);
You repeat this pattern in a few places. I suggest making a “ensureSpeechSynthesizer” method that does this and returns a reference.
> Source/WebCore/Modules/speech/SpeechSynthesis.cpp:88 > + m_voiceList.append(SpeechSynthesisVoice::create(*voice));
It bugs me that these two m_voiceList loops exist.! Can they be shared somehow? Seems like we could just use a reference to the voiceList from either source.
> Source/WebCore/Modules/speech/SpeechSynthesis.h:45 > + static Ref<SpeechSynthesis> create(WeakPtr<SpeechSynthesisClient>);
Do we ever create a SpeechSynthesis object without a Client? I think we could pass a reference, and do the makeWeakPtr inside the constructor.
> Source/WebKit/UIProcess/WebPageProxy.cpp:8839 > + m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler);
It seems like we should be able to pass the completion handler to “speak”, below. The implementation of speak would take a completion handler as the final argument, and call it when complete (rather then calling “didFinishSpeaking” (Ditto for the other methods). It might not be necessary to have the SpeechSynhesisData object at all, and just thread the state through the call stack. This also seems to be the only place where the Utterance is used. I wonder if we need to remember it elsewhere. Can we just WTFMove it into the “speak” method?
> Source/WebKit/UIProcess/WebPageProxy.h:2465 > + CompletionHandler<void()> speakingResumedCompletionHandler;
You might not need to store these three handlers, and perhaps not the utterance.
> Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:243 > +void WebPageProxy::didStartSpeaking(WebCore::PlatformSpeechSynthesisUtterance&)
I can tell where these get called, I guess the SpeechSynthesis object calls them at some point? I think these could be discarded if you just passed the completion handler into “speak”, “cancel”, etc., and invoked the handler instead of calling this method.
> Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:248 > +{
Do we need to remember the utterance at all, or even pass the argument? We never seem to use it,,,
> Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:75 > + m_page.corePage()->speechSynthesisClient()->observer()->didPauseSpeaking();
This looks like weird spacing (missing indent)
Per Arne Vollan
Comment 17
2019-03-14 16:02:08 PDT
(In reply to Brent Fulgham from
comment #16
)
> Comment on
attachment 364622
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=364622&action=review
> > I think this is close, but we could do better by removing more boilerplate > method implementations. This might mean touching other platform-specific > files to thread the completion handler brought. > > > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:75 > > if (m_voiceList.size()) > > Nit: I really prefer !m_voiceList.emoty() (or isEmoty?) > > > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:84 > > + m_platformSpeechSynthesizer = std::make_unique<PlatformSpeechSynthesizer>(this); > > You repeat this pattern in a few places. I suggest making a > “ensureSpeechSynthesizer” method that does this and returns a reference. > > > Source/WebCore/Modules/speech/SpeechSynthesis.cpp:88 > > + m_voiceList.append(SpeechSynthesisVoice::create(*voice)); > > It bugs me that these two m_voiceList loops exist.! Can they be shared > somehow? Seems like we could just use a reference to the voiceList from > either source. > > > Source/WebCore/Modules/speech/SpeechSynthesis.h:45 > > + static Ref<SpeechSynthesis> create(WeakPtr<SpeechSynthesisClient>); > > Do we ever create a SpeechSynthesis object without a Client? I think we > could pass a reference, and do the makeWeakPtr inside the constructor. >
Yes, for WebKitLegacy there will be no client.
> > Source/WebKit/UIProcess/WebPageProxy.cpp:8839 > > + m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler); > > It seems like we should be able to pass the completion handler to “speak”, > below. The implementation of speak would take a completion handler as the > final argument, and call it when complete (rather then calling > “didFinishSpeaking” (Ditto for the other methods). >
I think that's an excellent approach. Since this means changing the existing PlatformSpeechSynthesizer, which will make the scope of the patch bigger, do you think we could do this in a follow-up patch?
> It might not be necessary to have the SpeechSynhesisData object at all, and > just thread the state through the call stack. > > This also seems to be the only place where the Utterance is used. I wonder > if we need to remember it elsewhere. Can we just WTFMove it into the “speak” > method? > > > Source/WebKit/UIProcess/WebPageProxy.h:2465 > > + CompletionHandler<void()> speakingResumedCompletionHandler; > > You might not need to store these three handlers, and perhaps not the > utterance. > > > Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:243 > > +void WebPageProxy::didStartSpeaking(WebCore::PlatformSpeechSynthesisUtterance&) > > I can tell where these get called, I guess the SpeechSynthesis object calls > them at some point? > > I think these could be discarded if you just passed the completion handler > into “speak”, “cancel”, etc., and invoked the handler instead of calling > this method. > > > Source/WebKit/UIProcess/Cocoa/WebPageProxyCocoa.mm:248 > > +{ > > Do we need to remember the utterance at all, or even pass the argument? We > never seem to use it,,, > > > Source/WebKit/WebProcess/WebCoreSupport/WebSpeechSynthesisClient.cpp:75 > > + m_page.corePage()->speechSynthesisClient()->observer()->didPauseSpeaking(); > > This looks like weird spacing (missing indent)
Thanks for reviewing! I will update the patch.
Per Arne Vollan
Comment 18
2019-03-14 16:54:27 PDT
Created
attachment 364719
[details]
Patch
Brent Fulgham
Comment 19
2019-03-14 18:33:50 PDT
Comment on
attachment 364622
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=364622&action=review
>>> Source/WebKit/UIProcess/WebPageProxy.cpp:8839 >>> + m_speechSynthesisData->speakingFinishedCompletionHandler = WTFMove(completionHandler); >> >> It seems like we should be able to pass the completion handler to “speak”, below. The implementation of speak would take a completion handler as the final argument, and call it when complete (rather then calling “didFinishSpeaking” (Ditto for the other methods). >> >> It might not be necessary to have the SpeechSynhesisData object at all, and just thread the state through the call stack. >> >> This also seems to be the only place where the Utterance is used. I wonder if we need to remember it elsewhere. Can we just WTFMove it into the “speak” method? > > I think that's an excellent approach. Since this means changing the existing PlatformSpeechSynthesizer, which will make the scope of the patch bigger, do you think we could do this in a follow-up patch?
Sure!
Brent Fulgham
Comment 20
2019-03-15 10:39:18 PDT
Comment on
attachment 364719
[details]
Patch r=me.
Per Arne Vollan
Comment 21
2019-03-15 10:52:39 PDT
Comment on
attachment 364719
[details]
Patch Thanks for reviewing!
WebKit Commit Bot
Comment 22
2019-03-15 11:18:42 PDT
Comment on
attachment 364719
[details]
Patch Clearing flags on attachment: 364719 Committed
r243002
: <
https://trac.webkit.org/changeset/243002
>
WebKit Commit Bot
Comment 23
2019-03-15 11:18:45 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug