Bug 218815 - speechSynthesis.speak pauses video elements, mutes getUserMedia streams
Summary: speechSynthesis.speak pauses video elements, mutes getUserMedia streams
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: Safari 13
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: chris fleizach
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-11-11 11:50 PST by Andrew
Modified: 2022-02-10 20:17 PST (History)
8 users (show)

See Also:


Attachments
repro page (10.84 KB, text/html)
2020-11-11 11:50 PST, Andrew
no flags Details
repro page (sans rtc) (8.41 KB, text/html)
2020-11-18 18:19 PST, Andrew
no flags Details
patch (1.63 KB, patch)
2020-11-19 11:32 PST, chris fleizach
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
patch (1.64 KB, patch)
2020-11-19 11:34 PST, chris fleizach
eric.carlson: review+
ews-feeder: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew 2020-11-11 11:50:00 PST
Created attachment 413846 [details]
repro page

On Safari, speechSynthesis.speak invocations interrupt other active media.
To reproduce:
 1) get a MediaStream from getUserMedia, add it to a video element, play the video element
 2) call speechSynthesis.speak(new SpeechSynthesisUtterance('hello')

This interrupts both the video element and media stream indepedently.

 1) (mobile safari) video elements are paused
Calling videoElement.play() resumes playback.
If video playback is resumed before the SpeechSynthesisUtterance ends, the speech is sometimes interrupted (this part is really inconsistent).
It seems like, after resuming playback from a user-gesture handler, future calls to speak no longer pause this video.
If, on the other hand, the video is resumed from a non-user-gesture (e.g. in response to utterance.end event), future calls to speak will re-pause the video.

 2) (mobile safari) audio tracks from getUserMedia are muted (MediaStreamTrack - mute event is raised, muted is true)
The track is never unmuted, and there is no API for user-code to unmute a track. It seems the only option would be to stop the track, and get a new one by getUserMedia.

 3) (desktop safari) audio tracks from getUserMedia are silenced (all samples are 0) while the utterance is playing (mute event is not raised, volume is restored after utterance ends)


This may not be a complete list - I haven't checked how Web Speech interacts with other APIs, e.g. Web Audio.

For Mobile Safari, I tested on iOS 14.1
For Desktop Safari, I tested on Safari 13.1.3 (OSX 10.15.7)
Other browsers (Chrome, Firefox) do not exhibit these issues.

It would be great if Safari behaved the same as Chrome or Firefox.
At the bare minimum, I ask that media streams could be unmuted after speech finishes.

I have attached a page which demonstrates the behavior.
The page includes WebRTC, which serves to demonstrate that only the getUserMedia-created stream is muted.
Comment 1 Andrew 2020-11-11 12:26:36 PST
Tested on Safari 14.0, OSX 10.15.7 - same result, only #3 is an issue
Comment 2 chris fleizach 2020-11-11 16:41:37 PST
not unexpected, but we may be able to do better
Comment 3 Radar WebKit Bug Importer 2020-11-12 20:04:29 PST
<rdar://problem/71355701>
Comment 4 chris fleizach 2020-11-18 16:24:11 PST
Hi, how do I operate this repro page?

I press send video, see my web cam. then I press speak, but I don't see anything else happening. Am I supposed to hear or see a different audio/video track?

my own camera video is not paused while speech synthesizer works
Comment 5 Andrew 2020-11-18 16:41:20 PST
On iPad, the speech pauses the video element and mutes the input stream. There is a button to resume the video element. 

On OS X, the speech only causes the audio input stream to be silent, and only while speech is in progress - you should be able to see that reflected on the volume meter next to the video. 

I can attach a video of it happening if you’d like.
Comment 6 Andrew 2020-11-18 16:48:55 PST
You may need to edit the attachment, it looks like the string I had it speaking (a poop emoji - sorry) got mangled during upload.
Comment 7 chris fleizach 2020-11-18 16:50:54 PST
(In reply to Andrew from comment #6)
> You may need to edit the attachment, it looks like the string I had it
> speaking (a poop emoji - sorry) got mangled during upload.

On my Safari when the remote RTP gives an error

Unhandled Promise Rejection: AbortError: The operation was aborted.

and doesn't display (this works on chrome incidentally)
Comment 8 Andrew 2020-11-18 16:58:24 PST
IIRC Safari is more strict, need to load this kind of page over https:// and not file://

I’m not near a computer, later I will attach a version of the page without RTC.
Comment 9 chris fleizach 2020-11-18 17:05:07 PST
(In reply to Andrew from comment #8)
> IIRC Safari is more strict, need to load this kind of page over https:// and
> not file://
> 
> I’m not near a computer, later I will attach a version of the page without
> RTC.

Thanks a lot!
Comment 10 Andrew 2020-11-18 18:19:39 PST
Created attachment 414526 [details]
repro page (sans rtc)

Revised version of the page:
* no longer non-ascii text
* removed rtc, wasn't required as part of core issue
Comment 11 chris fleizach 2020-11-19 10:01:32 PST
(In reply to Andrew from comment #10)
> Created attachment 414526 [details]
> repro page (sans rtc)
> 
> Revised version of the page:
> * no longer non-ascii text
> * removed rtc, wasn't required as part of core issue

So on the Mac, once I start the Mic and Camera, I see my video and I see the volume bar go up and down as a I speak.

When I play the speech synthesis, and I continue to talk, I still see the bar go up and down.

Should I see the bar at 0 while speech synthesis is playing while I'm talking?

Also I'm testing on 11.1
Comment 12 Andrew 2020-11-19 10:05:05 PST
(In reply to chris fleizach from comment #11)
> Should I see the bar at 0 while speech synthesis is playing while I'm
> talking?

Yes, that is what I see.

(In reply to chris fleizach from comment #11)
> Also I'm testing on 11.1

I currently only have Safari 14.0 (macOS 10.15.7).
Comment 13 chris fleizach 2020-11-19 10:12:29 PST
(In reply to Andrew from comment #12)
> (In reply to chris fleizach from comment #11)
> > Should I see the bar at 0 while speech synthesis is playing while I'm
> > talking?
> 
> Yes, that is what I see.
> 
> (In reply to chris fleizach from comment #11)
> > Also I'm testing on 11.1
> 
> I currently only have Safari 14.0 (macOS 10.15.7).

Interesting -- wonder if this was resolved on 11.0. I don't have an easy way to go back to 10.15 without spending a day blowing away my computers.
Comment 14 Andrew 2020-11-19 10:49:31 PST
(In reply to chris fleizach from comment #13)
> Interesting -- wonder if this was resolved on 11.0. I don't have an easy way
> to go back to 10.15 without spending a day blowing away my computers.

That might be good enough for us on macOS, I will try to sell it.

However, the issue is more severe on mobile Safari. We have a much higher usage on iPad than macOS (issue also exists on iPhone I think, but we don't support that platform).
Additionally, this bug carries over to wkWebView, now that in 14.3 beta added getUserMedia.
Comment 15 chris fleizach 2020-11-19 10:59:52 PST
(In reply to Andrew from comment #14)
> (In reply to chris fleizach from comment #13)
> > Interesting -- wonder if this was resolved on 11.0. I don't have an easy way
> > to go back to 10.15 without spending a day blowing away my computers.
> 
> That might be good enough for us on macOS, I will try to sell it.
> 
> However, the issue is more severe on mobile Safari. We have a much higher
> usage on iPad than macOS (issue also exists on iPhone I think, but we don't
> support that platform).
> Additionally, this bug carries over to wkWebView, now that in 14.3 beta
> added getUserMedia.

got it, looking at that next
Comment 16 chris fleizach 2020-11-19 11:32:29 PST
Created attachment 414606 [details]
patch
Comment 17 chris fleizach 2020-11-19 11:34:08 PST
Created attachment 414607 [details]
patch
Comment 18 chris fleizach 2020-11-30 16:58:27 PST
Comment on attachment 414607 [details]
patch

Thanks for the review. I want to try to add a test for this before merging
Comment 19 Yvonne 2021-01-25 01:06:55 PST
Hi, we are using speechSynthesis in combination with a video element and can confirm issue #1. Could see this issue on iOS 13.7 and 14.

Do you have any ETA on this fix?
Comment 20 Anton Mo Eriksson 2021-01-25 01:19:07 PST
I am also interested in this feature and the status.
Comment 21 chris fleizach 2021-01-25 10:35:14 PST
(In reply to Anton Mo Eriksson from comment #20)
> I am also interested in this feature and the status.

It is fixed in iOS14.4 already. Please give it a test there
Comment 22 Brent Fulgham 2022-02-10 20:17:17 PST
The fix for this issue was needed outside the WebKit project, therefore this is being resolved as 'Moved'.

This should now be fixed in shipping software.