Bug 231105 - AudioContext stops playing when minimizing or moving the macOS Safari window to the background.
Summary: AudioContext stops playing when minimizing or moving the macOS Safari window ...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Audio (show other bugs)
Version: Safari 15
Hardware: Mac (Apple Silicon) macOS 11
: P2 Major
Assignee: youenn fablet
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-10-01 15:02 PDT by Harshit Oberoi
Modified: 2021-12-17 08:27 PST (History)
16 users (show)

See Also:


Attachments
Patch (6.37 KB, patch)
2021-10-18 06:42 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Patch (6.44 KB, patch)
2021-10-20 01:07 PDT, youenn fablet
youennf: review?
Details | Formatted Diff | Diff
Check if the destination is not connected (5.20 KB, patch)
2021-12-10 05:52 PST, Alex Chronopoulos
no flags Details | Formatted Diff | Diff
Combined diff to run the tests (9.11 KB, patch)
2021-12-17 07:35 PST, Alex Chronopoulos
ews-feeder: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Harshit Oberoi 2021-10-01 15:02:34 PDT
AudioContext stops playing when minimizing or moving the macOS Safari window to the background.

Problem:

In the WebRTC-based web applications using the AudioContext API, audio stops playing when you minimize or move the macOS Safari window to the background.


Steps to Reproduce

1. Open any HTTPS website in Safari 14 or 15. In the following example, you will use the MediaDevices API to get your microphone's media stream.
2. Open the console in Web Inspector.
3. Run the following code to bind your first audio input to an audio element via two AudioNodes.
 (async function () {
      // Get your first audio input
      const devices = await navigator.mediaDevices.enumerateDevices({ audio: true });
      const deviceId = devices[0].deviceId;
      const deviceStream = await navigator.mediaDevices.getUserMedia({
        audio: { deviceId: { exact: deviceId } },
      });
      
      // Your audio input => MediaStreamAudioSourceNode => MediaStreamAudioDestinationNode
      const audioContext = new (window.AudioContext || window.webkitAudioContext)();
      const sourceNode = audioContext.createMediaStreamSource(deviceStream);
      const destinationNode = audioContext.createMediaStreamDestination();
      sourceNode.connect(destinationNode);
    
      // MediaStreamAudioDestinationNode => HTMLAudioElement
      const audio = document.createElement('audio');
      document.body.appendChild(audio);
      audio.srcObject = destinationNode.stream;
      audio.play();
    })();
4. Speak into your microphone and ensure that you can hear your voice from the system's speaker.
5. Minimize the Safari window (⌘ Cmd + M). Or you can activate full-screen mode (⌃ Ctrl + ⌘ Cmd + F) and switch to another window.
6. Speak into your microphone and test if you can hear your voice from the system’s speaker.



Actual Results:

The audio stops as soon as you minimize or move the Safari window to the background.


Expected Results:

The audio should not stop and play continuously. 


Build: 

* macOS Safari Version 15.0 (16612.1.29.41.4, 16612)
* macOS Safari Version 14.1.2 (16611.3.10.1.3)



Additional Information:

Behavior without AudioContext

The following script binds your first audio input with an audio element without using the AudioContext API.  The audio continuously plays when you minimize the Safari window.

(async function () {
  const devices = await navigator.mediaDevices.enumerateDevices({ audio: true });
  const deviceId = devices[0].deviceId;

  const audio = document.createElement('audio');
  document.body.appendChild(audio);
  audio.srcObject = await navigator.mediaDevices.getUserMedia({
    audio: { deviceId: { exact: deviceId } },
  });
  audio.play();
})();

Behavior In Other Browsers:

When running the reproduction steps in Chrome and Firefox, audio continuously plays when you minimize or move the Safari window to the background.

*Chrome Version*: *94.0.4606.61*
*FireFox Version*: *91.1.Oser*
Comment 1 Chris Dumez 2021-10-01 15:15:13 PDT
@Jer / Eric: Is this intentional?

I have noticed this in the past and also found this behavior annoying.
Comment 2 Chris Dumez 2021-10-04 09:14:55 PDT
Seems to be reproducible with this demo site as well: https://downloads.scirra.com/labs/bugs/safaripannerquality/
Comment 3 Chris Dumez 2021-10-04 09:33:30 PDT
(In reply to Chris Dumez from comment #2)
> Seems to be reproducible with this demo site as well:
> https://downloads.scirra.com/labs/bugs/safaripannerquality/

Never mind, that one also happens in Chrome so it may be the site doing it.
Comment 4 Chris Dumez 2021-10-04 09:58:59 PDT
The provided JS gives out an error on my machine after I allow getUserMedia:
< Promise {status: "pending"}
[Error] Failed to create MediaStream audio source: No CoreAudioCaptureSource device
[Error] Unhandled Promise Rejection: NotReadableError: The I/O read operation failed.
	Console Evaluation (Console Evaluation 3:7)
	asyncFunctionResume
	(anonymous function)
	promiseReactionJobWithoutPromise
	promiseReactionJob

Not sure why. I am still trying to find a repro case.
Comment 5 Radar WebKit Bug Importer 2021-10-05 09:28:20 PDT
<rdar://problem/83889697>
Comment 6 youenn fablet 2021-10-18 06:16:20 PDT
I was able to reproduce.
My understanding is that AudioContext gets interrupted/uninterrupted in case of getting backgrounded.
HTMLMediaElement::shouldOverrideBackgroundPlaybackRestriction has a special behavior to override background interruption in some cases, like when playing an audio track.

We could apply the same rule for AudioContext given there is no way the realtime data could be played again when resuming.

We have a somewhat inconsistent behavior as a track created from AudioContext will allow a media element to keep playing, but the AudioContext being paused, it would play silence.

I would tend to allow AudioContext to override interruptions in case the page is capturing as AudioContext could be used to improve the audio data sent to other participants.
Comment 7 youenn fablet 2021-10-18 06:42:56 PDT
Created attachment 441596 [details]
Patch
Comment 8 youenn fablet 2021-10-20 01:07:53 PDT
Created attachment 441856 [details]
Patch
Comment 9 Jer Noble 2021-10-20 11:30:22 PDT
I think this approach is fine, but I'd tailor it a bit.

I'd put the background playback exemption like this:

If the AudioContext has a MediaStreamAudioDestinationNode, and the destinationNode is connected to a mediaElement, and the AudioContext's default destination node has no connected nodes, then the AudioContext should be allowed to play in the background.

Ideally, sites would be able to create a "processing only AudioContext" that would be completely exempt from these background restrictions. We should try to float that idea with the Audio WG.
Comment 10 Jer Noble 2021-10-20 11:33:43 PDT
(In reply to Jer Noble from comment #9)
> I'd put the background playback exemption like this:
> 
> If the AudioContext has a MediaStreamAudioDestinationNode, and the
> destinationNode is connected to a mediaElement, and the AudioContext's
> default destination node has no connected nodes, then the AudioContext
> should be allowed to play in the background.

Actually thinking more about this, it should be enough to allow an exemption when a context has a MediaStreamAudioDestinationNode and no nodes connected to it's default destination node.
Comment 11 youenn fablet 2021-10-21 01:14:42 PDT
(In reply to Jer Noble from comment #9)
> I think this approach is fine, but I'd tailor it a bit.
> 
> I'd put the background playback exemption like this:
> 
> If the AudioContext has a MediaStreamAudioDestinationNode, and the
> destinationNode is connected to a mediaElement, and the AudioContext's
> default destination node has no connected nodes, then the AudioContext
> should be allowed to play in the background.
> 
> Ideally, sites would be able to create a "processing only AudioContext" that
> would be completely exempt from these background restrictions. We should try
> to float that idea with the Audio WG.

I filed https://github.com/WebAudio/web-audio-api/issues/1551 some time ago.
I guess we can try to resurrect it if web audio is still active.

> > If the AudioContext has a MediaStreamAudioDestinationNode, and the
> > destinationNode is connected to a mediaElement, and the AudioContext's
> > default destination node has no connected nodes, then the AudioContext
> > should be allowed to play in the background.
> 
> Actually thinking more about this, it should be enough to allow an exemption
> when a context has a MediaStreamAudioDestinationNode and no nodes connected
> to it's default destination node.

I am not sure what this heuristic is trying to achieve, can you precise it?
Is it legacy content that might continue to play while it should not?
Given this is restricted to media capture being on, I am not sure this is a real issue.

Web Audio can be used for speech recognition in VC to spend text transcript to remote participants. In that case, there is no MediaStreamAudioDestinationNode, no destination. It would be sad to continue sending audio but not text when backgrounded.

Web Audio can also be used to do spatial effects on multiple RTCPeerConnection tracks. It makes sense to continue audio rendering in that case, just like what happens for regular MediaStreamTrack rendering through media elements.
In that case, destination is connected but MediaStreamAudioDestinationNode is not needed: going through a MediaStreamAudioDestinationNode adds quite a bit of extra work by UA plus additional buffering/latency.
Comment 12 Jer Noble 2021-10-21 12:03:05 PDT
(In reply to youenn fablet from comment #11)
> (In reply to Jer Noble from comment #9)
> > I think this approach is fine, but I'd tailor it a bit.
> > 
> > I'd put the background playback exemption like this:
> > 
> > If the AudioContext has a MediaStreamAudioDestinationNode, and the
> > destinationNode is connected to a mediaElement, and the AudioContext's
> > default destination node has no connected nodes, then the AudioContext
> > should be allowed to play in the background.
> > 
> > Ideally, sites would be able to create a "processing only AudioContext" that
> > would be completely exempt from these background restrictions. We should try
> > to float that idea with the Audio WG.
> 
> I filed https://github.com/WebAudio/web-audio-api/issues/1551 some time ago.
> I guess we can try to resurrect it if web audio is still active.

Yeah, I don't understand Paul's comment w.r.t. the Autoplay API; that API is simply about detection. I would suggest we re-open or file a new bug.

> > > If the AudioContext has a MediaStreamAudioDestinationNode, and the
> > > destinationNode is connected to a mediaElement, and the AudioContext's
> > > default destination node has no connected nodes, then the AudioContext
> > > should be allowed to play in the background.
> > 
> > Actually thinking more about this, it should be enough to allow an exemption
> > when a context has a MediaStreamAudioDestinationNode and no nodes connected
> > to it's default destination node.
> 
> I am not sure what this heuristic is trying to achieve, can you precise it?
> Is it legacy content that might continue to play while it should not?
> Given this is restricted to media capture being on, I am not sure this is a
> real issue.

An AudioContext which is not outputting to the default destinationNode isn't generating audible sound, and thus shouldn't (hypothetically speaking) be affected by our autoplay or rules. (This is tricky of course, since the node graph can be changed at any time by script, but in effect, this is no different than allowing silent <video> elements to play and "interrupting" them once they are unmuted/have audio.)

> Web Audio can be used for speech recognition in VC to spend text transcript
> to remote participants. In that case, there is no
> MediaStreamAudioDestinationNode, no destination. It would be sad to continue
> sending audio but not text when backgrounded.

Okay, then maybe the heuristic should be whether _any_ nodes are connected to the destinationNode. That would cover this case and the originator's.

> Web Audio can also be used to do spatial effects on multiple
> RTCPeerConnection tracks. It makes sense to continue audio rendering in that
> case, just like what happens for regular MediaStreamTrack rendering through
> media elements.
> In that case, destination is connected but MediaStreamAudioDestinationNode
> is not needed: going through a MediaStreamAudioDestinationNode adds quite a
> bit of extra work by UA plus additional buffering/latency.

IMO, this is scope creep. Poking holes in our autoplay rules whenever there is capture happening is not good design and it's not sustainable. If we want WebAudio to behave like an <audio> element, that needs to be a deliberate choice and preferably be backed by API.
Comment 13 youenn fablet 2021-10-21 12:58:28 PDT
> IMO, this is scope creep. Poking holes in our autoplay rules whenever there
> is capture happening is not good design and it's not sustainable. If we want
> WebAudio to behave like an <audio> element, that needs to be a deliberate
> choice and preferably be backed by API.

I am not sure what you mean by 'that needs to be a deliberate choice and preferably be backed by API.'. This seems to indicate the proposed change of behavior would break existing web sites, which I doubt.

Aligning Web Audio and media element heuristics seems good to me, they are both audio rendering API (although they may do more).
I do not see sufficient differences between the two APIs that would warrant one to continue generating audio in the background while disallowing the other (except the break existing content argument).

Your idea to align no-destination WebAudio behavior with muted media element goes int the same direction: apply the same heuristics to both APIs, I'll take a stab at it.

One thing that I see as different is the audio category (ambient vs. media playback) but in the case we are talking, the audio category will be play and record whatever the use of audio rendering API.

In the future, it might be good to be able to set the audio category by the web page (we are having the issue that audio rendering starts with category set to media playback and switches later on to play and record). If we do that, it seems to me Web Audio and audio element might become even closer.
Comment 14 Jer Noble 2021-10-22 22:15:07 PDT
(In reply to youenn fablet from comment #13)
> > IMO, this is scope creep. Poking holes in our autoplay rules whenever there
> > is capture happening is not good design and it's not sustainable. If we want
> > WebAudio to behave like an <audio> element, that needs to be a deliberate
> > choice and preferably be backed by API.
> 
> I am not sure what you mean by 'that needs to be a deliberate choice and
> preferably be backed by API.'. This seems to indicate the proposed change of
> behavior would break existing web sites, which I doubt.

I'm not really worried about breaking existing sites; I'm worried about unexpected audio playback and violated user expectations.

> One thing that I see as different is the audio category (ambient vs. media
> playback) but in the case we are talking, the audio category will be play
> and record whatever the use of audio rendering API.

Currently that's true, only because we currently use a global AVAudioSession for all generated audio, which itself causes a ton of problems, including system audio being interrupted by silent <video> playback. If we switch to a model which uses multiple AVAudioSessions, then WebAudio won't get a "free ride" on the PlayAndRecord behavior, and will be interrupted upon backgrounding.

> Your idea to align no-destination WebAudio behavior with muted media element
> goes int the same direction: apply the same heuristics to both APIs, I'll
> take a stab at it.

When you're experimenting here, perhaps you could experiment with changing the category of a WebAudio context that has an MediaElementSourceNode; perhaps such an AudioContext should be treated exactly the same as the media element that's providing the audio.

> In the future, it might be good to be able to set the audio category by the
> web page (we are having the issue that audio rendering starts with category
> set to media playback and switches later on to play and record). If we do
> that, it seems to me Web Audio and audio element might become even closer.

Yes, this was the idea behind the Audio Focus API; that the page could request a certain "mode" of audio playback. If we implemented such a web API, it might make sense to override and unify all the audio generating APIs into a single mode. And that's what I meant by "backed by API". Pages generally know what they're doing better than we do, and an explicit signal (e.g., "everything on the page is doing video conferencing") would let us make good decisions without risking unexpected behavior.
Comment 15 Alex Chronopoulos 2021-11-11 02:25:49 PST
Writing a quick comment to mention that the same problem appears when we switch tab. As a side effect, when we return back to the first tab the audio is distorted. The following fiddle exposes that behavior. There are steps in it.

https://jsfiddle.net/achronop/f8ujeom4/
Comment 16 changchc 2021-11-12 07:56:21 PST
Hi Youenn, 

I was wondering when the patch would be released?

Thanks,
Comment 17 mattwindwer 2021-11-14 10:54:17 PST
We noticed starting iOS 15 with web audio in Safari the audio stops playing after the device is locked, or in some cases when Safari is backgrounded and resumed or an incoming call/notification comes in. Sometimes manually resuming the audio context fixes it, but not always. Is this the bug that is filed for that issue?
Comment 18 Ben 2021-11-29 14:06:17 PST
I am seeing the same issue on iOS 15.1.1 as Matt Windwer (Safari and Chrome).

WebAudio destination is either killed or muted when the screen falls to background. 

I attempted to output webAudio to a HTML audio element using MediaStreamDesinationNode with the hope that this would resolve the bug by using the audio element as the destination. Unfortunately this bug seems to kill off the WebAudio part of the pipeline regardless of the fact that it is ultimately being served through audio element/tag. 

Interestingly as soon as the screen is woken and unlocked; audio resumes immediately which suggests that this bug isn't killing off the entire process. 

Regarding your discussions above, please also consider audio only and receive only applications through WebRTC. I am effectively using WebRTC for live radio/communications application where WebAudio features (volume/pan) and background audio is essential and intended. 

Thanks.
Comment 19 Alex Chronopoulos 2021-12-10 05:52:04 PST
Created attachment 446715 [details]
Check if the destination is not connected

This is a patch on the top of the existing one that addresses the case that AudioContext is running on the background only when destination is not used. I think this is the most open approach in terms of giving the opportunity to many use cases to take advantage of it. However, I know this is one of the cases you are discussing in this bug. I was hoping that we can restart that conversation and the effort to address this issue. It is blocking a very important use case and we receive many complaints about it. By the way, if you have decided on the approach I would be happy to work on that direction.
Comment 20 Alex Chronopoulos 2021-12-17 07:35:53 PST
Created attachment 447455 [details]
Combined diff to run the tests

The patch I added above is incremental on the top of youennf's one. Combined both in this diff to execute the tests.