I have a client who has a very specific request for the app that requires two AVPlayers to be in sync. One video is for some content and the other one is for a presenter speaking about the content. Using a AVMutableComposition to combine them into one video is not an option because the presenter video has to be able to respond to user generated events (e.g. they want to have a feature to show/hide the presenter) and I don't believe there is a way to have that kind of control over a specific AVMutableCompositionTrack.
So, I'm left with figuring out how to ensure that two AVPlayers stay in sync and I was wondering if anyone has had experience with this or suggestions for other tools to accomplish this.
Thanks
The following methods are the ones to use
- (void)setRate:(float)rate
time:(CMTime)itemTime
atHostTime:(CMTime)hostClockTime;
- (void)prerollAtRate:(float)rate
completionHandler:(void (^)(BOOL finished))completionHandler;
Caveats
Important This method is not currently supported for HTTP Live
Streaming or when automaticallyWaitsToMinimizeStalling is YES. For
clients linked against iOS 10.0 and later or macOS 10.12 and later,
invoking this method when automaticallyWaitsToMinimizeStalling is YES
will raise an NSInvalidArgument exception.
This is an expected behavior since "live" is "present" and cannot be seek forward and setting the rate to less than 1.0 it will cause to extra buffering the stream (second point is a guess).
Documentation
https://developer.apple.com/documentation/avfoundation/avplayer/1386591-setrate?language=objc
https://developer.apple.com/documentation/avfoundation/avplayer/1389712-prerollatrate?language=objc
As a side note consider that HLS streams are not truly live streams, the "present moment" could vary several seconds among clientes consuming the stream, the opposite of WebRTC for example where the delay between publishers and consumers is kinda of warranted for 1 second max.
Related
I have an application in which there is a set of about 50 sounds, which range in length from about 300 ms to about 4 seconds. Various combinations of sounds need to be played at precise times (up to 10 of them can be triggered at once). Some sounds need to be repeated at intervals as short as 100 ms.
I've implemented this is as a two dimensional array of AVAudioPlayers, all of which are loaded with sounds at application launch. There are several players for each sound, to accommodate rapidly repeating sounds. The players for a particular sound are reused in strict rotation. When a new sound is scheduled, the oldest player for that sound is stopped and its current time is set to 0, so the sound will repeat from the start, the next time it's scheduled using player.play(atTime:). There's a thread that schedules new sets of sounds about 300 ms before they are to be played.
It all works quite nicely, up to a point that varies with the device. Eventually, as sounds are played more rapidly, and/or more simultaneous sounds are scheduled, some sounds will refuse to play.
I'm contemplating switching to AVAudioEngine and AVAudioPlayerNodes, using a mixer node. Does anyone know if that approach is likely to handle more simultaneous sounds? My guess is that both approaches translate into a rather similar set of CoreAudio functions, but I haven't actually written the code to test that hypothesis - before I do that, I'm hoping that someone else may have explored this issue before me. I've been deep into CoreAudio before, and I'm hoping to be able to use these handy high-level functions instead!
Also, does anyone know of a way to trigger a closure when a sound initiates? The documented functionality allows for a callback closure, but the only way I've been able to trigger events when the sounds start, is to create a high quality of service queue for DispatchQueue. Unfortunately, depending on the system load, queued events may be executed at times that vary from the scheduled times by up to about 50 ms, which is not quite as precise as I'd prefer to be.
Using AVAudioEngine with AVAudioPlayersNodes provides much better performance, albeit at the cost of a bit of code complexity. I was able to easily increase the playback rate by a factor of five, with better buffer control.
The main drawback in switching to this approach was that Apple's documentation is less than stellar. A few additions to Apple's documentation would have made this task a LOT easier:
Mixer nodes are documented as being able to convert sample rates and channel counts, so I attempted to configure audioEngine.mainMixerNode to convert mono buffers to the output node's settings. Setting the main mixer node's output to the output node's format appeared to be accepted, but threw opaque errors at run time that complained about channel count mismatches.
It appears that the main mixer node is not actually a fully functional mixer node. To get this to work, I had to insert another mixer node that performed the channel conversion, and connect it to the main mixer node. If Apple's documentation had actually mentioned this, it would have saved me a lot of experimentation.
Also, just scheduling a buffer does not cause anything to play. You need to call play() on the player node before anything will happen. Apple's documentation is confusing here - it says that calling play() with no arguments will cause playback to occur immediately, which wasn't what I wanted. It took some experimentation to determine that play() just tells the player node to wake up, and that scheduled buffers will actually be played at the scheduled time, rather than immediately.
It would have been enormously helpful if Apple had provided more than the auto-generated class documentation. A bit of human-generated documentation would have saved me an awful lot of frustrating experimentation.
Chris Adamson's well-written "Learning Core Audio" was very helpful when I was working with Core Audio - it's a shame that the newer AVAudioEngine functionality isn't documented nearly as well.
I am trying to implement adaptive bit rate with AVPlayer but i don't know how to switch between a low/high stream. I am a bit confused and have few questions:
Is it the sole responsibility of the server to implement HLS on its side OR the client also has to do something about it OR the client handles it automatically?
I am getting the following URLs from server, can someone tell me how to switch between the them based on network speed and what other steps are involved?
{
"VideoStreamUrl": "http://50.7.149.74:1935/pitvlive/aplus3.stream/playlist.m3u8?",
"VideoStreamUrlLow": "http://50.7.149.74:1935/pitvlive/aplus3_240p.stream/playlist.m3u8?",
"VideoStreamUrlHD": null
}
AVPlayer supports HLS natively from the framework so you shouldnt need to do anything to support this.
The framework will automatically switch between low and high streams according to the current available bandwidth, so you dont actually need to pick a stream.
I'm able to mute/unmute midi audio tracks with ease using MusicPlayer's MusicTrackSetProperty(t, kSequenceTrackProperty_MuteStatus...) method. But, I haven't wrapped my wits around how to enable/disable specific midi channels within the track. Is there a mute/unmute or disable/enable property for channels within a track?
Would something like this done on the track level, or should I be manipulating the midi synth audio unit in some fashion?
Creating an endpoint does me no good, because I only get a copy of events sent to the synth, not a callback that I can see for filtering what's going to the synth. So, I'm thinking there's probably something that can be tweaked in the audio unit graph, but what exactly?
Someone might suggest opening the midi file with the kMusicSequenceLoadSMF_ChannelsToTracks flag and then simply unmute the track corresponding to the channel and mute the rest. I tried doing that, but I actually get /less/ tracks when opening the midi file that way then without the kMusicSequenceLoadSMF_ChannelsToTracks flag. Odd. Maybe I should understand why that's the case, huh? Here's what I have for a midi file: 16 tracks each containing 6 channels of midi. Without the kMusicSequenceLoadSMF_ChannelsToTracks, I get 16 tracks, with the kMusicSequenceLoadSMF_ChannelsToTracks flag, 12. Shouldn't it be 16*6 tracks?
Thank your for your help. Best to you. /Jay
You're on the right track. To my knowledge, kMusicSequenceLoadSMF_ChannelsToTracks will coalesce common channels. So if given two tracks containing notes from three channels each, let's say track1 has notes on channels 1,2,and 3. And track2 has notes on channels 3,4,and 5. Then using the kMusicSequenceLoadSMF_ChannelsToTracks flag will coalesce the notes using channel 3 from track1 and track2 to a new track. The total number of tracks would be 5 using that method. That's probably the way to go unless you can prove otherwise. Otherwise, if you really need to pick things apart the endpoint is a valid approach. You just need to send the midi events manually instead of making a connection (pointing a track to a synth). In your callback you are supposed to parse the midi and call MusicDeviceMIDIEvent to trigger the synth directly. You could do your filtering there.
I'm working with a third party API that behaves as follows:
I have to connect to its URL and make my request, which involves POSTing request data;
the remote server then sends back, "chunk" at a time, the corresponding WAV data (which I receive in my NSURLConnectionDataDelegate's didReceiveData callback).
By "chunk" for argument's sake, we mean some arbitrary "next portion" of the data, with no guarantee that it corresponds to any meaningful division of the audio (e.g. it may not be aligned to a specific multiple of audio frames, the number of bytes in each chunk is just some arbitrary number that can be different for each chunk, etc).
Now-- correct me if I'm wrong, I can't simply use an AVAudioPlayer because I need to POST to my URL, so I need to pull back the data "manually" via an NSURLConnection.
So... given the above, what is then the most painless way for me to play back that audio as it comes down the wire? (I appreciate that I could concatenate all the arrays of bytes and then pass the whole thing to an AVAudioPlayer at the end-- only that this will delay the start of playback as I have to wait for all the data.)
I will give a bird's eye view to the solution. I think that this will help you a great deal in the direction to find a concrete, coded solution.
iOS provides a zoo of audio APIs and several of them can be used to play audio. Which one of them you choose depends on your particular requirements. As you wrote already, the AVAudioPlayer class is not suitable for your case, because with this one, you need to know all the audio data in the moment you start playing audio. Obviously, this is not the case for streaming, so we have to look for an alternative.
A good tradeoff between ease of use and versatility are the Audio Queue Services, which I recommend for you. Another alternative would be Audio Units, but they are a low level C API and therefor less intuitive to use and they have many pitfalls. So stick to Audio Queues.
Audio Queues allow you to define callback functions which are called from the API when it needs more audio data for playback - similarly to the callback of your network code, which gets called when there is data available.
Now the difficulty is how to connect two callbacks, one which supplies data and one which requests data. For this, you have to use a buffer. More specifically, a queue (don't confuse this queue with the Audio Queue stuff. Audio Queue Services is the name of an API. On the other hand, the queue I'm talking about next is a container object). For clarity, I will call this one buffer-queue.
To fill data into the buffer-queue you will use the network callback function, which supplies data to you from the network. And data will be taken out of the buffer-queue by the audio callback function, which is called by the Audio Queue Services when it needs more data.
You have to find a buffer-queue implementation which supports concurrent access (aka it is thread safe), because it will be accessed from two different threads, the audio thread and the network thread.
Alternatively to finding an already thread safe buffer-queue implementation, you can take care of the thread safety on your own, e.g. by executing all code dealing with the buffer-queue on a certain dispatch queue (3rd kind of queue here; yes, Apple and IT love them).
Now, what happens if either
The audio callback is called and your buffer-queue is empty, or
The network callback is called and your buffer-queue is already full?
In both cases, the respective callback function can't proceed normally. The audio callback function can't supply audio data if there is none available and the network callback function can't store incoming data if the buffer-queue is full.
In these cases, I would first try out blocking further execution until more data is available or respectively space is available to store data. On the network side, this will most likely work. On the audio side, this might cause problems. If it causes problems on the audio side, you have an easy solution: if you have no data, simply supply silence as data. That means that you need to supply zero-frames to the Audio Queue Services, which it will play as silence to fill the gap until more data is available from the network.
This is the concept that all streaming players use when suddenly the audio stops and it tells you "buffering" next to some kind of spinning icon indicating that you have to wait and nobody knows for how long.
I would like to write a web application that allows me to sync audio playback of an MP3 down to ~50ms, or close enough that the human ear can't detect the difference.
The idea would be that two or more smartphones could each be paired to a bluetooth speaker, and two or more speakers would play the same audio at the exact same time.
How would you suggest I go about setting this up, both client-side and server-side? I'm planning to use Rails/Ruby for backend, and iOS/obj c for mobile dev.
I had though of the idea of syncing to a global/atomic clock on the server, and having the server provide instructions to clients on when to start playing/jump in to an already playing track. My concern is that, if I want to stream the audio, that it will be impossible to load a song into memory and start playback accurately on the millisecond level.
Thoughts?
The jitter in internet packet delivery will be too large, so forget about syncing over the internet. However you could check the accuracy of NTP which is still used (I guess, I know that older UNIX's used it) by the OS when you switch on automatic date/time in Settings, but my guess is that it won't be good enough either. But perhaps the OS may also use other time sources like GPS; I'm don't know how iOS does it but accuracy within 20ms is not to be expected. You could create experimental app to check it out.
So, what's left is a sync closer to home, meaning between the devices directly. Of course you need to make sure that all devices haves loaded (enough of) the song, and have preloaded it in AVAudioPlayer or whatever you're using, to be able to start playing immediately. (It may actually not be the best idea to use higher level 'AVAudioPlayer` API's as it may give higher delays, and what more important higher jitter, than lower level API's.)
Here are three ideas (one device needs to be master triggering the start play, the others are slaves that are waiting for the trigger):
Use an audio trigger pulse, like a high tone of a defined length and frequency. Then use FFT to recognise this tone.
Connect the devices via GameKit Bluetooth and transmit the trigger on these connections.
Use the iPhone 4+ flash as trigger: flash in a certain pattern. This would require you to sample the video data which is quite doable and can be very fast.
I'm going with a solution that uses an atomic clock for synchronization, and an external service that allows server instructions/messages to be sent to all devices in close sync.