iOS: Playing RTP packets using audio unit - ios

I am developing VoIP app and need to play data from RTP packets which are sent by server every 20 ms.
I have a buffer which accumulates samples from RTP packets. Audio unit render callback reads data from this buffer.
The problem is that I cannot synchronise audio unit with RTP stream. Preferred IO buffer duration cannot be set to exactly 20 ms. And number of frames requested by render callback also cannot be set to the packet's number of samples.
As a result, there are two possible situations (depending on sample rate and IO buffer duration):
a) audio unit reads from my buffer faster than it is filled from RTP packets; in this case buffer periodically doesn't contain the requested number of samples and I get distorted sound;
b) buffer is filled faster than audio unit reads from it; in this case buffer periodically is overflowed and samples from new RTP packets are lost.
What should I do to avoid this issue?

If you have control over the packet rate, this is typically done via a "leaky bucket" algorithm. A circular FIFO/buffer can hold the "bucket" of incoming data, and a certain amount of padding needs to be kept in the FIFO/buffer to cover variations in network rate and latency. If the bucket gets too full, you ask the packet sender to slow down, etc.
On the audio playback end, various audio concealment methods (PSOLA time-pitch modification, etc.) can be used to slightly stretch or shrink the data to fit, if adequate buffer fill thresholds are exceeded.

If you are receiving audio
Try having the client automatically request periodically (ex. every second) that the server sends audio of a certain bitrate dependent on the buffer size and connection speed.
For example, have each audio sample be 300kbits large if there are, for example, 20 samples in the buffer and a 15000kbit/s speed and increase/decrease the audio sample bitrate dynamically as necessary.
If you are sending audio
Do the same, but in reverse. Have the server request periodically that the client changes the audio bitrate.

Related

Reading frames faster than rtsp stream fps

I have an algorithm that reads in frames from a live rtsp stream (from camera connected to my computer). I can read and process frames at a faster rate than the incoming stream (20 vs 6).
I run cv2.VideoCapture on the stream, and then using a while loop, read in frames using stream.read(). I just wanted to check, if I read frames faster than the stream can deliver them, does that mean stream.read() will return false sometimes? Or will it "wait" for the next frame to come in.

Fast-fourier transform for low-frequency signals in the WebAudio API / javascript

I have a single-channel wave coming in at an 8000 Hz sampling rate.
I need to analyze frequencies that are between 5 Hz and 300 Hz in real-time, with emphasis on signals from 10 to 60 Hz.
My thought initially is to run the 8000 Hz sample into a buffer, collecting about 32000 samples. Then, run a 32000 window-sized fourier transform on it.
The reasoning here is that for lower-frequency signals, you need a larger window size (right?)
However, if I'm trying to display this signal in real-time, it seems like the AudioAnalyserNode might not be a good choice here. I know the WebAudio API would allow me to get the raw data, but ideally the AudioAnalyserNode would be able to run a new fft based on the previous 32000 samples, even if a smaller amount of samples have become newly available. At this point, it seems like the fft data is only updating once every four seconds.
Do I have to create a special "running bin" so that the display updates more frequently than once every 4 seconds? Or, what's the smallest window size I can use to still get reasonable values in this range? Is 32000 a large enough window size?
I am using the WebAudio API analyser node in javascript, but if I have to get the raw data, I'm also willing to change libraries to another one in javascript.
Using an AnalyserNode, you can call getFloatFrequencyData as often as you like. This will return the FFT of the last fftSize samples. These get smoothed together. For full details, see AnalyserNode Interface
Also, the WebAudio spec allows you to construct an AudioContext with a user-selectable sample rate. You could set your sample rate to 8000 Hz. Then your FFTs can have finer resolution with less complexity.
However, I don't think any browser has implemented this capability yet.
An alternative would be to get a supported audio card that allows a sample rate of 8000 Hz and set up your system to use that as the default audio output device, Then the audio context will have a sample rate of 8000 Hz.

Alsa: snd_pci_readi() and real-time threads

I've got a dedicated thread that caputures audio from Alsa through snd_pcm_readi(). Periodically I get a short read, meaning snd_pcm_readi() returns a positive integer lower than my buffer size, and there's obviously a 'pop' sound in my audio stream. Then I set the thread priority to real-time and this gives a tangible benefit, far less short reads, but this doesn't solve.
Now the question: before going down the bumpy road of a real-time patched Linux kernel, there's something else I can do to squeeze out some more performance? Is calling snd_pcm_readi() in a dedicated thread the best way to pull audio out of Alsa?
For playback, the buffer size determines the latency.
For capture, it does not; only the period size determines how long you must wait until recorded samples are reported to be available.
So to prevent overruns, make the buffer as large as possible (e.g., by calling snd_pcm_hw_params_set_buffer_size_max() after setting the other parameters).

MP3 radio Stream buffer underrun detection

any pointers to detect through a script on linux that an mp3 radio stream is breaking up, i am having issues with my radio station when the internet connection slows down and causes the stream on the client side to stop, buffer and then play.
There are a few ways to do this.
Method 1: Assume constant bitrate
If you know that you will have a constant bitrate, you can measure that bitrate over time on the server and determine when it slows below a threshold. Note that this isn't the most accurate method, and won't always work. Not all streams use a constant bitrate. But, this method is as easy as counting bytes received over the wire.
Method 2: Playback on server
You can run a headless player on the server (via cvlc or similar) and track when it has buffer underruns. This will work at any bitrate and will give you a decent idea of what's happening on the clients. This sort of player setup also enables utility functions like silence detection. The downside is that it takes a little bit of CPU to decode, and a bit more effort to automate.
Method 3 (preferred): Log output buffer on source
Your source encoder will have a buffer on its output, data waiting to be sent to the server. When this buffer grows over a particular threshold, log it. This means that output over the network stalled for whatever reason. This method gets the appropriate data right from the source, and ensures you don't have to worry about clock synchronization issues that can occur over time in your monitoring of audio streams. (44.1 kHz to your encoder might be 44.101 kHz to a player.) This method might require modifying your source client.

Core Audio Large (Compressed) File Playback and Memory Footprint

So, I have setup a multichannel mixer and a Remote I/O unit to mix/play several buffers of PCM data that I read from audio files.
For short sound effects in my game, I load the whole file into a memory buffer using ExtAudioFileRead().
For my background music, let's say I have a 3 minute compressed audio file. Assuming it's encoded as mp3 # 128 kbps (44,100 Hz stereo), that gives around 1 MB per minute, or 3 MB total. Uncompressed, in memory, I believe it's around ten times that if I remember correctly. I could use the exact same method as for small files; I believe ExtAudioFileRead() takes care of the decoding, using the (single) hardware decoder when available, but I'd rather not read the whole buffer at once, and instead 'stream' it at regular intervals from disk.
The first thing that comes to mind is going one step below to the (non-"extended") Audio File Services API and use AudioFileReadPackets(), like so:
Prepare two buffers A and B, each big enough to hold (say) 5 seconds of audio. During playback, start reading from one buffer and switch to the other one when reaching the end (i.e., they make up the two halves of a ring buffer).
Read first 5 seconds of audio from file into buffer A.
Read next 5 seconds of audio from file into buffer B.
Begin playback (from buffer A).
Once the play head enters buffer B, load next 5 seconds of audio into buffer A.
Once the play head enters buffer A again, load next 5 seconds of audio into buffer B.
Go to #5
Is this the right approach, or is there a better way?
I'd suggest using the high-level AVAudioPlayer class to do simple background playback of an audio file. See:
https://developer.apple.com/library/ios/documentation/AVFoundation/Reference/AVAudioPlayerClassReference/Chapters/Reference.html#//apple_ref/doc/uid/TP40008067
If you require finer-grained control and lower latency, check out Apple's AUAudioFilePlayer. See AudioUnitProperties.h for a discussion. This is an Audio Unit that that abstracts the complexities of streaming an audio file from disk. That said, it's still pretty complicated to set up and use, so definitely try AVAudioPlayer first.

Resources