MP3 radio Stream buffer underrun detection - stream

any pointers to detect through a script on linux that an mp3 radio stream is breaking up, i am having issues with my radio station when the internet connection slows down and causes the stream on the client side to stop, buffer and then play.

There are a few ways to do this.
Method 1: Assume constant bitrate
If you know that you will have a constant bitrate, you can measure that bitrate over time on the server and determine when it slows below a threshold. Note that this isn't the most accurate method, and won't always work. Not all streams use a constant bitrate. But, this method is as easy as counting bytes received over the wire.
Method 2: Playback on server
You can run a headless player on the server (via cvlc or similar) and track when it has buffer underruns. This will work at any bitrate and will give you a decent idea of what's happening on the clients. This sort of player setup also enables utility functions like silence detection. The downside is that it takes a little bit of CPU to decode, and a bit more effort to automate.
Method 3 (preferred): Log output buffer on source
Your source encoder will have a buffer on its output, data waiting to be sent to the server. When this buffer grows over a particular threshold, log it. This means that output over the network stalled for whatever reason. This method gets the appropriate data right from the source, and ensures you don't have to worry about clock synchronization issues that can occur over time in your monitoring of audio streams. (44.1 kHz to your encoder might be 44.101 kHz to a player.) This method might require modifying your source client.

Related

Alsa: snd_pci_readi() and real-time threads

I've got a dedicated thread that caputures audio from Alsa through snd_pcm_readi(). Periodically I get a short read, meaning snd_pcm_readi() returns a positive integer lower than my buffer size, and there's obviously a 'pop' sound in my audio stream. Then I set the thread priority to real-time and this gives a tangible benefit, far less short reads, but this doesn't solve.
Now the question: before going down the bumpy road of a real-time patched Linux kernel, there's something else I can do to squeeze out some more performance? Is calling snd_pcm_readi() in a dedicated thread the best way to pull audio out of Alsa?
For playback, the buffer size determines the latency.
For capture, it does not; only the period size determines how long you must wait until recorded samples are reported to be available.
So to prevent overruns, make the buffer as large as possible (e.g., by calling snd_pcm_hw_params_set_buffer_size_max() after setting the other parameters).

iOS: Playing RTP packets using audio unit

I am developing VoIP app and need to play data from RTP packets which are sent by server every 20 ms.
I have a buffer which accumulates samples from RTP packets. Audio unit render callback reads data from this buffer.
The problem is that I cannot synchronise audio unit with RTP stream. Preferred IO buffer duration cannot be set to exactly 20 ms. And number of frames requested by render callback also cannot be set to the packet's number of samples.
As a result, there are two possible situations (depending on sample rate and IO buffer duration):
a) audio unit reads from my buffer faster than it is filled from RTP packets; in this case buffer periodically doesn't contain the requested number of samples and I get distorted sound;
b) buffer is filled faster than audio unit reads from it; in this case buffer periodically is overflowed and samples from new RTP packets are lost.
What should I do to avoid this issue?
If you have control over the packet rate, this is typically done via a "leaky bucket" algorithm. A circular FIFO/buffer can hold the "bucket" of incoming data, and a certain amount of padding needs to be kept in the FIFO/buffer to cover variations in network rate and latency. If the bucket gets too full, you ask the packet sender to slow down, etc.
On the audio playback end, various audio concealment methods (PSOLA time-pitch modification, etc.) can be used to slightly stretch or shrink the data to fit, if adequate buffer fill thresholds are exceeded.
If you are receiving audio
Try having the client automatically request periodically (ex. every second) that the server sends audio of a certain bitrate dependent on the buffer size and connection speed.
For example, have each audio sample be 300kbits large if there are, for example, 20 samples in the buffer and a 15000kbit/s speed and increase/decrease the audio sample bitrate dynamically as necessary.
If you are sending audio
Do the same, but in reverse. Have the server request periodically that the client changes the audio bitrate.

Core Audio Large (Compressed) File Playback and Memory Footprint

So, I have setup a multichannel mixer and a Remote I/O unit to mix/play several buffers of PCM data that I read from audio files.
For short sound effects in my game, I load the whole file into a memory buffer using ExtAudioFileRead().
For my background music, let's say I have a 3 minute compressed audio file. Assuming it's encoded as mp3 # 128 kbps (44,100 Hz stereo), that gives around 1 MB per minute, or 3 MB total. Uncompressed, in memory, I believe it's around ten times that if I remember correctly. I could use the exact same method as for small files; I believe ExtAudioFileRead() takes care of the decoding, using the (single) hardware decoder when available, but I'd rather not read the whole buffer at once, and instead 'stream' it at regular intervals from disk.
The first thing that comes to mind is going one step below to the (non-"extended") Audio File Services API and use AudioFileReadPackets(), like so:
Prepare two buffers A and B, each big enough to hold (say) 5 seconds of audio. During playback, start reading from one buffer and switch to the other one when reaching the end (i.e., they make up the two halves of a ring buffer).
Read first 5 seconds of audio from file into buffer A.
Read next 5 seconds of audio from file into buffer B.
Begin playback (from buffer A).
Once the play head enters buffer B, load next 5 seconds of audio into buffer A.
Once the play head enters buffer A again, load next 5 seconds of audio into buffer B.
Go to #5
Is this the right approach, or is there a better way?
I'd suggest using the high-level AVAudioPlayer class to do simple background playback of an audio file. See:
https://developer.apple.com/library/ios/documentation/AVFoundation/Reference/AVAudioPlayerClassReference/Chapters/Reference.html#//apple_ref/doc/uid/TP40008067
If you require finer-grained control and lower latency, check out Apple's AUAudioFilePlayer. See AudioUnitProperties.h for a discussion. This is an Audio Unit that that abstracts the complexities of streaming an audio file from disk. That said, it's still pretty complicated to set up and use, so definitely try AVAudioPlayer first.

Time between callback calls?

I have a lab project that uses mainly PyAudio and to further understand its way of working I made some measurements, in this case time between callbacks (using callback mode).
I timed it, and got an interesting result
(#256 chunk size, 44.1k fs): 0.0099701;0.0000365;0.0000201;0.0201579
This pattern goes on and on.
Between two longer calls, we have two shorter calls and sometimes the longer call is shorter (mind you I don't do anything else in the program than time the callbacks).
If we average this out we get our desired callback time:
1/44100 * 256 (roughly 5.8ms)
Here is my measurement visualized:
So can someone explain what exactly happens here under the hood?
What happens under the hood in PortAudio is dependent on a number of factors, including:
Which native audio API PortAudio is talking to
What buffer size and latency parameters you passed to Pa_OpenStream()
The capabilities of the audio hardware and its drivers, including its supported buffer sizes, buffering model and timing characteristics.
Under some circumstances PortAudio will request larger buffers from the native audio API and then invoke the PortAudio user callback multiple times in quick succession. This can happen if you have selected a small callback buffer size and a long latency.
Another scenario is that the native audio API doesn't support the buffer size that you requested for your callback size (framesPerBuffer parameter to Pa_OpenStream()). In this case PortAudio will be forced to use a driver-supported buffer size and then "adapt" between that buffer size and your callback buffer size. This adaption process can cause irregular timing.
Yet another possibility is that the native audio API uses a large ring buffer. Each time PortAudio polls the native host API, it will work to fill the native ring buffer by calling your callback as many times as needed. In this case irregular timing is related to the polling rate.
The above are not the only possibilities.
One likely explanation of what is happening in your case is that PortAudio is calling your callback 3 times in fast succession (a guess would be that the native buffer size is 3x your callback buffer size), for one of the reasons above.
Another possibility is that the native audio subsystem is signalling PortAudio irregularly. This can happen if a system layer below PortAudio is doing similar kinds of buffering to what I described above. I have seen this happen with DirectSound on Windows 7 for example. ASIO4ALL drivers will exhibit +/- 1ms jitter (which is not what you're seeing).
You can try reducing the requested stream latency to 0 and see if that changes the result. This will force double-buffering, which may or may not produce stable output. Another thing to try is to use the paFramesPerBufferUnspecified parameter, which will cause the callback to be called with the native buffer size -- then you can observe whether there is greater periodicity, what that buffer size is, and also whether the buffer size varies from callback to callback.
You didn't say which operating system and host API you're targetting, so it's hard to give more specific details than the above.
The internal buffering models used by the various PortAudio host API backends are described in some detail on the PortAudio wiki.
To answer a related question: why is it like this? Aside from the cases where it is a function of the lower layers of the native audio subsystem, or the buffer adaption process, it is often a result of specifying a large suggested latency to Pa_OpenStream(). Some PortAudio host APIs will relax the buffer periodicity if the specified latency is very high, in order to reduce system load that would be caused by high-frequency timer callbacks.

Low jitter audio on iOS

I'd like to load a small audio clip like a beep into memory, and schedule playback after x seconds with very low jitter. My application ideally gets less than +-1ms, but +-5ms could still be useful. The time is synchronized to a remote application without a microphone. My question is what kind of jitter can I expect from the audio APIs, and are they all equal in this regard?
I'm not familiar with the audio APIs, but from the latency discussions I've seen the number 5.8ms using remoteIO audio units. Does this mean +-3ms would be the best precision possible?
You would need to set this process as Real-Time to have a guarantee of low delay, otherwise you can get jitter in seconds because operating system can decide to make some background job.
Once you got it as real-time, you might archive lower delay.
Please check with Apple if you can make process real-time (with scheduling options). You might want to have extra permissions and kernel level support in your app to do it properly, that you can have guaranteed 1ms delay for audio app.

Resources