How to play streaming PCM data from server in iOS?

How to play streaming PCM data from server in iOS? - ios

I'm trying develop app what play streaming PCM data from server.
I have used AudioQueue, but it does not work well.
PCM data format (from server) :
Sample rate = 48000, num of channel = 2, Bit per sample = 16
And, server is not streaming fixed bytes to client.
(Streaming variable bytes. Ex : 30848, 128, 2764, ... bytes )
My source code :
Here, ASBD structure what I have setted & create Audio Queue object
(language : Swift)
// Create ASBD structure & set properties.
var streamFormat = AudioStreamBasicDescription()
streamFormat.mSampleRate = 48000
streamFormat.mFormatID = kAudioFormatLinearPCM
streamFormat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
streamFormat.mFramesPerPacket = 1
streamFormat.mChannelsPerFrame = 2
streamFormat.mBitsPerChannel = 16
streamFormat.mBytesPerFrame = (streamFormat.mBitsPerChannel / 8) * streamFormat.mChannelsPerFrame
streamFormat.mBytesPerPacket = streamFormat.mBytesPerFrame
streamFormat.mReserved = 0
// Create AudioQueue for playing PCM streaming data.
var err = AudioQueueNewOutput(&streamFormat, self.queueCallbackProc, nil, nil, nil, 0, &aq)
...
I have setted ASBD structure & created AudioQueue object like the above.
AudioQueue play streamed PCM data very well for a few seconds,
but soon playing sound is on and off. What can I do?
(still streaming, and queueing AudioQueue)
Please give me any idea.

You need to do (at least) two things:
You need to buffer data to handle latency jitter and different packet sizes between the data from the server, and the data that the audio queue callback requests. A typical solution involves using a circular fifo/buffer, pre-filled with a certain amount of data, enough to handle the worse case network jitter (you will need to statistically analyze this number). The audio queue callback can just copy the requested amount of data out of the circular fifo. The network code tries to keep it filled.
You may also need some way to conceal errors when the two rates are not sufficiently identical: e.g. some way to duplicate or synthesize extra sound when the network rate is too slow, and some way to leave out samples when the network rate is too high, both ways trying to be as inaudible as possible and without producing loud clicks at any rate discontinuities or network drop-outs.

Related

record pcmaudiodata per 10 milisecond without playback

İ need to record pcmaudio per 10 milisecond without playback in swift.
I have tried this code but i can't find how can i stop playback while recording.
RecordAudio Github Repo
and second question: How can i get PCM data from circular buffer for encode-decode process properly. When I convert recorded audio data to signed byte or unsigned byte or anything else the converted data sometimes will corrupt. What is the best practice for this kind of process

In the RecordAudio sample code, the audio format is specified as Float (32-bit floats). When doing a float to integer conversion, you have to make sure your scale and offset results in a value in legal range for the destination type. e.g. check that -1.0 to 1.0 results in 0 to 256 (unsigned byte), and out-of-range values are clipped to legal values. Also pay attention to the number of samples you convert, as an Audio Unit callback can vary the frameCount sent (number of samples returned). You most likely won't get exactly 10 mS in any single RemoteIO callback, but may have to observe a circular buffer filled by multiple callbacks, or a larger buffer that you will have to split.
When RemoteIO is running in play-and-record mode, you can usually silence playback by zeroing the bufferList buffers (after copying, analyzing, or otherwise using the data in the buffers) before returning from the Audio Unit callback.

speakhere sample can not work correctly when I redefine the time interval of callback function

I downloaded the speakHere example and changed the parameters liked below:
#define kBufferDurationSeconds 0.020
void AQRecorder::SetupAudioFormat(UInt32 inFormatID)
{
memset(&mRecordFormat,0, sizeof(mRecordFormat));
mRecordFormat.mFormatID =kAudioFormatLinearPCM;
mRecordFormat.mSampleRate =8000.0;
mRecordFormat.mFormatFlags =kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
mRecordFormat.mBitsPerChannel = 16;
mRecordFormat.mFramesPerPacket = mRecordFormat.mChannelsPerFrame =1;
mRecordFormat.mBytesPerFrame = (mRecordFormat.mBitsPerChannel/8) * mRecordFormat.mChannelsPerFrame;
mRecordFormat.mBytesPerPacket = mRecordFormat.mBytesPerFrame ;
}
But I found that it seemed the time interval of the callback function AQRecorder::MyInputBufferHandler() was called but not per 20ms. It called the callback function four times with 1ms interval and after 500ms calls the callback function one time, then four times with 1ms, then 500ms,over and over again. But I set the parameter kBufferDurationSeconds = 0.02
what cause this problem. Please help me.

In iOS, the Audio Session setPreferredIOBufferDuration API (did you even use an OS buffer duration call?) is only a request regarding the app's preference. The OS is free to choose a buffer duration different, but compatible with what iOS thinks is best (for battery life, compatibility with other apps, etc.)
Audio Queues run on top of Audio Units. If the RemoteIO Audio Unit is using 500 mS buffers, it will cut them up into 4 smaller Audio Queue buffers and pass those smaller buffers to the Audio Queue API in a quick burst.
If you use the Audio Unit API instead of the Audio Queue API, and the Audio Session API for a setPreferredIOBufferDuration message, you may be able to request and get shorter, more evenly spaced, audio buffers.

Accurate timer using AudioUnit

I'm trying to make an accurate timer to analyze an input. I'd like to be able to measure 1% deviation in signals of ~200ms.
My understanding is that using an AudioUnit will be able to get <1ms.
I tried implementing the code from Stefan Popp's example
After updating a few things to get it to work on xcode 6.3, I have the example working, however:
While I do eventually want to capture audio, I thought there should be some way to get a notification, like NSTimer, so I tried an AudioUnitAddRenderNotify, but it does exactly what it says it should - i.e it's tied to the render, not just an arbitrary timer. Is there some way to get a callback triggered without having to record or play?
When I examine mSampleTime, I find that the interval between slices does match the inNumberFrames - 512 - which works out to 11.6ms. I see the same interval for both record and play. I need more resolution than that.
I tried playing with kAudioSessionProperty_PreferredHardwareIOBufferDuration but all the examples I could find use the deprecated AudioSessions, so I tried to convert to AudioUnits:
Float32 preferredBufferSize = .001; // in seconds
status = AudioUnitSetProperty(audioUnit, kAudioSessionProperty_PreferredHardwareIOBufferDuration, kAudioUnitScope_Output, kOutputBus, &preferredBufferSize, sizeof(preferredBufferSize));
But I get OSStatus -10879, kAudioUnitErr_InvalidProperty.
Then I tried kAudioUnitProperty_MaximumFramesPerSlice with values of 128 and 256, but inNumberFrames is always 512.
UInt32 maxFrames = 128;
status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maxFrames, sizeof(maxFrames));
[EDIT]
I am trying to compare the timing of an input (user's choice of MIDI or microphone) to when it should be. Specifically, is the instrument being played before or after the beat/metronome and by how much? This is for musicians, not a game, so precision is expected.
[EDIT]
The answers seem re-active to events. i.e. They let me precisely see when something happened, however I don't see how I do something accurately. My fault for not being clear. My app needs to be the metronome as well - synchronize playing a click on the beat and flash a dot on the beat - then I can analyze the user's action to compare timing. But if I can't play the beat accurately, the rest falls apart. Maybe I'm supposed to record audio - even if I don't want it - just to get inTimeStamp from the callback?
[EDIT]
Currently my metronome is:
- (void) setupAudio
{
AVAudioPlayer *audioPlayer;
NSString *path = [NSString stringWithFormat:#"%#/click.mp3", [[NSBundle mainBundle] resourcePath]];
NSURL *soundUrl = [NSURL fileURLWithPath:path];
audioPlayer = [[AVAudioPlayer alloc] initWithContentsOfURL:soundUrl error:nil];
[audioPlayer prepareToPlay];
CADisplayLink *syncTimer;
syncTimer = [CADisplayLink displayLinkWithTarget:self selector:#selector(syncFired:)];
syncTimer.frameInterval = 30;
[syncTimer addToRunLoop:[NSRunLoop mainRunLoop] forMode:NSDefaultRunLoopMode];
}
-(void)syncFired:(CADisplayLink *)displayLink
{
[audioPlayer play];
}

You should be using a circular buffer, and performing your analysis on the signal in chunks that match your desired frame count on your own timer. To do this you set up a render callback, then feed your circular buffer the input audio in the callback. Then you set up your own timer which will pull from the tail of the buffer and do your analysis. This way you could be feeding the buffer 1024 frames every 0.23 seconds, and your analysis timer could fire maybe every 0.000725 seconds and analyze 32 samples. Here is a related question about circular buffers.
EDIT
To get precision timing using a ring buffer, you could also store the timestamp corresponding to the audio buffer. I use TPCircularBuffer for doing just that. TPCircularBufferPrepareEmptyAudioBufferList, TPCircularBufferProduceAudioBufferList, and TPCircularBufferNextBufferList will copy and retrieve the audio buffer and timestamp to and from a ring buffer. Then when you are doing your analysis, there will be a timestamp corresponding to each buffer, eliminating the need to do all of your work in the render thread, and allowing you to pick and choose your analysis window.

If you are using something like cross-correlation and/or a peak detector to find a matched sample vector within an audio sample buffer (or a ring buffer containing samples), then you should be able to count samples between sharp events to within one sample (1/44100 or 0.0226757 milliseconds at a 44.1k Hz sample rate), plus or minus some time estimation error. For events more than one Audio Unit buffer apart, you can sum and add the number of samples within the intervening buffers to get a more precise time interval than just using (much coarser) buffer timing.
However, note that there is a latency or delay between every sample buffer and speaker audio going out, as well as between microphone sound reception and buffer callbacks. That has to be measured, as in you can measure the round trip time between sending a sample buffer out, and when the input buffer autocorrelation estimation function gets it back. This is how long it takes the hardware to buffer, convert (analog to digital and vice versa) and pass the data. That latency might be around the area of 2 to 6 times 5.8 milliseconds, using appropriate Audio Session settings, but might be different for different iOS devices.
Yes, the most accurate way to measure audio is to capture the audio and look at the data in the actual sampled audio stream.

Performing FFT on PCM file to generate a spectrogram

Im writing an app that visualizes music. So far I have an audio file from the ipod library converted to PCM and placed inside the APPs directory. Now I am trying to perform a FFT on that PCM file to give me frequency and db over time. Here is code I found which uses Apple Accelerate framework to perform the FFT:
int bufferFrames = 1024;
int bufferlog2 = round(log2(bufferFrames));
FFTSetup fftSetup = vDSP_create_fftsetup(bufferlog2, kFFTRadix2);
float outReal[bufferFrames / 2];
float outImaginary[bufferFrames / 2];
COMPLEX_SPLIT out = { .realp = outReal, .imagp = outImaginary };
vDSP_ctoz((COMPLEX *)data, 2, &out, 1, bufferFrames / 2);
vDSP_fft_zrip(fftSetup, &out, 1, bufferlog2, FFT_FORWARD);
Now I dont understand how to feed this the PCM file. 'data' I believe is an array of COMPLEX objects which hold the portion of the audio that the FFT will be applied to. How do I build such a data structure from the PCM file?
I found some Java code that might be useful but not sure how to convert this to C. Also was is audioData and how do I fill that from the PCM file:
Complex[] complexData = new Complex[audioData.length];
for (int i = 0; i < complexData.length; i++) {
complextData[i] = new Complex(audioData[i], 0);
}

The "Apple Accelerate framework" is fine, I have used it on a project. Just read the docs carefully. Basically you will need to read the PCM data into memory and perhaps massage the format, then call the FFT on it.
To massage the data loop through your PCM data, sample by sample and copy as appropriate to another array.
The docs are here, you want the vDFT_fft* functions.
Mike Ash has a good writeup Friday Q&A site.
Audio data is the result of voltage samples taken periodically of the audio. The forma has many different forms. If you are going to work with this you are going to have to spend some time learning the basics. Also if you are going to run a FFT on audio data you need to learn about sampling frequency and "aliasing" sometimes called "fold back".

kAudioDevicePropertyBufferFrameSize replacement for iOS

I was trying to set up an audio unit to render the music (instead of Audio Queue.. which was too opaque for my purposes).. iOS doesn't have this property kAudioDevicePropertyBufferFrameSize.. any idea how I can derive this value to set up the buffer size of my IO unit?
I found this post interesting.. it asks about the possibility of using a combination of kAudioSessionProperty_CurrentHardwareIOBufferDuration and kAudioSessionProperty_CurrentHardwareOutputLatency audio session properties to determine that value.. but there is no answer.. any ideas?

You can use the kAudioSessionProperty_CurrentHardwareIOBufferDuration property, which represents the buffer size in seconds. Multiply this by the sample rate you get from kAudioSessionProperty_CurrentHardwareSampleRate to get the number of samples you should buffer.
The resulting buffer size should be a multiple of 2. I believe either 512 or 4096 are what you're likely to get, but you should always base it off of the values returned from AudioSessionGetProperty.
Example:
Float64 sampleRate;
UInt32 propSize = sizeof(Float64);
AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareSampleRate,
&propSize,
&sampleRate);
Float32 bufferDuration;
propSize = sizeof(Float32);
AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration,
&propSize,
&bufferDuration);
UInt32 bufferLengthInFrames = sampleRate * bufferDuration;
The next step is to find out the input stream format of the unit you're sending audio to. Based on your description, I'm assuming that you're programmatically generating audio to send to the speakers. This code assumes unit is an AudioUnit you're sending audio to, whether that's the RemoteIO or something like an effect Audio Unit.
AudioStreamBasicDescription inputASBD;
UInt32 propSize = sizeof(AudioStreamBasicDescription);
AudioUnitGetProperty(unit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&inputASBD,
&propSize);
After this, inputASBD.mFormatFlags will be a bit field corresponding to the audio stream format that unit is expecting. The two most likely sets of flags are named kAudioFormatFlagsCanonical and kAudioFormatFlagsAudioUnitCanonical. These two have corresponding sample types AudioSampleType and AudioUnitSampleType that you can base your size calculation off of.
As an aside, AudioSampleType typically represents samples coming from the mic or destined for the speakers, whereas AudioUnitSampleType is usually for samples that are intended to be processed (by an audio unit, for example). At the moment on iOS, AudioSampleType is a SInt16 and AudioUnitSampleType is fixed 8.24 number stored in a SInt32 container. Here's a post on the Core Audio mailing list explaining this design choice
The reason I hold back from saying something like "just use Float32, it'll work" is because the actual bit representation of the stream is subject to change if Apple feels like it.

The audio unit itself decides on the actual buffer size, so the app's audio unit callback has to be able to handle any reasonable size given to it. You can suggest and poll the kAudioSessionProperty_CurrentHardwareIOBufferDuration property, but note that this value can while your app is running (especially during screen lock or call interruptions, etc.) outside of what the app can control.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart