ios audio queue - how to meter audio level in buffer? - ios

I'm working on an app that should do some audio signal processing. I need to measure the audio level in each one of the buffers I get (through the Callback function). I've been searching the web for some time, and I found that there is a build-in property called Current level metering:
AudioQueueGetProperty(recordState->queue,kAudioQueueProperty_CurrentLevelMeter,meters,&dlen);
This property gets me the average or peak audio level, but it's not synchronised to the current buffer.
I figured out I need to calculate the audio level from the buffer data by myself, so I had this:
double calcAudioRMS (SInt16 * audioData, int numOfSamples)
{
double RMS, adPercent;
RMS = 0;
for (int i=0; i<numOfSamples; i++)
{
adPercent=audioData[i]/32768.0f;
RMS += adPercent*adPercent;
}
RMS = sqrt(RMS / numOfSamples);
return RMS;
}
This function gets the audio data (casted into Sint16) and the number of samples in the current buffer. The numbers I get are indeed between 0 and 1, but they seem to be rather random and low comparing to the numbers I got from the built-in audio level metering.
The recording audio format is:
format->mSampleRate = 8000.0;
format->mFormatID = kAudioFormatLinearPCM;
format->mFramesPerPacket = 1;
format->mChannelsPerFrame = 1;
format->mBytesPerFrame = 2;
format->mBytesPerPacket = 2;
format->mBitsPerChannel = 16;
format->mReserved = 0;
format->mFormatFlags = kLinearPCMFormatFlagIsSignedInteger |kLinearPCMFormatFlagIsPacked;
My question is how to get the right values from the buffer? Is there a built-in function \ property for this? Or should I calculate the audio level myself, and how to do it?
Thanks in advance.

Your calculation for RMS power is correct. I'd be inclined to say that you have a fewer number of samples than Apple does, or something similar, and that would explain the difference. You can check by inputting a loud sine wave, and checking that Apple (and you) calculate RMS power at 1/sqrt(2).
Unless there's a good reason, I would use Apple's power calculations. I've used them, and they seem good to me. Additionally, generally you don't want RMS power, you want RMS power as decibels, or use the kAudioQueueProperty_CurrentLevelMeterDB constant. (This depends on if you're trying to build an audio meter, or truly display the audio power)

Related

Spectrogram from AVAudioPCMBuffer using Accelerate framework in Swift

I'm trying to generate a spectrogram from an AVAudioPCMBuffer in Swift. I install a tap on an AVAudioMixerNode and receive a callback with the audio buffer. I'd like to convert the signal in the buffer to a [Float:Float] dictionary where the key represents the frequency and the value represents the magnitude of the audio on the corresponding frequency.
I tried using Apple's Accelerate framework but the results I get seem dubious. I'm sure it's just in the way I'm converting the signal.
I looked at this blog post amongst other things for a reference.
Here is what I have:
self.audioEngine.mainMixerNode.installTapOnBus(0, bufferSize: 1024, format: nil, block: { buffer, when in
let bufferSize: Int = Int(buffer.frameLength)
// Set up the transform
let log2n = UInt(round(log2(Double(bufferSize))))
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
// Create the complex split value to hold the output of the transform
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
// Now I need to convert the signal from the buffer to complex value, this is what I'm struggling to grasp.
// The complexValue should be UnsafePointer<DSPComplex>. How do I generate it from the buffer's floatChannelData?
vDSP_ctoz(complexValue, 2, &output, 1, UInt(bufferSize / 2))
// Do the fast Fournier forward transform
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
// Convert the complex output to magnitude
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
vDSP_zvmags(&output, 1, &fft, 1, vDSP_length(bufferSize / 2))
// Release the setup
vDSP_destroy_fftsetup(fftsetup)
// TODO: Convert fft to [Float:Float] dictionary of frequency vs magnitude. How?
})
My questions are
How do I convert the buffer.floatChannelData to UnsafePointer<DSPComplex> to pass to the vDSP_ctoz function? Is there a different/better way to do it maybe even bypassing vDSP_ctoz?
Is this different if the buffer contains audio from multiple channels? How is it different when the buffer audio channel data is or isn't interleaved?
How do I convert the indices in the fft array to frequencies in Hz?
Anything else I may be doing wrong?
Update
Thanks everyone for suggestions. I ended up filling the complex array as suggested in the accepted answer. When I plot the values and play a 440 Hz tone on a tuning fork it registers exactly where it should.
Here is the code to fill the array:
var channelSamples: [[DSPComplex]] = []
for var i=0; i<channelCount; ++i {
channelSamples.append([])
let firstSample = buffer.format.interleaved ? i : i*bufferSize
for var j=firstSample; j<bufferSize; j+=buffer.stride*2 {
channelSamples[i].append(DSPComplex(real: buffer.floatChannelData.memory[j], imag: buffer.floatChannelData.memory[j+buffer.stride]))
}
}
The channelSamples array then holds separate array of samples for each channel.
To calculate the magnitude I used this:
var spectrum = [Float]()
for var i=0; i<bufferSize/2; ++i {
let imag = out.imagp[i]
let real = out.realp[i]
let magnitude = sqrt(pow(real,2)+pow(imag,2))
spectrum.append(magnitude)
}
Hacky way: you can just cast a float array. Where reals and imag values are going one after another.
It depends on if audio is interleaved or not. If it's interleaved (most of the cases) left and right channels are in the array with STRIDE 2
Lowest frequency in your case is frequency of a period of 1024 samples. In case of 44100kHz it's ~23ms, lowest frequency of the spectrum will be 1/(1024/44100) (~43Hz). Next frequency will be twice of this (~86Hz) and so on.
4: You have installed a callback handler on an audio bus. This is likely run with real-time thread priority and frequently. You should not do anything that has potential for blocking (it will likely result in priority inversion and glitchy audio):
Allocate memory (realp, imagp - [Float](.....) is shorthand for Array[float] - and likely allocated on the heap`. Pre-allocate these
Call lengthy operations such as vDSP_create_fftsetup() - which also allocates memory and initialises it. Again, you can allocate this once outside of your function.

How do I increase the size of EZAudio EZMicrophone?

I would like to use the EZAudio framework to do realtime microphone signal FFT processing, along with some other processing in order to determine the peak frequency.
The problem is, the EZmicrophone class only appears to work on 512 samples, however, my signal requires an FFT of 8192 or even 16384 samples. There doesnt appear to be a way to change the buffer size in EZMicrophone, but I've read posts that recommend creating an array of my target size and appending the microphone buffer to it, then when it's full, do the FFT.
When I do this though, I get large chunks of memory with no data, or discontinuities between the segments of copied memory. I think it may have something to do with the timing or order in which the microphone delegate is being called or memory being overwritten in different threads...I'm grasping at straws here. Am I correct in assuming that this code is being executed everytime the microphone buffer is full of a new 512 samples?
Can anyone suggest what I may be doing wrong? I've been stuck on this for a long time.
Here is the post I've been using as a reference:
EZAudio: How do you separate the buffersize from the FFT window size(desire higher frequency bin resolution).
// Global variables which are bad but I'm just trying to make things work
float tempBuf[512];
float fftBuf[8192];
int samplesRemaining = 8192;
int samplestoCopy = 512;
int FFTLEN = 8192;
int fftBufIndex = 0;
#pragma mark - EZMicrophoneDelegate
-(void) microphone:(EZMicrophone *)microphone
hasAudioReceived:(float **)buffer
withBufferSize:(UInt32)bufferSize
withNumberOfChannels:(UInt32)numberOfChannels {
// Copy the microphone buffer so it wont be changed
memcpy(tempBuf, buffer[0], bufferSize);
dispatch_async(dispatch_get_main_queue(),^{
// Setup the FFT if it's not already setup
if( !_isFFTSetup ){
[self createFFTWithBufferSize:FFTLEN withAudioData:fftBuf];
_isFFTSetup = YES;
}
int samplesRemaining = FFTLEN;
memcpy(fftBuf+fftBufIndex, tempBuf, samplestoCopy*sizeof(float));
fftBufIndex += samplestoCopy;
samplesRemaining -= samplestoCopy;
if (fftBufIndex == FFTLEN)
{
fftBufIndex = 0;
samplesRemaining = FFTLEN;
[self updateFFTWithBufferSize:FFTLEN withAudioData:fftBuf];
}
});
}
You likely have threading issues because you are trying to do work in some blocks that takes much much longer than the time between audio callbacks. Your code is being called repeatedly before prior calls can say that they are done (with the FFT setup or clearing the FFT buffer).
Try doing the FFT setup outside the callback before starting the recording, only copy to a circular buffer or FIFO inside the callback, and do the FFT in code async to the callback (not locked in the same block as the circular buffer copy).

Varispeed with Libsndfile, Libsamplerate and Portaudio in C

I'm working on an audio visualizer in C with OpenGL, Libsamplerate, portaudio, and libsndfile. I'm having difficulty using src_process correctly within my whole paradigm. My goal is to use src_process to achieve Vinyl Like varispeed in real time within the visualizer. Right now my implementation changes the pitch of the audio without changing the speed. It does so with lots of distortion due to what sounds like missing frames as when I lower the speed with the src_ratio it almost sounds granular like chopped up samples. Any help would be appreciated, I keep experimenting with my buffering chunks however 9 times out of 10 I get a libsamplerate error saying my input and output arrays are overlapping. I've also been looking at the speed change example that came with libsamplerate and I can't find where I went wrong. Any help would be appreciated.
Here's the code I believe is relevant. Thanks and let me know if I can be more specific, this semester was my first experience in C and programming.
#define FRAMES_PER_BUFFER 1024
#define ITEMS_PER_BUFFER (FRAMES_PER_BUFFER * 2)
float src_inBuffer[ITEMS_PER_BUFFER];
float src_outBuffer[ITEMS_PER_BUFFER];
void initialize_SRC_DATA()
{
data.src_ratio = 1; //Sets Default Playback Speed
/*---------------*/
data.src_data.data_in = data.src_inBuffer; //Point to SRC inBuffer
data.src_data.data_out = data.src_outBuffer; //Point to SRC OutBuffer
data.src_data.input_frames = 0; //Start with Zero to Force Load
data.src_data.output_frames = ITEMS_PER_BUFFER
/ data.sfinfo1.channels; //Number of Frames to Write Out
data.src_data.src_ratio = data.src_ratio; //Sets Default Playback Speed
}
/* Open audio stream */
err = Pa_OpenStream( &g_stream,
NULL,
&outputParameters,
data.sfinfo1.samplerate,
FRAMES_PER_BUFFER,
paNoFlag,
paCallback,
&data );
/* Read FramesPerBuffer Amount of Data from inFile into buffer[] */
numberOfFrames = sf_readf_float(data->inFile, data->src_inBuffer, framesPerBuffer);
/* Looping of inFile if EOF is Reached */
if (numberOfFrames < framesPerBuffer)
{
sf_seek(data->inFile, 0, SEEK_SET);
numberOfFrames = sf_readf_float(data->inFile,
data->src_inBuffer+(numberOfFrames*data->sfinfo1.channels),
framesPerBuffer-numberOfFrames);
}
/* Inform SRC Data How Many Input Frames To Process */
data->src_data.end_of_input = 0;
data->src_data.input_frames = numberOfFrames;
/* Perform SRC Modulation, Processed Samples are in src_outBuffer[] */
if ((data->src_error = src_process (data->src_state, &data->src_data))) {
printf ("\nError : %s\n\n", src_strerror (data->src_error)) ;
exit (1);
}
* Write Processed SRC Data to Audio Out and Visual Out */
for (i = 0; i < framesPerBuffer * data->sfinfo1.channels; i++)
{
// gl_audioBuffer[i] = data->src_outBuffer[i] * data->amplitude;
out[i] = data->src_outBuffer[i] * data->amplitude;
}
I figured out a solution that works well enough for me and am just going to explain it best I can for anyone else with a similar issue. So to get the Varispeed to work, the way the API works is you give it a certain number of frames, and it spits out a certain number of frames. So for a SRC ratio of 0.5, if you process 512 frames per loop you are feeding in 512/0.5 frames = 1024 frames. That way when the API runs its src_process function, it compresses those 1024 frames into 512, speeding up the samples. So I dont fully understand why it solved my issue, but the problem was if the ratio is say 0.7, you end up with a float number which doesn't work with the arrays indexed int values. Therefore there's missing samples unless the src ratio is eqaully divisble by the framesperbuffer potentially at the end of each block. So what I did was add +2 frames to be read if the framesperbuffer%src.ratio != 0 and it seemed to fix 99% of the glitches.
/* This if Statement Ensures Smooth VariSpeed Output */
if (fmod((double)framesPerBuffer, data->src_data.src_ratio) == 0)
{
numInFrames = framesPerBuffer;
}
else
numInFrames = (framesPerBuffer/data->src_data.src_ratio) + 2;
/* Read FramesPerBuffer Amount of Data from inFile into buffer[] */
numberOfFrames = sf_readf_float(data->inFile, data->src_inBuffer, numInFrames);

How do I interpret an AudioBuffer and get the power?

I am trying to make a volume-meter for my app, which will show while recording a video. I have found a lot of support for such meters for iOS, but mostly for AVAudioPlayer, which is no option for me. I am using AVCaptureSession to record, and will then end up with the delegate method shown below:
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
CMFormatDescriptionRef formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer);
CFRetain(sampleBuffer);
CFRetain(formatDescription);
if(connection == audioConnection)
{
CMBlockBufferRef blockBuffer;
AudioBufferList audioBufferList;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer,
NULL, &audioBufferList, sizeof(AudioBufferList), NULL, NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&blockBuffer);
SInt16 *data = audioBufferList.mBuffers[0].mData;
}
//Releases etc..
}
(Only showing relevant code)
Of what I understand, I receive a 'sample buffer', containing either audio or video. Once I've verified that the connection indeed is audio, then I 'extract' the audioBufferList from the buffer, and I am sitting here left with a list of one (or more?) audioBuffers. The actual data is, as I understand, represented as SInt16, or '16 bits signed integer', which as far as I understand has a range from -32,768 to 32,767. However, if I simply print out this received value, I get A LOT of bouncing numbers. When in "silence" I get values bouncing rapidly between -200 and 200, and when there's noise I get values from -4,000 to 13,000, completely out of order.
As I've understood from reading, the value 0 will represent silence. However, I do not understand the difference between negative and positive values, as well as I do not know if the are able to reach all the way up/down to +-32,768.
I believe I need a percentage of how 'loud' it is, but have been unable to find anything.
I have read a couple of tutorials and references on the matter, but nothing makes sense to me. I followed one guide by doing this(appending to the code above, inside the if):
float accumulator = 0;
for(int i = 0; i < audioBufferList.mBuffers[0].mDataByteSize; i++)
accumulator += data[i] * data[i];
float power = accumulator / audioBufferList.mBuffers[0].mDataByteSize;
float decibels = log10f(power);
NSLog(#"%f", decibels);
Apparently, this code was supposed to align from -1 to +1, but that did not happen. I am now getting values around 6.194681 when silence, and 7.773492 for some noise. This is feels like the correct 'range', but in the 'wrong place'. I can't simply subtract 7 from the number and assume I'm between -1 and +1. There should be some logic and science behind how this should work, but I do not know enough about how digital audio works.
Does anyone know the logic behind this? Is 0 always silence while -32,768 and 32,767 are loud noises? Can I then simply multiply all negative values by -1 to always get positive values, and then find out how many percent they are at (between 0 and 32767)? Somehow, I don't believe this will work, as I guess there is a reason for the negative values.. I'm not completely sure what to try.
The code in your question is wrong in several ways. This code is trying to copy that from the article below, but you've not handled it properly converting from the float-based code in the article to 16-bit integer math. You're also looping on the wrong number of values (max i) and will end up pulling in garbage data. So this is all kinds of wrong.
https://www.mikeash.com/pyblog/friday-qa-2012-10-12-obtaining-and-interpreting-audio-data.html
The code in the article is correct. Here's what it is, expanded a bit. This is only looking at the first buffer in a 32-bit float buffer list.
float accumulator = 0;
AudioBuffer buffer = bufferList->mBuffers[0];
float * data = (float *)buffer.mData;
UInt32 numSamples = buffer.mDataByteSize / sizeof(float);
for (UInt32 i = 0; i < numSamples; i++) {
accumulator += data[i] * data[i];
}
float power = accumulator / (float)numSamples;
float decibels = 10 * log10f(power);
As the article says, the result here is decibels uses 0dB reference. eg, 0.0 is the maximum value. This is the same thing that AVAudioPlayer's averagePowerForChannel returns for example.
To use this in your 16-bit integer context, you'd need to a) loop appropriately through each 16-bit sample, b) convert the data[i] value from a 16-bit integer to a floating point value in the [-1.0, 1.0] range before squaring and adding to the accumulator.

FFT and accelerometer data: why am I getting this output?

I have read various posts here at StackOverflow regarding the execution of FFT on accelerometer data, but none of them helped me understand my problem.
I am executing this FFT implementation on my accelerometer data array in the following way:
int length = data.size();
double[] re = new double[256];
double[] im = new double[256];
for (int i = 0; i < length; i++) {
input[i] = data[i];
}
FFT fft = new FFT(256);
fft.fft(re, im);
float outputData[] = new float[256];
for (int i = 0; i < 128; i++) {
outputData[i] = (float) Math.sqrt(re[i] * re[i]
+ im[i] * im[i]);
}
I plotted the contents of outputData (left,) and also used R to perform the FFT on my data (right.)
What am I doing wrong here? I am using the same code for executing the FFT that I see in other places.
EDIT: Following the advice of #PaulR to apply a windowing function, and the link provided by #BjornRoche (http://baumdevblog.blogspot.com.br/2010/11/butterworth-lowpass-filter-coefficients.html), I was able to solve my problem. The solution is pretty much what is described in that link. This is my graph now: http://imgur.com/wGs43
The low frequency artefacts are probably due to a lack of windowing. Try applying a window function.
The overall shift is probably due to different scaling factors in the two different FFT implementations - my guess is that you are seeing a shift of 24 dB which corresponds to a difference in scaling by a factor of 256.
Because all your data on left are above 0, for frequency analyze it is a DC signal. So after your fft, it abstract the DC signal out, it is very hugh. For your scene, you only need to cut off the DC signal, just preserve the signal over 0 Hz(AC signal), that makes sense.

Resources