I'm trying to generate a spectrogram from an AVAudioPCMBuffer in Swift. I install a tap on an AVAudioMixerNode and receive a callback with the audio buffer. I'd like to convert the signal in the buffer to a [Float:Float] dictionary where the key represents the frequency and the value represents the magnitude of the audio on the corresponding frequency.
I tried using Apple's Accelerate framework but the results I get seem dubious. I'm sure it's just in the way I'm converting the signal.
I looked at this blog post amongst other things for a reference.
Here is what I have:
self.audioEngine.mainMixerNode.installTapOnBus(0, bufferSize: 1024, format: nil, block: { buffer, when in
let bufferSize: Int = Int(buffer.frameLength)
// Set up the transform
let log2n = UInt(round(log2(Double(bufferSize))))
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
// Create the complex split value to hold the output of the transform
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
// Now I need to convert the signal from the buffer to complex value, this is what I'm struggling to grasp.
// The complexValue should be UnsafePointer<DSPComplex>. How do I generate it from the buffer's floatChannelData?
vDSP_ctoz(complexValue, 2, &output, 1, UInt(bufferSize / 2))
// Do the fast Fournier forward transform
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
// Convert the complex output to magnitude
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
vDSP_zvmags(&output, 1, &fft, 1, vDSP_length(bufferSize / 2))
// Release the setup
vDSP_destroy_fftsetup(fftsetup)
// TODO: Convert fft to [Float:Float] dictionary of frequency vs magnitude. How?
})
My questions are
How do I convert the buffer.floatChannelData to UnsafePointer<DSPComplex> to pass to the vDSP_ctoz function? Is there a different/better way to do it maybe even bypassing vDSP_ctoz?
Is this different if the buffer contains audio from multiple channels? How is it different when the buffer audio channel data is or isn't interleaved?
How do I convert the indices in the fft array to frequencies in Hz?
Anything else I may be doing wrong?
Update
Thanks everyone for suggestions. I ended up filling the complex array as suggested in the accepted answer. When I plot the values and play a 440 Hz tone on a tuning fork it registers exactly where it should.
Here is the code to fill the array:
var channelSamples: [[DSPComplex]] = []
for var i=0; i<channelCount; ++i {
channelSamples.append([])
let firstSample = buffer.format.interleaved ? i : i*bufferSize
for var j=firstSample; j<bufferSize; j+=buffer.stride*2 {
channelSamples[i].append(DSPComplex(real: buffer.floatChannelData.memory[j], imag: buffer.floatChannelData.memory[j+buffer.stride]))
}
}
The channelSamples array then holds separate array of samples for each channel.
To calculate the magnitude I used this:
var spectrum = [Float]()
for var i=0; i<bufferSize/2; ++i {
let imag = out.imagp[i]
let real = out.realp[i]
let magnitude = sqrt(pow(real,2)+pow(imag,2))
spectrum.append(magnitude)
}
Hacky way: you can just cast a float array. Where reals and imag values are going one after another.
It depends on if audio is interleaved or not. If it's interleaved (most of the cases) left and right channels are in the array with STRIDE 2
Lowest frequency in your case is frequency of a period of 1024 samples. In case of 44100kHz it's ~23ms, lowest frequency of the spectrum will be 1/(1024/44100) (~43Hz). Next frequency will be twice of this (~86Hz) and so on.
4: You have installed a callback handler on an audio bus. This is likely run with real-time thread priority and frequently. You should not do anything that has potential for blocking (it will likely result in priority inversion and glitchy audio):
Allocate memory (realp, imagp - [Float](.....) is shorthand for Array[float] - and likely allocated on the heap`. Pre-allocate these
Call lengthy operations such as vDSP_create_fftsetup() - which also allocates memory and initialises it. Again, you can allocate this once outside of your function.
Related
I'm trying to make a simple 3D modeling tool.
there is some work to move a vertex( or vertices ) for transform the model.
I used dynamic vertex buffer because thought it needs much update.
but performance is too low in high polygon model even though I change just one vertex.
is there other methods? or did I wrong way?
here is my D3D11_BUFFER_DESC
Usage = D3D11_USAGE_DYNAMIC;
CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
BindFlags = D3D11_BIND_VERTEX_BUFFER;
ByteWidth = sizeof(ST_Vertex) * _nVertexCount
D3D11_SUBRESOURCE_DATA d3dBufferData;
d3dBufferData.pSysMem = pVerticesInfo;
hr = pd3dDevice->CreateBuffer(&descBuffer, &d3dBufferData, &_pVertexBuffer);
and my update funtion
D3D11_MAPPED_SUBRESOURCE d3dMappedResource;
pImmediateContext->Map(_pVertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &d3dMappedResource);
ST_Vertex* pBuffer = (ST_Vertex*)d3dMappedResource.pData;
for (int i = 0; i < vIndice.size(); ++i)
{
pBuffer[vIndice[i]].xfPosition.x = pVerticesInfo[vIndice[i]].xfPosition.x;
pBuffer[vIndice[i]].xfPosition.y = pVerticesInfo[vIndice[i]].xfPosition.y;
pBuffer[vIndice[i]].xfPosition.z = pVerticesInfo[vIndice[i]].xfPosition.z;
}
pImmediateContext->Unmap(_pVertexBuffer, 0);
As mentioned in the previous answer, you are updating your whole buffer every time, which will be slow depending on model size.
The solution is indeed to implement partial updates, there are two possibilities for it, you want to update a single vertex, or you want to update
arbitrary indices (for example, you want to move N vertices in one go, in different locations, like vertex 1,20,23 for example.
The first solution is rather simple, first create your buffer with the following description :
Usage = D3D11_USAGE_DEFAULT;
CPUAccessFlags = 0;
BindFlags = D3D11_BIND_VERTEX_BUFFER;
ByteWidth = sizeof(ST_Vertex) * _nVertexCount
D3D11_SUBRESOURCE_DATA d3dBufferData;
d3dBufferData.pSysMem = pVerticesInfo;
hr = pd3dDevice->CreateBuffer(&descBuffer, &d3dBufferData, &_pVertexBuffer);
This makes sure your vertex buffer is gpu visible only.
Next create a second dynamic buffer which has the size of a single vertex (you do not need any bind flags in that case, as it will be used only for copies)
_pCopyVertexBuffer
Usage = D3D11_USAGE_DYNAMIC; //Staging works as well
CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
BindFlags = 0;
ByteWidth = sizeof(ST_Vertex);
D3D11_SUBRESOURCE_DATA d3dBufferData;
d3dBufferData.pSysMem = NULL;
hr = pd3dDevice->CreateBuffer(&descBuffer, &d3dBufferData, &_pCopyVertexBuffer);
when you move a vertex, copy the changed vertex in the copy buffer :
ST_Vertex changedVertex;
D3D11_MAPPED_SUBRESOURCE d3dMappedResource;
pImmediateContext->Map(_pVertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &d3dMappedResource);
ST_Vertex* pBuffer = (ST_Vertex*)d3dMappedResource.pData;
pBuffer->xfPosition.x = changedVertex.xfPosition.x;
pBuffer->.xfPosition.y = changedVertex.xfPosition.y;
pBuffer->.xfPosition.z = changedVertex.xfPosition.z;
pImmediateContext->Unmap(_pVertexBuffer, 0);
Since you use D3D11_MAP_WRITE_DISCARD, make sure to write all attributes there (not only position).
Now once you done, you can use ID3D11DeviceContext::CopySubresourceRegion to only copy the modified vertex in the current location :
I assume that vertexID is the index of the modified vertex :
pd3DeviceContext->CopySubresourceRegion(_pVertexBuffer,
0, //must be 0
vertexID * sizeof(ST_Vertex), //location of the vertex in you gpu vertex buffer
0, //must be 0
0, //must be 0
_pCopyVertexBuffer,
0, //must be 0
NULL //in this case we copy the full content of _pCopyVertexBuffer, so we can set to null
);
Now if you want to update a list of vertices, things get more complicated and you have several options :
-First you apply this single vertex technique in a loop, this will work quite well if your changeset is small.
-If your changeset is very big (close to almost full vertex size, you can probably rewrite the whole buffer instead).
-An intermediate technique is to use compute shader to perform the updates (thats the one I normally use as its the most flexible version).
Posting all c++ binding code would be way too long, but here is the concept :
your vertex buffer must have BindFlags = D3D11_BIND_VERTEX_BUFFER | D3D11_BIND_UNORDERED_ACCESS; //this allows to write wioth compute
you need to create an ID3D11UnorderedAccessView for this buffer (so shader can write to it)
you need the following misc flags : D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS //this allows to write as RWByteAddressBuffer
you then create two dynamic structured buffers (I prefer those over byteaddress, but vertex buffer and structured is not allowed in dx11, so for the write one you need raw instead)
first structured buffer has a stride of ST_Vertex (this is your changeset)
second structured buffer has a stride of 4 (uint, these are the indices)
both structured buffers get an arbitrary element count (normally i use 1024 or 2048), so that will be the maximum amount of vertices you can update in a single pass.
both structured buffers you need an ID3D11ShaderResourceView (shader visible, read only)
Then update process is the following :
write modified vertices and locations in structured buffers (using map discard, if you have to copy less its ok)
attach both structured buffers for read
attach ID3D11UnorderedAccessView for write
set your compute shader
call dispatch
detach ID3D11UnorderedAccessView for write (this is VERY important)
This is a sample compute shader code (I assume you vertex is position only, for simplicity)
cbuffer cbUpdateCount : register(b0)
{
uint updateCount;
};
RWByteAddressBuffer RWVertexPositionBuffer : register(u0);
StructuredBuffer<float3> ModifiedVertexBuffer : register(t0);
StructuredBuffer<uint> ModifiedVertexIndicesBuffer : register(t0);
//this is the stride of your vertex buffer, since here we use float3 it is 12 bytes
#define WRITE_STRIDE 12
[numthreads(64, 1, 1)]
void CS( uint3 tid : SV_DispatchThreadID )
{
//make sure you do not go part element count, as here we runs 64 threads at a time
if (tid.x >= updateCount) { return; }
uint readIndex = tid.x;
uint writeIndex = ModifiedVertexIndicesBuffer[readIndex];
float3 vertex = ModifiedVertexBuffer[readIndex];
//byte address buffers do not understand float, asuint is a binary cast.
RWVertexPositionBuffer.Store3(writeIndex * WRITE_STRIDE, asuint(vertex));
}
For the purposes of this question I'm going to assume you already have a mechanism for selecting a vertex from a list of vertices based upon ray casting or some other picking method and a mechanism for creating a displacement vector detailing how the vertex was moved in model space.
The method you have for updating the buffer is sufficient for anything less than a few hundred vertices, but on large scale models it becomes extremely slow. This is because you're updating everything, rather than the individual vertices you modified.
To fix this, you should only update the vertices you have changed, and to do that you need to create a change set.
In concept, a change set is nothing more than a set of changes made to the data - a list of the vertices that need to be updated. Since we already know which vertices were modified (otherwise we couldn't have manipulated them), we can map in the GPU buffer, go to that vertex specifically, and copy just those vertices into the GPU buffer.
In your vertex modification method, record the index of the vertex that was modified by the user:
//Modify the vertex coordinates based on mouse displacement
pVerticesInfo[SelectedVertexIndex].xfPosition.x += DisplacementVector.x;
pVerticesInfo[SelectedVertexIndex].xfPosition.y += DisplacementVector.y;
pVerticesInfo[SelectedVertexIndex].xfPosition.z += DisplacementVector.z;
//Add the changed vertex to the list of changes.
changedVertices.add(SelectedVertexIndex);
//And update the GPU buffer
UpdateD3DBuffer();
In UpdateD3DBuffer(), do the following:
D3D11_MAPPED_SUBRESOURCE d3dMappedResource;
pImmediateContext->Map(_pVertexBuffer, 0, D3D11_MAP_WRITE, 0, &d3dMappedResource);
ST_Vertex* pBuffer = (ST_Vertex*)d3dMappedResource.pData;
for (int i = 0; i < changedVertices.size(); ++i)
{
pBuffer[changedVertices[i]].xfPosition.x = pVerticesInfo[changedVertices[i]].xfPosition.x;
pBuffer[changedVertices[i]].xfPosition.y = pVerticesInfo[changedVertices[i]].xfPosition.y;
pBuffer[changedVertices[i]].xfPosition.z = pVerticesInfo[changedVertices[i]].xfPosition.z;
}
pImmediateContext->Unmap(_pVertexBuffer, 0);
changedVertices.clear();
This has the effect of only updating the vertices that have changed, rather than all vertices in the model.
This also allows for some more complex manipulations. You can select multiple vertices and move them all as a group, select a whole face and move all the connected vertices, or move entire regions of the model relatively easily, assuming your picking method is capable of handling this.
In addition, if you record the change sets with enough information (the affected vertices and the displacement index), you can fairly easily implement an undo function by simply reversing the displacement vector and reapplying the selected change set.
I am trying to make a volume-meter for my app, which will show while recording a video. I have found a lot of support for such meters for iOS, but mostly for AVAudioPlayer, which is no option for me. I am using AVCaptureSession to record, and will then end up with the delegate method shown below:
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
CMFormatDescriptionRef formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer);
CFRetain(sampleBuffer);
CFRetain(formatDescription);
if(connection == audioConnection)
{
CMBlockBufferRef blockBuffer;
AudioBufferList audioBufferList;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer,
NULL, &audioBufferList, sizeof(AudioBufferList), NULL, NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&blockBuffer);
SInt16 *data = audioBufferList.mBuffers[0].mData;
}
//Releases etc..
}
(Only showing relevant code)
Of what I understand, I receive a 'sample buffer', containing either audio or video. Once I've verified that the connection indeed is audio, then I 'extract' the audioBufferList from the buffer, and I am sitting here left with a list of one (or more?) audioBuffers. The actual data is, as I understand, represented as SInt16, or '16 bits signed integer', which as far as I understand has a range from -32,768 to 32,767. However, if I simply print out this received value, I get A LOT of bouncing numbers. When in "silence" I get values bouncing rapidly between -200 and 200, and when there's noise I get values from -4,000 to 13,000, completely out of order.
As I've understood from reading, the value 0 will represent silence. However, I do not understand the difference between negative and positive values, as well as I do not know if the are able to reach all the way up/down to +-32,768.
I believe I need a percentage of how 'loud' it is, but have been unable to find anything.
I have read a couple of tutorials and references on the matter, but nothing makes sense to me. I followed one guide by doing this(appending to the code above, inside the if):
float accumulator = 0;
for(int i = 0; i < audioBufferList.mBuffers[0].mDataByteSize; i++)
accumulator += data[i] * data[i];
float power = accumulator / audioBufferList.mBuffers[0].mDataByteSize;
float decibels = log10f(power);
NSLog(#"%f", decibels);
Apparently, this code was supposed to align from -1 to +1, but that did not happen. I am now getting values around 6.194681 when silence, and 7.773492 for some noise. This is feels like the correct 'range', but in the 'wrong place'. I can't simply subtract 7 from the number and assume I'm between -1 and +1. There should be some logic and science behind how this should work, but I do not know enough about how digital audio works.
Does anyone know the logic behind this? Is 0 always silence while -32,768 and 32,767 are loud noises? Can I then simply multiply all negative values by -1 to always get positive values, and then find out how many percent they are at (between 0 and 32767)? Somehow, I don't believe this will work, as I guess there is a reason for the negative values.. I'm not completely sure what to try.
The code in your question is wrong in several ways. This code is trying to copy that from the article below, but you've not handled it properly converting from the float-based code in the article to 16-bit integer math. You're also looping on the wrong number of values (max i) and will end up pulling in garbage data. So this is all kinds of wrong.
https://www.mikeash.com/pyblog/friday-qa-2012-10-12-obtaining-and-interpreting-audio-data.html
The code in the article is correct. Here's what it is, expanded a bit. This is only looking at the first buffer in a 32-bit float buffer list.
float accumulator = 0;
AudioBuffer buffer = bufferList->mBuffers[0];
float * data = (float *)buffer.mData;
UInt32 numSamples = buffer.mDataByteSize / sizeof(float);
for (UInt32 i = 0; i < numSamples; i++) {
accumulator += data[i] * data[i];
}
float power = accumulator / (float)numSamples;
float decibels = 10 * log10f(power);
As the article says, the result here is decibels uses 0dB reference. eg, 0.0 is the maximum value. This is the same thing that AVAudioPlayer's averagePowerForChannel returns for example.
To use this in your 16-bit integer context, you'd need to a) loop appropriately through each 16-bit sample, b) convert the data[i] value from a 16-bit integer to a floating point value in the [-1.0, 1.0] range before squaring and adding to the accumulator.
I am trying to display a spectrum analyser for iOS and am stuck after two weeks. I have read pretty much every post about FFT and the Accelerate Frameworks on here and have downloaded the aurioTouch2 example from Apple.
I think I understand the mechanism of FFT (did it in Uni 20 years ago) and am a fairly experienced iOS programmer but I have hit a wall.
I am using AudioUnit to play mp3, m4a, and wav files and have that working beautifully. I have attached a Render Callback to the AUGraph and I can plot Waveforms to the music. The waveform goes with the music nicely.
When I take the data from the Render Callback which is in Float form in the range 0 .. 1 and attempt to pass that through the FFT code (either my own or aurioTouch2's FFTBufferManager.mm) I get something thats not completely wrong, but is not correct either. or instance this is a 440Hz sine wave:
That peak value is -6.1306, followed by -24. -31., -35. and those values towards the end are around -63.
Animated gif for "Black Betty":
Animated gif for "Black Betty
The format I receive from the Render callback:
AudioStreamBasicDescription outputFileFormat;
outputFileFormat.mSampleRate = 44100;
outputFileFormat.mFormatID = kAudioFormatLinearPCM;
outputFileFormat.mFormatFlags = kAudioFormatFlagsNativeFloatPacked | kAudioFormatFlagIsNonInterleaved;
outputFileFormat.mBitsPerChannel = 32;
outputFileFormat.mChannelsPerFrame = 2;
outputFileFormat.mFramesPerPacket = 1;
outputFileFormat.mBytesPerFrame = outputFileFormat.mBitsPerChannel / 8;
outputFileFormat.mBytesPerPacket = outputFileFormat.mBytesPerFrame;
In looking at the aurioTouch2 example it looks like they are receiving their data in a signed int format but then running an AudioConverter to convert it to Float. Their format is hard to decipher but is using a macro:
drawFormat.SetAUCanonical(2, false);
drawFormat.mSampleRate = 44100;
XThrowIfError(AudioConverterNew(&thruFormat, &drawFormat, &audioConverter), "couldn't setup AudioConverter");
In their render callback they are copying the data out of the AudioBufferList into mAudioBuffer (Float32*) and passing it to the CalculateFFT method which calls vDSP_ctoz
//Generate a split complex vector from the real data
vDSP_ctoz((COMPLEX *)mAudioBuffer, 2, &mDspSplitComplex, 1, mFFTLength);
I think this is where my problem is. What format does vDSP_ctoz expect? It is cast as a (COMPLEX*) but I cannot find anywhere in the aurioTouch2 code which puts the mAudioBuffer data into the (COMPLEX*) format. So is must be coming from the Render Callback in this format?
typedef struct DSPComplex {
float real;
float imag;
} DSPComplex;
typedef DSPComplex COMPLEX;
If I don't have the format correct at this point (or understand the format) then there is no point in debugging the rest of it.
Any help would be greatly appreciated.
Code from AurioTouch2 that I am using:
Boolean FFTBufferManager::ComputeFFTFloat(Float32 *outFFTData)
{
if (HasNewAudioData())
{
// Added after Hotpaw2 comment.
UInt32 windowSize = mFFTLength;
Float32 *window = (float *) malloc(windowSize * sizeof(float));
memset(window, 0, windowSize * sizeof(float));
vDSP_hann_window(window, windowSize, 0);
vDSP_vmul( mAudioBuffer, 1, window, 1, mAudioBuffer, 1, mFFTLength);
// Added after Hotpaw2 comment.
DSPComplex *audioBufferComplex = new DSPComplex[mFFTLength];
for (int i=0; i < mFFTLength; i++)
{
audioBufferComplex[i].real = mAudioBuffer[i];
audioBufferComplex[i].imag = 0.0f;
}
//Generate a split complex vector from the real data
vDSP_ctoz((COMPLEX *)audioBufferComplex, 2, &mDspSplitComplex, 1, mFFTLength);
//Take the fft and scale appropriately
vDSP_fft_zrip(mSpectrumAnalysis, &mDspSplitComplex, 1, mLog2N, kFFTDirection_Forward);
vDSP_vsmul(mDspSplitComplex.realp, 1, &mFFTNormFactor, mDspSplitComplex.realp, 1, mFFTLength);
vDSP_vsmul(mDspSplitComplex.imagp, 1, &mFFTNormFactor, mDspSplitComplex.imagp, 1, mFFTLength);
//Zero out the nyquist value
mDspSplitComplex.imagp[0] = 0.0;
//Convert the fft data to dB
vDSP_zvmags(&mDspSplitComplex, 1, outFFTData, 1, mFFTLength);
//In order to avoid taking log10 of zero, an adjusting factor is added in to make the minimum value equal -128dB
vDSP_vsadd( outFFTData, 1, &mAdjust0DB, outFFTData, 1, mFFTLength);
Float32 one = 1;
vDSP_vdbcon(outFFTData, 1, &one, outFFTData, 1, mFFTLength, 0);
free( audioBufferComplex);
free( window);
OSAtomicDecrement32Barrier(&mHasAudioData);
OSAtomicIncrement32Barrier(&mNeedsAudioData);
mAudioBufferCurrentIndex = 0;
return true;
}
else if (mNeedsAudioData == 0)
OSAtomicIncrement32Barrier(&mNeedsAudioData);
return false;
}
After reading the answer below I tried adding this to the top of the method:
DSPComplex *audioBufferComplex = new DSPComplex[mFFTLength];
for (int i=0; i < mFFTLength; i++)
{
audioBufferComplex[i].real = mAudioBuffer[i];
audioBufferComplex[i].imag = 0.0f;
}
//Generate a split complex vector from the real data
vDSP_ctoz((COMPLEX *)audioBufferComplex, 2, &mDspSplitComplex, 1, mFFTLength);
And the result I got was this:
I am now rendering the 5 last results, they are the faded ones behind.
After adding hann window:
Now looks a lot better after applying the hann window (Thanks hotpaw2). Not worried about the mirror image.
My main problem now is using a real song it doesn't look like other Spectrum Analysers. Everything is always pushed high on the left no matter what music i push through it. After applying the window it seems to go to the beat a lot better though.
The AU render callback only returns the real part of the complex input required. To use a complex FFT, you need to fill an equal number of imaginary components with zeros yourself, and copy over the elements of the real part, if needed.
I'm working on an app that should do some audio signal processing. I need to measure the audio level in each one of the buffers I get (through the Callback function). I've been searching the web for some time, and I found that there is a build-in property called Current level metering:
AudioQueueGetProperty(recordState->queue,kAudioQueueProperty_CurrentLevelMeter,meters,&dlen);
This property gets me the average or peak audio level, but it's not synchronised to the current buffer.
I figured out I need to calculate the audio level from the buffer data by myself, so I had this:
double calcAudioRMS (SInt16 * audioData, int numOfSamples)
{
double RMS, adPercent;
RMS = 0;
for (int i=0; i<numOfSamples; i++)
{
adPercent=audioData[i]/32768.0f;
RMS += adPercent*adPercent;
}
RMS = sqrt(RMS / numOfSamples);
return RMS;
}
This function gets the audio data (casted into Sint16) and the number of samples in the current buffer. The numbers I get are indeed between 0 and 1, but they seem to be rather random and low comparing to the numbers I got from the built-in audio level metering.
The recording audio format is:
format->mSampleRate = 8000.0;
format->mFormatID = kAudioFormatLinearPCM;
format->mFramesPerPacket = 1;
format->mChannelsPerFrame = 1;
format->mBytesPerFrame = 2;
format->mBytesPerPacket = 2;
format->mBitsPerChannel = 16;
format->mReserved = 0;
format->mFormatFlags = kLinearPCMFormatFlagIsSignedInteger |kLinearPCMFormatFlagIsPacked;
My question is how to get the right values from the buffer? Is there a built-in function \ property for this? Or should I calculate the audio level myself, and how to do it?
Thanks in advance.
Your calculation for RMS power is correct. I'd be inclined to say that you have a fewer number of samples than Apple does, or something similar, and that would explain the difference. You can check by inputting a loud sine wave, and checking that Apple (and you) calculate RMS power at 1/sqrt(2).
Unless there's a good reason, I would use Apple's power calculations. I've used them, and they seem good to me. Additionally, generally you don't want RMS power, you want RMS power as decibels, or use the kAudioQueueProperty_CurrentLevelMeterDB constant. (This depends on if you're trying to build an audio meter, or truly display the audio power)
I have read various posts here at StackOverflow regarding the execution of FFT on accelerometer data, but none of them helped me understand my problem.
I am executing this FFT implementation on my accelerometer data array in the following way:
int length = data.size();
double[] re = new double[256];
double[] im = new double[256];
for (int i = 0; i < length; i++) {
input[i] = data[i];
}
FFT fft = new FFT(256);
fft.fft(re, im);
float outputData[] = new float[256];
for (int i = 0; i < 128; i++) {
outputData[i] = (float) Math.sqrt(re[i] * re[i]
+ im[i] * im[i]);
}
I plotted the contents of outputData (left,) and also used R to perform the FFT on my data (right.)
What am I doing wrong here? I am using the same code for executing the FFT that I see in other places.
EDIT: Following the advice of #PaulR to apply a windowing function, and the link provided by #BjornRoche (http://baumdevblog.blogspot.com.br/2010/11/butterworth-lowpass-filter-coefficients.html), I was able to solve my problem. The solution is pretty much what is described in that link. This is my graph now: http://imgur.com/wGs43
The low frequency artefacts are probably due to a lack of windowing. Try applying a window function.
The overall shift is probably due to different scaling factors in the two different FFT implementations - my guess is that you are seeing a shift of 24 dB which corresponds to a difference in scaling by a factor of 256.
Because all your data on left are above 0, for frequency analyze it is a DC signal. So after your fft, it abstract the DC signal out, it is very hugh. For your scene, you only need to cut off the DC signal, just preserve the signal over 0 Hz(AC signal), that makes sense.