Performing FFT on PCM file to generate a spectrogram

Performing FFT on PCM file to generate a spectrogram - ios

Im writing an app that visualizes music. So far I have an audio file from the ipod library converted to PCM and placed inside the APPs directory. Now I am trying to perform a FFT on that PCM file to give me frequency and db over time. Here is code I found which uses Apple Accelerate framework to perform the FFT:
int bufferFrames = 1024;
int bufferlog2 = round(log2(bufferFrames));
FFTSetup fftSetup = vDSP_create_fftsetup(bufferlog2, kFFTRadix2);
float outReal[bufferFrames / 2];
float outImaginary[bufferFrames / 2];
COMPLEX_SPLIT out = { .realp = outReal, .imagp = outImaginary };
vDSP_ctoz((COMPLEX *)data, 2, &out, 1, bufferFrames / 2);
vDSP_fft_zrip(fftSetup, &out, 1, bufferlog2, FFT_FORWARD);
Now I dont understand how to feed this the PCM file. 'data' I believe is an array of COMPLEX objects which hold the portion of the audio that the FFT will be applied to. How do I build such a data structure from the PCM file?
I found some Java code that might be useful but not sure how to convert this to C. Also was is audioData and how do I fill that from the PCM file:
Complex[] complexData = new Complex[audioData.length];
for (int i = 0; i < complexData.length; i++) {
complextData[i] = new Complex(audioData[i], 0);
}

The "Apple Accelerate framework" is fine, I have used it on a project. Just read the docs carefully. Basically you will need to read the PCM data into memory and perhaps massage the format, then call the FFT on it.
To massage the data loop through your PCM data, sample by sample and copy as appropriate to another array.
The docs are here, you want the vDFT_fft* functions.
Mike Ash has a good writeup Friday Q&A site.
Audio data is the result of voltage samples taken periodically of the audio. The forma has many different forms. If you are going to work with this you are going to have to spend some time learning the basics. Also if you are going to run a FFT on audio data you need to learn about sampling frequency and "aliasing" sometimes called "fold back".

Related

Record and send audio data to c++ function

I need to send audio data in real-time in PCM format 8 KHz 16 Bit Mono.
Audio must been sent like array of chars with length
(<#char *data#>, <#int len#>).
Now I'm beginner in Audio processing and cant really understand how to accomplish that. My best try was been to convert to iLBC format and try but it couldn't work. Is there any sample how to record and convert audio to any format. I have already read Learning Core Audio from Chris Adamson and Kevin Avila but I really didn't find solution that works.
Simple what i need:
(record)->(convert?)-> send(char *data, int length);
Couse I need to send data like arrays of chars i cant use player.
EDIT:
I managed to make everything work with recording and with reading buffers. What I can't manage is :
if (ref[i]->mAudioDataByteSize != 0){
char * data = (char*)ref[i]->mAudioData;
sendData(mHandle, data, ref[i]->mAudioDataByteSize);
}

This is not really a beginner task. The solutions are to use either the RemoteIO Audio Unit, the Audio Queue API, or an AVAudioEngine installTapOnBus block. These will give you near real-time (depending on the buffer size) buffers of audio samples (Int16's or Floats, etc.) that you can convert, compress, pack into other data types or arrays, etc. Usually by calling a callback function or block that you provide to do whatever you want with the incoming recorded audio sample buffers.

Split audio track into segments by BPM and analyse each segment using Superpowered iOS

I have been using the Superpowered iOS library to analyse audio and extract BPM, loudness, pitch data. I'm working on an iOS Swift 3.0 project and have been able to get the C classes work with Swift using the Bridging headers for ObjC.
The problem I am running into is that whilst I can create a decoder object, extract audio from the Music Library and store it as a .WAV - I am unable to create a decoder object for just snippets of the extracted audio and get the analyser class to return data.
My approach has been to create a decoder object as follows:
var decodeAttempt = decoder!.open(self.originalFilePath, metaOnly: false, offset: offsetBytes, length: lengthBytes, stemsIndex: 0)
'offsetBytes' and 'LengthBytes' I think are the position within the audio file. As I have already decompressed audio, stored it as WAV and then am providing it to the decoder here, I am calculating the offset and length using the PCM Wave audio formula of 44100 x 2 x 16 / 8 = 176400 bytes per second. Then using this to specify a start point and length in bytes. I'm not sure that this is the correct way to do this as the decoder will return 'Unknown file format'.
Any ideas or even alternative suggestions of how to achieve the title of this question? Thanks in advance!

The offset and length parameters of the SuperpoweredDecoder are there because of the Android APK file format, where bundled audio files are simply concatenated to the package.
Despite a WAV file is as "uncompressed" as it can be, there is a header at the beginning, so offset and length are not a good way for this purpose. Especially as the header is present at the beginning only, and without the header decoding is not possible.
You mention that you can extract audio to PCM (and save to WAV). Then you have the answer in your hand: just submit different extracted portions to different instances of the SuperpoweredOfflineAnalyzer.

How to play streaming PCM data from server in iOS?

I'm trying develop app what play streaming PCM data from server.
I have used AudioQueue, but it does not work well.
PCM data format (from server) :
Sample rate = 48000, num of channel = 2, Bit per sample = 16
And, server is not streaming fixed bytes to client.
(Streaming variable bytes. Ex : 30848, 128, 2764, ... bytes )
My source code :
Here, ASBD structure what I have setted & create Audio Queue object
(language : Swift)
// Create ASBD structure & set properties.
var streamFormat = AudioStreamBasicDescription()
streamFormat.mSampleRate = 48000
streamFormat.mFormatID = kAudioFormatLinearPCM
streamFormat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
streamFormat.mFramesPerPacket = 1
streamFormat.mChannelsPerFrame = 2
streamFormat.mBitsPerChannel = 16
streamFormat.mBytesPerFrame = (streamFormat.mBitsPerChannel / 8) * streamFormat.mChannelsPerFrame
streamFormat.mBytesPerPacket = streamFormat.mBytesPerFrame
streamFormat.mReserved = 0
// Create AudioQueue for playing PCM streaming data.
var err = AudioQueueNewOutput(&streamFormat, self.queueCallbackProc, nil, nil, nil, 0, &aq)
...
I have setted ASBD structure & created AudioQueue object like the above.
AudioQueue play streamed PCM data very well for a few seconds,
but soon playing sound is on and off. What can I do?
(still streaming, and queueing AudioQueue)
Please give me any idea.

You need to do (at least) two things:
You need to buffer data to handle latency jitter and different packet sizes between the data from the server, and the data that the audio queue callback requests. A typical solution involves using a circular fifo/buffer, pre-filled with a certain amount of data, enough to handle the worse case network jitter (you will need to statistically analyze this number). The audio queue callback can just copy the requested amount of data out of the circular fifo. The network code tries to keep it filled.
You may also need some way to conceal errors when the two rates are not sufficiently identical: e.g. some way to duplicate or synthesize extra sound when the network rate is too slow, and some way to leave out samples when the network rate is too high, both ways trying to be as inaudible as possible and without producing loud clicks at any rate discontinuities or network drop-outs.

kAudioDevicePropertyBufferFrameSize replacement for iOS

I was trying to set up an audio unit to render the music (instead of Audio Queue.. which was too opaque for my purposes).. iOS doesn't have this property kAudioDevicePropertyBufferFrameSize.. any idea how I can derive this value to set up the buffer size of my IO unit?
I found this post interesting.. it asks about the possibility of using a combination of kAudioSessionProperty_CurrentHardwareIOBufferDuration and kAudioSessionProperty_CurrentHardwareOutputLatency audio session properties to determine that value.. but there is no answer.. any ideas?

You can use the kAudioSessionProperty_CurrentHardwareIOBufferDuration property, which represents the buffer size in seconds. Multiply this by the sample rate you get from kAudioSessionProperty_CurrentHardwareSampleRate to get the number of samples you should buffer.
The resulting buffer size should be a multiple of 2. I believe either 512 or 4096 are what you're likely to get, but you should always base it off of the values returned from AudioSessionGetProperty.
Example:
Float64 sampleRate;
UInt32 propSize = sizeof(Float64);
AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareSampleRate,
&propSize,
&sampleRate);
Float32 bufferDuration;
propSize = sizeof(Float32);
AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration,
&propSize,
&bufferDuration);
UInt32 bufferLengthInFrames = sampleRate * bufferDuration;
The next step is to find out the input stream format of the unit you're sending audio to. Based on your description, I'm assuming that you're programmatically generating audio to send to the speakers. This code assumes unit is an AudioUnit you're sending audio to, whether that's the RemoteIO or something like an effect Audio Unit.
AudioStreamBasicDescription inputASBD;
UInt32 propSize = sizeof(AudioStreamBasicDescription);
AudioUnitGetProperty(unit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&inputASBD,
&propSize);
After this, inputASBD.mFormatFlags will be a bit field corresponding to the audio stream format that unit is expecting. The two most likely sets of flags are named kAudioFormatFlagsCanonical and kAudioFormatFlagsAudioUnitCanonical. These two have corresponding sample types AudioSampleType and AudioUnitSampleType that you can base your size calculation off of.
As an aside, AudioSampleType typically represents samples coming from the mic or destined for the speakers, whereas AudioUnitSampleType is usually for samples that are intended to be processed (by an audio unit, for example). At the moment on iOS, AudioSampleType is a SInt16 and AudioUnitSampleType is fixed 8.24 number stored in a SInt32 container. Here's a post on the Core Audio mailing list explaining this design choice
The reason I hold back from saying something like "just use Float32, it'll work" is because the actual bit representation of the stream is subject to change if Apple feels like it.

The audio unit itself decides on the actual buffer size, so the app's audio unit callback has to be able to handle any reasonable size given to it. You can suggest and poll the kAudioSessionProperty_CurrentHardwareIOBufferDuration property, but note that this value can while your app is running (especially during screen lock or call interruptions, etc.) outside of what the app can control.

Recording, modifying and playing audio on iOS

EDIT: In the end I used exactly as I explained below, AVRecorder for recording the speech and openAL for the pitch shift and playback. It worked out quite well.
I got a question regarding recording, modifying and playing back audio. I asked a similar question before ( Record, modify pitch and play back audio in real time on iOS ) but I now have more information and could do with some further advice please.
So firstly this is what I am trying to do (on a separate thread to the main thread):
monitor the iphone mic
check for sound greater than a certain volume
if above threshold start recording e.g. person starts talking
continue to record until volume drops below threshold e.g. person stops talking
modify pitch of recorded sound.
playback sound
I was thinking of using the AVRecorder to monitor and record the sound, good tutorial here: http://mobileorchard.com/tutorial-detecting-when-a-user-blows-into-the-mic/
and I was thinking of using openAL to modify the pitch of the recorded audio.
So my question is, is my thinking correct in the list of points above, am I missing something or is there a better/easier way to do it. Can I avoid mixing audio libraries and just use AVFoundation to change the pitch too?

You can either use AVRecorder or something lower like the realtime IO audio unit.
The concept of 'volume' is pretty vague. You might want to look at the difference between calculating peak and RMS values, and understanding how to integrate an RMS value over a given time (say 300ms which is what a VU meter uses).
Basically you sum all the squares of the values. You would take the square root and convert to dBFS with 10 * log10f(sqrt(sum/num_samples)), but you can do that without the sqrt in one step with 20 * log10f(sum/num_samples).
You'll need to do a lot of adjusting of integration times and thresholds to get it to behave the way you want.
For pitch shifting, I think OpenAL with do the trick, the technique behind it is called band limited interpolation - https://ccrma.stanford.edu/~jos/resample/Theory_Ideal_Bandlimited_Interpolation.html
This example shows a rms calculation as a running average. The circular buffer maintains a history of squares, and eliminates the need to sum the squares every operation. I haven't run it so treat it as pseudo code ;)
Example:
class VUMeter
{
protected:
// samples per second
float _sampleRate;
// the integration time in seconds (vu meter is 300ms)
float _integrationTime;
// these maintain a circular buffer which contains
// the 'squares' of the audio samples
int _integrationBufferLength;
float *_integrationBuffer;
float *_integrationBufferEnd;
float *_cursor;
// this is a sort of accumulator to make a running
// average more efficient
float _sum;
public:
VUMeter()
: _sampleRate(48000.0f)
, _integrationTime(0.3f)
, _sum(0.)
{
// create a buffer of values to be integrated
// e.g 300ms # 48khz is 14400 samples
_integrationBufferLength = (int) (_integrationTime * _sampleRate);
_integrationBuffer = new float[_integrationBufferLength + 1];
bzero(_integrationBuffer, _integrationBufferLength);
// set the pointers for our ciruclar buffer
_integrationBufferEnd = _integrationBuffer + _integrationBufferLength;
_cursor = _integrationBuffer;
}
~VUMeter()
{
delete _integrationBuffer;
}
float getRms(float *audio, int samples)
{
// process the samples
// this part accumulates the 'squares'
for (int i = 0; i < samples; ++i)
{
// get the input sample
float s = audio[i];
// remove the oldest value from the sum
_sum -= *_cursor;
// calculate the square and write it into the buffer
double square = s * s;
*_cursor = square;
// add it to the sum
_sum += square;
// increment the buffer cursor and wrap
++_cursor;
if (_cursor == _integrationBufferEnd)
_cursor = _integrationBuffer;
}
// now calculate the 'root mean' value in db
return 20 * log10f(_sum / _integrationBufferLength);
}
};

OpenAL resampling will change the pitch and the duration inversely. e.g. a sound resampled to a higher pitch will play for a shorter amount of time and thus faster.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart