How can I get my iPhone to listen for sound frequencies above a certain threshold? - ios

I'm interested in getting my iOS app to turn on the microphone and only listen for frequencies above 17000 hz. If it hears something in that range, I'd like the app to call a method.
I was able to find a repository that detects frequency: https://github.com/krafter/DetectingAudioFrequency
And here is a post breaking down FFT:
Get Hz frequency from audio stream on iPhone
Using these examples, I've been able to get the phone to react to the strongest frequency it hears, but I'm more interested in just reacting to the above 17000 hz frequencies.

The fact that I wrote that code helps me answering this question but the answer probably only applies to this code.
You can easily limit the frequencies you listen to just by trimming that output array to a piece that contains only the range you need.
In details: To be simple - array[0..255] contains your audio in frequency domain. For example you sample rate was 44100 when you did FFT.
Then maximum frequency you can encode is 22050. (Nyquist theorem).
That is array[0] contains value for 22050/256=86.13 Hz. Array[1] contains value for 86.13*2 = 172.26 Hz, array[2] contains value for 86.13*3 = 258.39 Hz. And so on. Your full range is distributed across those 256 values. (and yes, precision suffers)
So if you only need to listen to some range, let's say above 17000Hz, you just take a piece of that array and ignore the rest. In this case you take 17000/86.13=197 to 255 subarray and you have it. Only 17000-22050 range.
In my repo you modify strongestFrequencyHZ function like that:
static Float32 strongestFrequencyHZ(Float32 *buffer, FFTHelperRef *fftHelper, UInt32 frameSize, Float32 *freqValue) {
Float32 *fftData = computeFFT(fftHelper, buffer, frameSize);
fftData[0] = 0.0;
unsigned long length = frameSize/2.0;
Float32 max = 0;
unsigned long maxIndex = 0;
Float32 freqLimit = 17000; //HZ
Float32 freqsPerIndex = NyquistMaxFreq/length;
unsigned long lowestLimitIndex = (unsigned long) freqLimit/freqsPerIndex;
unsigned long newLen = length-lowestLimitIndex;
Float32 *newData = fftData+lowestLimitIndex; //address arithmetic
max = vectorMaxValueACC32_index(newData, newLen, 1, &maxIndex);
if (freqValue!=NULL) { *freqValue = max; }
Float32 HZ = frequencyHerzValue(lowestLimitIndex+maxIndex, length, NyquistMaxFreq);
return HZ;
}
I did some address arithmetic in there so it looks kind of complicated. You can just take that fftData array and do the regular stuff.
Other things to keep in mind:
Finding strongest freq. is easy. You just find maximum in that array. That's it. But in you case you need to monitor the range and find when it went from regular weak noise to some strong signal. In other words when stuff peaks, and this is not so trivial, but possible. You can probably just set some limit above which the signal becomes detected, although this not the best option.
I would rather be optimistic about this cause in real life you can't see much noise at 18000Hz around you. The only thing a can remember of are some old TVs that produce that high pitched sound when they're on.

Related

Can dispatch overhead be more expensive than an actual thread work?

Recently I'm thinking about possibilities for hard optimization. I mean those kind of optimization when you sometimes hardcode the loop from 3 iterations just to get something.
So one thought came to my mind. Imagine we have a buffer of 1024 elements. We want to multiply every single element of it by 2. And we create a simple kernel, where we pass a buffer, outBuffer, their size (to check if we outside of the bounds) and [[thread_position_in_grid]]. Then we just do a simple multiplaction and write that number to another buffer.
It will look a bit like that:
kernel void multiplyBy2(constant float* in [[buffer(0)]],
device float* out [[buffer(1)]],
constant Uniforms& uniforms [[buffer(2)]],
uint gid [[thread_position_in_grid]])
{
if (gid >= uniforms.buffer_size) { return; }
out[gid] = in[gid] * 2.0;
}
The thing I'm concerned about is If the actual thread work still worth the overhead that is produced by it's dispatching?
Would it be more effective to, for example, dispatch 4 times less threads, that do something like that
out[gid * 4 + 0] = in[gid + 0] * 2.0;
out[gid * 4 + 1] = in[gid + 1] * 2.0;
out[gid * 4 + 2] = in[gid + 2] * 2.0;
out[gid * 4 + 3] = in[gid + 3] * 2.0;
So that thread can work a little bit longer? Or it is better to make threads as thin as possible?
Yes, and this is true not merely in contrived examples, but in some real-world scenarios too.
For extremely simple kernels like yours, the dispatch overhead can swamp the work to be done, but there's another factor that may have an even bigger effect on performance: sharing fetched data and intermediate results.
If you have a kernel that, for example, reads the 3x3 neighborhood of a pixel from an input texture and writes the average to an output texture, you could share the fetched texture data and partial sums between adjacent pixels by operating on more than one pixel in your kernel function and reducing the total number of threads you dispatch.
Perhaps this sates your curiosity. For any practical application, Scott Hunter is right that you should profile on all target devices before and after optimizing.

Select an integer number of periods

Suppose we have sinusoidal with frequency 100Hz and sampling frequency of 1000Hz. It means that our signal has 100 periods in a second and we are taking 1000 samples in a second. Therefore, in order to select a complete period I'll have to take fs/f=10 samples. Right?
What if the sampling period is not a multiple of the frequency of the signal (like 550Hz)? Do I have to find the minimum multiple M of f and fs, and than take M samples?
My goal is to select an integer number of periods in order to be able to replicate them without changes.
You have f periods a second, and fs samples a second.
If you take M samples, it would cover M/fs part of a second, or P = f * (M/fs) periods. You want this number to be integer.
So you need to take M = fs / gcd(f, fs) samples.
For your example P = 1000 / gcd(100, 1000) = 1000 / 100 = 10.
If you have 60 Hz frequency and 80 Hz sampling frequency, it gives P = 80 / gcd(60, 80) = 80 / 20 = 4 -- 4 samples will cover 4 * 1/80 = 1/20 part of a second, and that will be 3 periods.
If you have 113 Hz frequency and 512 Hz sampling frequency, you are out of luck, since gcd(113, 512) = 1 and you'll need 512 samples, covering the whole second and 113 periods.
In general, an arbitrary frequency will not have an integer number of periods. Irrational frequencies will never even repeat ever. So some means other than concatenation of buffers one period in length will be needed to synthesize exactly periodic waveforms of arbitrary frequencies. Approximation by interpolation for fractional phase offsets is one possibility.

Detecting a low frequency tone in an audio file

I know this question been asked hundred times... But I am getting frustrated with my result so I wanted to ask again. Before I dive deep into fft, I need to figure this simple task out.
I need to detect a 20 hz tone in an audiofile. I insert the 20hz tone myself like in the picture. (It can be any frequency as long as listener can't hear it so I thought I should choose a frequency around 20hz to 50 hz)
info about the audiofile.
afinfo 1.m4a
File: 1.m4a
File type ID: adts
Num Tracks: 1
----
Data format: 1 ch, 22050 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Mono
estimated duration: 8.634043 sec
audio bytes: 42416
audio packets: 219
bit rate: 33364 bits per second
packet size upper bound: 768
maximum packet size: 319
audio data file offset: 0
optimized
format list:
[ 0] format: 1 ch, 22050 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Mono
----
I followed this three tutorials and I came up with a working code that reads audio buffer and gives me fft doubles.
http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
https://github.com/alexbw/iPhoneFFT
How do I obtain the frequencies of each value in an FFT?
I read the data as follows
// If there's more packets, read them
inCompleteAQBuffer->mAudioDataByteSize = numBytes;
CheckError(AudioQueueEnqueueBuffer(inAQ,
inCompleteAQBuffer,
(sound->packetDescs?nPackets:0),
sound->packetDescs),
"couldn't enqueue buffer");
sound->packetPosition += nPackets;
int numFrequencies=2048;
int kNumFFTWindows=10;
SInt16 *testBuffer = (SInt16*)inCompleteAQBuffer->mAudioData; //Read data from buffer...!
OouraFFT *myFFT = [[OouraFFT alloc] initForSignalsOfLength:numFrequencies*2 andNumWindows:kNumFFTWindows];
for(long i=0; i<myFFT.dataLength; i++)
{
myFFT.inputData[i] = (double)testBuffer[i];
}
[myFFT calculateWelchPeriodogramWithNewSignalSegment];
for (int i=0;i<myFFT.dataLength/2;i++) {
NSLog(#"the spectrum data %d is %f ",i,myFFT.spectrumData[i]);
}
and my out out log something like
Everything checks out for 4096 samples of data
Set up all values, about to init window type 2
the spectrum data 0 is 42449.823771
the spectrum data 1 is 39561.024361
.
.
.
.
the spectrum data 2047 is -42859933071799162597786649755206634193030992632381393031503716729604050285238471034480950745056828418192654328314899253768124076782117157451993697900895932215179138987660717342012863875797337184571512678648234639360.000000
I know I am not calculating the magnitude yet but how can I detect that sound has 20 hz in it? Do I need to learn Goertzel algorithm?
There are many ways to convey information which gets inserted into then retrieved from some preexisting wave pattern. The information going in can vary things like the amplitude (amplitude modulation) or freq (frequency modulation), etc. Do you have a strategy here ? Note that the density of information you wish to convey can be influenced by such factors as the modulating frequency (higher frequencies naturally can convey more information as it can resolve changes more times per second).
Another approach is possible if both the sender and receiver have the source audio (reference). In this case the receiver could do a diff between reference and actual received audio to resolve out the transmitted extra information. A variation on this would be to have the sender send ~~same~~ audio twice, first send the reference untouched audio followed by a modulated version of this same reference audio that way the receiver just does a diff between these two audibly ~~same~~ clips to resolve out the embedded audio.
Going back to your original question ... if the sender and receiver have an agreement ... say for some time period X the reference pure 20 Hz tone is sent followed by another period X that 20 Hz tone is modulated by your input information to either alter its amplitude or frequency ... then just repeat this pattern ... on receiving side they just do a diff between each such pair of time periods to resolve your modulated information ... for this to work the source audio cannot have any tones below some freq say 100 Hz (you remove such frequency band if needed) just to eliminate interference from source audio ... you have not mentioned what kind of data you wish to transmit ... if its voice you first would need to stretch it out in effect lowering its frequency range from the 1 kHz range down to your low 20 Hz range ... once result of diff is available on receiving side you then squeeze this curve to restore it back to normal voice range of 1kHz ... maybe more work than you have time for but this might just work ... real AM/FM radio uses modulation to send voice over mega Hz carrier frequencies so it can work

Extract Treble and Bass from audio in iOS

I'm looking for a way to get the treble and bass data from a song for some incrementation of time (say 0.1 seconds) and in the range of 0.0 to 1.0. I've googled around but haven't been able to find anything remotely close to what I'm looking for. Ultimately I want to be able to represent the treble and bass level while the song is playing.
Thanks!
Its reasonably easy. You need to perform an FFT and then sum up the bins that interest you. A lot of how you select will depend on the sampling rate of your audio.
You then need to choose an appropriate FFT order to get good information in the frequency bins returned.
So if you do an order 8 FFT you will need 256 samples. This will return you 128 complex pairs.
Next you need to convert these to magnitude. This is actually quite simple. if you are using std::complex you can simply perform a std::abs on the complex number and you will have its magnitude (sqrt( r^2 + i^2 )).
Interestingly at this point there is something called Parseval's theorem. This theorem states that after performinng a fourier transform the sum of the bins returned is equal to the sum of mean squares of the input signal.
This means that to get the amplitude of a specific set of bins you can simply add them together divide by the number of them and then sqrt to get the RMS amplitude value of those bins.
So where does this leave you?
Well from here you need to figure out which bins you are adding together.
A treble tone is defined as above 2000Hz.
A bass tone is below 300Hz (if my memory serves me correctly).
Mids are between 300Hz and 2kHz.
Now suppose your sample rate is 8kHz. The Nyquist rate says that the highest frequency you can represent in 8kHz sampling is 4kHz. Each bin thus represents 4000/128 or 31.25Hz.
So if the first 10 bins (Up to 312.5Hz) are used for Bass frequencies. Bin 10 to Bin 63 represent the mids. Finally bin 64 to 127 is the trebles.
You can then calculate the RMS value as described above and you have the RMS values.
RMS values can be converted to dBFS values by performing 20.0f * log10f( rmsVal );. This will return you a value from 0dB (max amplitude) down to -infinity dB (min amplitude). Be aware amplitudes do not range from -1 to 1.
To help you along, here is a bit of my C++ based FFT class for iPhone (which uses vDSP under the hood):
MacOSFFT::MacOSFFT( unsigned int fftOrder ) :
BaseFFT( fftOrder )
{
mFFTSetup = (void*)vDSP_create_fftsetup( mFFTOrder, 0 );
mImagBuffer.resize( 1 << mFFTOrder );
mRealBufferOut.resize( 1 << mFFTOrder );
mImagBufferOut.resize( 1 << mFFTOrder );
}
MacOSFFT::~MacOSFFT()
{
vDSP_destroy_fftsetup( (FFTSetup)mFFTSetup );
}
bool MacOSFFT::ForwardFFT( std::vector< std::complex< float > >& outVec, const std::vector< float >& inVec )
{
return ForwardFFT( &outVec.front(), &inVec.front(), inVec.size() );
}
bool MacOSFFT::ForwardFFT( std::complex< float >* pOut, const float* pIn, unsigned int num )
{
// Bring in a pre-allocated imaginary buffer that is initialised to 0.
DSPSplitComplex dspscIn;
dspscIn.realp = (float*)pIn;
dspscIn.imagp = &mImagBuffer.front();
DSPSplitComplex dspscOut;
dspscOut.realp = &mRealBufferOut.front();
dspscOut.imagp = &mImagBufferOut.front();
vDSP_fft_zop( (FFTSetup)mFFTSetup, &dspscIn, 1, &dspscOut, 1, mFFTOrder, kFFTDirection_Forward );
vDSP_ztoc( &dspscOut, 1, (DSPComplex*)pOut, 1, num );
return true;
}
It seems that you're looking for Fast Fourier Transform sample code.
It is quite a large topic to cover in an answer.
The tools you will need are already build in iOS: vDSP API
This should help you: vDSP Programming Guide
And there is also a FFT Sample Code available
You might also want to check out iPhoneFFT. Though that code is slighlty
outdated it can help you understand processes "under-the-hood".
Refer to auriotouch2 example from Apple - it has everything from frequency analysis to UI representation of what you want.

Understanding Overlap and Add for Filtering

I am trying to implement the overlap and add method in oder to apply a filter in a real time context. However, it seems that there is something I am doing wrong, as the resulting output has a larger error than I would expect. For comparing the accuracy of my computations I created a file, that I am processing in one chunk. I am comparing this with the output of the overlap and add process and take the resulting comparison as an indicator for the accuracy of the computation. So here is my process of doing Overlap and add:
I take a chunk of length L from my input signal
I pad the chunk with zeros to length L*2
I transform that signal into frequency domain
I multiply the signal in frequency domain with my filter response of length L*2 in frequency domain (the filter response is actually created by interpolating control points in the UI - so this is not transformed from time domain. However using length L*2 in frequency domain should be similar to using a ffted time domain signal of length L padded to L*2)
Then I transform the resulting signal back to time domain and add it to the output stream with an overlap of L
Is there anything wrong with that procedure? After reading a lot of different papers and books I've gotten pretty unsure which is the right way to deal with that.
Here is some more data from the tests I have been running:
I created a signal, which consists of three cosine waves
I used this filter function in the time domain for filtering. (It's symmetric, as it is applied to the whole output of the FFT, which also is symmetric for real input signals)
The output of the IFFT looks like this: It can be seen that low frequencies are attenuated more than frequency in the mid range.
For the overlap add/save and the windowed processing I divided the input signal into 8 chunks of 256 samples. After reassembling them they look like that. (sample 490 - 540)
Output Signal overlap and add:
output signal overlap and save:
output signal using STFT with Hanning window:
It can be seen that the overlap add/save processes differ from the STFT version at the point where chunks are put together (sample 511). This is the main error which leads to different results when comparing windowed process and overlap add/save. However the STFT is closer to the output signal, which has been processed in one chunk.
I am pretty much stuck at this point since a few days. What is wrong here?
Here is my source
// overlap and add
// init Buffers
for (UInt32 j = 0; j<samples; j++){
output[j] = 0.0;
}
// process multiple chunks of data
for (UInt32 i = 0; i < (float)div * 2; i++){
for (UInt32 j = 0; j < chunklength/2; j++){
// copy input data to the first half ofcurrent buffer
inBuffer[j] = input[(int)((float)i * chunklength / 2 + j)];
// pad second half with zeros
inBuffer[j + chunklength/2] = 0.0;
}
// clear buffers
for (UInt32 j = 0; j < chunklength; j++){
outBuffer[j][0] = 0.0;
outBuffer[j][8] = 0.0;
FFTBuffer[j][0] = 0.0;
FFTBuffer[j][9] = 0.0;
}
FFT(inBuffer, FFTBuffer, chunklength);
// processing
for(UInt32 j = 0; j < chunklength; j++){
// multiply with filter
FFTBuffer[j][0] *= multiplier[j];
FFTBuffer[j][10] *= multiplier[j];
}
// Inverse Transform
IFFT((const double**)FFTBuffer, outBuffer, chunklength);
for (UInt32 j = 0; j < chunklength; j++){
// copy to output
if ((int)((float)i * chunklength / 2 + j) < samples){
output[(int)((float)i * chunklength / 2 + j)] += outBuffer[j][0];
}
}
}
After the suggestion below, I tried the following:
IFFTed my Filter. This looks like this:
set the second half to zero:
FFTed the signal and compared the magnitudes to the old filter (blue):
After trying to do overlap and add with this filter, the results have obviously gotten worse instead of better. In order to make sure my FFT works correctly, I tried to IFFT and FFT the filter without setting the second half zero. The result is identical to the orignal filter. So the problem shouldn't be the FFTing. I suppose that this is more of some general understanding of the overlap and add method. But I still can't figure out what is going wrong...
One thing to check is the length of the impulse response of your filter. It must be shorter than the length of zero padding used before the fast convolution FFT, or you will get wrap around errors.
I think the problem might be in the windowing approach that you are using. You simply add zeros to the chunks so there is no actual overlap. In the overlap and add method, you need to damp the edges of the window. What this means is that where you add zeros to the chunk you instead have add weighted input signal and the weight in your case should be 0.5 since only two windows overlap.
Rest of the procedure seems OK. You then simply take FTs, multiply and take inverse FTS and finally add up all the chunks to get the final signal which should be exactly the same if you filtered the whole signal at once.

Resources