From the question: How do I obtain the frequencies of each value in an FFT?
I have a similar question. I understand the answers to the previous question, but I would like further clarification on frequency. Is frequency the same as the index?
Let's go for an example: let's assume we have an array (1X200) of data in MATLAB. When you apply 'abs(fft)' for that array it gives the same size array as the result (1X200). So, does this mean this array contains magnitude? Does this mean the indices of these magnitudes are the frequencies? Like 1, 2, 3, 4...200? Or, if this assumption is wrong, please tell me how to find the frequency from the magnitude.
Instead of using the FFT directly you can use MATLAB's periodogram function, which takes care of a lot of the housekeeping for you, and which will plot the X (frequency axis) correctly if you supply the sample rate. See e.g. this answer.
For clarification though, the index of the FFT corresponds to frequency, and the magnitude of the complex value at each frequency (index) tells you the amplitude of the signal at that frequency.
Related
First time posting, thanks for the great community!
I am using AudioKit and trying to add frequency weighting filters to the microphone input and so I am trying to understand the values that are coming out of the AudioKit AKFFTTap.
Currently I am trying to just print the FFT buffer converted into dB values
for i in 0..<self.bufferSize {
let db = 20 * log10((self.fft?.fftData[Int(i)])!)
print(db)
}
I was expecting values ranging in the range of about -128 to 0, but I am getting strange values of nearly -200dB and when I blow on the microphone to peg out the readings it only reaches about -60. Am I not approaching this correctly? I was assuming that the values being output from the EZAudioFFT engine would be plain amplitude values and that the normal dB conversion math would work. Anyone have any ideas?
Thanks in advance for any discussion about this issue!
You need to add all of the values from self.fft?.fftData (consider changing negative values to positive before adding) and then change that to decibels
The values in the array correspond to the values of the bins in the FFT. Having a single bin contain a magnitude value close to 1 would mean that a great amount of energy is in that narrow frequency band e.g. a very loud sinusoid (a signal with a single frequency).
Normal sounds, such as the one caused by you blowing on the mic, spread their energy across the entire spectrum, that is, in many bins instead of just one. For this reason, usually the magnitudes get lower as the FFT size increases.
Magnitude of -40dB on a single bin is quite loud. If you try to play a tone, you should see a clear peak in one of the bins.
I am using fft function from std.numeric
Complex!double[] resultfft = fft(timeDomainAmplitudeVal);
The parameter timeDomainAmplitudeVal is audio amplitude data. Sample rate 44100 hz and there is 131072(2^16) samples
I am seeing that resultfft has the same size as timeDomainAmplitudeVal(131072) which does not fits my project(also makes no sense) . I need to be able to divide FFT to N equally spaced frequencies. And I need this N to be defined by me .
Is there anyway to implement this with std.numeric.fft or can you have any advices for fft library?
Ps: I will be glad to hear if some DSP libraries exist also
That's just how Fourier transforms work in the practical number-crunching world. Give S samples of signal, get S amplitudes. (Ignoring issues with complex numbers and symmetries.)
If you want N amplitudes, you'll have to interpolate the S-points amplitudes you get from FFT. Your biggest decision is to choose between linear, cubic, truncated sinc, etc.
Altnernative: resample the original audio signal to have your desired N samples in the same overall time interval. Then FFT it.
take a look at pfft, a fast FFT written in D.
http://jerro.github.io/pfft/doc/pfft.pfft.html
or numpy & Pyd
http://docs.scipy.org/doc/numpy/reference/routines.fft.html
http://pyd.dsource.org/
HTH
This is absolutely normal that the FFT gives the same data length.
Here some C++ code to perform windows FFT analysis with overlap and optional "zero-phase" ordering. http://pastebin.com/4YKgbed1
What do FFT coefficients mean?
Question: "OK so I've done the FFT and I'm said I can recover the original signal. Now, what are these coefficients."
Answer: "You can think of coefficient i as representing the phase and amplitude of frequencies from SR*i/(2*N) to SR*(i+1)/(2*N). This is a helpful metaphor. But a more accurate view is that coefficient i is the contribution of a sine of frequency SR*i/(2*N) in a reconstruction of the original input chunk."
I am using libSVM.
Say my feature values are in the following format:
instance1 : f11, f12, f13, f14
instance2 : f21, f22, f23, f24
instance3 : f31, f32, f33, f34
instance4 : f41, f42, f43, f44
..............................
instanceN : fN1, fN2, fN3, fN4
I think there are two scaling can be applied.
scale each instance vector such that each vector has zero mean and unit variance.
( (f11, f12, f13, f14) - mean((f11, f12, f13, f14) ). /std((f11, f12, f13, f14) )
scale each colum of the above matrix to a range. for example [-1, 1]
According to my experiments with RBF kernel (libSVM) I found that the second scaling (2) improves the results by about 10%. I did not understand the reason why (2) gives me a improved results.
Could anybody explain me what is the reason for applying scaling and why the second option gives me improved results?
The standard thing to do is to make each dimension (or attribute, or column (in your example)) have zero mean and unit variance.
This brings each dimension of the SVM into the same magnitude. From http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf:
The main advantage of scaling is to avoid attributes in greater numeric
ranges dominating those in smaller numeric ranges. Another advantage is to avoid
numerical diculties during the calculation. Because kernel values usually depend on
the inner products of feature vectors, e.g. the linear kernel and the polynomial ker-
nel, large attribute values might cause numerical problems. We recommend linearly
scaling each attribute to the range [-1,+1] or [0,1].
I believe that it comes down to your original data a lot.
If your original data has SOME extreme values for some columns, then in my opinion you lose some definition when scaling linearly, for example in the range [-1,1].
Let's say that you have a column where 90% of values are between 100-500 and in the remaining 10% the values are as low as -2000 and as high as +2500.
If you scale this data linearly, then you'll have:
-2000 -> -1 ## <- The min in your scaled data
+2500 -> +1 ## <- The max in your scaled data
100 -> -0.06666666666666665
234 -> -0.007111111111111068
500 -> 0.11111111111111116
You could argue that the discernibility between what was originally 100 and 500 is smaller in the scaled data in comparison to what it was in the original data.
At the end, I believe it very much comes down to the specifics of your data and I believe the 10% improved performance is very coincidental, you will certainly not see a difference of this magnitude in every dataset you try both scaling methods on.
At the same time, in the paper in the link listed in the other answer, you can clearly see that the authors recommend data to be scaled linearly.
I hope someone finds this useful!
The accepted answer speaks of "Standard Scaling", which is not efficient for high-dimensional data stored in sparse matrices (text data is a use-case); in such cases, you may resort to "Max Scaling" and its variants, which works with sparse matrices.
I'm able to read a wav files and its values. I need to find peaks and pits positions and their values. First time, i tried to smooth it by (i-1 + i + i +1) / 3 formula then searching on array as array[i-1] > array[i] & direction == 'up' --> pits style solution but because of noise and other reasons of future calculations of project, I'm tring to find better working area. Since couple days, I'm researching FFT. As my understanding, fft translates the audio files to series of sines and cosines. After fft operation the given values is a0's and a1's for a0 + ak * cos(k*x) + bk * sin(k*x) which k++ and x++ as this picture
http://zone.ni.com/images/reference/en-XX/help/371361E-01/loc_eps_sigadd3freqcomp.gif
My question is, does fft helps to me find peaks and pits on audio? Does anybody has a experience for this kind of problems?
It depends on exactly what you are trying to do, which you haven't really made clear. "finding the peaks and pits" is one thing, but since there might be various reasons for doing this there might be various methods. You already tried the straightforward thing of actually looking for the local maximum and minima, it sounds like. Here are some tips:
you do not need the FFT.
audio data usually swings above and below zero (there are exceptions, including 8-bit wavs, which are unsigned, but these are exceptions), so you must be aware of positive and negative values. Generally, large positive and large negative values carry large amounts of energy, though, so you want to count those as the same.
due to #2, if you want to average, you might want to take the average of the absolute value, or more commonly, the average of the square. Once you find the average of the squares, take the square root of that value and this gives the RMS, which is related to the power of the signal, so you might do something like this is you are trying to indicate signal loudness, intensity or approximate an analog meter. The average of absolutes may be more robust against extreme values, but is less commonly used.
another approach is to simply look for the peak of the absolute value over some number of samples, this is commonly done when drawing waveforms, and for digital "peak" meters. It makes less sense to look at the minimum absolute.
Once you've done something like the above, yes you may want to compute the log of the value you've found in order to display the signal in dB, but make sure you use the right formula. 10 * log_10( amplitude ) is not it. Rule of thumb: usually when computing logs from amplitude you will see a 20, not a 10. If you want to compute dBFS (the amount of "headroom" before clipping, which is the standard measurement for digital meters), the formula is -20 * log_10( |amplitude| ), where amplitude is normalize to +/- 1. Watch out for amplitude = 0, which gives an infinite headroom in dB.
If I understand you correctly, you just want to estimate the relative loudness/quietness of an audio digital sample at a given point.
For this estimation, you don't need to use FFT. However your method of averaging the signal does not produce the appropiate picture neither.
The digital signal is the value of the audio wave at a given moment. You need to find the overall amplitude of the signal at that given moment. You can somewhat see it as the local maximum value for a given interval around the moment you want to calculate. You may have a moving max for the signal and get your amplitude estimation.
At a 16 bit sound sample, the sound signal value can go from 0 up to 32767. At a 44.1 kHz sample rate, you can find peaks and pits of around 0.01 secs by finding the max value of 441 samples around a given t moment.
max=1;
for (i=0; i<441; i++) if (array[t*44100+i]>max) max=array[t*44100+i];
then for representing it on a 0 to 1 scale you (not really 0, because we used a minimum of 1)
amplitude = max / 32767;
or you might represent it in relative dB logarithmic scale (here you see why we used 1 for the minimum value)
dB = 20 * log10(amplitude);
all you need to do is take dy/dx, which can getapproximately by just scanning through the wave and and subtracting the previous value from the current one and look at where it goes to zero or changes from positive to negative
in this code I made it really brief and unintelligent for sake of brevity, of course you could handle cases of dy being zero better, find the 'centre' of a long section of a flat peak, that kind of thing. But if all you need is basic peaks and troughs, this will find them.
lastY=0;
bool goingup=true;
for( i=0; i < wave.length; i++ ) {
y = wave[i];
dy = y - lastY;
bool stillgoingup = (dy>0);
if( goingup != direction ) {
// changed direction - note value of i(place) and 'y'(height)
stillgoingup = goingup;
}
}
According to "Introduction to Neural Networks with Java By Jeff Heaton", the input to the Kohonen neural network must be the values between -1 and 1.
It is possible to normalize inputs where the range is known beforehand:
For instance RGB (125, 125, 125) where the range is know as values between 0 and 255:
1. Divide by 255: (125/255) = 0.5 >> (0.5,0.5,0.5)
2. Multiply by two and subtract one: ((0.5*2)-1)=0 >> (0,0,0)
The question is how can we normalize the input where the range is unknown like our height or weight.
Also, some other papers mention that the input must be normalized to the values between 0 and 1. Which is the proper way, "-1 and 1" or "0 and 1"?
You can always use a squashing function to map an infinite interval to a finite interval. E.g. you can use tanh.
You might want to use tanh(x * l) with a manually chosen l though, in order not to put too many objects in the same region. So if you have a good guess that the maximal values of your data are +/- 500, you might want to use tanh(x / 1000) as a mapping where x is the value of your object It might even make sense to subtract your guess of the mean from x, yielding tanh((x - mean) / max).
From what I know about Kohonen SOM, they specific normalization does not really matter.
Well, it might through specific choices for the value of parameters of the learning algorithm, but the most important thing is that the different dimensions of your input points have to be of the same magnitude.
Imagine that each data point is not a pixel with the three RGB components but a vector with statistical data for a country, e.g. area, population, ....
It is important for the convergence of the learning part that all these numbers are of the same magnitude.
Therefore, it does not really matter if you don't know the exact range, you just have to know approximately the characteristic amplitude of your data.
For weight and size, I'm sure that if you divide them respectively by 200kg and 3 meters all your data points will fall in the ]0 1] interval. You could even use 50kg and 1 meter the important thing is that all coordinates would be of order 1.
Finally, you could a consider running some linear analysis tools like POD on the data that would give you automatically a way to normalize your data and a subspace for the initialization of your map.
Hope this helps.