FFT normalization - signal-processing

I know this question has been asked ad nauseam but somehow I can't make it work properly. I created a single, sine wave of 440 Hz having a unit amplitude. Now, after the FFT, the bin at 440 Hz has a distinct peak but the value just isn't right. I'd expect to see 0 dB since I'm dealing with a unit amplitude sine wave. Instead, the power calculated is well above 0 dB. The formula I'm using is simply
for (int i = 0; i < N/2; i++)
{
mag = sqrt((Real[i]*Real[i] + Img[i]*Img[i])/(N*0.54)); //0.54 correction for a Hamming Window
Mag[i] = 10 * log(mag) ;
}
I should probably point out that the total energy in the time domain is equal to the energy in the frequency domain (Parseval's theorem), so I know that my FFT routine is fine.
Any help is much appreciated.

I've been struggling with this again for work. It seems that a lot of software routines / books are a bit sloppy on the normalization of the FFT.
The best summary I have is: Energy needs to be conserved - which is Parseval's theorem. Also when coding this in Python, you can easily loose an element and not know it. Note that numpy.arrays indexing is not inclusive of the last element.
a = [1,2,3,4,5,6]
a[1:-1] = [2,3,4,5]
a[-1] = 6
Here's my code to normalize the FFT properly:
# FFT normalization to conserve power
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal
sample_rate = 500.0e6
time_step = 1/sample_rate
carrier_freq = 100.0e6
# number of digital samples to simulate
num_samples = 2**18 # 262144
t_stop = num_samples*time_step
t = np.arange(0, t_stop, time_step)
# let the signal be a voltage waveform,
# so there is no zero padding
carrier_I = np.sin(2*np.pi*carrier_freq*t)
#######################################################
# FFT using Welch method
# windows = np.ones(nfft) - no windowing
# if windows = 'hamming', etc.. this function will
# normalize to an equivalent noise bandwidth (ENBW)
#######################################################
nfft = num_samples # fft size same as signal size
f,Pxx_den = scipy.signal.welch(carrier_I, fs = 1/time_step,\
window = np.ones(nfft),\
nperseg = nfft,\
scaling='density')
#######################################################
# FFT comparison
#######################################################
integration_time = nfft*time_step
power_time_domain = sum((np.abs(carrier_I)**2)*time_step)/integration_time
print 'power time domain = %f' % power_time_domain
# Take FFT. Note that the factor of 1/nfft is sometimes omitted in some
# references and software packages.
# By proving Parseval's theorem (conservation of energy) we can find out the
# proper normalization.
signal = carrier_I
xdft = scipy.fftpack.fft(signal, nfft)/nfft
# fft coefficients need to be scaled by fft size
# equivalent to scaling over frequency bins
# total power in frequency domain should equal total power in time domain
power_freq_domain = sum(np.abs(xdft)**2)
print 'power frequency domain = %f' % power_freq_domain
# Energy is conserved
# FFT symmetry
plt.figure(0)
plt.subplot(2,1,1)
plt.plot(np.abs(xdft)) # symmetric in amplitude
plt.title('magnitude')
plt.subplot(2,1,2)
plt.plot(np.angle(xdft)) # pi phase shift out of phase
plt.title('phase')
plt.show()
xdft_short = xdft[0:nfft/2+1] # take only positive frequency terms, other half identical
# xdft[0] is the dc term
# xdft[nfft/2] is the Nyquist term, note that Python 2.X indexing does NOT
# include the last element, therefore we need to use 0:nfft/2+1 to have an array
# that is from 0 to nfft/2
# xdft[nfft/2-x] = conjugate(xdft[nfft/2+x])
Pxx = (np.abs(xdft_short))**2 # power ~ voltage squared, power in each bin.
Pxx_density = Pxx / (sample_rate/nfft) # power is energy over -fs/2 to fs/2, with nfft bins
Pxx_density[1:-1] = 2*Pxx_density[1:-1] # conserve power since we threw away 1/2 the spectrum
# note that DC (0 frequency) and Nyquist term only appear once, we don't double those.
# Note that Python 2.X array indexing is not inclusive of the last element.
Pxx_density_dB = 10*np.log10(Pxx_density)
freq = np.linspace(0,sample_rate/2,nfft/2+1)
# frequency range of the fft spans from DC (0 Hz) to
# Nyquist (Fs/2).
# the resolution of the FFT is 1/t_stop
# dft of size nfft will give nfft points at frequencies
# (1/stop) to (nfft/2)*(1/t_stop)
plt.figure(1)
plt.plot(freq,Pxx_density_dB,'^')
plt.figure(1)
plt.plot(f, 10.0*np.log10(Pxx_den))
plt.xlabel('Freq (Hz)'),plt.ylabel('dBm/Hz'),#plt.show()
plt.ylim([-200, 0])
plt.show()

Many common (but not all) FFT libraries scale the FFT result of a unit amplitude sinusoid by the length of the FFT. This maintains Parsevals equality since a longer sinusoid represents more total energy than a shorter one of the same amplitude.
If you don't want to scale by the FFT length when using one of these libraries, then divide by the length before computing the magnitude in dB.

Normalization can be done in many different ways - depending on window, number of samples, etc.
Common trick: take FFT of known signal and normalize by the value of the peak. Say in the above example your peak is 123 - if you want it to be 1, then divide it ( and all results obtained with this algorithm) by 123.

Related

Misconceptions about the Shannon-Nyquist theorem

I am a student working with time-series data which we feed into a neural network for classification (my task is to build and train this NN).
We're told to use a band-pass filter of 10 Hz to 150 Hz since anything outside that is not interesting.
After applying the band-pass, I've also down-sampled the data to 300 samples per second (originally it was 768 Hz). My understanding of the Shannon Nyquist sampling theorem is that, after applying the band-pass, any information in the data will be perfectly preserved at this sample-rate.
However, I got into a discussion with my supervisor who claimed that 300 Hz might not be sufficient even if the signal was band-limited. She says that it is only the minimum sample rate, not necessarily the best sample rate.
My understanding of the sampling theorem makes me think the supervisor is obviously wrong, but I don't want to argue with my supervisor, especially in case I'm actually the one who has misunderstood.
Can anyone help to confirm my understanding or provide some clarification? And how should I take this up with my supervisor (if at all).
The Nyquist-Shannon theorem states that the sampling frequency should at-least be twice of bandwidth, i.e.,
fs > 2B
So, this is the minimal criteria. If the sampling frequency is less than 2B then there will be aliasing. There is no upper limit on sampling frequency, but more the sampling frequency, the better will be the reconstruction.
So, I think your supervisor is right in saying that it is the minimal condition and not the best one.
Actually, you and your supervisor are both wrong. The minimum sampling rate required to faithfully represent a real-valued time series whose spectrum lies between 10 Hz and 150 Hz is 140 Hz, not 300 Hz. I'll explain this, and then I'll explain some of the context that shows why you might want to "oversample", as it is referred to (spoiler alert: Bailian-Low Theorem). The supervisor is mixing folklore into the discussion, and when folklore is not properly-contexted, it tends to telephone tag into fakelore. (That's a common failing even in the peer-reviewed literature, by the way). And there's a lot of fakelore, here, that needs to be defogged.
For the following, I will use the following conventions.
There's no math layout on Stack Overflow (except what we already have with UTF-8), so ...
a^b denotes a raised to the power b.
∫_I (⋯x⋯) dx denotes an integral of (⋯x⋯) taken over all x ∈ I, with the default I = ℝ.
The support supp φ (or supp_x φ(x) to make the "x" explicit) of a function φ(x) is the smallest closed set containing all the x-es for which φ(x) ≠ 0. For regularly-behaving (e.g. continuously differentiable) functions that means a union of closed intervals and/or half-rays or the whole real line, itself. This figures centrally in the Shannon-Nyquist sampling theorem, as its main condition is that a spectrum have bounded support; i.e. a "finite bandwidth".
For the Fourier transform I will use the version that has the 2π up in the exponent, and for added convenience, I will use the convention 1^x = e^{2πix} = cos(2πx) + i sin(2πx) (which I refer to as the Ramanujan Convention, as it is the convention I frequently used in my previous life oops I mean which Ramanujan secretly used in his life to make the math a whole lot simpler).
The set ℤ = {⋯, -2, -1, 0, +1, +2, ⋯ } is the integers, and 1^{x+z} = 1^x for all z∈ℤ - making 1^x the archetype of a periodic function whose period is 1.
Thus, the Fourier transform f̂(ν) of a function f(t) and its inverse are given by:
f̂(ν) = ∫ f(t) 1^{-νt} dt, f(t) = ∫ f̂(ν) 1^{+νt} dν.
The spectrum of the time series given by the function f(t) is the function f̂(ν) of the cyclic frequency ν, which is what is measured in Hertz (Hz.); t, itself, being measured in seconds. A common convention is to use the angular frequency ω = 2πν, instead, but that muddies the picture.
The most important example, with respect to the issue at hand, is the Fourier transform χ̂_Ω of the interval function given by χ_Ω(t) = 1 if t ∈ [-½Ω,+½Ω] and χ_Ω(t) = 0 else:
χ̂_Ω(t) = ∫_[-½Ω,+½Ω] 1^ν dν
= {1^{+½Ω} - 1^{-½Ω}}/{2πi}
= {2i sin πΩ}/{2πi}
= Ω sinc πΩ
which is where the function sinc x = (sin πx)/(πx) comes into play.
The cardinal form of the sampling theorem is that a function f(t) can be sampled over an equally-spaced sampled domain T ≡ { kΔt: k ∈ ℤ }, if its spectrum is bounded by supp f̂ ⊆ [-½Ω,+½Ω] ⊆ [-1/(2Δt),+1/(2Δt)], with the sampling given as
f(t) = ∑_{t'∈T} f(t') Ω sinc(Ω(t - t')) Δt.
So, this generally applies to [over-]sampling with redundancy factors 1/(ΩΔt) ≥ 1. In the special case where the sampling is tight with ΩΔt = 1, then it reduces to the form
f(t) = ∑_{t'∈T} f(t') sinc({t - t'}/Δt).
In our case, supp f̂ = [10 Hz., 150 Hz.] so the tightest fits are with 1/Δt = Ω = 300 Hz.
This generalizes to equally-spaced sampled domains of the form T ≡ { t₀ + kΔt: k ∈ ℤ } without any modification.
But it also generalizes to frequency intervals supp f̂ = [ν₋,ν₊] of width Ω = ν₊ - ν₋ and center ν₀ = ½ (ν₋ + ν₊) to the following form:
f(t) = ∑_{t'∈T} f(t') 1^{ν₀(t - t')} Ω sinc(Ω(t - t')) Δt.
In your case, you have ν₋ = 10 Hz., ν₊ = 150 Hz., Ω = 140 Hz., ν₀ = 80 Hz. with the condition Δt ≤ 1/140 second, a sampling rate of at least 140 Hz. with
f(t) = (140 Δt) ∑_{t'∈T} f(t') 1^{80(t - t')} sinc(140(t - t')).
where t and Δt are in seconds.
There is a larger context to all of this. One of the main places where this can be used is for transforms devised from an overlapping set of windowed filters in the frequency domain - a typical case in point being transforms for the time-scale plane, like the S-transform or the continuous wavelet transform.
Since you want the filters to be smoothly-windowed functions, without sharp corners, then in order for them to provide a complete set that adds up to a finite non-zero value over all of the frequency spectrum (so that they can all be normalized, in tandem, by dividing out by this sum), then their respective supports have to overlap.
(Edit: Generalized this example to cover both equally-spaced and logarithmic-spaced intervals.)
One example of such a set would be filters that have end-point frequencies taken from the set
Π = { p₀ (α + 1)ⁿ + β {(α + 1)ⁿ - 1} / α: n ∈ {0,1,2,⋯} }
So, for interval n (counting from n = 0), you would have ν₋ = p_n and ν₊ = p_{n+1}, where the members of Π are enumerated
p_n = p₀ (α + 1)ⁿ + β {(α + 1)ⁿ - 1} / α,
Δp_n = p_{n+1} - p_n = α p_n + β = (α p₀ + β)(α + 1)ⁿ,
n ∈ {0,1,2,⋯}
The center frequency of interval n would then be ν₀ = p_n + ½ Δp₀ (α + 1)ⁿ and the width would be Ω = Δp₀ (α + 1)ⁿ, but the actual support for the filter would overlap into a good part of the neighboring intervals, so that when you add up the filters that cover a given frequency ν the sum doesn't drop down to 0 as ν approaches any of the boundary points. (In the limiting case α → 0, this produces an equally-spaced frequency domain, suitable for an equalizer, while in the case β → 0, it produces a logarithmic scale with base α + 1, where octaves are equally-spaced.)
The other main place where you may apply this is to time-frequency analysis and spectrograms. Here, the role of a function f and its Fourier transform f̂ are reversed and the role of the frequency bandwidth Ω is now played by the (reciprocal) time bandwidth 1/Ω. You want to break up a time series, given by a function f(t) into overlapping segments f̃(q,λ) = g(λ)* f(q + λ), with smooth windowing given by the functions g(λ) with bounded support supp g ⊆ [-½ 1/Ω, +½ 1/Ω], and with interval spacing Δq much larger than the time sampling Δt (the ratio Δq/Δt is called the "hop" factor). The analogous role of Δt is played, here, by the frequency interval in the spectrogram Δp = Ω, which is now constant.
Edit: (Fixed the numbers for the Audacity example)
The minimum sampling rate for both supp_λ g and supp_λ f(q,λ) is Δq = 1/Ω = 1/Δp, and the corresponding redundancy factor is 1/(ΔpΔq). Audacity, for instance, uses a redundancy factor of 2 for its spectrograms. A typical value for Δp might be 44100/2048 Hz., while the time-sampling rate is Δt = 1/(2×3×5×7)² second (corresponding to 1/Δt = 44100 Hz.). With a redundancy factor of 2, Δq would be 1024/44100 second and the hop factor would be Δq/Δt = 1024.
If you try to fit the sampling windows, in either case, to the actual support of the band-limited (or time-limited) function, then the windows won't overlap and the only way to keep their sum from dropping to 0 on the boundary points would be for the windowing functions to have sharp corners on the boundaries, which would wreak havoc on their corresponding Fourier transforms.
The Balian-Low Theorem makes the actual statement on the matter.
https://encyclopediaofmath.org/wiki/Balian-Low_theorem
And a shout-out to someone I've been talking with, recently, about DSP-related matters and his monograph, which provides an excellent introductory reference to a lot of the issues discussed here.
A Friendly Guide To Wavelets
Gerald Kaiser
Birkhauser 1994
He said it's part of a trilogy, another installment of which is forthcoming.

How can count outlier and inlier points after applying RANSAC?

I have gone through the code below and would like to know how can I count the outlier points and inlier points after using RANSAC? could you point to a good code how it can be done?
Second question, which feature matching algorithm is better: BFMatcher.knnMatch() with Test ratio or bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True) with shortest distance? any reference for this comparison?
**# BFMatcher with default params
bf = cv.BFMatcher()
matches = bf.knnMatch(des1, des2, k=2)
# Apply ratio test
good_matches = []
for m,n in matches:
if m.distance < 0.75*n.distance:
good_matches.append([m])
# Draw matches
img3=cv.drawMatchesKnn(img1,kp1,img2,kp2,good_matches,None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
cv.imwrite('matches.jpg', img3)
# Select good matched keypoints
ref_matched_kpts = np.float32([kp1[m[0].queryIdx].pt for m in good_matches])
sensed_matched_kpts = np.float32([kp2[m[0].trainIdx].pt for m in good_matches])
# Compute homography
H, status = cv.findHomography(sensed_matched_kpts, ref_matched_kpts, cv.RANSAC,5.0)**
Count number of outliers and inliers
# number of detected outliers: len(status) - np.sum(status)
# number of detected inliers: np.sum(status)
# Inlier Ratio, number of inlier/number of matches: float(np.sum(status)) / float(len(status))
Feature Matching Algorithm
I would say that if you are using the sparse feature-based algorithm (SIFT or SURF), BFMatcher.knnMatch() with Test ratio is preferred. While the bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True) is used for binary-based algorithm (ORB, FAST, etc). My suggestion would be try both algorithms on your project to investigate which one is better.

cost becoming NaN after certain iterations

I am trying to do a multiclass classification problem (containing 3 labels) with softmax regression.
This is my first rough implementation with gradient descent and back propagation (without using regularization and any advanced optimization algorithm) containing only 1 layer.
Also when learning-rate is big (>0.003) cost becomes NaN, on decreasing learning-rate the cost function works fine.
Can anyone explain what I'm doing wrong??
# X is (13,177) dimensional
# y is (3,177) dimensional with label 0/1
m = X.shape[1] # 177
W = np.random.randn(3,X.shape[0])*0.01 # (3,13)
b = 0
cost = 0
alpha = 0.0001 # seems too small to me but for bigger values cost becomes NaN
for i in range(100):
Z = np.dot(W,X) + b
t = np.exp(Z)
add = np.sum(t,axis=0)
A = t/add
loss = -np.multiply(y,np.log(A))
cost += np.sum(loss)/m
print('cost after iteration',i+1,'is',cost)
dZ = A-y
dW = np.dot(dZ,X.T)/m
db = np.sum(dZ)/m
W = W - alpha*dW
b = b - alpha*db
This is what I get :
cost after iteration 1 is 6.661713420377916
cost after iteration 2 is 23.58974203186562
cost after iteration 3 is 52.75811642877174
.............................................................
...............*upto 100 iterations*.................
.............................................................
cost after iteration 99 is 1413.555298639879
cost after iteration 100 is 1429.6533630169406
Well after some time i figured it out.
First of all the cost was increasing due to this :
cost += np.sum(loss)/m
Here plus sign is not needed as it will add all the previous cost computed on every epoch which is not what we want. This implementation is generally required during mini-batch gradient descent for computing cost over each epoch.
Secondly the learning rate is too big for this problem that's why cost was overshooting the minimum value and becoming NaN.
I looked in my code and find out that my features were of very different range (one was from -1 to 1 and other was -5000 to 5000) which was limiting my algorithm to use greater values for learning rate.
So I applied feature scaling :
var = np.var(X, axis=1)
X = X/var
Now learning rate can be much bigger (<=0.001).

Importance weighted autoencoder doing worse than VAE

I've been implementing VAE and IWAE models on the caltech silhouettes dataset and am having an issue where the VAE outperforms IWAE by a modest margin (test LL ~120 for VAE, ~133 for IWAE!). I don't believe this should be the case, according to both theory and experiments produced here.
I'm hoping someone can find some issue in how I'm implementing that's causing this to be the case.
The network I'm using to approximate q and p is the same as that detailed in the appendix of the paper above. The calculation part of the model is below:
data_k_vec = data.repeat_interleave(K,0) # Generate K samples (in my case K=50 is producing this behavior)
mu, log_std = model.encode(data_k_vec)
z = model.reparameterize(mu, log_std) # z = mu + torch.exp(log_std)*epsilon (epsilon ~ N(0,1))
decoded = model.decode(z) # this is the sigmoid output of the model
log_prior_z = torch.sum(-0.5 * z ** 2, 1)-.5*z.shape[1]*T.log(torch.tensor(2*np.pi))
log_q_z = compute_log_probability_gaussian(z, mu, log_std) # Definitions below
log_p_x = compute_log_probability_bernoulli(decoded,data_k_vec)
if model_type == 'iwae':
log_w_matrix = (log_prior_z + log_p_x - log_q_z).view(-1, K)
elif model_type =='vae':
log_w_matrix = (log_prior_z + log_p_x - log_q_z).view(-1, 1)*1/K
log_w_minus_max = log_w_matrix - torch.max(log_w_matrix, 1, keepdim=True)[0]
ws_matrix = torch.exp(log_w_minus_max)
ws_norm = ws_matrix / torch.sum(ws_matrix, 1, keepdim=True)
ws_sum_per_datapoint = torch.sum(log_w_matrix * ws_norm, 1)
loss = -torch.sum(ws_sum_per_datapoint) # value of loss that gets returned to training function. loss.backward() will get called on this value
Here are the likelihood functions. I had to fuss with the bernoulli LL in order to not get nan during training
def compute_log_probability_gaussian(obs, mu, logstd, axis=1):
return torch.sum(-0.5 * ((obs-mu) / torch.exp(logstd)) ** 2 - logstd, axis)-.5*obs.shape[1]*T.log(torch.tensor(2*np.pi))
def compute_log_probability_bernoulli(theta, obs, axis=1): # Add 1e-18 to avoid nan appearances in training
return torch.sum(obs*torch.log(theta+1e-18) + (1-obs)*torch.log(1-theta+1e-18), axis)
In this code there's a "shortcut" being used in that the row-wise importance weights are being calculated in the model_type=='iwae' case for the K=50 samples in each row, while in the model_type=='vae' case the importance weights are being calculated for the single value left in each row, so that it just ends up calculating a weight of 1. Maybe this is the issue?
Any and all help is huge - I thought that addressing the nan issue would permanently get me out of the weeds but now I have this new problem.
EDIT:
Should add that the training scheme is the same as that in the paper linked above. That is, for each of i=0....7 rounds train for 2**i epochs with a learning rate of 1e-4 * 10**(-i/7)
The K-sample importance weighted ELBO is
$$ \textrm{IW-ELBO}(x,K) = \log \sum_{k=1}^K \frac{p(x \vert z_k) p(z_k)}{q(z_k;x)}$$
For the IWAE there are K samples originating from each datapoint x, so you want to have the same latent statistics mu_z, Sigma_z obtained through the amortized inference network, but sample multiple z K times for each x.
So its computationally wasteful to compute the forward pass for data_k_vec = data.repeat_interleave(K,0), you should compute the forward pass once for each original datapoint, then repeat the statistics output by the inference network for sampling:
mu = torch.repeat_interleave(mu,K,0)
log_std = torch.repeat_interleave(log_std,K,0)
Then sample z_k. And now repeat your datapoints data_k_vec = data.repeat_interleave(K,0), and use the resulting tensor to efficiently evaluate the conditional p(x |z_k) for each importance sample z_k.
Note you may also want to use the logsumexp operation when calculating the IW-ELBO for numerical stability. I can't quite figure out what's going on with the log_w_matrix calculation in your post, but this is what I would do:
log_pz = ...
log_qzCx = ....
log_pxCz = ...
log_iw = log_pxCz + log_pz - log_qzCx
log_iw = log_iw.reshape(-1, K)
iwelbo = torch.logsumexp(log_iw, dim=1) - np.log(K)
EDIT: Actually after thinking about it a bit and using the score function identity, you can interpret the IWAE gradient as an importance weighted estimate of the standard single-sample gradient, so the method in the OP for calculation of the importance weights is equivalent (if a bit wasteful), provided you place a stop_gradient operator around the normalized importance weights, which you call w_norm. So I the main problem is the absence of this stop_gradient operator.

FFT - Calculating exact frequency between frequency bins

I am using a nice FFT library I found online to see if I can write a pitch-detection program. So far, I have been able to successfully let the library do FFT calculation on a test audio signal containing a few sine waves including one at 440Hz (I'm using 16384 samples as the size and the sample rate at 44100Hz).
The FFT output looks like:
433.356Hz - Real: 590.644 - Imag: -27.9856 - MAG: 16529.5
436.047Hz - Real: 683.921 - Imag: 51.2798 - MAG: 35071.4
438.739Hz - Real: 4615.24 - Imag: 1170.8 - MAG: 5.40352e+006
441.431Hz - Real: -3861.97 - Imag: 2111.13 - MAG: 8.15315e+006
444.122Hz - Real: -653.75 - Imag: 341.107 - MAG: 222999
446.814Hz - Real: -564.629 - Imag: 186.592 - MAG: 105355
As you can see, the 441.431Hz and 438.739Hz bins both show equally high magnitude outputs (the right-most numbers following "MAG:"), so it's obvious that the target frequency 440Hz falls somewhere between. Increasing the resolution might be one way to close in, but that would add to the calculation time.
How do I calculate the exact frequency that falls between two frequency bins?
UPDATE:
I tried out Barry Quinn's "Second Estimator" discussed on the DSPGuru website and got excellent results. The following shows the result for 440Hz square wave - now I'm only off by 0.003Hz!
Here is the code I used. I simply adapted this example I found, which was for Swift. Thank you everyone for your very valuable input, this has been a great learning journey :)
To calculate the "true" frequency, once I used parabola fit algorithm. It worked very well for my use case.
This is the way I followed in order to find the fundamental frequency:
Calculate DFT (WOLA).
Find peaks in your DFT bins.
Find Harmonic Product Spectrum. Not the most reliable nor precise, but this is a very easy way of finding your fundamental frequency candidates.
Based on peaks and HPS, use parabola fit algorithm to find fundamental pitch frequency (and amplitude if needed).
For example, HPS says the fundamental (strongest) pitch is concentrated in bin x of your DFT; if bin x belongs to the peak y, then parabola fit frequency is taken from the peak y and that is the pitch you were looking for.
If you are not looking for fundamental pitch, but exact frequency in any bin, just apply parabola fit for that bin.
Some code to get you started:
struct Peak
{
float freq ; // Peak frequency calculated by parabola fit algorithm.
float amplitude; // True amplitude.
float strength ; // Peak strength when compared to neighbouring bins.
uint16_t startPos ; // Peak starting position (DFT bin).
uint16_t maxPos ; // Peak location (DFT bin).
uint16_t stopPos ; // Peak stop position (DFT bin).
};
void calculateTrueFrequency( Peak & peak, float const bins, uint32_t const fs, DFT_Magnitudes mags )
{
// Parabola fit:
float a = mags[ peak.maxPos - 1 ];
float b = mags[ peak.maxPos ];
float c = mags[ peak.maxPos + 1 ];
float p = 0.5f * ( a - c ) / ( a - 2.0f * b + c );
float bin = convert<float>( peak.maxPos ) + p;
peak.freq = convert<float>( fs ) * bin / bins / 2;
peak.amplitude = b - 0.25f + ( a - c ) * p;
}
Sinc interpolation can be used to accurately interpolate (or reconstruct) the spectrum between FFT result bins. A zero-padded FFT will produce a similar interpolated spectrum. You can use a high quality interpolator (such as a windowed Sinc kernel) with successive approximation to estimate the actual spectral peak to whatever resolution the S/N allows. This reconstruction might not work near the DC or Fs/2 FFT result bins unless you include the effects of the the spectrum's conjugate image in the interpolation kernel.
See: https://ccrma.stanford.edu/~jos/Interpolation/Ideal_Bandlimited_Sinc_Interpolation.html and https://en.wikipedia.org/wiki/Whittaker%E2%80%93Shannon_interpolation_formula for details about time domain reconstruction, but the same interpolation method works in either domain, frequency or time, for bandlimited or time limited signals respectively.
If you require a less accurate estimator with far less computational overhead, parabolic interpolation (and other similar curve fitting estimators) might work. See: https://www.dsprelated.com/freebooks/sasp/Quadratic_Interpolation_Spectral_Peaks.html and https://mgasior.web.cern.ch/mgasior/pap/FFT_resol_note.pdf for details for parabolic, and http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.555.2873&rep=rep1&type=pdf for other curve fitting peak estimators.

Resources