non-uniform sampling frequency for time series dataset - machine-learning

I have a dataset with non-uniform sampling frequency and I need to predict the data at the next time stamp
I would like to have a maximum of 2 seconds of the sampling frequency, the data looks like the following:
gps coordinates - geolife

Related

Sinusoids with frequencies that are random variales - What does the FFT impulse look like?

I'm currently working on a program in C++ in which I am computing the time varying FFT of a wav file. I have a question regarding plotting the results of an FFT.
Say for example I have a 70 Hz signal that is produced by some instrument with certain harmonics. Even though I say this signal is 70 Hz, it's a real signal and I assume will have some randomness in which that 70Hz signal varies. Say I sample it for 1 second at a sample rate of 20kHz. I realize the sample period probably doesn't need to be 1 second, but bear with me.
Because I now have 20000 samples, when I compute the FFT. I will have 20000 or (19999) frequency bins. Let's also assume that my sample rate in conjunction some windowing techniques minimize spectral leakage.
My question then: Will the FFT still produce a relatively ideal impulse at 70Hz? Or will there 'appear to be' spectral leakage which is caused by the randomness the original signal? In otherwords, what does the FFT look like of a sinusoid whose frequency is a random variable?
Some of the more common modulation schemes will add sidebands that carry the information in the modulation. Depending on the amount and type of modulation with respect to the length of the FFT, the sidebands can either appear separate from the FFT peak, or just "fatten" a single peak.
Your spectrum will appear broadened and this happens in the real world. Look e.g for the Voight profile, which is a Lorentizan (the result of an ideal exponential decay) convolved with a Gaussian of a certain width, the width being determined by stochastic fluctuations, e.g. Doppler effect on molecules in a gas that is being probed by a narrow-band laser.
You will not get an 'ideal' frequency peak either way. The limit for the resolution of the FFT is one frequency bin, (frequency resolution being given by the inverse of the time vector length), but even that (as #xvan pointed out) is in general broadened by the window function. If your window is nonexistent, i.e. it is in fact a square window of the length of the time vector, then you'll get spectral peaks that are convolved with a sinc function, and thus broadened.
The best way to visualize this is to make a long vector and plot a spectrogram (often shown for audio signals) with enough resolution so you can see the individual variation. The FFT of the overall signal is then the projection of the moving peaks onto the vertical axis of the spectrogram. The FFT of a given time vector does not have any time resolution, but sums up all frequencies that happen during the time you FFT. So the spectrogram (often people simply use the STFT, short time fourier transform) has at any given time the 'full' resolution, i.e. narrow lineshape that you expect. The FFT of the full time vector shows the algebraic sum of all your lineshapes and therefore appears broadened.
To sum it up there are two separate effects:
a) broadening from the window function (as the commenters 1 and 2 pointed out)
b) broadening from the effect of frequency fluctuation that you are trying to simulate and that happens in real life (e.g. you sitting on a swing while receiving a radio signal).
Finally, note the significance of #xvan's comment : phi= phi(t). If the phase angle is time dependent then it has a derivative that is not zero. dphi/dt is a frequency shift, so your instantaneous frequency becomes f0 + dphi/dt.

How to determine periodicity from FFT?

Let's say I have some data that corresponds to the average temperature in a city measured every minute for around 1 year. How can I determine if there's cyclical patterns from the data using an FFT?
I know how it works for sound... I do an FFT of a sound wave and now the magnitude is shown in the Y axis and the frequency in Hertz is shown in the X-axis because the sampling frequency is in Hertz. But in my previous example the sampling frequency would be... 1 sample every minute, right? So how should I change it to something meaningful? I would get cycles/minute instead of cycles per seconds? And what does cycles/minute would mean here?
I think your interpretation is correct - you are just scaling to different units. Once you've found the spectral peak you might find it more useful to take the reciprocal to express the value in minutes/cycle (ie the length of the periodic cycle). Effectively this is thinking in terms of wavelength rather than frequency.

time aligning time signals with different sampling rates

I am trying to time align two signals. My problem is however, that they have been sampled at different rates, one has been sampled at 50 Hz the other at 100 Hz will my initial approach of cross correlation work or do I now need to either model these signals or interpolate the one sampled at 50 Hz. I feel this may be a hefty task as this is real-life data and my model will have a certain amount of error.
You can just re-sample the 50 Hz data to 100 Hz. There are plenty of libraries and sample code out there for doing this. The basic algorithm for 2x up-sampling is:
insert a 0 sample between each actual sample
apply a low pass filter (25 Hz cut-off)
Alternatively, if you're not interested in the higher frequency components then you can down-sample the 100 Hz data to 50 Hz:
apply a low pass filter (25 Hz cut-off)
delete every other sample

fast fourier transform apply window and overlap

This may be a naive question, but I didn't find exact details in searching.
In FFT with window overlapping, after we've applied window functions to sequences of data set with overlapping and got the FFT results, how do we combine those FFT results for overlapping sequence?
Do we just add them together, treating those frequency domain results as non-overlapping parts?
Are magnitudes of these results in complex numbers frequency magnitudes?
Thank you.
For each FFT you typically calculate the magnitude of each complex output bin - this gives you a spectrum (magnitude versus frequency) for one window. The sequence of magnitude spectra for all time windows is effectively a 3D data set or graph - magnitude versus frequency versus time - which is typically plotted as a a spectrogram, waterfall or time varying 2D spectrum.
In the specific case where the data is statistically stationary and you just want to reduce the variance you can average the successive magnitude spectra - this is called ensemble averaging. Normally though for time-varying signals such as speech or music you would not want to do this.

Effects for bad sampling in frequency formula

Is there any formula to calculate the frequency (or frequencys) of a signal that is bad sampled?
For example, what's the output of an analog signal with F=22Khz when it's sampled at 25Khz, or 10Khz?
EDIT:
In this example, the sampled signal (on the right) have a different frequency than the original one, because it was bad sampled (Fs is minor than 2*F)
My question is: is there any formula to know what's the frequency of this 20kHz signal, sampled at 30kHz?
No any formula to know what's the frequency of 20kHz signal, sampled at 30kHz. But it is a fact that the frequency of undersampled signal will be reflected about Nyquist frequency. In your example 30 kHz means that Nyquist frequency is about 15 KHz, that is not enough to record original signal (20KHz) correctly, only 15 kHz of it distributed, another 5 KHz (reminder after distribution of 15 KHZ) during reflection about Nyquist frequency appear in position 15-5=10 KHz. This is final ansver. The frequency of sampled signal will be equal 10 kHz in your case
Unless the bandwidth of the signal is less than half the sampling rate, you lose information during sampling and generally can't distinguish frequencies after that due to aliasing.
See Undersampling for more details about sampling at rates lower than twice the maximum signal frequency.
There's no simple formula that can give you the spectral content of a signal or the main frequency. In general you need to calculate a Discrete Fourier Transform of the sampled signal to find that out. If you're interested in whether or not there's a specific frequency, or how strong it is, you can calculate DFT at that frequency. The Goertzel algorithm can be an option.
EDIT: a signal at frequency f such that fsample/2 <= f < fsample will alias to f* = fsample - f, hence a 20KHz sine wave sampled at 30KHz will appear as a 10KHz sine wave.
In general frequencies above the fsample/2 can be observed in the sampled signal, but their frequency is ambiguous. That is, a frequency component with frequency f cannot be distinguished from other components with frequencies N*fsample/2 + f and N*fsample/2 – f for nonzero integers N. This ambiguity is called aliasing*.
Assuming a constant sampling rate, any sampling will alias together spectral content from below and above the sampling rate. If you have frequency content on both sides of the sampling rate that you don't want combined, you will have to filter one or the other frequency band Out before the sampling, or you will have a problem. For instance a low-pass filter which only passes signals below Fs/2, or a bandpass filter that only passes signals strictly between n*Fs/2 and (n+1)*Fs/2 for some integer n, might be appropriate.
Note that the accuracy of the sampling rate must be higher (lower jitter) for n > 0. Lack of this lower jitter would be an example of bad sampling that would add random phase noise.

Resources