Log transformation on differenced time series - time-series

I have a time series that becomes mean-stationary after differencing (though the variance is not constant). Can a log transformation be applied on differenced series to ensure weak stationarity?

Related

Metrics for time series segments

I am looking for a metric to evaluate a time series segmentation or classification task.
Suppose we have an ML model which tries at each time to classify a point as TRUE or FALSE.
All consecutive points with the same label (TRUE or FALSE) are grouped into segments or intervals we classify with the same label. So the whole series is a succession of TRUE and FALSE intervals.
We have a ground truth of the data already labeled and want to evaluate the model at segment level, with some tolerance about the intersection between the classified interval and labeled one (two intervals are considered as matching if they intersect each other for 70% of their total/union duration, IoU).
Is there any implementation of such metric in Python ?
Thank you
I have already computed and measured the IoUs of labeled Vs classified segments.

What is the implication if Holt-WInter Additive model outperforms Linear Exponential Smoothing when no seasonality is actually observed

I'm forecasting a time series using exponential smoothing method.
There is an obvious additive trend but no seasonality observed based on ACF plot and seasonal plot. Initially a Linear Exponential Smoothing (LES) model is considered the best.
However, when I tried to fit by Holt-Winter's additive model with seasonality (no matter 6,12,18..etc), it actually gave me a lower MSE than LES.
What is the possible implication behind? It looks awkward to fit a model with seasonality which is in fact not observable or explainable? Many thanks.
Time series
ACF plot

What is derivative of FFT outputs with respect to time?

I am quite new to Digital Signal Processing. I am trying to implement an anti-cogging algorithm in my PMSM control algorithm. I follow this [documentation].
I collected velocity data according to the angle. And I translated velocity data to the frequency domain with FFT. But last step, Acceleration Based Waveform Analysis, a calculated derivative of FFT outputs with respect to time. Outputs are frequency domain how could I calculate derivative of FFT outputs with respect to time, and why does it do this calculation?
"derivative of FFT outputs with respect to time" doesn't make any sense, so even though the notation used in the paper seems to say just that, I think we can understand it better by reading the text that says "the accelerations are
found by taking the time derivative of the FFT fitted speeds".
That makes perfect sense: The FFT of the velocity array is used to interpolate between samples, providing velocity as a continuous function of position. That continuous function is then differentiated at the appropriate position (j in the paper) to find the acceleration at every position i. I didn't read closely enough to find out how i and j are related.
In implementation, every FFT output for frequency f would be multiplied by fi (that is, the frequency times sqrt(-1), not i the position) to produce the FFT of the acceleration function, and then the FFT basis functions would be evaluated in their continuous form (using Math.sin and Math.cos) to produce an acceleration at any desired point.

Spectral Analysis - converting a specific frequency into the time domain

I have time series data that I want to decompose into its intrinsic frequencies. After running FFT on the series, I am left with a periodogram with the main frequencies.
Is there any way to extract a specific frequency and convert it back into the original time-domain?
For example, if I find a 7-day frequency, I can understand that this is a weekly cycle, however, is there any way to understand on what day the peaks occur?
Use a complex FFT instead of a scalar periodogram. The magnitude of a complex FFT shows the peaks, but the complex inverse (complex IFFT) of a single complex FFT peak (might be more than one sample in width) will retain the phase information and result in the proper time relationship of a particular cyclic component to the original time series.
Note: The input to the complex IFFT needs to retain conjugate symmetry for the result to be real instead of complex (except for rounding/quantization noise).

What FFT descriptors should be used as feature to implement classification or clustering algorithm?

I have some geographical trajectories sampled to analyze, and I calculated the histogram of data in spatial and temporal dimension, which yielded a time domain based feature for each spatial element. I want to perform a discrete FFT to transform the time domain based feature into frequency domain based feature (which I think maybe more robust), and then do some classification or clustering algorithms.
But I'm not sure using what descriptor as frequency domain based feature, since there are amplitude spectrum, power spectrum and phase spectrum of a signal and I've read some references but still got confused about the significance. And what distance (similarity) function should be used as measurement when performing learning algorithms on frequency domain based feature vector(Euclidean distance? Cosine distance? Gaussian function? Chi-kernel or something else?)
Hope someone give me a clue or some material that I can refer to, thanks~
Edit
Thanks to #DrKoch, I chose a spatial element with the largest L-1 norm and plotted its log power spectrum in python and it did show some prominent peaks, below is my code and the figure
import numpy as np
import matplotlib.pyplot as plt
sp = np.fft.fft(signal)
freq = np.fft.fftfreq(signal.shape[-1], d = 1.) # time sloth of histogram is 1 hour
plt.plot(freq, np.log10(np.abs(sp) ** 2))
plt.show()
And I have several trivial questions to ask to make sure I totally understand your suggestion:
In your second suggestion, you said "ignore all these values."
Do you mean the horizontal line represent the threshold and all values below it should be assigned to value zero?
"you may search for the two, three largest peaks and use their location and probably widths as 'Features' for further classification."
I'm a little bit confused about the meaning of "location" and "width", does "location" refer to the log value of power spectrum (y-axis) and "width" refer to the frequency (x-axis)? If so, how to combine them together as a feature vector and compare two feature vector of "a similar frequency and a similar widths" ?
Edit
I replaced np.fft.fft with np.fft.rfft to calculate the positive part and plot both power spectrum and log power spectrum.
code:
f, axarr = plt.subplot(2, sharex = True)
axarr[0].plot(freq, np.abs(sp) ** 2)
axarr[1].plot(freq, np.log10(np.abs(sp) ** 2))
plt.show()
figure:
Please correct me if I'm wrong:
I think I should keep the last four peaks in first figure with power = np.abs(sp) ** 2 and power[power < threshold] = 0 because the log power spectrum reduces the difference among each component. And then use the log spectrum of new power as feature vector to feed classifiers.
I also see some reference suggest applying a window function (e.g. Hamming window) before doing fft to avoid spectral leakage. My raw data is sampled every 5 ~ 15 seconds and I've applied a histogram on sampling time, is that method equivalent to apply a window function or I still need apply it on the histogram data?
Generally you should extract just a small number of "Features" out of the complete FFT spectrum.
First: Use the log power spec.
Complex numbers and Phase are useless in these circumstances, because they depend on where you start/stop your data acquisiton (among many other things)
Second: you will see a "Noise Level" e.g. most values are below a certain threshold, ignore all these values.
Third: If you are lucky, e.g. your data has some harmonic content (cycles, repetitions) you will see a few prominent Peaks.
If there are clear peaks, it is even easier to detect the noise: Everything between the peaks should be considered noise.
Now you may search for the two, three largest peaks and use their location and probably widths as "Features" for further classification.
Location is the x-value of the peak i.e. the 'frequency'. It says something how "fast" your cycles are in the input data.
If your cycles don't have constant frequency during the measuring intervall (or you use a window before caclculating the FFT), the peak will be broader than one bin. So this widths of the peak says something about the 'stability' of your cycles.
Based on this: Two patterns are similar if the biggest peaks of both hava a similar frequency and a similar widths, and so on.
EDIT
Very intersiting to see a logarithmic power spectrum of one of your examples.
Now its clear that your input contains a single harmonic (periodic, oscillating) component with a frequency (repetition rate, cycle-duration) of about f0=0.04.
(This is relative frquency, proprtional to the your sampling frequency, the inverse of the time beetween individual measurment points)
Its is not a pute sine-wave, but some "interesting" waveform. Such waveforms produce peaks at 1*f0, 2*f0, 3*f0 and so on.
(So using an FFT for further analysis turns out to be very good idea)
At this point you should produce spectra of several measurements and see what makes a similar measurement and how differ different measurements. What are the "important" features to distinguish your mesurements? Thinks to look out for:
Absolute amplitude: Height of the prominent (leftmost, highest) peaks.
Pitch (Main cycle rate, speed of changes): this is position of first peak, distance between consecutive peaks.
Exact Waveform: Relative amplitude of the first few peaks.
If your most important feature is absoulute amplitude, you're better off with calculating the RMS (root mean square) level of our input signal.
If pitch is important, you're better off with calculationg the ACF (auto-correlation function) of your input signal.
Don't focus on the leftmost peaks, these come from the high frequency components in your input and tend to vary as much as the noise floor.
Windows
For a high quality analyis it is importnat to apply a window to the input data before applying the FFT. This reduces the infulens of the "jump" between the end of your input vector ant the beginning of your input vector, because the FFT considers the input as a single cycle.
There are several popular windows which mark different choices of an unavoidable trade-off: Precision of a single peak vs. level of sidelobes:
You chose a "rectangular window" (equivalent to no window at all, just start/stop your measurement). This gives excellent precission of your peaks which now have a width of just one sample. Your sidelobes (the small peaks left and right of your main peaks) are at -21dB, very tolerable given your input data. In your case this is an excellent choice.
A Hanning window is a single cosine wave. It makes your peaks slightly broader but reduces side-lobe levels.
The Hammimg-Window (cosine-wave, slightly raised above 0.0) produces even broader peaks, but supresses side-lobes by -42 dB. This is a good choice if you expect further weak (but important) components between your main peaks or generally if you have complicated signals like speech, music and so on.
Edit: Scaling
Correct scaling of a spectrum is a complicated thing, because the values of the FFT lines depend on may things like sampling rate, lenght of FFT, window, and even implementation details of the FFT algorithm (there exist several different accepted conventions).
After all, the FFT should show the underlying conservation of energy. The RMS of the input signal should be the same as the RMS (Energy) of the spectrum.
On the other hand: if used for classification it is enough to maintain relative amplitudes. As long as the paramaters mentioned above do not change, the result can be used for classification without further scaling.

Resources