Error Calculation for Hierarchical Time Series Forecasting - time-series

I have hierarchical time series data and I generate forecasts at the lowest level in the hierarchy. I am looking for a method to calculate the error of time series forecasts on each level in the hierarchy.
Let's assume that the hierarchy is as follows:
City - Region - Country - Continent
I have forecasts for each city and I calculate MAE of my forecasts on this level. How should I calculate the error on the Region level? Should take the weighted average of the MAEs calculated on city level or should I calculate the forecasts on Region level first(by summing up the forecasts on city level) and then calculate the MAE?
I hope my question is understandable. Thank you


how can I perform clustering on a dataset including time series and discrete point variables?

I am trying to perform clustering on a dataset including time series (e.g. sensor recording over a few seconds) and discrete valued variables (e.g. age). I have already tried PCA to combine the original variables and then standard clustering which effectively solves the problem of having time series and discrete valued variables. I would now like to perform time-series clustering using dynamic time warping (DTW) distance but I am not sure how I can incorporate the discrete valued variables.
My first attempt was to calculate DTW distance for the time-series variables, Euclidean distance for the discrete variables and then combine these distances into a single similarity matrix. The issue is that, because of the way DTW is calculated (sum of all the Euclidean distances between optimal matched points in two time series), the scale of the DTW distance is much larger than that of the discrete variables, even after standardising the variables. If I then apply clustering on the resulting distance matrix, the discrete variables would be pretty meaningless, which is not the case in the real world.
I am trying to find similar examples in the literature and cases in all the Stacks but I've not been very lucky. I thought about:
scaling the DTW distance by the length of the series but that can be a bit tricky with time series with different lengths and on initial attempts, it seems it shrinks the distance in the time series variables a lot.
converting the discrete variable into a time series of constant values but I am not sure this is a great idea either.
Does anyone know of any examples or has anyone got any clever ideas?
You should be able to leverage any generic stock ticker analysis to get what you want. Here is a link that shows a simple time series analysis of stock data, as well as a few clustering exercises.

Hard time finding SARIMA parameters from ACF and PACF

Im a beginner in time series analyses.
I need help finding the SARIIMA(p,d,q,P,D,Q,S) parameters.
This is my dataset. Sampletime 1 hour. Season 24 hour.
Using the adfuller test I get p = 6.202463523469663e-16. Therefor stationary.
d=0 and D=0
Plotting ACF and PACF:
Using this post:
I learn to "start counting how many “lollipop” are above or below the confidence interval before the next one enter the blue area."
So looking at PACF I can see maybe 5 before one is below the confidence interval. Therefor non seasonal p=5 (AR).
But I having a hard time finding the q - MA parameter from the ACF.
"To estimate the amount of MA terms, this time you will look at ACF plot. The same logic is applied here: how much lollipops are above or below the confidence interval before the next lollipop enters the blue area?"
But in the ACF plot not a single lollipop is inside the blue area.
Any tips?
There are many different rules of thumb and everyone has own views. I would say, in your case you probably do not need the MA component at all. The rule with the lollipop refers to ACF/PACF plots that have a sharp cut-off after a certain lag, for example in your PACF after the second or third lag. Your ACF is trailing off which can be an indicator for not using the MA component. You do not have to necessarily use it and sometimes the data is not suited for an MA model. A good tip is to always check what pmdarima’s auto_arima() function returns for your data:
Looking at you autocorrelation plot you can clearly see the seasonality. Just because the ADF test tells you it is stationary does not mean it necessarily is. You should at least check if you model works better with seasonal differencing (D).

Quantify stationary seasonality

I want to quantify seasonal variation to be able to determine that one data has more seasonal variation than another data.
I am analyzing weekday variation in sales for a stores (Store A ). I have data between 1995 and 1999 and 2005 and 2009.
My aim is to identify and compare the daily Seasonality in 1995-1999 and 2004-2009.
I have worked with seasonality before, but I have never used any method to quantify seasonality.
I have identified the seasonal components using the decompose() function in R.
I run two separate models, one for 1995-1999 and one for 2004-2009.
I use additive models because the seasonality does not vary within these periods.
I report the results as seasonal index.
It is easy to see (Figure below) that there was less seasonality in 2005-2009 (dotted line) compared to 1995-1999 (solid).
However, I would like to be able to quantify the difference in seasonality.
Is it correct to use a simple Coefficient of variation (CV)? CV in 1995-1999 = 0.15. CV in 2005-2009 = 0.5.
strength of seasonality
small vs. large seasonality
I have read about the strength of seasonality and wonder about what it really indicates. What is the meant of strength of seasonality? feat_stl() function i r produce seasonal_strength. But is this really an indicator of how much seasonality a seasonal pattern holds? Is strength = "how much"
Is not the total area under/above the line of seasonality a better measure of increasing/ declining seasonality. The blue line obviously symbolizes much more seasonal variation compared to the red line. If you measure the arena below/above the lines, these areas also clearly shows this.
Is measuring the total area above/below the line a working way to quantify seasonal variation?
I understand that it can be more complex if the seasonal pattern is very fluctuating because that is also part of seasonal variation.

How to determine periodicity from FFT?

Let's say I have some data that corresponds to the average temperature in a city measured every minute for around 1 year. How can I determine if there's cyclical patterns from the data using an FFT?
I know how it works for sound... I do an FFT of a sound wave and now the magnitude is shown in the Y axis and the frequency in Hertz is shown in the X-axis because the sampling frequency is in Hertz. But in my previous example the sampling frequency would be... 1 sample every minute, right? So how should I change it to something meaningful? I would get cycles/minute instead of cycles per seconds? And what does cycles/minute would mean here?
I think your interpretation is correct - you are just scaling to different units. Once you've found the spectral peak you might find it more useful to take the reciprocal to express the value in minutes/cycle (ie the length of the periodic cycle). Effectively this is thinking in terms of wavelength rather than frequency.

Selecting an appropriate similarity metric & assessing the validity of a k-means clustering model

I have implemented k-means clustering for determining the clusters in 300 objects. Each of my object
has about 30 dimensions. The distance is calculated using the Euclidean metric.
I need to know
How would I determine if my algorithms works correctly? I can't have a graph which will
give some idea about the correctness of my algorithm.
Is Euclidean distance the correct method for calculating distances? What if I have 100 dimensions
instead of 30 ?
The two questions in the OP are separate topics (i.e., no overlap in the answers), so I'll try to answer them one at a time staring with item 1 on the list.
How would I determine if my [clustering] algorithms works correctly?
k-means, like other unsupervised ML techniques, lacks a good selection of diagnostic tests to answer questions like "are the cluster assignments returned by k-means more meaningful for k=3 or k=5?"
Still, there is one widely accepted test that yields intuitive results and that is straightforward to apply. This diagnostic metric is just this ratio:
inter-centroidal separation / intra-cluster variance
As the value of this ratio increase, the quality of your clustering result increases.
This is intuitive. The first of these metrics is just how far apart is each cluster from the others (measured according to the cluster centers)?
But inter-centroidal separation alone doesn't tell the whole story, because two clustering algorithms could return results having the same inter-centroidal separation though one is clearly better, because the clusters are "tighter" (i.e., smaller radii); in other words, the cluster edges have more separation. The second metric--intra-cluster variance--accounts for this. This is just the mean variance, calculated per cluster.
In sum, the ratio of inter-centroidal separation to intra-cluster variance is a quick, consistent, and reliable technique for comparing results from different clustering algorithms, or to compare the results from the same algorithm run under different variable parameters--e.g., number of iterations, choice of distance metric, number of centroids (value of k).
The desired result is tight (small) clusters, each one far away from the others.
The calculation is simple:
For inter-centroidal separation:
calculate the pair-wise distance between cluster centers; then
calculate the median of those distances.
For intra-cluster variance:
for each cluster, calculate the distance of every data point in a given cluster from
its cluster center; next
(for each cluster) calculate the variance of the sequence of distances from the step above; then
average these variance values.
That's my answer to the first question. Here's the second question:
Is Euclidean distance the correct method for calculating distances? What if I have 100 dimensions instead of 30 ?
First, the easy question--is Euclidean distance a valid metric as dimensions/features increase?
Euclidean distance is perfectly scalable--works for two dimensions or two thousand. For any pair of data points:
subtract their feature vectors element-wise,
square each item in that result vector,
sum that result,
take the square root of that scalar.
Nowhere in this sequence of calculations is scale implicated.
But whether Euclidean distance is the appropriate similarity metric for your problem, depends on your data. For instance, is it purely numeric (continuous)? Or does it have discrete (categorical) variables as well (e.g., gender? M/F) If one of your dimensions is "current location" and of the 200 users, 100 have the value "San Francisco" and the other 100 have "Boston", you can't really say that, on average, your users are from somewhere in Kansas, but that's sort of what Euclidean distance would do.
In any event, since we don't know anything about it, i'll just give you a simple flow diagram so that you can apply it to your data and identify an appropriate similarity metric.
To identify an appropriate similarity metric given your data:
Euclidean distance is good when dimensions are comparable and on the same scale. If one dimension represents length and another - weight of item - euclidean should be replaced with weighted.
Make it in 2d and show the picture - this is good option to see visually if it works.
Or you may use some sanity check - like to find cluster centers and see that all items in the cluster aren't too away of it.
Can't you just try sum |xi - yi| instead if (xi - yi)^2
in your code, and see if it makes much difference ?
I can't have a graph which will give some idea about the correctness of my algorithm.
A couple of possibilities:
look at some points midway between 2 clusters in detail
vary k a bit, see what happens (what is your k ?)
to map 30d down to 2d; see the plots under
also SO questions/tagged/pca
By the way, scipy.spatial.cKDTree
can easily give you say 3 nearest neighbors of each point,
in p=2 (Euclidean) or p=1 (Manhattan, L1), to look at.
It's fast up to ~ 20d, and with early cutoff works even in 128d.
Added: I like Cosine distance in high dimensions; see euclidean-distance-is-usually-not-good-for-sparse-data for why.
Euclidean distance is the intuitive and "normal" distance between continuous variable. It can be inappropriate if too noisy or if data has a non-gaussian distribution.
You might want to try the Manhattan distance (or cityblock) which is robust to that (bear in mind that robustness always comes at a cost : a bit of the information is lost, in this case).
There are many further distance metrics for specific problems (for example Bray-Curtis distance for count data). You might want to try some of the distances implemented in pdist from python module scipy.spatial.distance.
