similarity difference of time series data with different time interval - time-series

Hi I'm currently trying to get score similarity between two time series data.
Metric that are used now is KL-divergence. But I want to know if there are any other metrics. There are many metrics (e.g. Dynamic time warping, Fréchet distance). However I can't apply these metrics because my time series data starting time is different. For example, stock_data_1 is starting at monday and stock_data_2 is starting at friday. I want to know if there is metirc that compare two time series data with different starting point and can detect distribution difference which is specific for time series data.
Thanks in advance.

Related

Dynamic Multi-step Time Series Forecasting

I am using LSTM to forecast the future temperature of a specific device so my data only have Date and Temperature
I want to create a method to forecast N number of time steps in future. N is changeable (I will take it from the user). But the main thing is, I don't want to use any loops in this method because of the time restrictions.

Converting Multiple Time-series Signals into One Spectrogram

Is it possible to combine multivariate time series signals (from the same event) and convert them to the frequency domain to feed a spectrogram? I would like to convert these signals to the frequency domain so that I can perform a Convolutional Neural Network and predict classifications of events.
So far, I've only seen examples using just ONE (1) time series, not multidimensional. Such as pictured here below.
Time Series to Spectrogram
As an example, let's assume (in the figure below) this is the data I collected in multiple time series for 1 day in the year. I've collected similar data for 30 other days. I want to combine the signals in a way to create a frequency spectrogram.
Multivariate
Can this be done? What are some ways to perform this operation?
Can you please provide the code/github link of that one time series conversion

Impute time series using similar time series

I have a problem where I have a lot of data about 1 year recordings of thermostats where every hour it gives me the mean temperature in that household. But a lot of data is not available due to they only installed the thermostat in the middle of the year or they put out the thermostat for a week or ... But a lot of this thermostat data is really similar. What I want to do is impute the missing data using similar timeseries.
So lets say house A only started in july but from there they are very similar to household B I would want to then use the info from household B to predict what the data dould be before july in house A.
I was thinking about training a Recurrent Neural Network that could do this for me but I am not shure what is out there to do this and when I search for papers and such they almost exclusively work on data sets over multiple years and impute the data using the data of previous years. I do not have this data, so that is not an option.
Does anyone have a clue how to tackle this problem or a refference I could use that solves a similar problem ?
As I understand it you want to impute the data using cross-sectional data rather than time series information.
There are actually quite a lot of imputation packages that can do this for you in R. (if you are using R)
You'd need equally spaced data. So 1 values per hour and if it is not present, then it needs to be NA. So ideally you have then multiple time series of qual length.
Then you merge these time series according to the time stamp / hour.
Afterwards you can apply an imputation package like e.g. mice, missForest, imputeR with basically one line of code. These packages will use the correlations between the different time series to estimate the missing values in these series.

After clustering on a subset of time series, how can I associate the remaining time series with already created clusters?

I would like to know if there is a way to associate time series with existing clusters?
In practice, I considered a subset of time series and for each I extracted some features (after which I applied the k-means and grouped similar ones) having 6 clusters.
Is it possible to insert the remaining time series directly into one of the clusters already created in which there are similar time series?
You can get the centroid of each cluster, compute the distance of the new time-serie to each centroid and insert that time-serie to the nearest cluster.

Before clustering should i do an analysis on time series?

I have a question. I have a lot of different items, different articles of a company, (26000) and i have the sell quantity of 52 weeks of 2017. I need to do a forecasting model for the future so I decided to do a cluster of items.
The goal is to show the quantity of items that were sold during 2017 in the similar quantity and for the new collection of items i do a classification based on the cluster and do a specific model forecasting for items. It’s my first time that i use machine learning so i need help.
Do I need to do an analysis about correlation before i do the cluster?
I can create a metric based on correlation that i put in my cluster function like the distance metric.
Doing clustering on time series data cannot yield results on raw data.
Time series data is about trends and not actual values.
Try transforming your data to reflect some trends and the do clustering.
For example suppose your data is like 5,10,45,23
Transform it to 0,1,1,0. (1 means increase in value then previous). By doing so you can cluster the items which increases or decreases together.
This is just an opinion, you will have to try out various transformations and see what works on your data. https://datascience.stackexchange.com/ is relevant place to ask such questions

Resources