ARIMA predictions look shifted to the right? - time-series

I am using statsmodels ARIMA (1,2,1) to predict the monthly demand for a product. The predictions look like they are shifted to the right by one month. I wonder if the statsmodels.ARIMA.Residuals.predict returns something different than the monthly demand predictions (maybe differences). Or is there anything else I might be doing wrong? I have also attached the acf and pacf plots.
Thanks

Related

Time Series Data: Trend and Multi-Seasonality, SARIMA and TBATS predictions not working

I have ~2.6K hours of sales data with a positive linear trend as well as daily and weekly seasonality. See plotted data. I have tried to model the data using SARIMA and TBATS in python. In both cases, I cannot get the predictions to work as I intend.
For SARIMA, the in sample predictions look great, but when I try to forecast into the future, it looks completely wrong.See here for in sample SARIMA predictionsSee here for how poor the out of sample SARIMA predictions are.
For TBATS, the predicted values match the daily and weekly pattern, but is missing the positive trend, despite me forcing so that use_trend = True. See TBATS model prediction here
I have no idea what I'm doing wrong and have been stuck on this for days! Any advice greatly appreciated.

I have around 2000 data points for multiple locations for time series forecasting. Can I apply LSTM model on it?

I am new to machine learning and therefore, trying to figure out if my dataset is enough to run LSTM model.
I am trying to do time series forecasting on daily road traffic data. Currently, I have daily data (2012-2019) for 20 different locations. Essentially, I just have ~2800 data points for each of the location. Is that a good dataset to start with?
Any recommendations on how I can tweak the data or transform it to help with my dataset?
Please help! Thank you!!
Consider this your dataset is ~ 2800*20 examples. Now you can always run an LSTM/RNN model on this much data, but you should try to check whether they are outperforming baseline models like Autoregressive Moving Average (ARMA),
Autoregressive Integrated Moving Average (ARIMA).
Also, if data is in format:
Example_1: Day_1: x, Day_2:y, ...., Day_n: xx .etc
Rather than inputing whole Day_1 ... Day_n features to predict Day_n+1
You can always increase your dataset by using Day_1 to predict Day_2 and so on.
Check this LINK. Something I worked on which might help.

how to build an linear regression model for daily predictions

i need to create a prediction model that predicts the quantity of an item per day...
this how my data look like on DB...
item id |date | quantity
1000 |2020-02-03 | 5
what I did is converted the date to :
year number
number of the week in year
weekday number
I trained this model on a dataset of 100,000 items with RegressionFastForest, RegressionFastTree, LbfgsPoissonRegression, FastTreeTweedie
but results are not so good (RMSE SCORE OF 3.5 - 4)
am I doing this wrong ?
I am using ML.NET if its matter
thanks
There're several techniques of time series forecasting. But the main point: we don't seek dependency of value on date. Instead, we're seeking dependence of value[i] on value[i-1].
Most common techniques are family of ARIMA models and recurrent neural networks. I would recommend to read about them. But, if you don't have much time or something else, there's something that can help. And it's Auto ARIMA models.
Implementation of auto ARIMA exists at least in Python and R. Here's python version:
from pyramid.arima import auto_arima
model = auto_arima(y)
where y is your time series.
P.S. Even though it is called auto model (which means that the algorithm will choose best hyperparameters by itself), you should still understand what does: p, q, P, Q and S mean.
There are several problems with directly applying linear regression to your data.
1) If item id is an index of sorts and does not reflect physical properties of the item, then it is a categorical feature. Use OneHotEncoding to replace it with regression-friendly labels.
2) If you assume that you data may have a cyclical dependence on the time of the day/week/month, use sin and cos of those functions. It will not work with year, as it is not periodic. Here is a good guide with examples in Python.
Good luck!
P. S. I usually use LogisticRegression in tasks with sparse representations of categorical features (OneHotEncoding) for benchmark. It will not be as good as a state-of-the art NN solution, but gives me a clue what the benchmark looks like.

interpretation of ACF & PACF plots

First, apologies in case the question is pretty basic.
Can anyone help me interpret the ACF/PACF plots to identify the values of AR and MA in ARIMA model?
My data set is network traffic in an office which means that it has seasonality of 168 points (hourly aggregation). This is because the traffic on all same days is similar (eg. All Monday's sees heavy traffic)
graph acf and pacf
If your data was non-stationary, the differenced ACF and PACF plots are the ones you should look at. Judging from the graphs you provided, the difference ACF shows a significant lag at 1 and it is positive in value, so consider adding AR(1) term to your model, that is for ARIMA, use p=1 and a q=0, because there is no significant negative correlation at lags 1 and above.
As per my understanding AR(p)=2 and MA(q)=1
AR(p)=2 and MA(q)=1
Please read this blog
https://arauto.readthedocs.io/en/latest/how_to_choose_terms.html

Prediction Algorithm for Basketball Stats

I'm working on a project where I need to predict future stats based on past stats of basketball players. I would like to be able to predict next season's statistics based on the statistics of the past three seasons (if there are three previous seasons to choose from). Does anyone have a suggestion for a good prediction algorithm I could use? The data is continuous and there can be anywhere between 5-14 dimensions (age, minutes, points, etc.)
Thanks!
Note: I'd really like to use the program Weka to do this.
Out of the box, random forest would likely give you a strong baseline, so I would start with this.
You can also try try linear regression, which is a simple yet relative effective method, but depending on the data might require a bit more tweaking (for example transforming some of the input and/or out variables).
Gradient boosting regression is another strong predictor, but typically also needs more tweaking to work well.
All of these algorithms have Weka implementations.
There obviously isn't one correct answer, but for anyone looking to do something similar, I'll better describe my problem and the solution that I've found. I created a csv file where each row is a different season, and each column contains a different attribute. For each attribute that I would like to predict, I have the stats for the current season and then another column for the stats for the previous season. The first (rookie) season will have 0 for all 'previous season' columns. With this data set, I loaded it into Weka and used a Multilayer Perceptron with the test-option set to Cross-Validation. I set the number of folds to somewhere between 80-90% of the number of seasons available.
Finally, to predict the next season's statistics, you add one more row to the end and input the last-season values with "?" in the columns that you would like to predict. If anyone would like a deeper example, I'd be glad to provide one.
I think also if you truly want to create an accurate prediction you have to look at player movement and if a player moves to a team with a losing record, do they increase their minutes to have a larger role which would inflate stats or move to a winning team for a lesser role where they could see a decrease in stats.

Resources