First, apologies in case the question is pretty basic.
Can anyone help me interpret the ACF/PACF plots to identify the values of AR and MA in ARIMA model?
My data set is network traffic in an office which means that it has seasonality of 168 points (hourly aggregation). This is because the traffic on all same days is similar (eg. All Monday's sees heavy traffic)
graph acf and pacf
If your data was non-stationary, the differenced ACF and PACF plots are the ones you should look at. Judging from the graphs you provided, the difference ACF shows a significant lag at 1 and it is positive in value, so consider adding AR(1) term to your model, that is for ARIMA, use p=1 and a q=0, because there is no significant negative correlation at lags 1 and above.
As per my understanding AR(p)=2 and MA(q)=1
AR(p)=2 and MA(q)=1
Please read this blog
https://arauto.readthedocs.io/en/latest/how_to_choose_terms.html
Related
can we predict growth percentage in sales of an item given the change in discount(positive or negative number) from the previous year as a predictor variable. There seems to be no correlation between these. How to solve this problem using machine learning?
You are on the wrong track to ask this question.
Correlation is on the knowledge side of Statistics, Please check Pearson’s correlation of coefficient / Spearman’s correlation of coefficient in order to find the correlation between the discount changes and the sales groth correlation.
In Machine Learning, we seldom compare two percentage data, instead, we compare the actual sales/discount value. A simple ML can be applied by Linear regression (most ML is used in multi-dimension, as your case is one-x one-y data (single column to single output). Please refer to related information online and solved with excel or python code.
I am using statsmodels ARIMA (1,2,1) to predict the monthly demand for a product. The predictions look like they are shifted to the right by one month. I wonder if the statsmodels.ARIMA.Residuals.predict returns something different than the monthly demand predictions (maybe differences). Or is there anything else I might be doing wrong? I have also attached the acf and pacf plots.
Thanks
I'm currently performing a time series analysis using the ARIMA model. I cannot read the PACF and ACF graphs in order to determine P & Q values. Any help would be appreciated.
Thanks
Autocorrelation shows the correlation of past observations (lags) with the time series, which is the correlation of the time series with itself. If you have a time series y(t), then you calculate the correlation of y(t) and y(t-1), y(t) and y(t-2), and so on.
The problem with the autocorrelation is that so called intermediary effects/indirect correlations are also included. If y(t) and y(t-1) correlate, and y(t-1) and y(t-2) also correlate. This influences the correlation of y(t) and y(t-2) indirectly. You can find a more detailed explanation here:
https://otexts.com/fpp2/non-seasonal-arima.html
Partial autocorrelation also shows the correlation of a time series and it’s lags, but intermediary effects are removed. That means in the PACF you can only see how y(t) is influenced directly by y(t-1), y(t-2), and so on. Maybe also have a look here:
https://towardsdatascience.com/time-series-from-scratch-autocorrelation-and-partial-autocorrelation-explained-1dd641e3076f
Now, how do you read ACF and PACF plots? The plot shows you the correlation per each lag. A correlation coefficient usually ranges from -1 meaning a perfect negative relationship to +1 meaning a perfect positive relationship. 0 means no relationship at all. In your case, y(t) and y(t-1) -> lag 1 correlate with a coefficient of around -0.55, meaning a medium strong negative relationship. y(t) and y(t-8) -> lag 8 correlate with a coefficient of +0.3, meaning a weak positive relationship. The confidence limit shows you whether the correlation is statistically significant. Basically, this means that every bar that crosses the line is a “true” correlation that is not random or so, you can use these correlations. On the other hand, t(y) and t(y-2) -> lag 2 have a very weak correlation that seems to be more or less random. You cannot use this relationship.
In general, strong correlations in the PACF indicate the usage of an MA model, so you should use and ARIMA(0, d, q) model. I would recommend to use the first, third and fourth lag, maybe also fifth lag, since these have at least medium strong, significant correlations. This means an ARIMA(0, d, [1, 3, 4, 5]) model.
But you also need the ACF plot to find the best ARIMA order, especially the p value.
First perform one differentiaiton, and then read number of lags for PACF, for current plot P is 8, which is quite high
for decent generalization, how many images per class is needed for fine-tuning the Resnet-50 model for ASL HandSign Classification(24 classes)? I have around 600 images per class and the model is overfitting very badly.
I can't give you a number, but a method to find it yourself. The technique is plotting a graph called "learning curve" where the x-axis is the number if training samples and the y-axis is the score. You start at 1 training sample and increase to 600. You plot two curves: the training error and the test error. You can then see how much influence more data without any other change will have on the result.
More details and the following image in my masters thesis, section 2.5.4:
In this example you can see that having up to 20 training samples each new example is improving the test score a lot (green curve goes down a lot). But after that, throwing just more data at the problem will not help a lot.
The curve will look different in your case, but the principle should be the same.
Other analysis
Look at chapter 2.5 and 2.6 of my masters thesis. I especially recommend having a look at the confusion matrix and confusion matrix ordering. This will give you an idea which classes are confused. Maybe the classes are just inherently difficult to distinguish? Maybe one can add more features? Maybe there are labeling errors? Have a look at chapter 2.5 for more of those "maybe's"
I am new in time series analysis. I am trying to find the trend of a short (1 day) temperature time series and tried to different approximations. Moreover, sampling frequency is 2 minute. The data were collocated for different stations. And I will compare different trends to see whether they are similar or not.
I am facing three challenges in doing this:
Q1 - How I can extract the pattern?
Q2 - How I can quantify the trend since I will compare trends belong to two different places?
Q3 - When can I say two trends are similar or not similar?
Q1 -How I can extract the pattern?
You would start by performing time series analysis on both your data sets. You will need a statistical library to do the tests and comparisons.
If you can use Python, pandas is a good option.
In R, the forecast package is great. Start by running ets on both data sets.
Q2 - How I can quantify the trend since I will compare trends belong to two different places?
The idea behind quantifying trend is to start by looking for a (linear) trend line. All stats packages can assist with this. For example, if you are assuming a linear trend, then the line that minimizes the squared deviation from your data points.
The Wikipedia article on trend estimation is quite accessible.
Also, keep in mind that trend can be linear, exponential or damped. Different trending parameters can be tried to take care of these.
Q3 - When can I say two trends are similar or not similar?
Run ARIMA on both data sets. (The basic idea here is to see if the same set of parameters (which make up the ARIMA model) can describe both your temp time series. If you run auto.arima() in forecast (R), then it will select the parameters p,d,q for your data, a great convenience.
Another thought is to perform a 2-sample t-test of both your series and check the p-value for significance. (Caveat: I am not a statistician, so I am not sure if there is any theory against doing this for time series.)
While researching I came across the Granger Test – where the basic idea is to see if one time series can help in forecasting another. Seems very applicable to your case.
So these are just a few things to get you started. Hope that helps.