auto-arima not proving the Seasonal Factors - time-series

I have weekly data and the frequency is 'W-sun' for my data.
Frequency description of the data
The data is seasonal :
Seasonal Graph of the data
However, when I am trying to get the P, D and Q from auto-arima I am not able to get the factors for the same. please let me know where I am going wrong
Auto- arima

Related

Facing difficulty in understanding Trend, Seasonality and Residual graph?

I am forecasting the new covid cases in upcoming time and I want to see the trend, seasonality present in the dataset.
I tried to see the trend and seasonality using seasonal decompose.
from statsmodels.tsa.seasonal import seasonal_decompose
#decomposition
decomposition = seasonal_decompose(x = df.new_cases,
model = 'multiplicative')
decomposition.plot()
and I got this -
I am not able to understand the seasonal graph. What does it trying to show? Does that mean my dataset doesn't has any seasonality?
and what does Resid graph indicates?

Time series prediction using GP - training data

I am trying to implement time series forecasting using genetic programming. I am creating random trees (Ramped Half-n-Half) with s-expressions and evaluating each expression using RMSE to calculate the fitness. My problem is the training process. If I want to predict gold prices and the training data looked like this:
date open high low close
28/01/2008 90.959999 91.889999 90.75 91.75
29/01/2008 91.360001 91.720001 90.809998 91.150002
30/01/2008 90.709999 92.580002 90.449997 92.059998
31/01/2008 90.919998 91.660004 90.739998 91.400002
01/02/2008 91.75 91.870003 89.220001 89.349998
04/02/2008 88.510002 89.519997 88.050003 89.099998
05/02/2008 87.900002 88.690002 87.300003 87.68
06/02/2008 89 89.650002 88.75 88.949997
07/02/2008 88.949997 89.940002 88.809998 89.849998
08/02/2008 90 91 89.989998 91
As I understand, this data is nonlinear so my questions are:
1- Do I need to make any changes to this data like exponential smoothing? and why?
2- When looping the current population and evaluating the fitness of each expression on the training data, should I calculate the RMSE on just part of this data or all of it?
3- When the algorithm finishes and I get an expression with the best (lowest) fitness, does this mean that when I apply any row from the training data, the output should be the price of the next day?
I've read some research papers about this and I noticed some of them mentioning dividing the training data when calculating the fitness and some of them are doing exponential smoothing. However, I found them a bit difficult to read and understand, and most implementations I've found are either in python or R which I am not familiar with.
I appreciate any help on this.
Thank you.

Trend and Seasonality from time series

how can we extract trend, seasonality from a time series in a way SARIMAX does internally.
I need to use the same to understand how much importance (feature importance) trend, seasonality, AR component, MA component and exogenous variables are to the forecast.
You can do this way -
from statsmodels.tsa.seasonal import seasonal_decompose
#decomposition
decomposition = seasonal_decompose(x = df.y, model = 'multiplicative')
decomposition.plot()
# df is the dataframe of y is the name of column having values of which you want
to see trends and seasonality.
# model value can be additive or multiplicative.

ARIMA model producing a straight line prediction

I did some experiments with the ARIMA model on 2 datasets
Airline passengers data
USD vs Indian rupee data
I am getting a normal zig-zag prediction on Airline passengers data
ARIMA order=(2,1,2)
Model Results
But on USD vs Indian rupee data, I am getting prediction as a straight line
ARIMA order=(2,1,2)
Model Results
SARIMAX order=(2,1,2), seasonal_order=(0,0,1,30)
Model Results
I tried different parameters but for USD vs Indian rupee data I am always getting a straight line prediction.
One more doubt, I have read that the ARIMA model does not support time series with a seasonal component (for that we have SARIMA). Then why for Airline passengers data ARIMA model is producing predictions with cycle?
Having gone through similar issue recently, I would recommend the following:
Visualize seasonal decomposition of the data to make sure that the seasonality exists in your data. Please make sure that the dataframe has frequency component in it. You can enforce frequency in pandas dataframe with the following :
dh = df.asfreq('W') #for weekly resampled data and fillnas with appropriate method
Here is a sample code to do seasonal decomposition:
import statsmodels.api as sm
decomposition = sm.tsa.seasonal_decompose(dh['value'], model='additive',
extrapolate_trend='freq') #additive or multiplicative is data specific
fig = decomposition.plot()
plt.show()
The plot will show whether seasonality exists in your data. Please feel free to go through this amazing document regarding seasonal decomposition. Decomposition
If you're sure that the seasonal component of the model is 30, then you should be able to get a good result with pmdarima package. The package is extremely effective in finding optimal pdq values for your model. Here is the link to it: pmdarima
example code pmdarima
If you're unsure about seasonality, please consult with a domain expert about the seasonal effects of your data or try experimenting with different seasonal components in your model and estimate the error.
Please make sure that the stationarity of data is checked by Dickey-Fuller test before training the model. pmdarima supports finding d component with the following:
from pmdarima.arima import ndiffs
kpss_diff = ndiffs(dh['value'].values, alpha=0.05, test='kpss', max_d=12)
adf_diff = ndiffs(dh['value'].values, alpha=0.05, test='adf', max_d=12)
n_diffs = max(adf_diff , kpss_diff )
You may also find d with the help of the document I provided here. If the answer isn't helpful, please provide the data source for exchange rate. I will try to explain the process flow with a sample code.

MAP estimation (predictive distribution)

Yes, this is an old exam question but unfortunately I can't find any solution.
Suppose that you are a part of a team that has trained n temperature prediction models. The models use readings from a set of sensors that measure weather conditions on a given day and predict the temperature for the following day. The i-th model is fully determined by a vector of
parameters wi and estimates the conditional probability P(y | x, wi) of observing temperature y if the sensor state is x. Furthermore, based on historical data your team has a prior belief over
the models given by P(wi) = 2i/(n(n+1)) for i ∈ {1, . . . , n}.
Given that the measured sensor state today is x* please write down the predictive distribution P(y*| x*) for the temperature tomorrow using MAP estimation.
We didn't really cover predictive distributions. So for a MAP estimate we would want to estimate the parameters wi on the basis of the observed sensor data x. But how can I fit y in here?
Any hints appreciated :-)

Resources