statsmodels ExponentialSmoothing forecasts constant values - time-series

I am very new to time series analysis and currently comparing exponential smoothing and arima forecast on a daily sales data with statsmodels. The data looks like this:
enter image description here
My code is below:
train.index = train.index.to_period('D')
expo = ExponentialSmoothing(train, initialization_method = 'estimated', seasonal_periods = 23).fit()
test.index = test.index.to_period('D')
ytrue = test.iloc[:, 9]
eres = expo.forecast(83)
#rmae = np.sqrt(mae(ytrue, eres))
from statsmodels.tsa.arima.model import ARIMA
model=ARIMA(train,order=(8,0,20)).fit()
arm = model.forecast(83)
eres.index = test.index
arm.index = test.index
test['expo'] = eres
test['arima'] = arm
test[['DAILY_UNITS', 'expo', 'arima']].plot()
The plot generated shows that the exponential smoothing model always forecasts constant. I have tried change some of the parameters but they only change the constant level rather than make it vary. Can someone help me with this? I am very confused right now.
enter image description here
It seems like my exponential smoothing has no problem with in sample prediction but struggles with out of sample prediction (forecast).

Related

Creating a new dataset of hidden state probabilities using a HMM results in different shapes after each run

I'm trying to create a new dataset of hidden state probabilities using a hidden Markov model. Everything works fine unless each time the output dataset comes up with different values (sometimes the same values) for hidden_states_train and hidden_states_test hence resulting a different column sizes in the columns stack/ a feature mismatch. e.g New dataset size (15261, 197) (5087, 194), New dataset size (15261, 197) (5087, 197) etc.
I can't figure out why this is happening each time I run the code. I tried to give same number of samples for both X_train_st and X_test_st but this keeps happening. If I set n_comp in range a smaller range e.g for n_comp in range(1,6) then often it results the same shapes.
Can someone shed some light to what's going on and a possible fix, please?
newX = X_train_st
newXtest = X_test_st
for n_comp in range(1,16):
print("fitting to HMM and decoding %d ..." % n_comp , end="")
modelHMM = GaussianHMM(n_components=n_comp, covariance_type="diag").fit(X_train_st)
hidden_states_train = to_categorical(modelHMM.predict(X_train_st))
hidden_states_test = to_categorical(modelHMM.predict(X_test_st))
print("done")
newX = np.column_stack((newX,hidden_states_train))
newXtest = np.column_stack((newXtest,hidden_states_test))
print('New dataset size',newX.shape,newXtest.shape)

How to only include p-value for certain comparisons using stat_pvalue_manual function?

I am trying to visualise a friedman's test followed by pairwise comparisons using a boxplot with p-values.
Here is an example of how it should look like:
[example graph downloaded from the internet][1]
However, since there are way too many significant comparisons in my case, my graph currently looks like this:
[my graph][2]
[1]: https://i.stack.imgur.com/DO6Vz.png
[2]: https://i.stack.imgur.com/94OXK.png
Here is the code I used to generate the graph with p-value
pwc_IFX_plot <- pwc_IFX %>% add_xy_position(x = "Variant")
ggboxplot(IFX_variant, x = "Variant", y = "Concentration", add = "point") +
stat_pvalue_manual(pwc_IFX_plot, hide.ns = TRUE)+
labs(
subtitle = get_test_label(res.fried_IFX, detailed = TRUE),
caption = get_pwc_label(pwc_IFX)
)+scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
I hope to only show the comparisons of each group to my control group, rather than all the intergroup comparisons.
Thank you for your time.
Any suggestions would be highly appreciated!

Stata timeseries rolling forecast

I'm new to Stata and have a question about its command language. I want to use my ARIMA model to forecast, ie use x[t], x[t-1]... to produce an estimate xhat[t+1], and then roll forward one time step, to make the next forecast, rebuilding the model every N time steps.
i can duplicate code, something like the following code for T, T+1, T+2, etc.:
arima x if t<=T, arima(2,0,2)
predict xhat
to produce a series of xhats to compare with in-sample x observations. There must be a more natural way to do this in the command language. any suggestions, pointers would be very much appreciated.
Posting a working solution provided by Stata tech support:
webuse dfex
tsset month
generate int id = _n
capture program drop forecarima
program forecarima, rclass
syntax [if]
tempvar yhat
arima unemp `if', arima(1,1,0)
local T = e(tmax)
local T1 = `T' + 1
summarize id if month == `T1'
local h = r(max)
predict `yhat', y dynamic(`T')
return scalar y = unemp[`h']
return scalar yhat = `yhat'[`h']
end
rolling unemp = r(y) unemp_hat = r(yhat), window(400) recursive ///
saving(results,replace): forecarima
use results,clear
browse
this provides output with the prediction and observed both available. the dates are off by one step, but easier left to post-processing.

ARIMA Python - predict function does not take date as input

I have a time series input data where the time is irregular as below: data_series_final is the data series
Price Date
9654 28.04.2013
10040 01.01.2014
10381 01.01.2017
10040 04.07.2016
11011 02.04.2018
10381 05.01.2018
10849 05.02.2018
11011 05.03.2018
11602 07.05.2018
Case 1: I did not regularize the intervals and went on to stationarize the data using log and difference method
ds_log = np.log(data_series_final)
data_diff_log = ds_log.diff(periods=1)
data_diff_log = data_diff_log.dropna()
data_diff_log_train = data_diff_log[0:5]
data_diff_log_test = data_diff_log[5:]
The data was pretty stationary and the training model followed the ARIMA fitted model. But when I ran the predict function on ARIMA, and passed in the dates as input parameter, I got the following error:
model_arima = ARIMA(data_diff_log_train, order=(0,0,2))
model_arima_fit = model_arima.fit()
predict_arima=[]
predict_arima = model_arima_fit.forecast(steps=2)[0]
predictions = model_arima_fit.predict(start='2020-01-01', end='2021-01-01')
KeyError: '2020-01-01'
TypeError: an integer is required
KeyError: 'The start argument could not be matched to a location related to the index of the data.'
Case 2: I tried to regularize the time interval by padding the data,
#
#Upsampling and Interpolating data
data_upsampled = data_series_final.resample('D', convention='start').asfreq()
data_upsampled = upsampled.interpolate(method='linear')
#data_upsampled = data_series_final.resample('D', fill_method = 'ffill')
But i could not stationarize the data as there were peaks in values. Also i got the same error as mentioned above and was not able to send in datetime as parameter.
Can someone please help?

ValueError: Found input variables with inconsistent numbers of samples : [1, 14048]

I am trying to run MultinomiaL Naive bayes and receiving the below error. Sample training data is given. Test data is exactly similar.
def main():
text_train, targets_train = read_data('train')
text_test, targets_test = read_data('test')
classifier1 = MultinomialNB()
classifier1.fit(text_train, targets_train)
prediction1 = classifier1.predict(text_test)
Sample Data:
Train:
category, text
Family, I love you Mom
University, I hate this course
Sometimes I face this question and find most of reason from the error is the input data should be 2-D array, such as if you want to build a regression model. you write this code and then you will face this error!
for example:
a = np.array([1,2,3]).T
b = np.array([4,5,6]).T
regr = linear_model.LinearRegression()
regr.fit(a, b)
then you should add something!
a = np.array([[1,2,3]]).T
b = np.array([[4,5,6]]).T
lastly you will be run normally!
so it is just my empirical!
This is just a reference, not a standard answer!
i am from Chinese as a student in learning English and python!

Resources