ARIMA Python - predict function does not take date as input - machine-learning

I have a time series input data where the time is irregular as below: data_series_final is the data series
Price Date
9654 28.04.2013
10040 01.01.2014
10381 01.01.2017
10040 04.07.2016
11011 02.04.2018
10381 05.01.2018
10849 05.02.2018
11011 05.03.2018
11602 07.05.2018
Case 1: I did not regularize the intervals and went on to stationarize the data using log and difference method
ds_log = np.log(data_series_final)
data_diff_log = ds_log.diff(periods=1)
data_diff_log = data_diff_log.dropna()
data_diff_log_train = data_diff_log[0:5]
data_diff_log_test = data_diff_log[5:]
The data was pretty stationary and the training model followed the ARIMA fitted model. But when I ran the predict function on ARIMA, and passed in the dates as input parameter, I got the following error:
model_arima = ARIMA(data_diff_log_train, order=(0,0,2))
model_arima_fit = model_arima.fit()
predict_arima=[]
predict_arima = model_arima_fit.forecast(steps=2)[0]
predictions = model_arima_fit.predict(start='2020-01-01', end='2021-01-01')
KeyError: '2020-01-01'
TypeError: an integer is required
KeyError: 'The start argument could not be matched to a location related to the index of the data.'
Case 2: I tried to regularize the time interval by padding the data,
#
#Upsampling and Interpolating data
data_upsampled = data_series_final.resample('D', convention='start').asfreq()
data_upsampled = upsampled.interpolate(method='linear')
#data_upsampled = data_series_final.resample('D', fill_method = 'ffill')
But i could not stationarize the data as there were peaks in values. Also i got the same error as mentioned above and was not able to send in datetime as parameter.
Can someone please help?

Related

statsmodels ExponentialSmoothing forecasts constant values

I am very new to time series analysis and currently comparing exponential smoothing and arima forecast on a daily sales data with statsmodels. The data looks like this:
enter image description here
My code is below:
train.index = train.index.to_period('D')
expo = ExponentialSmoothing(train, initialization_method = 'estimated', seasonal_periods = 23).fit()
test.index = test.index.to_period('D')
ytrue = test.iloc[:, 9]
eres = expo.forecast(83)
#rmae = np.sqrt(mae(ytrue, eres))
from statsmodels.tsa.arima.model import ARIMA
model=ARIMA(train,order=(8,0,20)).fit()
arm = model.forecast(83)
eres.index = test.index
arm.index = test.index
test['expo'] = eres
test['arima'] = arm
test[['DAILY_UNITS', 'expo', 'arima']].plot()
The plot generated shows that the exponential smoothing model always forecasts constant. I have tried change some of the parameters but they only change the constant level rather than make it vary. Can someone help me with this? I am very confused right now.
enter image description here
It seems like my exponential smoothing has no problem with in sample prediction but struggles with out of sample prediction (forecast).

Predict type="response" to type="terms conversion in R

Can someone please help me with the math? I need to convert the output of my GLM from response to terms to understand the math.
Let's say I am using gender (female(1), male(0)) to predict the college admission rate (0 to 1).
model <- glm(admission_rate ~ gender, data = data,family = quasipoisson(link="log"))
Model coefficients are
intercept 0.24918
genderFemale -0.23229
Now when I run
predict.glm(model, data = data, type = "response")
the values I will get will have the equation y= 0.24918 + (-0.23229) * 1 for female and y= 0.24918 for male. Since it is a link GLM, we take an exponent of each and what we get is our fitted values produced by type=response.
female = 1.017
male = 1.283
I have tried so many things to convert it to fitted values produced by type=terms, but did not get it to match.
The fitted values produced by terms should be
female = 0.152984
male = -0.07
constant = 0.096198
If you can explain the math behind, I would really really appreciate it!

Stata timeseries rolling forecast

I'm new to Stata and have a question about its command language. I want to use my ARIMA model to forecast, ie use x[t], x[t-1]... to produce an estimate xhat[t+1], and then roll forward one time step, to make the next forecast, rebuilding the model every N time steps.
i can duplicate code, something like the following code for T, T+1, T+2, etc.:
arima x if t<=T, arima(2,0,2)
predict xhat
to produce a series of xhats to compare with in-sample x observations. There must be a more natural way to do this in the command language. any suggestions, pointers would be very much appreciated.
Posting a working solution provided by Stata tech support:
webuse dfex
tsset month
generate int id = _n
capture program drop forecarima
program forecarima, rclass
syntax [if]
tempvar yhat
arima unemp `if', arima(1,1,0)
local T = e(tmax)
local T1 = `T' + 1
summarize id if month == `T1'
local h = r(max)
predict `yhat', y dynamic(`T')
return scalar y = unemp[`h']
return scalar yhat = `yhat'[`h']
end
rolling unemp = r(y) unemp_hat = r(yhat), window(400) recursive ///
saving(results,replace): forecarima
use results,clear
browse
this provides output with the prediction and observed both available. the dates are off by one step, but easier left to post-processing.

ValueError: Found input variables with inconsistent numbers of samples : [1, 14048]

I am trying to run MultinomiaL Naive bayes and receiving the below error. Sample training data is given. Test data is exactly similar.
def main():
text_train, targets_train = read_data('train')
text_test, targets_test = read_data('test')
classifier1 = MultinomialNB()
classifier1.fit(text_train, targets_train)
prediction1 = classifier1.predict(text_test)
Sample Data:
Train:
category, text
Family, I love you Mom
University, I hate this course
Sometimes I face this question and find most of reason from the error is the input data should be 2-D array, such as if you want to build a regression model. you write this code and then you will face this error!
for example:
a = np.array([1,2,3]).T
b = np.array([4,5,6]).T
regr = linear_model.LinearRegression()
regr.fit(a, b)
then you should add something!
a = np.array([[1,2,3]]).T
b = np.array([[4,5,6]]).T
lastly you will be run normally!
so it is just my empirical!
This is just a reference, not a standard answer!
i am from Chinese as a student in learning English and python!

torch backward through gModule

I have a graph as follows, where the input x has two paths to reach y. They are combined with a gModule that uses cMulTable. Now if I do gModule:backward(x,y), I get a table of two values. Do they correspond to the error derivative derived from the two paths?
But since path2 contains other nn layers, I suppose I need to derive the derivates in this path in a stepwise fashion. But why did I get a table of two values for dy/dx?
To make things clearer, code to test this is as follows:
input1 = nn.Identity()()
input2 = nn.Identity()()
score = nn.CAddTable()({nn.Linear(3, 5)(input1),nn.Linear(3, 5)(input2)})
g = nn.gModule({input1, input2}, {score}) #gModule
mlp = nn.Linear(3,3) #path2 layer
x = torch.rand(3,3)
x_p = mlp:forward(x)
result = g:forward({x,x_p})
error = torch.rand(result:size())
gradient1 = g:backward(x, error) #this is a table of 2 tensors
gradient2 = g:backward(x_p, error) #this is also a table of 2 tensors
So what is wrong with my steps?
P.S, perhaps I have found out the reason because g:backward({x,x_p}, error) results in the same table. So I guess the two values stand for dy/dx and dy/dx_p respectively.
I think you simply made a mistake constructing your gModule. gradInput of every nn.Module has to have exactly the same structure as its input - that is the way backprop works.
Here's an example how to create a module like yours using nngraph:
require 'torch'
require 'nn'
require 'nngraph'
function CreateModule(input_size)
local input = nn.Identity()() -- network input
local nn_module_1 = nn.Linear(input_size, 100)(input)
local nn_module_2 = nn.Linear(100, input_size)(nn_module_1)
local output = nn.CMulTable()({input, nn_module_2})
-- pack a graph into a convenient module with standard API (:forward(), :backward())
return nn.gModule({input}, {output})
end
input = torch.rand(30)
my_module = CreateModule(input:size(1))
output = my_module:forward(input)
criterion_err = torch.rand(output:size())
gradInput = my_module:backward(input, criterion_err)
print(gradInput)
UPDATE
As I said, gradInput of every nn.Module has to have exactly the same structure as its input. So, if you define your module as nn.gModule({input1, input2}, {score}), your gradOutput (the result of the backward pass) will be a table of gradients w.r.t. input1 and input2 which in your case are x and x_p.
The only question remains: why on Earth don't you get an error when call:
gradient1 = g:backward(x, error)
gradient2 = g:backward(x_p, error)
An exception must be raised because the first argument must be not a tensor but a table of two tensors. Well, most (perhaps all) of torch modules during calculating :backward(input, gradOutput) don't use input argument (they usually store a copy of input from the last :forward(input) call). In fact, this argument is so useless that modules don't even bother themselves to verify it.

Resources