why is my arima model giving NaNs in predictions? - time-series

I'm trying to predict weekly sales of a store from the famous walmart dataset - a simple time series prediction exercise. I fit the model and run predict command, but the predictions are all NaNs.
The data is stationary.
I've taken rolling mean over 10 weeks, detrended the sales numbers using this rolling mean, differenced the data.
Auto arima detects a ARIMA (1,0,2) model.
I fit the model and predict, but the predictions are all NaNs. Also, the predicted NaNs start with index of almost 10 years ago! Same happens for A sarimax model.
Please help me solve this issue!
Attaching my code below:
train = rmdetdiff.iloc[:110]['weekly_sales']
test = rmdetdiff.iloc[110:]['weekly_sales']
model1 = ARIMA(train, order=(1,0,2))
model1_fit = model1.fit()
start = len(train)
end = len(train)+len(test)-1
rmdetdiff['arima_pred'] = model1_fit.predict(start=start, end=end, dynamic=True)
rmdetdiff[['arima_pred','weekly_sales']].plot(legend=True)
Here is the plot after predictions: (all the blank space before the weekly_sales is created after the model1.fit command.)
shape of rmdetdiff is (133,1)

Related

Is there a way to train a neuronal network (LSTM model) with multiple datasets in order to do time series forecasting?

I want to do predictions with a LSTM model, but the dataset isn't a single file, it's composed with multiple files (for example 3 Excels).
I've already checked that if you want to deal with a time series forecasting problem you have to prepare your data like (number of samples, number of time steps, number of features) and it works well if I implement this for a single Excel.
The problem consists in training with the three Excels at the same time, in this case the input tensor for the LSTM layer has the shape: (n_files, n_samples, n_timesteps, n_features), with dim = 4. This doesn't work because LSTM layers only admits input tensors with dim = 3.
My files have the same amount of data. It's collected from a device and the data has 1 value for each minute along the duration of the experiment. All the experiments have the same duration too.
I've tried to concatenate the files in order to have 1, and choosing the batch_size as the number of samples in one Excel (because I can't mix the different experiments) but it doesn't produce a good result (at least as good as the results from predicting with 1 experiment).
def build_model():
model = keras.Sequential([
layers.Masking(mask_value = 0.0, input_shape=[1,1]),
layers.LSTM(50, return_sequences=True),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.Adam(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae','RootMeanSquaredError'])
return model
model_pred = build_model()
model_pred.fit(Xchopped_train, ychopped_train, batch_size = 252,
epochs=500, verbose=1)
Where Xchopped_train and ychopped_train are the concatenated data from the 3 Excel.
Another thing I've tried is to train the model within a loop, and changing the Excel:
for i in range(len(Xtrain)):
model_pred.fit(Xtrain[i], Ytrain[i], epochs=167, verbose=1)
Where Xtrain is (3,252,1,1) and the first index refers to the number of Excel.
And by far this is my best approach but it feels like this isn't a good solution since I don't know what's happening with the NN weights or which loss function is minimizing...
Is there a more efficient way to do this? Thanks!

Is this LSTM underfitting?

I am trying to create a model that predicts if it will rain in the next 5 days (multi-step) or not, so I dont need the precipitation value, just a "yes" or "no". I've been testing with some different tools/algorithms and I guess the big challenge here is dealing with the zero skewed data.
The dataset consists of hourly data that has columns such as precipitation, temperature, pressure, wind speed, humidity. It has around 1 milion rows. There is no requisite to use a multivariate approach.
Rain occurs mostly on months 1,2,3,11 and 12.
So I tried using a univariate LSTM on the data, and with hourly sample I had the best results. I used the following architecture:
model=Sequential()
model.add(LSTM(150,return_sequences=True,input_shape=(1,look_back)))
model.add(LSTM(50,return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(trainX, trainY, epochs=15, batch_size=4096, validation_data=(testX, testY), shuffle=False)
I'm using a lookback value of 24*60, which should mean 2 months.
Train/Validation Loss:
https://i.stack.imgur.com/CjDbR.png
Final result:
https://i.stack.imgur.com/p6SnD.png
So I read that this train/validation loss means the model is underfitting, is it? What could I do to prevent this?
Before using LSTM I tried using Prophet, which rendered really bad results and tried used autoarima, but it couldn't handle a yearly seasonality (365 days).
In case of underfitting what you can do is icreasing the learning rate, increasing training duration and number of training data.
It is also worth having some external metric such as the F1 score because loss isn't a good metrics for human evaluation.
Just looking at your example I would start with experimenting a bit with the loss function, it seems like your data is binary so it would be wiser to use a binary loss instead of a regression loss

Reproduce ARIMA Forecast (Pandas)

I am quite new to the ARIMA model, and I have a question on how to analyze the chart of the ACF (autocorrelaction function) according to the lag. Is it correct to take into account the ACF value of 0.5 which corresponds to about 450 lag and then set the arima model on these values?
This is my graph:
and this is my simple code for arima model:
import from statsmodels.tsa.arima_model import ARIMA
# fit model
model = ARIMA(df['valore'], order=(400,1,0))
model_fit = model.fit(disp=0)
print(model_fit.summary())
# plot residual errors
residuals = DataFrame(model_fit.resid)
residuals.plot()
pyplot.show()
residuals.plot(kind='kde')
pyplot.show()
print(residuals.describe())
Thanks!
P.S. my page in jupyter format and the data (csv) can be found at: github
In theory it is possible to include an order of 400 in an ARIMA model. In practice that value is astronomically high for an ARIMA model (Anything higher than 3 or 4 is considered unusual in an ARIMA model). I would double check your data and also double check how you are calculating the ACF.
Additionally the p order of the ARIMA(p,d,q) model is usually determined using the PACF, not the ACF. You use the ACF for determining q.

Non-linear multivariate time-series response prediction using RNN

I am trying to predict the hygrothermal response of a wall, given the interior and exterior climate. Based on literature research, I believe this should be possible with RNN but I have not been able to get good accuracy.
The dataset has 12 input features (time-series of exterior and interior climate data) and 10 output features (time-series of hygrothermal response), both containing hourly values for 10 years. This data was created with hygrothermal simulation software, there is no missing data.
Dataset features:
Dataset targets:
Unlike most time-series prediction problems, I want to predict the response for the full length of the input features time-series at each time-step, rather than the subsequent values of a time-series (eg financial time-series prediction). I have not been able to find similar prediction problems (in similar or other fields), so if you know of one, references are very welcome.
I think this should be possible with RNN, so I am currently using LSTM from Keras. Before training, I preprocess my data the following way:
Discard first year of data, as the first time steps of the hygrothermal response of the wall is influenced by the initial temperature and relative humidity.
Split into training and testing set. Training set contains the first 8 years of data, the test set contains the remaining 2 years.
Normalise training set (zero mean, unit variance) using StandardScaler from Sklearn. Normalise test set analogously using mean an variance from training set.
This results in: X_train.shape = (1, 61320, 12), y_train.shape = (1, 61320, 10), X_test.shape = (1, 17520, 12), y_test.shape = (1, 17520, 10)
As these are long time-series, I use stateful LSTM and cut the time-series as explained here, using the stateful_cut() function. I only have 1 sample, so batch_size is 1. For T_after_cut I have tried 24 and 120 (24*5); 24 appears to give better results. This results in X_train.shape = (2555, 24, 12), y_train.shape = (2555, 24, 10), X_test.shape = (730, 24, 12), y_test.shape = (730, 24, 10).
Next, I build and train the LSTM model as follows:
model = Sequential()
model.add(LSTM(128,
batch_input_shape=(batch_size,T_after_cut,features),
return_sequences=True,
stateful=True,
))
model.addTimeDistributed(Dense(targets)))
model.compile(loss='mean_squared_error', optimizer=Adam())
model.fit(X_train, y_train, epochs=100, batch_size=batch=batch_size, verbose=2, shuffle=False)
Unfortunately, I don't get accurate prediction results; not even for the training set, thus the model has high bias.
The prediction results of the LSTM model for all targets
How can I improve my model? I have already tried the following:
Not discarding the first year of the dataset -> no significant difference
Differentiating the input features time-series (subtract previous value from current value) -> slightly worse results
Up to four stacked LSTM layers, all with the same hyperparameters -> no significant difference in results but longer training time
Dropout layer after LSTM layer (though this is usually used to reduce variance and my model has high bias) -> slightly better results, but difference might not be statistically significant
Am I doing something wrong with the stateful LSTM? Do I need to try different RNN models? Should I preprocess the data differently?
Furthermore, training is very slow: about 4 hours for the model above. Hence I am reluctant to do an extensive hyperparameter gridsearch...
In the end, I managed to solve this the following way:
Using more samples to train instead of only 1 (I used 18 samples to train and 6 to test)
Keep the first year of data, as the output time-series for all samples have the same 'starting point' and the model needs this information to learn
Standardise both input and output features (zero mean, unit variance). I found this improved prediction accuracy and training speed
Use stateful LSTM as described here, but add reset states after epoch (see below for code). I used batch_size = 6 and T_after_cut = 1460. If T_after_cut is longer, training is slower; if T_after_cut is shorter, accuracy decreases slightly. If more samples are available, I think using a larger batch_size will be faster.
use CuDNNLSTM instead of LSTM, this speed up the training time x4!
I found that more units resulted in higher accuracy and faster convergence (shorter training time). Also I found that the GRU is as accurate as the LSTM tough converged faster for the same number of units.
Monitor validation loss during training and use early stopping
The LSTM model is build and trained as follows:
def define_reset_states_batch(nb_cuts):
class ResetStatesCallback(Callback):
def __init__(self):
self.counter = 0
def on_batch_begin(self, batch, logs={}):
# reset states when nb_cuts batches are completed
if self.counter % nb_cuts == 0:
self.model.reset_states()
self.counter += 1
def on_epoch_end(self, epoch, logs={}):
# reset states after each epoch
self.model.reset_states()
return(ResetStatesCallback)
model = Sequential()
model.add(layers.CuDNNLSTM(256, batch_input_shape=(batch_size,T_after_cut ,features),
return_sequences=True,
stateful=True))
model.add(layers.TimeDistributed(layers.Dense(targets, activation='linear')))
optimizer = RMSprop(lr=0.002)
model.compile(loss='mean_squared_error', optimizer=optimizer)
earlyStopping = EarlyStopping(monitor='val_loss', min_delta=0.005, patience=15, verbose=1, mode='auto')
ResetStatesCallback = define_reset_states_batch(nb_cuts)
model.fit(X_dev, y_dev, epochs=n_epochs, batch_size=n_batch, verbose=1, shuffle=False, validation_data=(X_eval,y_eval), callbacks=[ResetStatesCallback(), earlyStopping])
This gave me very statisfying accuracy (R2 over 0.98):
This figure shows the temperature (left) and relative humidity (right) in the wall over 2 years (data not used in training), prediction in red and true output in black. The residuals show that the error is very small and that the LSTM learns to capture the long-term dependencies to predict the relative humidity.

Time Series Ahead Prediction in Neural Network (N Point Ahead Prediction) Large Scale Iterative Training

(N=90) Point ahead Prediction using Neural Network:
I am trying to predict 3 minutes ahead i.e. 180 points ahead. Because I compressed my time series data as taking the mean of every 2 points as one, I have to predict (N=90) step-ahead prediction.
My time series data is given in seconds. The values are in between 30-90. They usually move from 30 to 90 and 90 to 30, as seen in the example below.
My data could be reach from: https://www.dropbox.com/s/uq4uix8067ti4i3/17HourTrace.mat
I am having trouble in implementing neural network to predict N points ahead. My only feature is previous time. I used elman recurrent neural network and also newff.
In my scenario I need to predict 90 points ahead. First how I separated my input and target data manually:
For Example:
data_in = [1,2,3,4,5,6,7,8,9,10]; //imagine 1:10 only defines the array index values.
N = 90; %predicted second ahead.
P(:, :) T(:) it could also be(2 theta time) P(:, :) T(:)
[1,2,3,4,5] [5+N] | [1,3,5,7,9] [9+N]
[2,3,4,5,6] [6+N] | [2,4,6,8,10] [10+N]
...
until it reaches to end of the data
I have 100 input points and 90 output points in Elman recurrent neural networks. What could be the most efficient hidden node size?
input_layer_size = 90;
NodeNum1 =90;
net = newelm(threshold,[NodeNum1 ,prediction_ahead],{'tansig', 'purelin'});
net.trainParam.lr = 0.1;
net.trainParam.goal = 1e-3;
//At the beginning of my training I filter it with kalman, normalization into range of [0,1] and after that I shuffled the data.
1) I won't able to train my complete data. First I tried to train complete M data which is around 900,000, which didn't gave me a solution.
2) Secondly I tried iteratively training. But in each iteration the new added data is merged with already trained data. After 20,000 trained data the accuracy start to decreases. First trained 1000 data perfectly fits in training. But after when I start iterativelt merge the new data and continue to training, the training accuracy drops very rapidly 90 to 20.
For example.
P = P_test(1:1000) T = T_test(1:1000) counter = 1;
while(1)
net = train(net,P,T, [], [] );%until it reaches to minimum error I train it.
[normTrainOutput] = sim(net,P, [], [] );
P = [ P P(counter*1000:counter*2000)]%iteratively new training portion of the data added.
counter = counter + 1; end
This approach is very slow and after a point it won't give any good resuts.
My third approach was iteratively training; It was similar to previous training but in each iteration, I do only train the 1000 portion of the data, without do any merging with previous trained data.For example when I train first 1000 data until it gets to minimum error which has >95% accuracy. After it has been trained, when I have done the same for the second 1000 portion of the data;it overwrites the weight and the predictor mainly behave as the latest train portion of the data.
> P = P_test(1:1000) T = T_test(1:1000) counter = 1;
while(1)
> net = train(net,P,T, [], [] ); % I did also use adapt()
> [normTrainOutput] = sim(net,P, [], [] );
>
> P = [ P(counter*1000:counter*2000)]%iteratively only 1000 portion of the data is added.
> counter = counter + 1;
end
Trained DATA: This figure is snapshot from my trained training set, blue line is the original time series and red line is the predicted values with trained neural network. The MSE is around 50.
Tested DATA: On the below picture, you can see my prediction for my testing data with the neural network, which is trained with 20,000 input points while keeping MSE error <50 for the training data set. It is able to catch few patterns but mostly I doesn't give the real good accuracy.
I wasn't able to successes any of this approaches. In each iteration I also observe that slight change on the alpha completely overwrites to already trained data and more focus onto the currently trained data portion.
I won't able to come up with a solution to this problem. In iterative training should I keep the learning rate small and number of epochs as small.
And I couldn't find an efficient way to predict 90 points ahead in time series. Any suggestions that what should I do to do in order to predict N points ahead, any tutorial or link for information.
What is the best way for iterative training? On my second approach when I reach 15 000 of trained data, training size starts suddenly to drop. Iteratively should I change the alpha on run time?
==========
Any suggestion or the things I am doing wrong would be very appreciated.
I also implemented recurrent neural network. But on training for large data I have faced with the same problems.Is it possible to do adaptive learning(online learning) in Recurrent Neural Networks for(newelm)? The weight won't update itself and I didn't see any improvement.
If yes, how it is possible, which functions should I use?
net = newelm(threshold,[6, 8, 90],{'tansig','tansig', 'purelin'});
net.trainFcn = 'trains';
batch_size = 10;
while(1)
net = train(net,Pt(:, k:k+batch_size ) , Tt(:, k:k+batch_size) );
end
Have a look at Echo State Networks (ESNs) or other forms of Reservoir Computing. They are perfect for time series prediction, very easy to use and converge fast. You don't need to worry about the structure of the network at all (every neuron in the mid-layer has random weights which do not change). You only learn the output weights.
If I understood the problem correctly, with Echo State Networks, I would just train the network to predict the next point AND 90 points ahead. This can be done by simply forcing the desired output in the output neurons and then performing ridge regression to learn the output weights.
When running the network after having trained it, at every step n, it would output the next point (n+1), which you would feed back to the network as input (to continue the iteration), and 90 points ahead (n+90), which you can do whatever you want with - i.e: you could also feed it back to the network so that it affects the next outputs.
Sorry if the answer is not very clear. It's hard to explain how reservoir computing works in a short answer, but if you just read the article in the link, you will find it very easy to understand the principles.
If you do decide to use ESNs, also read this paper to understand the most important property of ESNs and really know what you're doing.
EDIT: Depending on how "predictable" your system is, predicting 90 points ahead may still be very difficult. For example if you're trying to predict a chaotic system, noise would introduce very large errors if you're predicting far ahead.
use fuzzy logic using membership function to predict the future data. will be efficient method.

Resources