LSTM Predictions - machine-learning

I'm working on a LSTM model, I found some examples and I was confused about the output.
Here, I'm trying to predict the next 24 hours, should I put 1 or 24 on the Dense layer? is this section correct ?
I've been following this video
reg = Sequential()
reg.add(LSTM(units = 5, activation='relu', input_shape=(24,1)))
reg.add(Dense(24)) #Predicting the next 24h
Thank you.

A dense layer of 1, means you will get one output. So if you are predicting the next hour, you use 1 dense layer. However, keep in mind that if you want to predict the next 24 hours there are two ways to do that. You can iteratively predict 1 hour 24 times by feeding your new prediction into your next time sequence. Or you can predict 24 hours all at once by using a dense layer with 24 outputs.
Example
[1,2,3,4,5] is my sequence and I want to predict the 10th value.
I can predict the 6th value. Then shift my next time sequence so I end up with [2,3,4,5,6]. And keep doing that to predict the 7th, 8th, 9th, and 10th,
Alternatively, I can use [1,2,3,4,5] to try and predict [6,7,8,9,10] in one step.

Related

why is my arima model giving NaNs in predictions?

I'm trying to predict weekly sales of a store from the famous walmart dataset - a simple time series prediction exercise. I fit the model and run predict command, but the predictions are all NaNs.
The data is stationary.
I've taken rolling mean over 10 weeks, detrended the sales numbers using this rolling mean, differenced the data.
Auto arima detects a ARIMA (1,0,2) model.
I fit the model and predict, but the predictions are all NaNs. Also, the predicted NaNs start with index of almost 10 years ago! Same happens for A sarimax model.
Please help me solve this issue!
Attaching my code below:
train = rmdetdiff.iloc[:110]['weekly_sales']
test = rmdetdiff.iloc[110:]['weekly_sales']
model1 = ARIMA(train, order=(1,0,2))
model1_fit = model1.fit()
start = len(train)
end = len(train)+len(test)-1
rmdetdiff['arima_pred'] = model1_fit.predict(start=start, end=end, dynamic=True)
rmdetdiff[['arima_pred','weekly_sales']].plot(legend=True)
Here is the plot after predictions: (all the blank space before the weekly_sales is created after the model1.fit command.)
shape of rmdetdiff is (133,1)

Is this LSTM underfitting?

I am trying to create a model that predicts if it will rain in the next 5 days (multi-step) or not, so I dont need the precipitation value, just a "yes" or "no". I've been testing with some different tools/algorithms and I guess the big challenge here is dealing with the zero skewed data.
The dataset consists of hourly data that has columns such as precipitation, temperature, pressure, wind speed, humidity. It has around 1 milion rows. There is no requisite to use a multivariate approach.
Rain occurs mostly on months 1,2,3,11 and 12.
So I tried using a univariate LSTM on the data, and with hourly sample I had the best results. I used the following architecture:
model=Sequential()
model.add(LSTM(150,return_sequences=True,input_shape=(1,look_back)))
model.add(LSTM(50,return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(trainX, trainY, epochs=15, batch_size=4096, validation_data=(testX, testY), shuffle=False)
I'm using a lookback value of 24*60, which should mean 2 months.
Train/Validation Loss:
https://i.stack.imgur.com/CjDbR.png
Final result:
https://i.stack.imgur.com/p6SnD.png
So I read that this train/validation loss means the model is underfitting, is it? What could I do to prevent this?
Before using LSTM I tried using Prophet, which rendered really bad results and tried used autoarima, but it couldn't handle a yearly seasonality (365 days).
In case of underfitting what you can do is icreasing the learning rate, increasing training duration and number of training data.
It is also worth having some external metric such as the F1 score because loss isn't a good metrics for human evaluation.
Just looking at your example I would start with experimenting a bit with the loss function, it seems like your data is binary so it would be wiser to use a binary loss instead of a regression loss

Can someone explain batch-size and timestep for a regression model using RNN?

I am working on a regression model, which has 50 datapoints per hour. I am having a hard time deciding on the difference between batch size and time-step. From my understanding, batch size is used to decide how many datapoints do we want to consider before making a prediction. The larger the value, the longer it takes for the model to converge. If that is the case I am clear on definition of batch size. So if my model isn't taking very long, can I just use the maximum? Would that maximum be the test datasize?
How about timesteps then? For a model where you measure let's say temperature every minute till 30 hours, what would the timestep be?
I would appreciate it if someone who knows about regression using RNN could answer my doubts.
Given :
import numpy as np
x = np.array([[[1], [0], [1]]])
print(x.shape)
Output:
(1, 3, 1)
This is for m samples s timesteps with e measurements per timesteps.
(m, s, e)
In any case the number of data points is the size of the array so:
m * s * e
The number of data point per sample:
s * e
If you meaure temperature every second for an hour on one sample.
(1, 3600, 1)
If you measure let say temperature and humidity.
(1, 3600, 2)
Let say you do that simultaneously for 2 samples (in place A and B).
(2, 3600, 2)
Batch size is just not related at all.
For each epoch it just means how many samples you want to run at once. For 100 samples batch size 50 you have two weight update per epoch for instance.

Non-linear multivariate time-series response prediction using RNN

I am trying to predict the hygrothermal response of a wall, given the interior and exterior climate. Based on literature research, I believe this should be possible with RNN but I have not been able to get good accuracy.
The dataset has 12 input features (time-series of exterior and interior climate data) and 10 output features (time-series of hygrothermal response), both containing hourly values for 10 years. This data was created with hygrothermal simulation software, there is no missing data.
Dataset features:
Dataset targets:
Unlike most time-series prediction problems, I want to predict the response for the full length of the input features time-series at each time-step, rather than the subsequent values of a time-series (eg financial time-series prediction). I have not been able to find similar prediction problems (in similar or other fields), so if you know of one, references are very welcome.
I think this should be possible with RNN, so I am currently using LSTM from Keras. Before training, I preprocess my data the following way:
Discard first year of data, as the first time steps of the hygrothermal response of the wall is influenced by the initial temperature and relative humidity.
Split into training and testing set. Training set contains the first 8 years of data, the test set contains the remaining 2 years.
Normalise training set (zero mean, unit variance) using StandardScaler from Sklearn. Normalise test set analogously using mean an variance from training set.
This results in: X_train.shape = (1, 61320, 12), y_train.shape = (1, 61320, 10), X_test.shape = (1, 17520, 12), y_test.shape = (1, 17520, 10)
As these are long time-series, I use stateful LSTM and cut the time-series as explained here, using the stateful_cut() function. I only have 1 sample, so batch_size is 1. For T_after_cut I have tried 24 and 120 (24*5); 24 appears to give better results. This results in X_train.shape = (2555, 24, 12), y_train.shape = (2555, 24, 10), X_test.shape = (730, 24, 12), y_test.shape = (730, 24, 10).
Next, I build and train the LSTM model as follows:
model = Sequential()
model.add(LSTM(128,
batch_input_shape=(batch_size,T_after_cut,features),
return_sequences=True,
stateful=True,
))
model.addTimeDistributed(Dense(targets)))
model.compile(loss='mean_squared_error', optimizer=Adam())
model.fit(X_train, y_train, epochs=100, batch_size=batch=batch_size, verbose=2, shuffle=False)
Unfortunately, I don't get accurate prediction results; not even for the training set, thus the model has high bias.
The prediction results of the LSTM model for all targets
How can I improve my model? I have already tried the following:
Not discarding the first year of the dataset -> no significant difference
Differentiating the input features time-series (subtract previous value from current value) -> slightly worse results
Up to four stacked LSTM layers, all with the same hyperparameters -> no significant difference in results but longer training time
Dropout layer after LSTM layer (though this is usually used to reduce variance and my model has high bias) -> slightly better results, but difference might not be statistically significant
Am I doing something wrong with the stateful LSTM? Do I need to try different RNN models? Should I preprocess the data differently?
Furthermore, training is very slow: about 4 hours for the model above. Hence I am reluctant to do an extensive hyperparameter gridsearch...
In the end, I managed to solve this the following way:
Using more samples to train instead of only 1 (I used 18 samples to train and 6 to test)
Keep the first year of data, as the output time-series for all samples have the same 'starting point' and the model needs this information to learn
Standardise both input and output features (zero mean, unit variance). I found this improved prediction accuracy and training speed
Use stateful LSTM as described here, but add reset states after epoch (see below for code). I used batch_size = 6 and T_after_cut = 1460. If T_after_cut is longer, training is slower; if T_after_cut is shorter, accuracy decreases slightly. If more samples are available, I think using a larger batch_size will be faster.
use CuDNNLSTM instead of LSTM, this speed up the training time x4!
I found that more units resulted in higher accuracy and faster convergence (shorter training time). Also I found that the GRU is as accurate as the LSTM tough converged faster for the same number of units.
Monitor validation loss during training and use early stopping
The LSTM model is build and trained as follows:
def define_reset_states_batch(nb_cuts):
class ResetStatesCallback(Callback):
def __init__(self):
self.counter = 0
def on_batch_begin(self, batch, logs={}):
# reset states when nb_cuts batches are completed
if self.counter % nb_cuts == 0:
self.model.reset_states()
self.counter += 1
def on_epoch_end(self, epoch, logs={}):
# reset states after each epoch
self.model.reset_states()
return(ResetStatesCallback)
model = Sequential()
model.add(layers.CuDNNLSTM(256, batch_input_shape=(batch_size,T_after_cut ,features),
return_sequences=True,
stateful=True))
model.add(layers.TimeDistributed(layers.Dense(targets, activation='linear')))
optimizer = RMSprop(lr=0.002)
model.compile(loss='mean_squared_error', optimizer=optimizer)
earlyStopping = EarlyStopping(monitor='val_loss', min_delta=0.005, patience=15, verbose=1, mode='auto')
ResetStatesCallback = define_reset_states_batch(nb_cuts)
model.fit(X_dev, y_dev, epochs=n_epochs, batch_size=n_batch, verbose=1, shuffle=False, validation_data=(X_eval,y_eval), callbacks=[ResetStatesCallback(), earlyStopping])
This gave me very statisfying accuracy (R2 over 0.98):
This figure shows the temperature (left) and relative humidity (right) in the wall over 2 years (data not used in training), prediction in red and true output in black. The residuals show that the error is very small and that the LSTM learns to capture the long-term dependencies to predict the relative humidity.

Keras LSTM: Injecting already-known *future* values into prediction

I've built an LSTM In Keras with the goal of predicting future values of a time-series from a high-dimensional, time-index input.
However, there's a unique requirement: for certain time points in the future, we know with certainty what some values of the input series will be. For example:
model = SomeLSTM()
trained_model = model.train(train_data)
known_data = [(24, {feature: 2, val: 7.0}), (25, {feature: 2, val: 8.0})]
predictions = trained_model(look_ahead=48, known_data=known_data)
Which would train the model up to time t (the end of training), and predict forward 48 time periods from time t, but substituting known_data values for feature 2 at times 24 and 25.
How exactly can I explicitly inject this into the LSTM at some time?
For reference, here's the model:
model = Sequential()
model.add(LSTM(hidden, input_shape=(look_back, num_features)))
model.add(Dropout(dropout))
model.add(Dense(look_ahead))
model.add(Activation('linear'))
This may be a result of my un-intuitive grasp of LSTMs, and I'd appreciate any clarification. I've dived into the Keras source code, and my first guess is to inject it right into the LSTM state variable, but I'm unsure how to do that at time t (or even if that is correct.)
I think a clean way of doing this is to introduce 2*look_ahead new features, where for each 0 <= i < look_ahead 2*i-th feature is an indicator whether the value of the i-th time step is known and (2*i+1)-th is the value itself (0 if not known). Accordingly, you can generate training data with these features to make your model take into account these known values.
I am not exactly sure what you are trying to do, but maybe create your own layer to go at the end that sets the data to the known values, similar to how dropout sets random values to zero. As a side note, I have had better results with pooling than dropout, so maybe try switching that out and training it. Here is a good guide on how to do it. https://www.tutorialspoint.com/keras/keras_customized_layer.htm

Resources