Is there a way to train a neuronal network (LSTM model) with multiple datasets in order to do time series forecasting? - time-series

I want to do predictions with a LSTM model, but the dataset isn't a single file, it's composed with multiple files (for example 3 Excels).
I've already checked that if you want to deal with a time series forecasting problem you have to prepare your data like (number of samples, number of time steps, number of features) and it works well if I implement this for a single Excel.
The problem consists in training with the three Excels at the same time, in this case the input tensor for the LSTM layer has the shape: (n_files, n_samples, n_timesteps, n_features), with dim = 4. This doesn't work because LSTM layers only admits input tensors with dim = 3.
My files have the same amount of data. It's collected from a device and the data has 1 value for each minute along the duration of the experiment. All the experiments have the same duration too.
I've tried to concatenate the files in order to have 1, and choosing the batch_size as the number of samples in one Excel (because I can't mix the different experiments) but it doesn't produce a good result (at least as good as the results from predicting with 1 experiment).
def build_model():
model = keras.Sequential([
layers.Masking(mask_value = 0.0, input_shape=[1,1]),
layers.LSTM(50, return_sequences=True),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.Adam(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae','RootMeanSquaredError'])
return model
model_pred = build_model()
model_pred.fit(Xchopped_train, ychopped_train, batch_size = 252,
epochs=500, verbose=1)
Where Xchopped_train and ychopped_train are the concatenated data from the 3 Excel.
Another thing I've tried is to train the model within a loop, and changing the Excel:
for i in range(len(Xtrain)):
model_pred.fit(Xtrain[i], Ytrain[i], epochs=167, verbose=1)
Where Xtrain is (3,252,1,1) and the first index refers to the number of Excel.
And by far this is my best approach but it feels like this isn't a good solution since I don't know what's happening with the NN weights or which loss function is minimizing...
Is there a more efficient way to do this? Thanks!

Related

Is there a way to increase the variance of model's prediction?

I created a randomly generated(using numpy, between range 30 and 60) Data of about 12000 points (to
generate an artificial time-series data for more than a year in Time).
Now I am trying to fit that data points in an LSTM model and forecast
based upon that.
The LSTM model i applied,(here data is a single series so n_features = 1, and steps-in and out are for sequence-generation function for time-series, i took both equal to 5. Also the for the activation functions i tried all with both relu, both tanh and 1st tanh & 2nd relu (as shown here))
X, y = split_sequences(data, n_steps_in, n_steps_out)
n_features = X.shape[2]
model = Sequential()
model.add(LSTM(200, activation='tanh', input_shape=(n_steps_in,
n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))
opt = keras.optimizers.Adam(learning_rate=0.05)
model.compile(optimizer=opt, loss='mse')
model.fit(X, y, epochs= n, batch_size=10, verbose=1,
workers=4, use_multiprocessing = True, initial_epoch = 0)
I also tried smoothening of the data-points as they are randomly
distributed (in the predefined boundaries).
and then applied the model on the smoothed data, but still i am getting similar results.
for e.g., In this image showing both the smoothed-training data and the forecasted-prediction from the model
plt.plot(Training_data, 'g')
plt.plot(Pred_Forecasts,'r')
Every time the models are giving straight lines in prediction.
and which is obvious since it is a set of random numbers so model tends to get to a mean value between the upper and lower limits of the data, but still is there any way to generate a somewhat real looking model.
P.S-1 - I have also tried applying different models like prophet, sarima, arima.
But i think i need to find a way to increase the Variance of the prediction, which i am unable to find.
PS-2 - Sorry for the long question i am new to deep-learning so i tried to explain more.

PyTorch: Predicting future values with LSTM

I'm currently working on building an LSTM model to forecast time-series data using PyTorch. I used lag features to pass the previous n steps as inputs to train the network. I split the data into three sets, i.e., train-validation-test split, and used the first two to train the model. My validation function takes the data from the validation data set and calculates the predicted valued by passing it to the LSTM model using DataLoaders and TensorDataset classes. Initially, I've got pretty good results with R2 values in the region of 0.85-0.95.
However, I have an uneasy feeling about whether this validation function is also suitable for testing my model's performance. Because the function now takes the actual X values, i.e., time-lag features, from the DataLoader to predict y^ values, i.e., predicted target values, instead of using the predicted y^ values as features in the next prediction. This situation seems far from reality where the model has no clue of the real values of the previous time steps, especially if you forecast time-series data for longer time periods, say 3-6 months.
I'm currently a bit puzzled about tackling this issue and defining a function to predict future values relying on the model's values rather than the actual values in the test set. I have the following function predict, which makes a one-step prediction, but I haven't really figured out how to predict the whole test dataset using DataLoader.
def predict(self, x):
# convert row to data
x = x.to(device)
# make prediction
yhat = self.model(x)
# retrieve numpy array
yhat = yhat.to(device).detach().numpy()
return yhat
You can find how I split and load my datasets, my constructor for the LSTM model, and the validation function below. If you need more information, please do not hesitate to reach out to me.
Splitting and Loading Datasets
def create_tensor_datasets(X_train_arr, X_val_arr, X_test_arr, y_train_arr, y_val_arr, y_test_arr):
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
test_features = torch.Tensor(X_test_arr)
test_targets = torch.Tensor(y_test_arr)
train = TensorDataset(train_features, train_targets)
val = TensorDataset(val_features, val_targets)
test = TensorDataset(test_features, test_targets)
return train, val, test
def load_tensor_datasets(train, val, test, batch_size=64, shuffle=False, drop_last=True):
train_loader = DataLoader(train, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
val_loader = DataLoader(val, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
test_loader = DataLoader(test, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
return train_loader, val_loader, test_loader
Class LSTM
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
super(LSTMModel, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
self.lstm = nn.LSTM(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x, future=False):
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
out = out[:, -1, :]
out = self.fc(out)
return out
Validation (defined within a trainer class)
def validation(self, val_loader, batch_size, n_features):
with torch.no_grad():
predictions = []
values = []
for x_val, y_val in val_loader:
x_val = x_val.view([batch_size, -1, n_features]).to(device)
y_val = y_val.to(device)
self.model.eval()
yhat = self.model(x_val)
predictions.append(yhat.cpu().detach().numpy())
values.append(y_val.cpu().detach().numpy())
return predictions, values
I've finally found a way to forecast values based on predicted values from the earlier observations. As expected, the predictions were rather accurate in the short-term, slightly becoming worse in the long term. It is not so surprising that the future predictions digress over time, as they no longer depend on the actual values. Reflecting on my results and the discussions I had on the topic, here are my take-aways:
In real-life cases, the real values can be retrieved and fed into the model at each step of the prediction -be it weekly, daily, or hourly- so that the next step can be predicted with the actual values from the previous step. So, testing the performance based on the actual values from the test set may somewhat reflect the real performance of the model that is maintained regularly.
However, for predicting future values in the long term, forecasting, if you will, you need to make either multiple one-step predictions or multi-step predictions that span over the time period you wish to forecast.
Making multiple one-step predictions based on the values predicted the model yields plausible results in the short term. As the forecasting period increases, the predictions become less accurate and therefore less fit for the purpose of forecasting.
To make multiple one-step predictions and update the input after each prediction, we have to work our way through the dataset one by one, as if we are going through a for-loop over the test set. Not surprisingly, this makes us lose all the computational advantages that matrix operations and mini-batch training provide us.
An alternative could be predicting sequences of values, instead of predicting the next value only, say using RNNs with multi-dimensional output with many-to-many or seq-to-seq structure. They are likely to be more difficult to train and less flexible to make predictions for different time periods. An encoder-decoder structure may prove useful for solving this, though I have not implemented it by myself.
You can find the code for my function that forecasts the next n_steps based on the last row of the dataset X (time-lag features) and y (target value). To iterate over each row in my dataset, I would set batch_size to 1 and n_features to the number of lagged observations.
def forecast(self, X, y, batch_size=1, n_features=1, n_steps=100):
predictions = []
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = y.item(0)
with torch.no_grad():
self.model.eval()
for _ in range(n_steps):
X = X.view([batch_size, -1, n_features]).to(device)
yhat = self.model(X)
yhat = yhat.to(device).detach().numpy()
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = yhat.item(0)
predictions.append(yhat)
return predictions
The following line shifts values in the second dimension of the tensor by one so that a tensor [[[x1, x2, x3, ... , xn ]]] becomes [[[xn, x1, x2, ... , x(n-1)]]].
X = torch.roll(X, shifts=1, dims=2)
And, the line below selects the first element from the last dimension of the 3d tensor and sets that item to the predicted value stored in the NumPy ndarray (yhat), [[xn+1]]. Then, the new input tensor becomes [[[x(n+1), x1, x2, ... , x(n-1)]]]
X[..., -1, 0] = yhat.item(0)
Recently, I've decided to put together the things I had learned and the things I would have liked to know earlier. If you'd like to have a look, you can find the links down below. I hope you'll find it useful. Feel free to comment or reach out to me if you agree or disagree with any of the remarks I made above.
Building RNN, LSTM, and GRU for time series using PyTorch
Predicting future values with RNN, LSTM, and GRU using PyTorch

Non-linear multivariate time-series response prediction using RNN

I am trying to predict the hygrothermal response of a wall, given the interior and exterior climate. Based on literature research, I believe this should be possible with RNN but I have not been able to get good accuracy.
The dataset has 12 input features (time-series of exterior and interior climate data) and 10 output features (time-series of hygrothermal response), both containing hourly values for 10 years. This data was created with hygrothermal simulation software, there is no missing data.
Dataset features:
Dataset targets:
Unlike most time-series prediction problems, I want to predict the response for the full length of the input features time-series at each time-step, rather than the subsequent values of a time-series (eg financial time-series prediction). I have not been able to find similar prediction problems (in similar or other fields), so if you know of one, references are very welcome.
I think this should be possible with RNN, so I am currently using LSTM from Keras. Before training, I preprocess my data the following way:
Discard first year of data, as the first time steps of the hygrothermal response of the wall is influenced by the initial temperature and relative humidity.
Split into training and testing set. Training set contains the first 8 years of data, the test set contains the remaining 2 years.
Normalise training set (zero mean, unit variance) using StandardScaler from Sklearn. Normalise test set analogously using mean an variance from training set.
This results in: X_train.shape = (1, 61320, 12), y_train.shape = (1, 61320, 10), X_test.shape = (1, 17520, 12), y_test.shape = (1, 17520, 10)
As these are long time-series, I use stateful LSTM and cut the time-series as explained here, using the stateful_cut() function. I only have 1 sample, so batch_size is 1. For T_after_cut I have tried 24 and 120 (24*5); 24 appears to give better results. This results in X_train.shape = (2555, 24, 12), y_train.shape = (2555, 24, 10), X_test.shape = (730, 24, 12), y_test.shape = (730, 24, 10).
Next, I build and train the LSTM model as follows:
model = Sequential()
model.add(LSTM(128,
batch_input_shape=(batch_size,T_after_cut,features),
return_sequences=True,
stateful=True,
))
model.addTimeDistributed(Dense(targets)))
model.compile(loss='mean_squared_error', optimizer=Adam())
model.fit(X_train, y_train, epochs=100, batch_size=batch=batch_size, verbose=2, shuffle=False)
Unfortunately, I don't get accurate prediction results; not even for the training set, thus the model has high bias.
The prediction results of the LSTM model for all targets
How can I improve my model? I have already tried the following:
Not discarding the first year of the dataset -> no significant difference
Differentiating the input features time-series (subtract previous value from current value) -> slightly worse results
Up to four stacked LSTM layers, all with the same hyperparameters -> no significant difference in results but longer training time
Dropout layer after LSTM layer (though this is usually used to reduce variance and my model has high bias) -> slightly better results, but difference might not be statistically significant
Am I doing something wrong with the stateful LSTM? Do I need to try different RNN models? Should I preprocess the data differently?
Furthermore, training is very slow: about 4 hours for the model above. Hence I am reluctant to do an extensive hyperparameter gridsearch...
In the end, I managed to solve this the following way:
Using more samples to train instead of only 1 (I used 18 samples to train and 6 to test)
Keep the first year of data, as the output time-series for all samples have the same 'starting point' and the model needs this information to learn
Standardise both input and output features (zero mean, unit variance). I found this improved prediction accuracy and training speed
Use stateful LSTM as described here, but add reset states after epoch (see below for code). I used batch_size = 6 and T_after_cut = 1460. If T_after_cut is longer, training is slower; if T_after_cut is shorter, accuracy decreases slightly. If more samples are available, I think using a larger batch_size will be faster.
use CuDNNLSTM instead of LSTM, this speed up the training time x4!
I found that more units resulted in higher accuracy and faster convergence (shorter training time). Also I found that the GRU is as accurate as the LSTM tough converged faster for the same number of units.
Monitor validation loss during training and use early stopping
The LSTM model is build and trained as follows:
def define_reset_states_batch(nb_cuts):
class ResetStatesCallback(Callback):
def __init__(self):
self.counter = 0
def on_batch_begin(self, batch, logs={}):
# reset states when nb_cuts batches are completed
if self.counter % nb_cuts == 0:
self.model.reset_states()
self.counter += 1
def on_epoch_end(self, epoch, logs={}):
# reset states after each epoch
self.model.reset_states()
return(ResetStatesCallback)
model = Sequential()
model.add(layers.CuDNNLSTM(256, batch_input_shape=(batch_size,T_after_cut ,features),
return_sequences=True,
stateful=True))
model.add(layers.TimeDistributed(layers.Dense(targets, activation='linear')))
optimizer = RMSprop(lr=0.002)
model.compile(loss='mean_squared_error', optimizer=optimizer)
earlyStopping = EarlyStopping(monitor='val_loss', min_delta=0.005, patience=15, verbose=1, mode='auto')
ResetStatesCallback = define_reset_states_batch(nb_cuts)
model.fit(X_dev, y_dev, epochs=n_epochs, batch_size=n_batch, verbose=1, shuffle=False, validation_data=(X_eval,y_eval), callbacks=[ResetStatesCallback(), earlyStopping])
This gave me very statisfying accuracy (R2 over 0.98):
This figure shows the temperature (left) and relative humidity (right) in the wall over 2 years (data not used in training), prediction in red and true output in black. The residuals show that the error is very small and that the LSTM learns to capture the long-term dependencies to predict the relative humidity.

Gensim word embedding training with initial values

I have a dataset with documents separated into different years, and my objective is to train an embedding model for each year's data, while at the same time, the same word appearing in different years will have similar vector representations. Like this: for word 'compute', its vector in year 1 is
[0.22, 0.33, 0.20]
and in year 2 it's something around:
[0.20, 0.35, 0.18]
Is there a way to accomplish this? For example, train the model of year 2 with both initial values (if the word is trained already in year 1, modify its vector) and randomness (if this is a new word for the corpus).
I think the easiest solution is to save the embeddings after training on the first data set, then load the trained model and continue training for the second data set. This way you shouldn't expect the embeddings to drift away from the saved state much (unless your data sets are very different).
It would also make sense to create a single vocabulary from all documents: vocabulary words that aren't present in a particular document will get some random representation, but still it will be a working word2vec model.
Example from the documentation:
>>> model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4)
>>> model.save(fname)
>>> model = Word2Vec.load(fname) # continue training with the loaded model

neural network produces similar pattern for all inputs

I am attempting to train an ANN on time series data in Keras. I have three vectors of data that are broken into scrolling window sequences (i.e. for vector l).
np.array([l[i:i+window_size] for i in range( len(l) - window_size)])
The target vector is similarly windowed so the neural net output is a prediction of the target vector for the next window_size number of time steps. All the data is normalized with a min-max scaler. It is fed into the neural network as a shape=(nb_samples, window_size, 3). Here is a plot of the 3 input vectors.
The only output I've managed to muster from the ANN is the following plot. Target vector in blue, predictions in red (plot is zoomed in to make the prediction pattern legible). Prediction vectors are plotted at window_size intervals so each one of the repeated patterns is one prediction from the net.
I've tried many different model architectures, number of epochs, activation functions, short and fat networks, skinny, tall. This is my current one (it's a little out there).
Conv1D(64,4, input_shape=(None,3)) ->
Conv1d(32,4) ->
Dropout(24) ->
LSTM(32) ->
Dense(window_size)
But nothing I try will affect the neural net from outputting this repeated pattern. I must be misunderstanding something about time-series or LSTMs in Keras. But I'm very lost at this point so any help is greatly appreciated. I've attached the full code at this repository.
https://github.com/jaybutera/dat-toy
I played with your code a little and I think I have a few suggestions for getting you on the right track. The code doesn't seem to match your graphs exactly, but I assume you've tweaked it a bit since then. Anyway, there are two main problems:
The biggest problem is in your data preparation step. You basically have the data shapes backwards, in that you have a single timestep of input for X and a timeseries for Y. Your input shape is (18830, 1, 8), when what you really want is (18830, 30, 8) so that the full 30 timesteps are fed into the LSTM. Otherwise the LSTM is only operating on one timestep and isn't really useful. To fix this, I changed the line in common.py from
X = X.reshape(X.shape[0], 1, X.shape[1])
to
X = windowfy(X, winsize)
Similarly, the output data should probably be only 1 value, from what I've gathered of your goals from the plotting function. There are certainly some situations where you want to predict a whole timeseries, but I don't know if that's what you want in this case. I changed Y_train to use fuels instead of fuels_w so that it only had to predict one step of the timeseries.
Training for 100 epochs might be way too much for this simple network architecture. In some cases when I ran it, it looked like there was some overfitting going on. Observing the decrease of loss in the network, it seems like maybe only 3-4 epochs are needed.
Here is the graph of predictions after 3 training epochs with the adjustments I mentioned. It's not a great prediction, but it looks like it's on the right track now at least. Good luck to you!
EDIT: Example predicting multiple output timesteps:
from sklearn import datasets, preprocessing
import numpy as np
from scipy import stats
from keras import models, layers
INPUT_WINDOW = 10
OUTPUT_WINDOW = 5 # Predict 5 steps of the output variable.
# Randomly generate some regression data (not true sequential data; samples are independent).
np.random.seed(11798)
X, y = datasets.make_regression(n_samples=1000, n_features=4, noise=.1)
# Rescale 0-1 and convert into windowed sequences.
X = preprocessing.MinMaxScaler().fit_transform(X)
y = preprocessing.MinMaxScaler().fit_transform(y.reshape(-1, 1))
X = np.array([X[i:i + INPUT_WINDOW] for i in range(len(X) - INPUT_WINDOW)])
y = np.array([y[i:i + OUTPUT_WINDOW] for i in range(INPUT_WINDOW - OUTPUT_WINDOW,
len(y) - OUTPUT_WINDOW)])
print(np.shape(X)) # (990, 10, 4) - Ten timesteps of four features
print(np.shape(y)) # (990, 5, 1) - Five timesteps of one features
# Construct a simple model predicting output sequences.
m = models.Sequential()
m.add(layers.LSTM(20, activation='relu', return_sequences=True, input_shape=(INPUT_WINDOW, 4)))
m.add(layers.LSTM(20, activation='relu'))
m.add(layers.RepeatVector(OUTPUT_WINDOW))
m.add(layers.LSTM(20, activation='relu', return_sequences=True))
m.add(layers.wrappers.TimeDistributed(layers.Dense(1, activation='sigmoid')))
print(m.summary())
m.compile(optimizer='adam', loss='mse')
m.fit(X[:800], y[:800], batch_size=10, epochs=60) # Train on first 800 sequences.
preds = m.predict(X[800:], batch_size=10) # Predict the remaining sequences.
print('Prediction:\n' + str(preds[0]))
print('Actual:\n' + str(y[800]))
# Correlation should be around r = .98, essentially perfect.
print('Correlation: ' + str(stats.pearsonr(y[800:].flatten(), preds.flatten())[0]))

Resources