Finding "look_back" & "look_ahead" hyper-parameters for Seq2Seq models - time-series

For Seq2Seq deep learning architectures, viz., LSTM/GRU and multivariate, multistep time series forecasting, its important to convert the data to a 3D dimension: (batch_size, look_back, number_features). Here look_back decides the number of past data points/samples to consider using number_features from your training dataset. Similarly, look_ahead needs to be defined which defines the number of steps in future, you want your model to forecast for.
I have a written a function to help achieve this:
def split_series_multivariate(data, n_past, n_future):
'''
Create training and testing splits required by Seq2Seq
architecture(s) for multivariate, multistep and multivariate
output time-series modeling.
'''
X, y = list(), list()
for window_start in range(len(data)):
past_end = window_start + n_past
future_end = past_end + n_future
if future_end > len(data):
break
# slice past and future parts of window-
past, future = data[window_start: past_end, :], data[past_end: future_end, :]
# past, future = data[window_start: past_end, :], data[past_end: future_end, 4]
X.append(past)
y.append(future)
return np.array(X), np.array(y)
But, look_back and look_ahead are hyper-parameters which need to be tuned for a given dataset.
# Define hyper-parameters for Seq2Seq modeling:
# look-back window size-
n_past = 30
# number of future steps to predict for-
n_future = 10
# number of features used-
n_features = 8
What is the best practice for choosing/finding look_back and look_ahead hyper-parameters?

Related

LSTM time series forecast for multiple time series' ( single model vs individual model performance)

I would like to use deep(stacked) lstm model to forecast the disk utilisation for all my clusters. But what I am experiencing is my individual model representing a single clusters- time series is giving more accurate performance, when compared to a single model representing all the time series'. Single model is a kind of averaging out the forecast though I am including the cluster-id as part of a feature passed to the model.
My training sequence logic goes as given below, where my time series
has [ 'utilisation','clusterID'] as two features.
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out-1
# check if we are beyond the dataset
if out_end_ix > len(sequences):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix-1:out_end_ix, 0]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)```
- **Model**
trainX = trainX.reshape((trainX.shape[0], trainX.shape[1],features))
trainY = trainY.reshape((trainY.shape[0], trainY.shape[1]))
input = Input(batch_input_shape=(batch_size,look_back,features), name='input', dtype='float32')
optimizer = keras.optimizers.RMSprop(learning_rate=0.0001, rho=0.9, epsilon=None, decay=0.0)
model = Sequential()
model.add(Bidirectional(LSTM(800, return_sequences=True,return_state=False,input_shape=(look_back, features),stateful=stateful)))
model.add((LSTM(800, return_sequences=True,return_state=False,stateful=stateful)))
model.add(attention(return_sequences=False))
model.add(Dense(units=out_num))
model.compile(metrics=[rmse],run_eagerly=True,loss='mean_squared_error', optimizer='adam')
history = model.fit(trainX, trainY, batch_size=5000, epochs=100, verbose=2, validation_split=0.1)
- **and prediction logic as follows**
testX, testY = create_dataset(predictList.values,look_back,out_num)
reshaped_test=np.reshape(testX[0],(1,look_back,2))
futureStepPredict = model.predict(reshaped_test)
I was expecting an LSTM model can be trained on multiple timeseries at one time itself and can give individual forecast with almost the same accuracy as if it was considered as a single model based on the cluster-id provided as input, is this a wrong expectation?

PyTorch: Predicting future values with LSTM

I'm currently working on building an LSTM model to forecast time-series data using PyTorch. I used lag features to pass the previous n steps as inputs to train the network. I split the data into three sets, i.e., train-validation-test split, and used the first two to train the model. My validation function takes the data from the validation data set and calculates the predicted valued by passing it to the LSTM model using DataLoaders and TensorDataset classes. Initially, I've got pretty good results with R2 values in the region of 0.85-0.95.
However, I have an uneasy feeling about whether this validation function is also suitable for testing my model's performance. Because the function now takes the actual X values, i.e., time-lag features, from the DataLoader to predict y^ values, i.e., predicted target values, instead of using the predicted y^ values as features in the next prediction. This situation seems far from reality where the model has no clue of the real values of the previous time steps, especially if you forecast time-series data for longer time periods, say 3-6 months.
I'm currently a bit puzzled about tackling this issue and defining a function to predict future values relying on the model's values rather than the actual values in the test set. I have the following function predict, which makes a one-step prediction, but I haven't really figured out how to predict the whole test dataset using DataLoader.
def predict(self, x):
# convert row to data
x = x.to(device)
# make prediction
yhat = self.model(x)
# retrieve numpy array
yhat = yhat.to(device).detach().numpy()
return yhat
You can find how I split and load my datasets, my constructor for the LSTM model, and the validation function below. If you need more information, please do not hesitate to reach out to me.
Splitting and Loading Datasets
def create_tensor_datasets(X_train_arr, X_val_arr, X_test_arr, y_train_arr, y_val_arr, y_test_arr):
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
test_features = torch.Tensor(X_test_arr)
test_targets = torch.Tensor(y_test_arr)
train = TensorDataset(train_features, train_targets)
val = TensorDataset(val_features, val_targets)
test = TensorDataset(test_features, test_targets)
return train, val, test
def load_tensor_datasets(train, val, test, batch_size=64, shuffle=False, drop_last=True):
train_loader = DataLoader(train, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
val_loader = DataLoader(val, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
test_loader = DataLoader(test, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
return train_loader, val_loader, test_loader
Class LSTM
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
super(LSTMModel, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
self.lstm = nn.LSTM(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x, future=False):
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
out = out[:, -1, :]
out = self.fc(out)
return out
Validation (defined within a trainer class)
def validation(self, val_loader, batch_size, n_features):
with torch.no_grad():
predictions = []
values = []
for x_val, y_val in val_loader:
x_val = x_val.view([batch_size, -1, n_features]).to(device)
y_val = y_val.to(device)
self.model.eval()
yhat = self.model(x_val)
predictions.append(yhat.cpu().detach().numpy())
values.append(y_val.cpu().detach().numpy())
return predictions, values
I've finally found a way to forecast values based on predicted values from the earlier observations. As expected, the predictions were rather accurate in the short-term, slightly becoming worse in the long term. It is not so surprising that the future predictions digress over time, as they no longer depend on the actual values. Reflecting on my results and the discussions I had on the topic, here are my take-aways:
In real-life cases, the real values can be retrieved and fed into the model at each step of the prediction -be it weekly, daily, or hourly- so that the next step can be predicted with the actual values from the previous step. So, testing the performance based on the actual values from the test set may somewhat reflect the real performance of the model that is maintained regularly.
However, for predicting future values in the long term, forecasting, if you will, you need to make either multiple one-step predictions or multi-step predictions that span over the time period you wish to forecast.
Making multiple one-step predictions based on the values predicted the model yields plausible results in the short term. As the forecasting period increases, the predictions become less accurate and therefore less fit for the purpose of forecasting.
To make multiple one-step predictions and update the input after each prediction, we have to work our way through the dataset one by one, as if we are going through a for-loop over the test set. Not surprisingly, this makes us lose all the computational advantages that matrix operations and mini-batch training provide us.
An alternative could be predicting sequences of values, instead of predicting the next value only, say using RNNs with multi-dimensional output with many-to-many or seq-to-seq structure. They are likely to be more difficult to train and less flexible to make predictions for different time periods. An encoder-decoder structure may prove useful for solving this, though I have not implemented it by myself.
You can find the code for my function that forecasts the next n_steps based on the last row of the dataset X (time-lag features) and y (target value). To iterate over each row in my dataset, I would set batch_size to 1 and n_features to the number of lagged observations.
def forecast(self, X, y, batch_size=1, n_features=1, n_steps=100):
predictions = []
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = y.item(0)
with torch.no_grad():
self.model.eval()
for _ in range(n_steps):
X = X.view([batch_size, -1, n_features]).to(device)
yhat = self.model(X)
yhat = yhat.to(device).detach().numpy()
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = yhat.item(0)
predictions.append(yhat)
return predictions
The following line shifts values in the second dimension of the tensor by one so that a tensor [[[x1, x2, x3, ... , xn ]]] becomes [[[xn, x1, x2, ... , x(n-1)]]].
X = torch.roll(X, shifts=1, dims=2)
And, the line below selects the first element from the last dimension of the 3d tensor and sets that item to the predicted value stored in the NumPy ndarray (yhat), [[xn+1]]. Then, the new input tensor becomes [[[x(n+1), x1, x2, ... , x(n-1)]]]
X[..., -1, 0] = yhat.item(0)
Recently, I've decided to put together the things I had learned and the things I would have liked to know earlier. If you'd like to have a look, you can find the links down below. I hope you'll find it useful. Feel free to comment or reach out to me if you agree or disagree with any of the remarks I made above.
Building RNN, LSTM, and GRU for time series using PyTorch
Predicting future values with RNN, LSTM, and GRU using PyTorch

Have RMSE in Random Survival Forest in R program

I should have RMSE in three model to compare them with each other to say which one is better than the others. My models which I should run are Survival decision tree , Random survival forest and Bagging. I have been running my models but in the end I only have some predict. I brought Random survival forest result in the following. What should I do to have RMSE?
library(survival)
library(randomForestSRC)
dataset<-data.frame(data)
dataset
n.sample=round(0.5*nrow(dataset))
dataset1=sample (1: nrow(dataset),n.sample)
train=data[dataset1,]
test= data[-dataset1 ,]
set.seed(1369)
rsf0=rfsrc(Surv(time,status)~.,train,importance=TRUE,forest=T,ensemble="oob",mtry=NULL,block.size=1,splitrule="logrank")
print(rsf0)
Results:
Sample size: 821
Number of deaths: 209
Number of trees: 1000
Forest terminal node size: 15
Average no. of terminal nodes: 38.62
No. of variables tried at each split: 4
Total no. of variables: 14
Resampling used to grow trees: swor
Resample size used to grow trees: 519
Analysis: RSF
Family: surv
Splitting rule: logrank random
Number of random split points: 10
Error rate: 36.15%
I think you slightly misunderstand what survival analysis models are usually used for. Normally we want to predict the distribution of the survival time and not the survival time itself. The RMSE can only be used when the actual survival time is predicted. In your example, the models you discuss make a distribution prediction.
So firstly I've cleaned up your code slightly and added an example dataset to make it reproducible:
library(survival)
library(randomForestSRC)
# use the rats dataset to make the example reproducible
dataset <- data.frame(survival::rats)
dataset$sex <- factor(dataset$sex)
# note that you need to set.seed before you use `sample`
set.seed(1369)
# again specifying train/test split but this time as two separate sets of integers
train = sample(nrow(dataset), 0.5 * nrow(dataset))
test = setdiff(seq(nrow(dataset)), train)
# train the random forest model on the training data
rsf0 = rfsrc(Surv(time,status)~., dataset[train, ], importance=TRUE, forest=T,
ensemble="oob", mtry=NULL, block.size=1, splitrule="logrank")
# now make predictions
predictions = predict(rsf0, newdata = dataset[-train, ])
# view the predicted survival probabilities
predictions$survival
With these probabilities, you have to make a decision about how to convert them to survival time predictions, and then you have to manually compute the RMSE after first removing all censored observations. Common conversions to survival time are to take the mean of the predicted individual distributions or the median.
As an alternative, and plugging my own package here, you could use {mlr3proba} which does this for you:
# load required packages
library(mlr3); library(mlr3proba);library(mlr3extralearners); library(mlr3pipelines)
# use the rats dataset to make the example reproducible
dataset <- data.frame(survival::rats)
dataset$sex <- factor(dataset$sex)
# note that you need to set.seed before you use `sample`
set.seed(1369)
# again specifying train/test split but this time as two separate sets of integers
train = sample(nrow(dataset), 0.5 * nrow(dataset))
test = setdiff(seq(nrow(dataset)), train)
# select the random forest model and use the `crankcompositor` to automatically
# create survival time predictions
learn = ppl("crankcompositor", lrn("surv.rfsrc"), response = TRUE, graph_learner = TRUE)
# create a task which stores your dataset
task = TaskSurv$new("data", backend = dataset, time = "time", event = "status")
# train your learner on training data
learn$train(task, row_ids = train)
# make predictions on test data
predictions = learn$predict(task, row_ids = test)
# view your survival time predictions
predictions$response
# calculate RMSE
predictions$score(msr("surv.rmse"))
This second option is more complicated if you're not used to R6, but I suspect that in your use-case it will benefit you as you can also compare multiple models at the same time with this.

neural network produces similar pattern for all inputs

I am attempting to train an ANN on time series data in Keras. I have three vectors of data that are broken into scrolling window sequences (i.e. for vector l).
np.array([l[i:i+window_size] for i in range( len(l) - window_size)])
The target vector is similarly windowed so the neural net output is a prediction of the target vector for the next window_size number of time steps. All the data is normalized with a min-max scaler. It is fed into the neural network as a shape=(nb_samples, window_size, 3). Here is a plot of the 3 input vectors.
The only output I've managed to muster from the ANN is the following plot. Target vector in blue, predictions in red (plot is zoomed in to make the prediction pattern legible). Prediction vectors are plotted at window_size intervals so each one of the repeated patterns is one prediction from the net.
I've tried many different model architectures, number of epochs, activation functions, short and fat networks, skinny, tall. This is my current one (it's a little out there).
Conv1D(64,4, input_shape=(None,3)) ->
Conv1d(32,4) ->
Dropout(24) ->
LSTM(32) ->
Dense(window_size)
But nothing I try will affect the neural net from outputting this repeated pattern. I must be misunderstanding something about time-series or LSTMs in Keras. But I'm very lost at this point so any help is greatly appreciated. I've attached the full code at this repository.
https://github.com/jaybutera/dat-toy
I played with your code a little and I think I have a few suggestions for getting you on the right track. The code doesn't seem to match your graphs exactly, but I assume you've tweaked it a bit since then. Anyway, there are two main problems:
The biggest problem is in your data preparation step. You basically have the data shapes backwards, in that you have a single timestep of input for X and a timeseries for Y. Your input shape is (18830, 1, 8), when what you really want is (18830, 30, 8) so that the full 30 timesteps are fed into the LSTM. Otherwise the LSTM is only operating on one timestep and isn't really useful. To fix this, I changed the line in common.py from
X = X.reshape(X.shape[0], 1, X.shape[1])
to
X = windowfy(X, winsize)
Similarly, the output data should probably be only 1 value, from what I've gathered of your goals from the plotting function. There are certainly some situations where you want to predict a whole timeseries, but I don't know if that's what you want in this case. I changed Y_train to use fuels instead of fuels_w so that it only had to predict one step of the timeseries.
Training for 100 epochs might be way too much for this simple network architecture. In some cases when I ran it, it looked like there was some overfitting going on. Observing the decrease of loss in the network, it seems like maybe only 3-4 epochs are needed.
Here is the graph of predictions after 3 training epochs with the adjustments I mentioned. It's not a great prediction, but it looks like it's on the right track now at least. Good luck to you!
EDIT: Example predicting multiple output timesteps:
from sklearn import datasets, preprocessing
import numpy as np
from scipy import stats
from keras import models, layers
INPUT_WINDOW = 10
OUTPUT_WINDOW = 5 # Predict 5 steps of the output variable.
# Randomly generate some regression data (not true sequential data; samples are independent).
np.random.seed(11798)
X, y = datasets.make_regression(n_samples=1000, n_features=4, noise=.1)
# Rescale 0-1 and convert into windowed sequences.
X = preprocessing.MinMaxScaler().fit_transform(X)
y = preprocessing.MinMaxScaler().fit_transform(y.reshape(-1, 1))
X = np.array([X[i:i + INPUT_WINDOW] for i in range(len(X) - INPUT_WINDOW)])
y = np.array([y[i:i + OUTPUT_WINDOW] for i in range(INPUT_WINDOW - OUTPUT_WINDOW,
len(y) - OUTPUT_WINDOW)])
print(np.shape(X)) # (990, 10, 4) - Ten timesteps of four features
print(np.shape(y)) # (990, 5, 1) - Five timesteps of one features
# Construct a simple model predicting output sequences.
m = models.Sequential()
m.add(layers.LSTM(20, activation='relu', return_sequences=True, input_shape=(INPUT_WINDOW, 4)))
m.add(layers.LSTM(20, activation='relu'))
m.add(layers.RepeatVector(OUTPUT_WINDOW))
m.add(layers.LSTM(20, activation='relu', return_sequences=True))
m.add(layers.wrappers.TimeDistributed(layers.Dense(1, activation='sigmoid')))
print(m.summary())
m.compile(optimizer='adam', loss='mse')
m.fit(X[:800], y[:800], batch_size=10, epochs=60) # Train on first 800 sequences.
preds = m.predict(X[800:], batch_size=10) # Predict the remaining sequences.
print('Prediction:\n' + str(preds[0]))
print('Actual:\n' + str(y[800]))
# Correlation should be around r = .98, essentially perfect.
print('Correlation: ' + str(stats.pearsonr(y[800:].flatten(), preds.flatten())[0]))

Feature Vectors in Radial Basis Function Network

I am trying to use RBFNN for point cloud to surface reconstruction but I couldn't understand what would be my feature vectors in RBFNN.
Can any one please help me to understand this one.
A goal to get to this:
From inputs like this:
An RBF network essentially involves fitting data with a linear combination of functions that obey a set of core properties -- chief among these is radial symmetry. The parameters of each of these functions is learned by incremental adjustment based on errors generated through repeated presentation of inputs.
If I understand (it's been a very long time since I used one of these networks), your question pertains to preprocessing of the data in the point cloud. I believe that each of the points in your point cloud should serve as one input. If I understand properly, the features are your three dimensions, and as such each point can already be considered a "feature vector."
You have other choices that remain, namely the number of radial basis neurons in your hidden layer, and the radial basis functions to use (a Gaussian is a popular first choice). The training of the network and the surface reconstruction can be done in a number of ways but I believe this is beyond the scope of the question.
I don't know if it will help, but here's a simple python implementation of an RBF network performing function approximation, with one-dimensional inputs:
import numpy as np
import matplotlib.pyplot as plt
def fit_me(x):
return (x-2) * (2*x+1) / (1+x**2)
def rbf(x, mu, sigma=1.5):
return np.exp( -(x-mu)**2 / (2*sigma**2));
# Core parameters including number of training
# and testing points, minimum and maximum x values
# for training and testing points, and the number
# of rbf (hidden) nodes to use
num_points = 100 # number of inputs (each 1D)
num_rbfs = 20.0 # number of centers
x_min = -5
x_max = 10
# Training data, evenly spaced points
x_train = np.linspace(x_min, x_max, num_points)
y_train = fit_me(x_train)
# Testing data, more evenly spaced points
x_test = np.linspace(x_min, x_max, num_points*3)
y_test = fit_me(x_test)
# Centers of each of the rbf nodes
centers = np.linspace(-5, 10, num_rbfs)
# Everything is in place to train the network
# and attempt to approximate the function 'fit_me'.
# Start by creating a matrix G in which each row
# corresponds to an x value within the domain and each
# column i contains the values of rbf_i(x).
center_cols, x_rows = np.meshgrid(centers, x_train)
G = rbf(center_cols, x_rows)
plt.plot(G)
plt.title('Radial Basis Functions')
plt.show()
# Simple training in this case: use pseudoinverse to get weights
weights = np.dot(np.linalg.pinv(G), y_train)
# To test, create meshgrid for test points
center_cols, x_rows = np.meshgrid(centers, x_test)
G_test = rbf(center_cols, x_rows)
# apply weights to G_test
y_predict = np.dot(G_test, weights)
plt.plot(y_predict)
plt.title('Predicted function')
plt.show()
error = y_predict - y_test
plt.plot(error)
plt.title('Function approximation error')
plt.show()
First, you can explore the way in which inputs are provided to the network and how the RBF nodes are used. This should extend to 2D inputs in a straightforward way, though training may get a bit more involved.
To do proper surface reconstruction you'll likely need a representation of the surface that is altogether different than the representation of the function that's learned here. Not sure how to take this last step.

Resources