Multiple output MLP - machine-learning

I have an MLP model and my goal is to predict 2 variables(so ideally I should have 2 neurons in my final layer). However, the first variable (lets call it var1) has sub values(up to 12) and the second variable(var2) has just a single value. Does it make sense to have 13 neurons in my final layer and during back prop, I'll calculate 2 losses(one w.r.t to var1 and the second wrt var2), then sum them up?
For better intuition- my scenario is a bit complex so I'll use the house prediction analogy
Imagine we're trying to predict the price of houses and the number of rooms in each house(just for the sake of intuition)- we should have just 2 neurons in the final layer. However, lets say we want to be more specific(and we have enough data) to predict the prices of houses in 12 different states along side the number of rooms. So we'll have 13 neurons in our final layer(12 for the prices and 1 for # of rooms).
Does this architecture make sense?
Does it make sense then to compute our loss wrt to the 2 variables independently and sum them up?
Something like this
output_dims = 12
input_dims = 491776
mse_loss = nn.MSELoss()
model = MLP(input_dims, output_dims) #we get a tensor with 13 values
l1 = mse_loss(model[0][:-1], true_prices) #prices
l2 = mse_loss(model[0][-1], true_rooms) #number of rooms
loss = l1 + l2
optimizer.zero_grad()
loss.backward()
optimizer.step()

Related

MLP Time Series Forecasting Model not learning the correct distribution

The goal is to build a model that can rank the stocks based on their Target Return.
The dataset I am using is structured the following way:
Date
Stock_Id
Volume
Open
Close
High
Low
Target
2022-01-04
8341
103422
2734
2742
2755
2730
0.0007304601899196
The data is chronological, and all the stocks are grouped by day. There are only 2000 stocks.
Using the Open, High, Low, Close, and Volume features, I was able to generate about 65+ new features using talib library
The Neural Net takes in all 74 total features and is meant to predict the Target number of each stock. The Target represents the rate of change of the stock between t+2 and t+1.
What I've done:
I have normalized the data accross time. This means that I take all of the time domain data for any given StockId and use it to compute mean and std in order to normalize the data.
The dataloader indexes into the dataset by day. This acts as a mini batch since when getitem is called a (2000, 74) tensor is returned.
The train dataloader is set to shuffle, so the indexing is random.
The loss function is meant to optimize for the highest returns all while keeping the standard deviation of those returns in check.
Because of the non chronological nature of the dataloader and the fact that I need all outputs to compute mean and std, the training loop using optim.SGD but acts as Gradient Descent.
During training, the loss converges very well, but the predictions are very off.
After training, I use the test set to view the accuracy of the prediction. This is the distribution of the predictions. And this is the distribution of the true targets.
The range of the Targets percentages are mainly between -15 and 15. Where as the predictions have a smaller range, -0.07 to -0.05.
I have tried running the model using different learning rates, changing the hidden layer sizes in the network, multiplying the targets by 100 to represent them as percentages, changing the timeperiod argument in many of the talib features. But I always get a model that poorly represents the data.
More Info:
The data is from 1300 trading days with each day containing data for 2000 stocks.
I have adjusted the Open, Close, High, Low prices before normalizing.
I have tried different neural network sizes. Having input layer length 74, and output being 1, with any arbitrary number of layers and sizes in between.
This is what the training loop looks like.
loss_lst = []
for epoch in range(50): # loop over the dataset multiple times
optimizer.zero_grad()
running_error = torch.tensor(0.0)
running_return = []
running_loss = torch.tensor(0.0)
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels, date = data
inputs = inputs.cuda().squeeze()
labels = labels.cuda().squeeze()
# forward pass
outputs = net(inputs).squeeze()
error = error_criterion(outputs, labels) # Error between prediction and label
running_error += error.cpu()
day_return = return_criterion(outputs,labels).cpu()
running_return.append(day_return) # Array of daily returns that will be used to compute the std for the loss function.
del outputs
del inputs
del labels
avg_error = running_error / len(trainloader)
std_return = torch.stack(running_return).cuda().std()
loss = (mean_hp * avg_error) + (std_hp * std_return)
loss.backward()
optimizer.step()
running_loss += loss.item()
loss_lst.append(loss.item())
print(f'[{epoch + 1}] loss: {running_loss:.10f}')
# loss_lst.append(sum(running_loss)/len(trainloader))
running_loss = 0.0
print('Finished Training')
Let me know if any additional clarifications are needed.

PyTorch: Predicting future values with LSTM

I'm currently working on building an LSTM model to forecast time-series data using PyTorch. I used lag features to pass the previous n steps as inputs to train the network. I split the data into three sets, i.e., train-validation-test split, and used the first two to train the model. My validation function takes the data from the validation data set and calculates the predicted valued by passing it to the LSTM model using DataLoaders and TensorDataset classes. Initially, I've got pretty good results with R2 values in the region of 0.85-0.95.
However, I have an uneasy feeling about whether this validation function is also suitable for testing my model's performance. Because the function now takes the actual X values, i.e., time-lag features, from the DataLoader to predict y^ values, i.e., predicted target values, instead of using the predicted y^ values as features in the next prediction. This situation seems far from reality where the model has no clue of the real values of the previous time steps, especially if you forecast time-series data for longer time periods, say 3-6 months.
I'm currently a bit puzzled about tackling this issue and defining a function to predict future values relying on the model's values rather than the actual values in the test set. I have the following function predict, which makes a one-step prediction, but I haven't really figured out how to predict the whole test dataset using DataLoader.
def predict(self, x):
# convert row to data
x = x.to(device)
# make prediction
yhat = self.model(x)
# retrieve numpy array
yhat = yhat.to(device).detach().numpy()
return yhat
You can find how I split and load my datasets, my constructor for the LSTM model, and the validation function below. If you need more information, please do not hesitate to reach out to me.
Splitting and Loading Datasets
def create_tensor_datasets(X_train_arr, X_val_arr, X_test_arr, y_train_arr, y_val_arr, y_test_arr):
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
test_features = torch.Tensor(X_test_arr)
test_targets = torch.Tensor(y_test_arr)
train = TensorDataset(train_features, train_targets)
val = TensorDataset(val_features, val_targets)
test = TensorDataset(test_features, test_targets)
return train, val, test
def load_tensor_datasets(train, val, test, batch_size=64, shuffle=False, drop_last=True):
train_loader = DataLoader(train, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
val_loader = DataLoader(val, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
test_loader = DataLoader(test, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
return train_loader, val_loader, test_loader
Class LSTM
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
super(LSTMModel, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
self.lstm = nn.LSTM(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x, future=False):
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
out = out[:, -1, :]
out = self.fc(out)
return out
Validation (defined within a trainer class)
def validation(self, val_loader, batch_size, n_features):
with torch.no_grad():
predictions = []
values = []
for x_val, y_val in val_loader:
x_val = x_val.view([batch_size, -1, n_features]).to(device)
y_val = y_val.to(device)
self.model.eval()
yhat = self.model(x_val)
predictions.append(yhat.cpu().detach().numpy())
values.append(y_val.cpu().detach().numpy())
return predictions, values
I've finally found a way to forecast values based on predicted values from the earlier observations. As expected, the predictions were rather accurate in the short-term, slightly becoming worse in the long term. It is not so surprising that the future predictions digress over time, as they no longer depend on the actual values. Reflecting on my results and the discussions I had on the topic, here are my take-aways:
In real-life cases, the real values can be retrieved and fed into the model at each step of the prediction -be it weekly, daily, or hourly- so that the next step can be predicted with the actual values from the previous step. So, testing the performance based on the actual values from the test set may somewhat reflect the real performance of the model that is maintained regularly.
However, for predicting future values in the long term, forecasting, if you will, you need to make either multiple one-step predictions or multi-step predictions that span over the time period you wish to forecast.
Making multiple one-step predictions based on the values predicted the model yields plausible results in the short term. As the forecasting period increases, the predictions become less accurate and therefore less fit for the purpose of forecasting.
To make multiple one-step predictions and update the input after each prediction, we have to work our way through the dataset one by one, as if we are going through a for-loop over the test set. Not surprisingly, this makes us lose all the computational advantages that matrix operations and mini-batch training provide us.
An alternative could be predicting sequences of values, instead of predicting the next value only, say using RNNs with multi-dimensional output with many-to-many or seq-to-seq structure. They are likely to be more difficult to train and less flexible to make predictions for different time periods. An encoder-decoder structure may prove useful for solving this, though I have not implemented it by myself.
You can find the code for my function that forecasts the next n_steps based on the last row of the dataset X (time-lag features) and y (target value). To iterate over each row in my dataset, I would set batch_size to 1 and n_features to the number of lagged observations.
def forecast(self, X, y, batch_size=1, n_features=1, n_steps=100):
predictions = []
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = y.item(0)
with torch.no_grad():
self.model.eval()
for _ in range(n_steps):
X = X.view([batch_size, -1, n_features]).to(device)
yhat = self.model(X)
yhat = yhat.to(device).detach().numpy()
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = yhat.item(0)
predictions.append(yhat)
return predictions
The following line shifts values in the second dimension of the tensor by one so that a tensor [[[x1, x2, x3, ... , xn ]]] becomes [[[xn, x1, x2, ... , x(n-1)]]].
X = torch.roll(X, shifts=1, dims=2)
And, the line below selects the first element from the last dimension of the 3d tensor and sets that item to the predicted value stored in the NumPy ndarray (yhat), [[xn+1]]. Then, the new input tensor becomes [[[x(n+1), x1, x2, ... , x(n-1)]]]
X[..., -1, 0] = yhat.item(0)
Recently, I've decided to put together the things I had learned and the things I would have liked to know earlier. If you'd like to have a look, you can find the links down below. I hope you'll find it useful. Feel free to comment or reach out to me if you agree or disagree with any of the remarks I made above.
Building RNN, LSTM, and GRU for time series using PyTorch
Predicting future values with RNN, LSTM, and GRU using PyTorch

Predicting sequence of grid coordinates with PyTorch

I have a similar open question here on Cross Validated (though not implementation focused, which I intend this question to be, so I think they are both valid).
I'm working on a project that uses sensors to monitor a persons GPS location. The coordinates will then be converted to a simple-grid representation. What I want to try and do is after recording a users routes, train a neural network to predict the next coordinates, i.e. take the example below where a user repeats only two routes over time, Home->A and Home->B.
I want to train an RNN/LSTM with sequences of varying lengths e.g. (14,3), (13,3), (12,3), (11,3), (10,3), (9,3), (8,3), (7,3), (6,3), (5,3), (4,3), (3,3), (2,3), (1,3) and then also predict with sequences of varying lengths e.g. for this example route if I called
route = [(14,3), (13,3), (12,3), (11,3), (10,3)] //pseudocode
pred = model.predict(route)
pred should give me (9,3) (or ideally even a longer prediction e.g. ((9,3), (8,3), (7,3), (6,3), (5,3), (4,3), (3,3), (2,3), (1,3))
How do I feed such training sequences to the init and forward operations identified below?
self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
out, hidden = self.rnn(x, hidden)
Also, should the entire route be a tensor or each set of coordinates within the route a tensor?
I'm not very experienced with RNNs, but I'll give it a try.
A few things to pay attention to before we start:
1. Your data is not normalized.
2. The output prediction you want (even after normalization) is not bounded to [-1, 1] range and therefore you cannot have tanh or ReLU activations acting on the output predictions.
To address your problem, I propose a recurrent net that given a current state (2D coordinate) predicts the next state (2D coordinates). Note that since this is a recurrent net, there is also a hidden state associated with each location. At first, the hidden state is zero, but as the net sees more steps, it updates its hidden state.
I propose a simple net to address your problem. It has a single RNN layer with 8 hidden states, and a fully connected layer on to to output the prediction.
class MyRnn(nn.Module):
def __init__(self, in_d=2, out_d=2, hidden_d=8, num_hidden=1):
super(MyRnn, self).__init__()
self.rnn = nn.RNN(input_size=in_d, hidden_size=hidden_d, num_layers=num_hidden)
self.fc = nn.Linear(hidden_d, out_d)
def forward(self, x, h0):
r, h = self.rnn(x, h0)
y = self.fc(r) # no activation on the output
return y, h
You can use your two sequences as training data, each sequence is a tensor of shape Tx1x2 where T is the sequence length, and each entry is two dimensional (x-y).
To predict (during training):
rnn = MyRnn()
pred, out_h = rnn(seq[:-1, ...], torch.zeros(1, 1, 8)) # given time t predict t+1
err = criterion(pred, seq[1:, ...]) # compare prediction to t+1
Once the model is trained, you can show it first k steps and continue to predict the next steps:
rnn.eval()
with torch.no_grad():
pred, h = rnn(s[:k,...], torch.zeros(1, 1, 8, dtype=torch.float))
# pred[-1, ...] is the predicted next step
prev = pred[-1:, ...]
for j in range(k+1, s.shape[0]):
pred, h = rnn(prev, h) # note how we keep track of the hidden state of the model. it is no longer init to zero.
prev = pred
I put everything together in a colab notebook so you can play with it.
For simplicity, I ignored the data normalization here, but you can find it in the colab notebook.
What's next?
These types of predictions are prone to error accumulation. This should be addressed during training, by shifting the inputs from the ground truth "clean" sequences to the actual predicted sequences, so the model will be able to compensate for its errors.

How to predict result of sigmoid function in Machine Learning

I am taking Coursera Machine Learning course and I am kind of confused with the sigmoid function.
I implemented the sigmoid function like:
g = 1 ./ (1+e.^(-z));
and wrote a function to predict the result and it looks like
p = sigmoid(X*theta) >= 0.5
The questions says
"For a student with an Exam 1 score
of 45 and an Exam 2 score of 85, you should expect to see an admission
probability of 0.776"
But I am not sure how those two x values have to be plugged into the function that I made.
If the theta is 0.218, how does Exam scores of 45 and 85 give us the probability of 0.776? Can someone explain?
Thanks
The probability is given by the sigmoid function,
p = sigmoid(X*theta)
# Since there are two inputs, the model will have 2 weights and a bias.
p = sigmoid(0.45*w1+0.85*w2+b)
# The actual output is given by
y = 0.776
# Loss function
loss = (p-y)^2
# Find the weights by minimizing the loss function using gradient descent.

MLP giving inaccurate results

I tried to build a simple MLP with 2 hidden layers and 3 output classes.
What I have done in the model is:
Input images are 120x120 rgb images. Flattened size (3 * 120 * 120)
2 hidden layers of size 100.
Relu activation is used
Output layer has 3 neurons
Code
def model(input, weights, biases):
l_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
l_1 = tf.nn.relu(l_1)
l_2 = tf.add(tf.matmul(l_1, weights['h2']), biases['b2'])
l_2 = tf.nn.relu(l_2)
out = tf.matmul(l_2, weights['out']) + biases['out']
return out
Optimizer
pred = model(input_batch, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.GradientDescentOptimizer(rate).minimize(cost)
The model however does not work. The accuracy is only equal to that of a random model.
The example followed is this one:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py
You have a copy-paste typo in def model. First argument name is input while it is x on the next line.
Another trick to use when you suspect that model is not being trained is to run it on the same batch again and again. If implementation is correct and model is being trained it will soon learn that batch by heart yielding 100% accuracy. If it does not then it is an indicator that something is wrong in your implementation.

Resources