Pytorch LSTM Prediction not learning

Pytorch LSTM Prediction not learning - machine-learning

I'm using a LSTM model to predict BABA stock price using this dataset: "/kaggle/input/price-volume-data-for-all-us-stocks-etfs/Data/Stocks/baba.us.txt".
I'm not sure why my model is not learning and the y_test_prediction is so different from the actual y_test. I really appreciate your help as I'm beginning to learn machine learning. Thank you!
I have scaled the data with minMaxScaler before splitting it. This is how I split the data:
x_train, y_train, x_test, y_test = [], [], [], []
lags = 3
for t in range(len(train_data)-lags-1):
x_train.append(train_data[t:(t+lags),:])
y_train.append(train_data[(t+lags),:])
for t in range(len(test_data)-lags-1):
x_test.append(test_data[t:(t+lags),:])
y_test.append(test_data[(t+lags),:])
x_train = torch.FloatTensor(np.array(x_train))
y_train = torch.FloatTensor(np.array(y_train))
x_test = torch.FloatTensor(np.array(x_test))
y_test = torch.FloatTensor(np.array(y_test))
x_train = np.reshape(x_train,(x_train.shape[0],x_train.shape[1],1))
x_test = np.reshape(x_test,(x_test.shape[0],x_test.shape[1],1))
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
This is my LSTM model:
input_dim = 1
hidden_layer_dim = 32
num_layers = 1
output_dim = 1
class LSTM(nn.Module):
def __init__(self, input_dim,hidden_layer_dim, num_layers, output_dim ):
super(LSTM, self).__init__()
self.input_dim = input_dim
self.hidden_layer_dim = hidden_layer_dim
self.num_layers = num_layers
self.output_dim = output_dim
self.lstm = nn.LSTM(input_dim, hidden_layer_dim,num_layers,batch_first = True)
self.fc = nn.Linear(hidden_layer_dim, output_dim)
def forward(self, x):
# initial hidden state & cell state as zeros
h0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_layer_dim))
c0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_layer_dim))
# lstm output with hidden and cell state
output, (hn, cn) = self.lstm(x, (h0,c0))
# get hidden state to be passed to dense layer
hn = hn.view(-1, self.hidden_layer_dim)
output = self.fc(hn)
return output
This is my training:
num_epochs = 100
learning_rate = 0.01
model = LSTM(input_dim,hidden_layer_dim, num_layers, output_dim)
loss = torch.nn.MSELoss() # mean-squared error for regression
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
hist = np.zeros(num_epochs)
# train model
for epoch in range(num_epochs):
outputs = model(x_train)
optimizer.zero_grad()
#get loss function
loss_fn = loss(outputs, y_train.view(1,-1))
hist[epoch] = loss_fn.item()
loss_fn.backward()
optimizer.step()
if epoch %10==0:
print("Epoch: %d, loss: %1.5f" % (epoch, hist[epoch]))
This is the training loss and prediction vs actual
training loss
prediction vs actual

You are initialising hidden layers every time forward is being called, which might cause errors with backprop. You do not even have to initialise them. PyTorch takes care of that for you. You can check this implementation for the details. Also, as a side note, you might want to take a look at PyTorch dataloaders(just an easier way to make splits).

Related

Why is my pytorch classification model not learning?

I have created a simple pytorch classification model with sample datasets generated using sklearns make_classification. Even after training for thousands of epochs the accuracy of the model hovers between 30 and 40 percentage. During training itself the loss value is fluctuating very far and wide. I am wondering why this model is not learning, whether it's due to some logical error in the code.
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X,y = make_classification(n_features=15,n_classes=5,n_informative=4)
DEVICE = torch.device('cuda')
epochs = 5000
class CustomDataset(Dataset):
def __init__(self,X,y):
self.X = torch.from_numpy(X)
self.y = torch.from_numpy(y)
def __len__(self):
return len(self.X)
def __getitem__(self, index):
X = self.X[index]
y = self.y[index]
return (X,y)
class Model(nn.Module):
def __init__(self):
super().__init__()
self.l1 = nn.Linear(15,10)
self.l2 = nn.Linear(10,5)
self.relu = nn.ReLU()
def forward(self,x):
x = self.l1(x)
x = self.relu(x)
x = self.l2(x)
x = self.relu(x)
return x
model = Model().double().to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
train_data = CustomDataset(X_train,y_train)
test_data = CustomDataset(X_test,y_test)
trainloader = DataLoader(train_data, batch_size=32, shuffle=True)
testloader = DataLoader(test_data, batch_size=32, shuffle=True)
for i in range(epochs):
for (x,y) in trainloader:
x = x.to(DEVICE)
y = y.to(DEVICE)
optimizer.zero_grad()
output = model(x)
loss = loss_function(output,y)
loss.backward()
optimizer.step()
if i%200==0:
print("epoch: ",i," Loss: ",loss.item())
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
for x, y in testloader:
# calculate outputs by running x through the network
outputs = model(x.to(DEVICE)).to(DEVICE)
# the class with the highest energy is what we choose as prediction
_, predicted = torch.max(outputs.data, 1)
total += y.size(0)
correct += (predicted == y.to(DEVICE)).sum().item()
print(f'Accuracy of the network on the test data: {100 * correct // total} %')
EDIT
I tried to over-fit my model with only 10 samples (batch_size=5) X,y = make_classification(n_samples=10,n_features=15,n_classes=5,n_informative=4) but now the accuracy decreased to 15-20%. I then normalize the input data between the values 0 and 1 which pushed the accuracy a bit higher but not over 50 percentage. Any idea why this might be happening?

You should not be using ReLU activation on your output layer. Usually softmax activation is used for multi class classification on the final layer, or the logits are fed to the loss function directly without explicitly adding a softmax activation layer.
Try removing the ReLU activation from the final layer.

Accuracy value goes up and down on the training process

After training the network I noticed that accuracy goes up and down. Initially I thought it is caused by the learning rate, but it is set to quite small value. Please check the screenshot attached.
Plot Accuracy Screenshot
My network (in Pytorch) looks as follow:
class Network(nn.Module):
def __init__(self):
super(Network,self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(3,16,kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(16,32, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.layer3 = nn.Sequential(
nn.Conv2d(32,64, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.fc1 = nn.Linear(17*17*64,512)
self.fc2 = nn.Linear(512,1)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self,x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = out.view(out.size(0),-1)
out = self.relu(self.fc1(out))
out = self.fc2(out)
out = torch.sigmoid(out)
return out
I am using RMSprop as optimizer and BCELoss as criterion. The learning rate is set to 0.001
Here is the training process:
epochs = 15
itr = 1
p_itr = 100
model.train()
total_loss = 0
loss_list = []
acc_list = []
for epoch in range(epochs):
for samples, labels in train_loader:
samples, labels = samples.to(device), labels.to(device)
optimizer.zero_grad()
output = model(samples)
labels = labels.unsqueeze(-1)
labels = labels.float()
loss = criterion(output, labels)
loss.backward()
optimizer.step()
total_loss += loss.item()
scheduler.step()
if itr%p_itr == 0:
pred = torch.argmax(output, dim=1)
correct = pred.eq(labels)
acc = torch.mean(correct.float())
print('[Epoch {}/{}] Iteration {} -> Train Loss: {:.4f}, Accuracy: {:.3f}'.format(epoch+1, epochs, itr, total_loss/p_itr, acc))
loss_list.append(total_loss/p_itr)
acc_list.append(acc)
total_loss = 0
itr += 1
My dataset is quite small - 2000 train and 1000 validation (binary classification 0/1). I wanted to do the 80/20 split but I was asked to keep it like that. I was thinking that the architecture might be too complex for such a small dataset.
Any hits what may cause such jumps in the training process?

Your code here is wrong: pred = torch.argmax(output, dim=1)
This line using for multiclass classification with Cross-Entropy Loss.
Your task is binary classification so the pred values are wrong. Change to:
if itr%p_itr == 0:
pred = torch.round(output)
....
You can change your optimizer to Adam, SGD, or RMSprop to find the suitable optimizer that helps your model coverage faster.
Also change the forward() function:
def forward(self,x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = out.view(out.size(0),-1)
out = self.relu(self.fc1(out))
out = self.fc2(out)
return self.sigmoid(out) #use your forward is ok, but this cleaner

tgt and src have to have equal features for a Transformer Network in Pytorch

I am attempting to train EEG data through a transformer network. The input dimensions are 50x16684x60 (seq x batch x features) and the output is 16684x2. Right now I am simply trying to run a basic transformer, and I keep getting an error telling me
RuntimeError: the feature number of src and tgt must be equal to d_model
Why would the source and target feature number ever be equal? Is it possible to run such a dataset through a transformer?
Here is my basic model:
input_size = 60 # seq x batch x features
hidden_size = 32
num_classes = 2
learning_rate = 0.001
batch_size = 64
num_epochs = 2
sequence_length = 50
num_layers = 2
dropout = 0.5
class Transformer(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(Transformer, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.transformer = nn.Transformer(60, 2)
self.fc = nn.Linear(hidden_size * sequence_length, num_classes)
def forward(self, x, y):
# Forward Propogation
out, _ = self.transformer(x,y)
out = out.reshape(out.shape[0], -1)
out = self.fc(out)
return out
model = Transformer(input_size, hidden_size, num_layers, num_classes)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
for index in tqdm(range(16684)):
X, y = (X_train[index], Y_train[index])
print(X.shape, y.shape)
output = model(X, y)
loss = criterion(output, y)
model.zero_grad()
loss.backward()
optimizer.step()
if index % 500 == 0:
print(f"Epoch {epoch}, Batch: {index}, Loss: {loss}")

You train the model to find some features by feeding it the input sequence and desired sequence. The backprop trains the net by computing the loss as a "difference" between src and target features.
If the features sizes aren't the same - the backprop can't find the accordance to some desired feature and the model can't be trained.

How to fix this loss is NaN problem in PyTorch of this RNN with GRU?

I'm completely new to PyTorch and tried out some models. I wanted to make an easy prediction rnn of stock market prices and found the following code:
I load the data set with pandas then split it into training and test data and load it into a pytorch DataLoader for later usage in training process. The model is defined in the GRU class. But the actual problem seems to be the optimisation. I think the problem could be gradient explosion. I thought about adding gradient clipping but the GRU design should actually prevent gradient explosion or am I wrong? What could cause the loss to be instantly NaN (already in the first epoch)
from sklearn.preprocessing import MinMaxScaler
import time
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
batch_size = 200
input_dim = 1
hidden_dim = 32
num_layers = 2
output_dim = 1
num_epochs = 10
nvda = pd.read_csv('dataset/stocks/NVDA.csv')
price = nvda[['Close']]
scaler = MinMaxScaler(feature_range=(-1, 1))
price['Close'] = scaler.fit_transform(price['Close'].values.reshape(-1, 1))
def split_data(stock, lookback):
data_raw = stock.to_numpy() # convert to numpy array
data = []
# create all possible sequences of length seq_len
for index in range(len(data_raw) - lookback):
data.append(data_raw[index: index + lookback])
data = np.array(data)
test_set_size = int(np.round(0.2 * data.shape[0]))
train_set_size = data.shape[0] - (test_set_size)
x_train = data[:train_set_size, :-1, :]
y_train = data[:train_set_size, -1, :]
x_test = data[train_set_size:, :-1]
y_test = data[train_set_size:, -1, :]
return [x_train, y_train, x_test, y_test]
lookback = 20 # choose sequence length
x_train, y_train, x_test, y_test = split_data(price, lookback)
train_data = TensorDataset(torch.from_numpy(x_train).float(), torch.from_numpy(y_train).float())
train_data = DataLoader(train_data, shuffle=True, batch_size=batch_size, drop_last=True)
test_data = TensorDataset(torch.from_numpy(x_test).float(), torch.from_numpy(y_test).float())
test_data = DataLoader(test_data, shuffle=True, batch_size=batch_size, drop_last=True)
class GRU(nn.Module):
def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
super(GRU, self).__init__()
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.gru = nn.GRU(input_dim, hidden_dim, num_layers, batch_first=True, dropout=0.2)
self.fc = nn.Linear(hidden_dim, output_dim)
self.relu = nn.ReLU()
def forward(self, x, h):
out, h = self.gru(x, h)
out = self.fc(self.relu(out[:, -1]))
return out, h
def init_hidden(self, batch_size):
weight = next(self.parameters()).data
hidden = weight.new(self.num_layers, batch_size, self.hidden_dim).zero_()
return hidden
model = GRU(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim, num_layers=num_layers)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0000000001)
model.train()
start_time = time.time()
h = model.init_hidden(batch_size)
for epoch in range(1, num_epochs+1):
for x, y in train_data:
h = h.data
model.zero_grad()
y_train_pred, h = model(x, h)
loss = criterion(y_train_pred, y)
print("Epoch ", epoch, "MSE: ", loss.item())
loss.backward()
optimizer.step()
training_time = time.time() - start_time
print("Training time: {}".format(training_time))
This is the dataset which I used.

Not sure if it is the case, but did you preprocess and cleaned the data? I do not know it but maybe there are some values missing or it's something strange about it. I checked it here
https://ca.finance.yahoo.com/quote/NVDA/history?p=NVDA and it seems that every couple of rows there is some inconsistency. Like I said, I do not know if it's the case but it may be.

Size Mismatch using pytorch when trying to train data

I am really new to pytorch and just trying to use my own dataset to do a simple Linear Regression Model. I am only using the numbers values as inputs, too.
I have imported the data from the CSV
dataset = pd.read_csv('mlb_games_overview.csv')
I have split the data into four parts X_train, X_test, y_train, y_test
X = dataset.drop(['date', 'team', 'runs', 'win'], 1)
y = dataset['win']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=True)
I have converted the data to pytorch tensors
X_train = torch.from_numpy(np.array(X_train))
X_test = torch.from_numpy(np.array(X_test))
y_train = torch.from_numpy(np.array(y_train))
y_test = torch.from_numpy(np.array(y_test))
I have created a LinearRegressionModel
class LinearRegressionModel(torch.nn.Module):
def __init__(self):
super(LinearRegressionModel, self).__init__()
self.linear = torch.nn.Linear(1, 1)
def forward(self, x):
y_pred = self.linear(x)
return y_pred
I have initialized the optimizer and the loss function
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
Now when I start to train the data I get the runtime error mismatch
EPOCHS = 500
for epoch in range(EPOCHS):
pred_y = model(X_train) # RUNTIME ERROR HERE
loss = criterion(pred_y, y_train)
optimizer.zero_grad() # zero out gradients to update parameters correctly
loss.backward() # backpropagation
optimizer.step() # update weights
print('epoch {}, loss {}'. format(epoch, loss.data[0]))
Error Log:
RuntimeError Traceback (most recent call last)
<ipython-input-40-c0474231d515> in <module>
1 EPOCHS = 500
2 for epoch in range(EPOCHS):
----> 3 pred_y = model(X_train)
4 loss = criterion(pred_y, y_train)
5 optimizer.zero_grad() # zero out gradients to update parameters correctly
RuntimeError: size mismatch, m1: [3540 x 8], m2: [1 x 1] at
C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensorMath.cpp:752

In your Linear Regression model, you have:
self.linear = torch.nn.Linear(1, 1)
But your training data (X_train) shape is 3540 x 8 which means you have 8 features representing each input example. So, you should define the linear layer as follows.
self.linear = torch.nn.Linear(8, 1)
A linear layer in PyTorch has parameters, W and b. If you set the in_features to 8 and out_features to 1, then the shape of the W matrix will be 1 x 8 and the length of b vector will be 1.
Since your training data shape is 3540 x 8, you can perform the following operation.
linear_out = X_train W_T + b
I hope it clarifies your confusion.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Pytorch LSTM Prediction not learning - machine-learning

Related

Why is my pytorch classification model not learning?

Accuracy value goes up and down on the training process

tgt and src have to have equal features for a Transformer Network in Pytorch

How to fix this loss is NaN problem in PyTorch of this RNN with GRU?

Size Mismatch using pytorch when trying to train data

Categories

Resources