Size Mismatch using pytorch when trying to train data

Size Mismatch using pytorch when trying to train data - machine-learning

I am really new to pytorch and just trying to use my own dataset to do a simple Linear Regression Model. I am only using the numbers values as inputs, too.
I have imported the data from the CSV
dataset = pd.read_csv('mlb_games_overview.csv')
I have split the data into four parts X_train, X_test, y_train, y_test
X = dataset.drop(['date', 'team', 'runs', 'win'], 1)
y = dataset['win']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=True)
I have converted the data to pytorch tensors
X_train = torch.from_numpy(np.array(X_train))
X_test = torch.from_numpy(np.array(X_test))
y_train = torch.from_numpy(np.array(y_train))
y_test = torch.from_numpy(np.array(y_test))
I have created a LinearRegressionModel
class LinearRegressionModel(torch.nn.Module):
def __init__(self):
super(LinearRegressionModel, self).__init__()
self.linear = torch.nn.Linear(1, 1)
def forward(self, x):
y_pred = self.linear(x)
return y_pred
I have initialized the optimizer and the loss function
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
Now when I start to train the data I get the runtime error mismatch
EPOCHS = 500
for epoch in range(EPOCHS):
pred_y = model(X_train) # RUNTIME ERROR HERE
loss = criterion(pred_y, y_train)
optimizer.zero_grad() # zero out gradients to update parameters correctly
loss.backward() # backpropagation
optimizer.step() # update weights
print('epoch {}, loss {}'. format(epoch, loss.data[0]))
Error Log:
RuntimeError Traceback (most recent call last)
<ipython-input-40-c0474231d515> in <module>
1 EPOCHS = 500
2 for epoch in range(EPOCHS):
----> 3 pred_y = model(X_train)
4 loss = criterion(pred_y, y_train)
5 optimizer.zero_grad() # zero out gradients to update parameters correctly
RuntimeError: size mismatch, m1: [3540 x 8], m2: [1 x 1] at
C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensorMath.cpp:752

In your Linear Regression model, you have:
self.linear = torch.nn.Linear(1, 1)
But your training data (X_train) shape is 3540 x 8 which means you have 8 features representing each input example. So, you should define the linear layer as follows.
self.linear = torch.nn.Linear(8, 1)
A linear layer in PyTorch has parameters, W and b. If you set the in_features to 8 and out_features to 1, then the shape of the W matrix will be 1 x 8 and the length of b vector will be 1.
Since your training data shape is 3540 x 8, you can perform the following operation.
linear_out = X_train W_T + b
I hope it clarifies your confusion.

Related

Input/Target size mismatch when training a downstream BERT for classification (huggingface pretrained)

I am training a BERT model with a downstream task to classify movie genres. I am using HuggingFace pretrained model (aleph-bert since data is in Hebrew)
When training, I get the following error:
ValueError: Expected input batch_size (3744) to match target batch_size (16).
This is my notebook:
https://colab.research.google.com/drive/1mqIUPnLOu_H-URn5tzE6gGySsW3oAcRY?usp=sharing
The error happens in the compute_loss functions, while performing the cross_entropy step.
My batch size is 16 but for some reason the bert output returns a different size.
The relevant code:
def data_prep_for_genre(genre):
X = movies_df['overview']
y = movies_df[genre].rename('labels', inplace=True).astype(float)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train = tokenizer(X_train.to_list(), truncation=True)
X_test = tokenizer(X_test.to_list(), truncation=True)
train_dataset = TextData(X_train, y_train.to_list())
test_dataset = TextData(X_test, y_test.to_list())
# define model:
model = BertForTokenClassification.from_pretrained("onlplab/alephbert-base", num_labels=2)
return model, train_dataset, test_dataset
class MyTrainer(Trainer):
def compute_metrics(pred):
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
acc = accuracy_score(labels, preds)
return {
'accuracy': acc,
'f1': f1,
'precision': precision,
'recall': recall
}
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=10,
per_device_train_batch_size=16,
per_device_eval_batch_size=32,
warmup_steps=50,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10
)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
for genre in GENRE_SET:
model, train_dataset, test_dataset = data_prep_for_genre(genre)
trainer = MyTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
# eval_dataset=test_dataset,
data_collator=data_collator
)
trainer.train()

Why is my pytorch classification model not learning?

I have created a simple pytorch classification model with sample datasets generated using sklearns make_classification. Even after training for thousands of epochs the accuracy of the model hovers between 30 and 40 percentage. During training itself the loss value is fluctuating very far and wide. I am wondering why this model is not learning, whether it's due to some logical error in the code.
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X,y = make_classification(n_features=15,n_classes=5,n_informative=4)
DEVICE = torch.device('cuda')
epochs = 5000
class CustomDataset(Dataset):
def __init__(self,X,y):
self.X = torch.from_numpy(X)
self.y = torch.from_numpy(y)
def __len__(self):
return len(self.X)
def __getitem__(self, index):
X = self.X[index]
y = self.y[index]
return (X,y)
class Model(nn.Module):
def __init__(self):
super().__init__()
self.l1 = nn.Linear(15,10)
self.l2 = nn.Linear(10,5)
self.relu = nn.ReLU()
def forward(self,x):
x = self.l1(x)
x = self.relu(x)
x = self.l2(x)
x = self.relu(x)
return x
model = Model().double().to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
train_data = CustomDataset(X_train,y_train)
test_data = CustomDataset(X_test,y_test)
trainloader = DataLoader(train_data, batch_size=32, shuffle=True)
testloader = DataLoader(test_data, batch_size=32, shuffle=True)
for i in range(epochs):
for (x,y) in trainloader:
x = x.to(DEVICE)
y = y.to(DEVICE)
optimizer.zero_grad()
output = model(x)
loss = loss_function(output,y)
loss.backward()
optimizer.step()
if i%200==0:
print("epoch: ",i," Loss: ",loss.item())
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
for x, y in testloader:
# calculate outputs by running x through the network
outputs = model(x.to(DEVICE)).to(DEVICE)
# the class with the highest energy is what we choose as prediction
_, predicted = torch.max(outputs.data, 1)
total += y.size(0)
correct += (predicted == y.to(DEVICE)).sum().item()
print(f'Accuracy of the network on the test data: {100 * correct // total} %')
EDIT
I tried to over-fit my model with only 10 samples (batch_size=5) X,y = make_classification(n_samples=10,n_features=15,n_classes=5,n_informative=4) but now the accuracy decreased to 15-20%. I then normalize the input data between the values 0 and 1 which pushed the accuracy a bit higher but not over 50 percentage. Any idea why this might be happening?

You should not be using ReLU activation on your output layer. Usually softmax activation is used for multi class classification on the final layer, or the logits are fed to the loss function directly without explicitly adding a softmax activation layer.
Try removing the ReLU activation from the final layer.

Pytorch LSTM Prediction not learning

I'm using a LSTM model to predict BABA stock price using this dataset: "/kaggle/input/price-volume-data-for-all-us-stocks-etfs/Data/Stocks/baba.us.txt".
I'm not sure why my model is not learning and the y_test_prediction is so different from the actual y_test. I really appreciate your help as I'm beginning to learn machine learning. Thank you!
I have scaled the data with minMaxScaler before splitting it. This is how I split the data:
x_train, y_train, x_test, y_test = [], [], [], []
lags = 3
for t in range(len(train_data)-lags-1):
x_train.append(train_data[t:(t+lags),:])
y_train.append(train_data[(t+lags),:])
for t in range(len(test_data)-lags-1):
x_test.append(test_data[t:(t+lags),:])
y_test.append(test_data[(t+lags),:])
x_train = torch.FloatTensor(np.array(x_train))
y_train = torch.FloatTensor(np.array(y_train))
x_test = torch.FloatTensor(np.array(x_test))
y_test = torch.FloatTensor(np.array(y_test))
x_train = np.reshape(x_train,(x_train.shape[0],x_train.shape[1],1))
x_test = np.reshape(x_test,(x_test.shape[0],x_test.shape[1],1))
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
This is my LSTM model:
input_dim = 1
hidden_layer_dim = 32
num_layers = 1
output_dim = 1
class LSTM(nn.Module):
def __init__(self, input_dim,hidden_layer_dim, num_layers, output_dim ):
super(LSTM, self).__init__()
self.input_dim = input_dim
self.hidden_layer_dim = hidden_layer_dim
self.num_layers = num_layers
self.output_dim = output_dim
self.lstm = nn.LSTM(input_dim, hidden_layer_dim,num_layers,batch_first = True)
self.fc = nn.Linear(hidden_layer_dim, output_dim)
def forward(self, x):
# initial hidden state & cell state as zeros
h0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_layer_dim))
c0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_layer_dim))
# lstm output with hidden and cell state
output, (hn, cn) = self.lstm(x, (h0,c0))
# get hidden state to be passed to dense layer
hn = hn.view(-1, self.hidden_layer_dim)
output = self.fc(hn)
return output
This is my training:
num_epochs = 100
learning_rate = 0.01
model = LSTM(input_dim,hidden_layer_dim, num_layers, output_dim)
loss = torch.nn.MSELoss() # mean-squared error for regression
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
hist = np.zeros(num_epochs)
# train model
for epoch in range(num_epochs):
outputs = model(x_train)
optimizer.zero_grad()
#get loss function
loss_fn = loss(outputs, y_train.view(1,-1))
hist[epoch] = loss_fn.item()
loss_fn.backward()
optimizer.step()
if epoch %10==0:
print("Epoch: %d, loss: %1.5f" % (epoch, hist[epoch]))
This is the training loss and prediction vs actual
training loss
prediction vs actual

You are initialising hidden layers every time forward is being called, which might cause errors with backprop. You do not even have to initialise them. PyTorch takes care of that for you. You can check this implementation for the details. Also, as a side note, you might want to take a look at PyTorch dataloaders(just an easier way to make splits).

making predictions using classification models with multiple independent variables in hand

I am trying to make a simple classification using Logistic Regression. I fit the model and scale the values using a standard scaler. how can I make a single prediction after that? I am getting the same result for different values. For every value, I am getting 0. the prediction I am getting from single inputs does not resemble with the result from the prediction made by the testing dataset. Can someone please give me a hand?
dataset = pd.read_csv("Social_Network_Ads.csv")
x = dataset.iloc[:, 2:4].values
y = dataset.iloc[:, 4].values
print(dataset)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
classifier = LogisticRegression()
classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)
x_values = [36, 36000]
x_values = np.array(x_values).reshape(1, -1)
x_values = scaler.transform(x_values)
pred = classifier.predict(x_values)
print("single prediction: ", pred)

sklearn and weka kNN predictions exactly same for all except for one data point

I wrote a code for kNN using sklearn and then compared the predictions using the WEKA kNN. The comparison was done using the 10 test set predictions, out of which, only a single one is showing a high difference of >1.5 but all others are exactly the same. So, I am not sure about if my code is working fine or not. Here is my code:
df = pd.read_csv('xxxx.csv')
X = df.drop(['Name', 'activity'], axis=1)
y = df['activity']
Xstd = StandardScaler().fit_transform(X)
x_train, x_test, y_train, y_test = train_test_split(Xstd, y, test_size=0.2,
shuffle=False, random_state=None)
print(x_train.shape, x_test.shape)
X_train_trans = x_train
X_test_trans = x_test
for i in range(2, 3):
knn_regressor = KNeighborsRegressor(n_neighbors=i, algorithm='brute',
weights='uniform', metric='euclidean', n_jobs=1, p=2)
CV_pred_train = cross_val_predict(knn_regressor, X_train_trans, y_train,
n_jobs=-1, verbose=0, cv=LeaveOneOut())
print("LOO Q2: ", metrics.r2_score(y_train, CV_pred_train).round(2))
# Train Test predictions
knn_regressor.fit(X_train_trans, y_train)
train_r2 = knn_regressor.score(X_train_trans, y_train)
y_train_pred = knn_regressor.predict(X_train_trans).round(3)
train_r2_1 = metrics.r2_score(y_train, y_train_pred)
y_test_pred = knn_regressor.predict(X_test_trans).round(3)
train_r = stats.pearsonr(y_train, y_train_pred)
abs_error_train = (y_train - y_train_pred)
train_predictions = pd.DataFrame({'Actual': y_train, 'Predcited':
y_train_pred, "error": abs_error_train.round(3)})
MAE_train = metrics.mean_absolute_error(y_train, y_train_pred)
abs_error_test = (y_test_pred - y_test)
test_predictions = pd.DataFrame({'Actual': y_test, 'predcited':
y_test_pred, 'error': abs_error_test.round(3)})
test_r = stats.pearsonr(y_test, y_test_pred)
test_r2 = metrics.r2_score(y_test, y_test_pred)
MAE_test = metrics.mean_absolute_error(y_test, y_test_pred).round(3)
print(test_predictions)
The train set statistics are almost same in both sklearn and WEKA kNN.
the sklearn predictions are:
Actual predcited error
6.00 5.285 -0.715
5.44 5.135 -0.305
6.92 6.995 0.075
7.28 7.005 -0.275
5.96 6.440 0.480
7.96 7.150 -0.810
7.30 6.660 -0.640
6.68 7.200 0.520
***4.60 6.950 2.350***
and the weka predictions are:
actual predicted error
6 5.285 -0.715
5.44 5.135 -0.305
6.92 6.995 0.075
7.28 7.005 -0.275
5.96 6.44 0.48
7.96 7.15 -0.81
7.3 6.66 -0.64
6.68 7.2 0.52
***4.6 5.285 0.685***
parameters used in both algorithms are: k =2, brute force for distance calculation, metric: euclidean.
Any suggestions for the difference?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Size Mismatch using pytorch when trying to train data - machine-learning

Related

Input/Target size mismatch when training a downstream BERT for classification (huggingface pretrained)

Why is my pytorch classification model not learning?

Pytorch LSTM Prediction not learning

making predictions using classification models with multiple independent variables in hand

sklearn and weka kNN predictions exactly same for all except for one data point

Categories

Resources