How can I implement validation loss to my training body? - machine-learning

I have a regression model and I have :
input_data = torch.Tensor(features)
target_data = torch.Tensor(target)
Features values are x_values and y_values. I did not split them into train and test. My training body is shown below and I am able to see only loss but I would like to add val_loss and would like to compare. How can I Implement it?
#Training the models
losses = []
for epoch in range(3000):
# Forward pass
output = model(input_data)
# Compute loss
loss = criterion(output, target_data)
#r
# Backward pass and update
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Print loss
if epoch % 10 == 0:
losses.append(float(loss.item()))
print(f'Epoch {epoch}, Loss: {loss.item()}')

Related

Does model evaluation during training affect final accuracy in PyTorch?

During a simple training loop for PyTorch a strange effect was observed.
If the evaluation function is called or not seems to have effects on the final performance of the model.
We train on the CIFAR10 using a very simple MLP model and Adam with 10 training epochs.
We try two Main loops:
After the end of each training epoch we measure the accuracy of validation set
We calculate the validation only once at the end of all training.
We show the difference in code here below:
Main Loop 1:
# Main Loop 1
num_epochs = 10
print(f"num_epochs: {num_epochs}")
for epoch in range(num_epochs): # loop over the dataset multiple times
print(f"\nStart Epoch {epoch}")
model.train()
train_loss, train_accuracy = training_epoch(trainloader,optimizer,model,criterion)
print(f"Training Loss: {train_loss:.3f} - Training Accuracy: {train_accuracy:.3f}")
model.eval()
with torch.no_grad():
val_loss, val_accuracy = val_epoch(testloader, model, criterion)
print(f"Val Loss: {val_loss:.3f} - Val Accuracy: {val_accuracy:.3f}")
print('Finished Training')
Main Loop 2:
# Main Loop 2
num_epochs = 10
print(f"num_epochs: {num_epochs}")
for epoch in range(num_epochs): # loop over the dataset multiple times
print(f"\nStart Epoch {epoch}")
model.train()
train_loss, train_accuracy = training_epoch(trainloader,optimizer,model,criterion)
print(f"Training Loss: {train_loss:.3f} - Training Accuracy: {train_accuracy:.3f}")
model.eval()
with torch.no_grad():
val_loss, val_accuracy = val_epoch(testloader, model, criterion)
print(f"Val Loss: {val_loss:.3f} - Val Accuracy: {val_accuracy:.3f}")
print('Finished Training')
Though there shouldn't be any change, the final performance of model change.
Val Loss: 1.526 - Val Accuracy: 0.523
Val Loss: 1.501 - Val Accuracy: 0.528
Of course for reproducibility, we set all seeds. Moreover, this effect can already be observed at the beginning of the second training epoch.
I share the entire code as a Colab notebook:
https://colab.research.google.com/drive/1BODeKHZmcT8lH3r2bxYVHNR2KOpT9O9Y?usp=sharing
The observed difference would be due to variance because of stochasticity in the optimization algorithm. The evaluation you perform has no effect on the model's weights.
Also in the link you provided, you are re-initializing a SimpleMLP on both experiments. Since the module's weights get instantiated randomly the inference will naturally yield different results.

PyTorch Neural Net not learning

I am new to PyTorch and I'm trying to build a simple neural net for classification. The problem is the network doesn't learn at all. I tried various learning rate ranging from 0.3 to 1e-8 and I also tried training it for a longer duration. My data is small with only 120 training examples and the batch size is 16. Here is the code
Define network
model = nn.Sequential(nn.Linear(4999, 1000),
nn.ReLU(),
nn.Linear(1000,200),
nn.ReLU(),
nn.Linear(200,1),
nn.Sigmoid())
Loss and optimizer
import torch.optim as optim
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.BCELoss(reduction="mean")
Training
num_epochs = 100
for epoch in range(num_epochs):
cumulative_loss = 0
for i, data in enumerate(batch_gen(X_train, y_train, batch_size=16)):
inputs, labels = data
inputs = torch.from_numpy(inputs).float()
labels = torch.from_numpy(labels).float()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
cumulative_loss += loss.item()
if i%5 == 0 and i != 0:
print(f"epoch {epoch} batch {i} => Loss: {cumulative_loss/5}")
print("Finished Training!!")
Any help is appreciated!
The reason your loss doesn't seem to decrease every epoch is because you're not printing it every epoch. You're actually printing it every 5th batch. And the loss does not decrease a lot per batch.
Try the following. Here, loss every epoch will be printed.
num_epochs = 100
for epoch in range(num_epochs):
cumulative_loss = 0
for i, data in enumerate(batch_gen(X_train, y_train, batch_size=16)):
inputs, labels = data
inputs = torch.from_numpy(inputs).float()
labels = torch.from_numpy(labels).float()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
cumulative_loss += loss.item()
print(f"epoch {epoch} => Loss: {cumulative_loss}")
print("Finished Training!!")
One reason that your loss doesn't decrease could be because your neural-net isn't deep enough to learn anything. So, trying add more layers.
model = nn.Sequential(nn.Linear(4999, 3000),
nn.ReLU(),
nn.Linear(3000,200),
nn.ReLU(),
nn.Linear(2000,1000),
nn.ReLU(),
nn.Linear(500,250),
nn.ReLU(),
nn.Linear(250,1),
nn.Sigmoid())
Also, I just noticed you're passing data that has very high dimensionality. You have 4999 features/columns and only 120 training examples/rows. Converging a model with so less data is next to impossible (considering you have very high dimensional data).
I'd suggest you try finding more rows or perform dimensionality reduction on your input data (like PCA) to reduce the feature space (to maybe 50/100 or lesser features) and then try again. Chances are that your model still won't converge but it's worth a try.

How is the training accuracy in Keras determined for every epoch?

I am training a model in Keras with as follows:
model.fit(Xtrn, ytrn batch_size=16, epochs=50, verbose=1, shuffle=True,
callbacks=[model_checkpoint], validation_data=(Xval, yval))
The fitting output looks as follows:
As shown in the model.fit I have a batch size of 16 and a total of 8000 training samples as shown in the output. So from my understanding, training takes place every 16 batches. Which also means training is ran 500 times for a single epoch (i.e., 8000/16 =500)
So let's take the training accuracy printed in the output for Epoch 1/50, which in this case is 0.9381. I would like to know how is this training accuracy of 0.9381 derived.
Is it the:
Is the mean training accuracy, taken as the average from the 500 times training, performed for every batch?
OR,
Is it the best (or max) training accuracy from out of the 500 instances the training procedure is run?
Take a look at the BaseLogger in Keras where they're computing a running mean.
For each epoch the accuracy is the average of all the batches seen before in that epoch.
class BaseLogger(Callback):
"""Callback that accumulates epoch averages of metrics.
This callback is automatically applied to every Keras model.
"""
def on_epoch_begin(self, epoch, logs=None):
self.seen = 0
self.totals = {}
def on_batch_end(self, batch, logs=None):
logs = logs or {}
batch_size = logs.get('size', 0)
self.seen += batch_size
for k, v in logs.items():
if k in self.totals:
self.totals[k] += v * batch_size
else:
self.totals[k] = v * batch_size
def on_epoch_end(self, epoch, logs=None):
if logs is not None:
for k in self.params['metrics']:
if k in self.totals:
# Make value available to next callbacks.
logs[k] = self.totals[k] / self.seen

What does the 'training loss' mean in machine learning?

I found some sample code on the tensorflow website as follows.
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x_train}, y_train, batch_size=4, num_epochs=1000)
eval_input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x_eval}, y_eval, batch_size=4, num_epochs=1000)
# We can invoke 1000 training steps by invoking the method and passing the
# training data set.
estimator.fit(input_fn=input_fn, steps=1000)
# Here we evaluate how well our model did.
train_loss = estimator.evaluate(input_fn=input_fn)
eval_loss = estimator.evaluate(input_fn=eval_input_fn)
print("train loss: %r"% train_loss)
print("eval loss: %r"% eval_loss)
Would you let me know what the 'training loss' means?
Training loss is the loss on training data. Loss is a function that takes the correct output and model output and computes the error between them. The loss is then used to adjust weights based on how big the error was and which elements contributed to it the most.

Training neural network with keras (observed loss is too low)

I am training a neural network with Keras which takes input of 2000 X 1 arrays, all the input data are "0" and "1" and generate a single output either 0 or 1.
here is my model:
def mdl_normal(sq_len,broker_num):
model = Sequential()
model.add(Dense(sq_len * (broker_num + 1), input_dim = (sq_len * (broker_num+1)),activation = 'relu'))
model.add(Dense(800, activation = 'relu'))
model.add(Dense(400, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='SGD')
return model
However I am getting the following while training:
Epoch 384/600 0s - loss: 1.4224e-04 - val_loss: 2.6322
The loss is extremely low and I am wondering I am doing something wrong. Can someone explain what is the meaning of loss here?
Thanks!
Louis

Resources