When training in Caffe, there are Train and Test net outputs for each iteration. I know this is the loss. However, is this the average loss over my batch or the total loss? And is this the same for both Classification and Regression?
For example, if I were to have a batch of 100 training examples and my loss over that iteration is 100, does that mean that the average loss per example is 1?
Train loss is the averaged loss over the last training batch. That means that if you have 100 training examples in your mini-batch and your loss over that iteration is 100, then you have the average loss per example equals to 100.
Test loss is also an averaged loss but over all the test batches. You specify the test batch size and the number of testing iterations. Caffe will take #iter of such mini-batches, evaluate loss for them and provide you an averaged value. If #test_iter x batch_size == testset_size, you will have an averaged value across the full test set.
Related
I am trying to do a transfer learning with ResNet50V2 model using triplet loss function. I have kept Include_top = False, input shape = (160,160,3) with Imagenet weights. The last 3 layers of my model is shown in the below image with 6 million trainable parameters.
During the training process, I could see the loss function values reducing from 7.6 to 0.8 but the accuracy does not improve. But when I replace the model with VGG16 and while training the last 3 layers, the accuracy improves from 50% to 90% along with loss value reducing from 6.0 to 0.5.
Where am I going wrong ? Is there anything specific I should look at while training resnet model ? How to train the resnet model ?
My neural network trainign in pytorch is getting very wierd.
I am training a known dataset that came splitted into train and validation.
I'm shuffeling the data during training and do data augmentation on the fly.
I have those results:
Train accuracy start at 80% and increases
Train loss decreases and stays stable
Validation accuracy start at 30% but increases slowly
Validation loss increases
I have the following graphs to show:
How can you explain that the validation loss increases and the validation accuracy increases?
How can be such a big difference of accuracy between validation and training sets? 90% and 40%?
Update:
I balanced the data set.
It is binary classification. It now has now 1700 examples from class 1, 1200 examples from class 2. Total 600 for validation and 2300 for training.
I still see similar behavior:
**Can it be becuase I froze the weights in part of the network?
**Can it be becuase the hyperparametrs like lr?
I found the solution:
I had different data augmentation for training set and validation set. Matching them also increased the validation accuracy!
If the training set is very large in comparison to the validation set, you are more likely to overfit and learn the training data, which would make generalizing the model very difficult. I see your training accuracy is at 0.98 and your validation accuracy increases at a very slow rate, which would imply that you have overfit your training data.
Try reducing the number of samples in your training set to improve how well your model generalizes to unseen data.
Let me answer your 2nd question first. High accuracy on training data and low accuracy on val/test data indicates the model might not generalize well to infer real cases. That is what the validation process is all about. You need to finetune or even rebuild your model.
With regard to the first question, val loss might not necessarily correspond to the val accuracy. The model makes the prediction based on its model, and loss function calculates the difference between probablities of matrix and the target if you are using CrossEntropy function.
Which Evaluation metric should i use for classification problem statement ? On what factor should i decide ?
1. Accuracy
2. F1 Score
3. AUC ROC Score
4. Log Loss
Accuracy is a great metric when you are working with a balanced dataset. It's the number of true predictions over the total number of predictions.
F1 Score is a great metric when you want to maximaze the precision and the recall of the prediction, it's also great to unbalanced datasets.
AUC ROC Score represents how much of your data is covered by the algorithm. I really like using this evaluation metric, it works well for both balanced and unbalanced datasets.
Log Loss is the logarithmic loss of the prediction, beased on the cross-entropy between the predicted label and the true label. I never used this metric before.
I have implemented a Variational Autoencoder model in Pytorch that is trained on SMILES strings (String representations of molecular structures).
While training the autoencoder to output the same string as the input, the Loss function does not decrease between epochs.
I have tried the following with no success:
1) Adding 3 more GRU layers to the decoder to increase learning capability of the model.
2) Increasing the latent vector size from 292 to 350.
3) Increasing and decreasing the learning rate.
4) Changing the optimizer from Adam to SGD.
5) Trained the model on upto 50 epochs.
6) Increasing and decreasing the batch size.
The following is the link to my code.
https://colab.research.google.com/drive/1LctSm_Emnn5sHpw_Hon8xL5fF4bmKRw5
The following is an equivalent keras model(Same architecture) that is able to train successfully.
https://colab.research.google.com/drive/170Peseik03CFYpWPNyD8B8mxUGxTQx67
I develop a simple autoencoder and to find the right parameters I use a grid search on a small subset of dataset. The number of epochs in output can be used on the training set with higher dimension? The number of epochs depends on the dimension of dataset? or not? E.g. I have much more epochs in a dataset with a big dimension and a lower number of epochs for a small dataset
In general yes, the number of epochs will change if the dataset is bigger.
The number of epochs should not be decided a-priori. You should run the training and monitor the training and validation losses over time and stop training when the validation loss reaches a plateau or start increasing. This technique is called "early stopping" and is a good practice in machine learning.