LSTM higher validation accuracy at higher validation loss ( Python: Keras ) - machine-learning

When training my LSTM ( using the Keras library in Python ) the validation loss keeps increasing, although it eventually does obtain a higher validation accuracy. Which leads me to 2 questions:
How/Why does it obtain a (significantly) higher validation accuracy at a (significantly) higher validation loss?
Is it problematic that the validation loss increases? ( because it eventually does obtain a good validation accuracy either way )
This is an example history log of my LSTM for which this applies:
As visible when comparing epoch 0 with epoch ~430:
52% val accuracy at 1.1 val loss vs. 61% val accuracy at 1.8 val loss
For the loss function I'm using tf.keras.losses.CategoricalCrossentropy and I'm using the SGD optimizer at a high learning rate of 50-60% ( as it obtained the best validation accuracy with it ).
Initially I thought it may be overfitting, but then I don't understand how the validation accuracy does eventually get quite a lot higher at almost 2 times as high of a validation loss.
Any insights would be much appreciated.
EDIT: Another example of a different run, less fluctuating validation accuracy but still significantly higher validation accuracy as the validation loss increases:
In this run I used a low instead of high dropout.

As you stated, "at a high learning rate of 50-60%", this might be the reason why graphs are oscillating. Lowering the learning rate or adding regularization should solve the oscillating problem.
More generally,
Cross Entropy loss is not a bounded loss, so having very badly outliers would make it explode.
Accuracy can go higher which means your model is able to learn the rest of the dataset except the outliers.
Validation set has too many outliers that causing the oscillation of the loss values.
To conclude if you are overfitting or not, you should inspect validation set for outliers.

Related

Train Accuracy increases, Train loss is stable, Validation loss Increases, Validation Accuracy is low and increases

My neural network trainign in pytorch is getting very wierd.
I am training a known dataset that came splitted into train and validation.
I'm shuffeling the data during training and do data augmentation on the fly.
I have those results:
Train accuracy start at 80% and increases
Train loss decreases and stays stable
Validation accuracy start at 30% but increases slowly
Validation loss increases
I have the following graphs to show:
How can you explain that the validation loss increases and the validation accuracy increases?
How can be such a big difference of accuracy between validation and training sets? 90% and 40%?
Update:
I balanced the data set.
It is binary classification. It now has now 1700 examples from class 1, 1200 examples from class 2. Total 600 for validation and 2300 for training.
I still see similar behavior:
**Can it be becuase I froze the weights in part of the network?
**Can it be becuase the hyperparametrs like lr?
I found the solution:
I had different data augmentation for training set and validation set. Matching them also increased the validation accuracy!
If the training set is very large in comparison to the validation set, you are more likely to overfit and learn the training data, which would make generalizing the model very difficult. I see your training accuracy is at 0.98 and your validation accuracy increases at a very slow rate, which would imply that you have overfit your training data.
Try reducing the number of samples in your training set to improve how well your model generalizes to unseen data.
Let me answer your 2nd question first. High accuracy on training data and low accuracy on val/test data indicates the model might not generalize well to infer real cases. That is what the validation process is all about. You need to finetune or even rebuild your model.
With regard to the first question, val loss might not necessarily correspond to the val accuracy. The model makes the prediction based on its model, and loss function calculates the difference between probablities of matrix and the target if you are using CrossEntropy function.

Is some overfitting in a convolutional network alright?

I am using resnet50 to classify pictures of flowers from a Kaggle dataset. I would like to clarify some things about my results.
epoch train_loss valid_loss error_rate time
0 0.205352 0.226580 0.077546 02:01
1 0.148942 0.205224 0.074074 02:01
These are the last two epochs of training. As you can see, the second epoch shows some overfitting because the train_loss is a good margin lower than the validation loss. Despite the overfitting, the error_rate and the validation loss decreased. I am wondering whether the model had actually improved in spite of the overfitting. Is it better to use the model from epoch 0 or epoch 1 for unseen data? Thank you!
Sadly, "overfitting" is a much abused term nowadays, used to mean almost everything linked to suboptimal performance; nevertheless, and practically speaking, overfitting means something very specific: its telltale signature is when your validation loss starts increasing, while your training loss continues decreasing, i.e.:
(Image adapted from Wikipedia entry on overfitting)
It's clear than nothing of the sorts happens in your case; the "margin" between your training and validation loss is another story altogether (it is called generalization gap), and does not signify overfitting.
Thus, in principle, you have absolutely no reason at all to choose a model with higher validation loss (i.e. your first one) instead of one with a lower validation loss (your second one).

Is this overfitting?

I have a CNN that is performing very well (96% accuracy, 1.~ loss) on training data but poorly (50% accuracy, 3.5 loss) on testing data.
The telltale signature of overfitting is when your validation loss starts increasing, while your training loss continues decreasing, i.e.:
(Image adapted from Wikipedia entry on overfitting)
Here are some other plots indicating overfitting (source):
See also the SO thread How to know if underfitting or overfitting is occuring?.
Clearly, your loss plot does exhibit such behavior, so yes, you are indeed overfitting.
On the contrary, the plot you have linked to in a comment:
does not exhibit such behavior, hence here you are not actually overfitting (you just have reached a saturation point, beyond which your validation error is not further improving).
96% accuracy suggests you have a really close fit to your training data. 50% accuracy on testing data shows that your model cannot account for the noise/variability of the data being studied. This looks like textbook overfitting.
You seem to be calling your validation data your test data. Maybe you can better partition your data?

Test accuracy is greater than train accuracy what to do?

I am using the random forest.My test accuracy is 70% on the other hand train accuracy is 34% ? what to do ? How can I solve this problem.
Test accuracy should not be higher than train since the model is optimized for the latter. Ways in which this behavior might happen:
you did not use the same source dataset for test. You should do a proper train/test split in which both of them have the same underlying distribution. Most likely you provided a completely different (and more agreeable) dataset for test
an unreasonably high degree of regularization was applied. Even so there would need to be some element of "test data distribution is not the same as that of train" for the observed behavior to occur.
The other answers are correct in most cases. But I'd like to offer another perspective. There are specific training regimes that could cause the training data to be harder for the model to learn - for instance, adversarial training or adding Gaussian noise to the training examples. In these cases, the benign test accuracy could be higher than train accuracy, because benign examples are easier to evaluate. This isn't always a problem, however!
If this applies to you, and the gap between train and test accuracies is larger than you'd like (~30%, as in your question, is a pretty big gap), then this indicates that your model is underfitting to the harder patterns, so you'll need to increase the expressibility of your model. In the case of random forests, this might mean training the trees to a higher depth.
First you should check the data that is used for training. I think there is some problem with the data, the data may not be properly pre-processed.
Also, in this case, you should try more epochs. Plot the learning curve to analyze when the model is going to converge.
You should check the following:
Both training and validation accuracy scores should increase and loss should decrease.
If there is something wrong in step 1 after any particular epoch, then train your model until that epoch only, because your model is over-fitting after that.

Training Loss and Validation Loss in Deep Learning

Would you please guide me how to interpret the following results?
1) loss < validation_loss
2) loss > validation_loss
It seems that the training loss always should be less than validation loss. But, both of these cases happen when training a model.
Really a fundamental question in machine learning.
If validation loss >> training loss you can call it overfitting.
If validation loss > training loss you can call it some overfitting.
If validation loss < training loss you can call it some underfitting.
If validation loss << training loss you can call it underfitting.
Your aim is to make the validation loss as low as possible.
Some overfitting is nearly always a good thing. All that matters in the end is: is the validation loss as low as you can get it.
This often occurs when the training loss is quite a bit lower.
Also check how to prevent overfitting.
In machine learning and deep learning there are basically three cases
1) Underfitting
This is the only case where loss > validation_loss, but only slightly, if loss is far higher than validation_loss, please post your code and data so that we can have a look at
2) Overfitting
loss << validation_loss
This means that your model is fitting very nicely the training data but not at all the validation data, in other words it's not generalizing correctly to unseen data
3) Perfect fitting
loss == validation_loss
If both values end up to be roughly the same and also if the values are converging (plot the loss over time) then chances are very high that you are doing it right
1) Your model performs better on the training data than on the unknown validation data. A bit of overfitting is normal, but higher amounts need to be regulated with techniques like dropout to ensure generalization.
2) Your model performs better on the validation data. This can happen when you use augmentation on the training data, making it harder to predict in comparison to the unmodified validation samples. It can also happen when your training loss is calculated as a moving average over 1 epoch, whereas the validation loss is calculated after the learning phase of the same epoch.
Aurélien Geron made a good Twitter thread about this phenomenon. Summary:
Regularization is typically only applied during training, not validation and testing. For example, if you're using dropout, the model has fewer features available to it during training.
Training loss is measured after each batch, while the validation loss is measured after each epoch, so on average the training loss is measured ½ an epoch earlier. This means that the validation loss has the benefit of extra gradient updates.
the val set can be easier than the training set. For example, data augmentations often distort or occlude parts of the image. This can also happen if you get unlucky during sampling (val set has too many easy classes, or too many easy examples), or if your val set is too small. Or, the train set leaked into the val set.
If your validation loss is less than your training loss, you have not correctly split the training data. This correctly indicates that the distribution of the training and validation sets is different. It should ideally be the same. MOROVER, Good Fit: In the ideal case, the training and validation losses both drop and stabilize at specified points, indicating an optimal fit, i.e. a model that does neither overfit or underfit.

Resources