I am using a WGAN-GP for medical image generation and while training, I notice that the loss for the generator diverges to a high value while the loss for the discriminator is more or less the same as the training continues - Loss plot. I am not sure why that would happen.
I am using this code for the WGAN-GP. The only thing I changed in this implementation was to change the number of filters in the discriminator to 32 (or ngf/4).
When training my LSTM ( using the Keras library in Python ) the validation loss keeps increasing, although it eventually does obtain a higher validation accuracy. Which leads me to 2 questions:
How/Why does it obtain a (significantly) higher validation accuracy at a (significantly) higher validation loss?
Is it problematic that the validation loss increases? ( because it eventually does obtain a good validation accuracy either way )
This is an example history log of my LSTM for which this applies:
As visible when comparing epoch 0 with epoch ~430:
52% val accuracy at 1.1 val loss vs. 61% val accuracy at 1.8 val loss
For the loss function I'm using tf.keras.losses.CategoricalCrossentropy and I'm using the SGD optimizer at a high learning rate of 50-60% ( as it obtained the best validation accuracy with it ).
Initially I thought it may be overfitting, but then I don't understand how the validation accuracy does eventually get quite a lot higher at almost 2 times as high of a validation loss.
Any insights would be much appreciated.
EDIT: Another example of a different run, less fluctuating validation accuracy but still significantly higher validation accuracy as the validation loss increases:
In this run I used a low instead of high dropout.
As you stated, "at a high learning rate of 50-60%", this might be the reason why graphs are oscillating. Lowering the learning rate or adding regularization should solve the oscillating problem.
More generally,
Cross Entropy loss is not a bounded loss, so having very badly outliers would make it explode.
Accuracy can go higher which means your model is able to learn the rest of the dataset except the outliers.
Validation set has too many outliers that causing the oscillation of the loss values.
To conclude if you are overfitting or not, you should inspect validation set for outliers.
My neural network trainign in pytorch is getting very wierd.
I am training a known dataset that came splitted into train and validation.
I'm shuffeling the data during training and do data augmentation on the fly.
I have those results:
Train accuracy start at 80% and increases
Train loss decreases and stays stable
Validation accuracy start at 30% but increases slowly
Validation loss increases
I have the following graphs to show:
How can you explain that the validation loss increases and the validation accuracy increases?
How can be such a big difference of accuracy between validation and training sets? 90% and 40%?
I balanced the data set.
It is binary classification. It now has now 1700 examples from class 1, 1200 examples from class 2. Total 600 for validation and 2300 for training.
I still see similar behavior:
**Can it be becuase I froze the weights in part of the network?
**Can it be becuase the hyperparametrs like lr?
I found the solution:
I had different data augmentation for training set and validation set. Matching them also increased the validation accuracy!
If the training set is very large in comparison to the validation set, you are more likely to overfit and learn the training data, which would make generalizing the model very difficult. I see your training accuracy is at 0.98 and your validation accuracy increases at a very slow rate, which would imply that you have overfit your training data.
Try reducing the number of samples in your training set to improve how well your model generalizes to unseen data.
Let me answer your 2nd question first. High accuracy on training data and low accuracy on val/test data indicates the model might not generalize well to infer real cases. That is what the validation process is all about. You need to finetune or even rebuild your model.
With regard to the first question, val loss might not necessarily correspond to the val accuracy. The model makes the prediction based on its model, and loss function calculates the difference between probablities of matrix and the target if you are using CrossEntropy function.
I am doing binary classification with Keras
loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam and final layer is keras.layers.Dense(1, activation=tf.nn.sigmoid).
As I know, loss value is used to evaluate the model during training phase. However, when I use Keras model evaluation for my testing dataset (e.g. m_recall.evaluate(testData,testLabel), there are also loss values, accompanied by accuracy values like the output below
test size: (1889, 18525)
1889/1889 [==============================] - 1s 345us/step
m_acc: [0.5690245978371045, 0.9523557437797776]
1889/1889 [==============================] - 1s 352us/step
m_recall: [0.24519687695911097, 0.9359449444150344]
1889/1889 [==============================] - 1s 350us/step
m_f1: [0.502442331737344, 0.9216516675489677]
1889/1889 [==============================] - 1s 360us/step
metric name: ['loss', 'acc']
What is the meaning/usage of loss during testing? Why it is so high (e.g. 0.5690 in m_acc)? The accuracy evaluation seems fine to me (e.g. 0.9523 in m_acc) but I am concerned about the loss too, does it make my model perform badly?
m_acc, m_recall, etc. are just the way I name my models (they were trained by on different metrics in GridSearchCV)
I just realized that loss values are not in percentage, so how are they calculated? And with current values, are they good enough or do I need to optimize them more?
Suggestions for further reading are appreciated too!
When defining a machine learning model, we want a way to measure the performance of our model so that we could compare it with other models to choose the best one and also make sure that it is good enough. Therefore, we define some metrics like accuracy (in the context of classification), which is the proportion of correctly classified samples by the model, to measure how our model performs and whether it is good enough for our task or not.
Although these metrics are truly comprehensible by us, however the problem is that they cannot be directly used by the learning process of our models to tune the parameters of the model. Instead, we define other measures, which are usually called loss functions or objective functions, which can be directly used by the training process (i.e. optimization). These functions are usually defined such that we expect that when their values are low we would have a high accuracy. That's why you would commonly see that the machine learning algorithms are trying to minimize a loss function with the expectation that the accuracy increases. In other words, the models are indirectly learning by optimizing the loss functions. The loss values are important during training of the model, e.g. if they are not decreasing or fluctuating then this means there is a problem somewhere that needs to be fixed.
As a result, what we are ultimately (i.e. when testing a model) concerned about is the value of metrics (like accuracy) we have initially defined and we don't care about the final value of loss functions. That's why you don't hear things like "the loss value of a [specific model] on the ImageNet dataset is 8.732"! That does not tell you anything whether the model is great, good, bad or terrible. Rather, you would hear that "this model performs with 87% accuracy on the ImageNet dataset".
Would you please guide me how to interpret the following results?
1) loss < validation_loss
2) loss > validation_loss
It seems that the training loss always should be less than validation loss. But, both of these cases happen when training a model.
Really a fundamental question in machine learning.
If validation loss >> training loss you can call it overfitting.
If validation loss > training loss you can call it some overfitting.
If validation loss < training loss you can call it some underfitting.
If validation loss << training loss you can call it underfitting.
Your aim is to make the validation loss as low as possible.
Some overfitting is nearly always a good thing. All that matters in the end is: is the validation loss as low as you can get it.
This often occurs when the training loss is quite a bit lower.
Also check how to prevent overfitting.
In machine learning and deep learning there are basically three cases
1) Underfitting
This is the only case where loss > validation_loss, but only slightly, if loss is far higher than validation_loss, please post your code and data so that we can have a look at
2) Overfitting
loss << validation_loss
This means that your model is fitting very nicely the training data but not at all the validation data, in other words it's not generalizing correctly to unseen data
3) Perfect fitting
loss == validation_loss
If both values end up to be roughly the same and also if the values are converging (plot the loss over time) then chances are very high that you are doing it right
1) Your model performs better on the training data than on the unknown validation data. A bit of overfitting is normal, but higher amounts need to be regulated with techniques like dropout to ensure generalization.
2) Your model performs better on the validation data. This can happen when you use augmentation on the training data, making it harder to predict in comparison to the unmodified validation samples. It can also happen when your training loss is calculated as a moving average over 1 epoch, whereas the validation loss is calculated after the learning phase of the same epoch.
Aurélien Geron made a good Twitter thread about this phenomenon. Summary:
Regularization is typically only applied during training, not validation and testing. For example, if you're using dropout, the model has fewer features available to it during training.
Training loss is measured after each batch, while the validation loss is measured after each epoch, so on average the training loss is measured ½ an epoch earlier. This means that the validation loss has the benefit of extra gradient updates.
the val set can be easier than the training set. For example, data augmentations often distort or occlude parts of the image. This can also happen if you get unlucky during sampling (val set has too many easy classes, or too many easy examples), or if your val set is too small. Or, the train set leaked into the val set.
If your validation loss is less than your training loss, you have not correctly split the training data. This correctly indicates that the distribution of the training and validation sets is different. It should ideally be the same. MOROVER, Good Fit: In the ideal case, the training and validation losses both drop and stabilize at specified points, indicating an optimal fit, i.e. a model that does neither overfit or underfit.
I implement the ResNet for the cifar 10 in accordance with this document https://arxiv.org/pdf/1512.03385.pdf
But my accuracy is significantly different from the accuracy obtained in the document
My - 86%
Pcs daughter - 94%
What's my mistake?
Your question is a little bit too generic, my opinion is that the network is over fitting to the training data set, as you can see the training loss is quite low, but after the epoch 50 the validation loss is not improving anymore.
I didn't read the paper in deep so I don't know how did they solved the problem but increasing regularization might help. The following link will point you in the right direction http://cs231n.github.io/neural-networks-3/
below I copied the summary of the text:
To train a Neural Network:
Gradient check your implementation with a small batch of data and be aware of the pitfalls.
As a sanity check, make sure your initial loss is reasonable, and that you can achieve 100% training accuracy on a very small portion of
the data
During training, monitor the loss, the training/validation accuracy, and if you’re feeling fancier, the magnitude of updates in relation to
parameter values (it should be ~1e-3), and when dealing with ConvNets,
the first-layer weights.
The two recommended updates to use are either SGD+Nesterov Momentum or Adam.
Decay your learning rate over the period of the training. For example, halve the learning rate after a fixed number of epochs, or
whenever the validation accuracy tops off.
Search for good hyperparameters with random search (not grid search). Stage your search from coarse (wide hyperparameter ranges,
training only for 1-5 epochs), to fine (narrower rangers, training for
many more epochs)
Form model ensembles for extra performance
I would argue that the difference in data pre processing makes the difference in performance. He is using padding and random crops, which in essence increases the amount of training samples and decreases the generalization error. Also as the previous poster said you are missing regularization features, such as the weight decay.
You should take another look at the paper and make sure you implement everything like they did.