If the validation isnt converging, my model is overfitting right? - machine-learning

I have this graph, and my validation loss isn't converging.
am I right that this is a case of overfitting?
graph picture

It seems to me that this would be overfitting/high variance (big difference between loss of training set and validation set).
One solution you could try would be to add some form of regularisation (L1, L2, dropout). You could also try increasing the size of your training set to get more variations in your training examples.
Good luck !

Related

How is Machine Learning Overfitting works

How noise in the data, target complexity and size of the training set are related to over-fitting?
I am guessing that you are a beginner, suppose you have dataset with lots of features(as in columns). you create a model and test it on your training and test dataset, you notice that it gives you an accuracy of 100 percent on your training set and 60-70 on your test set, this is an example of Overfitting. it is because you have chosen a lot of features which were not related to predicting the outcome.
you can remove it by dropping those irrelevant columns(which are called as noise), apply K-fold cross validation on your data.
this video might help you get a better understanding
https://www.youtube.com/watch?v=Anq4PgdASsc

Test accuracy is greater than train accuracy what to do?

I am using the random forest.My test accuracy is 70% on the other hand train accuracy is 34% ? what to do ? How can I solve this problem.
Test accuracy should not be higher than train since the model is optimized for the latter. Ways in which this behavior might happen:
you did not use the same source dataset for test. You should do a proper train/test split in which both of them have the same underlying distribution. Most likely you provided a completely different (and more agreeable) dataset for test
an unreasonably high degree of regularization was applied. Even so there would need to be some element of "test data distribution is not the same as that of train" for the observed behavior to occur.
The other answers are correct in most cases. But I'd like to offer another perspective. There are specific training regimes that could cause the training data to be harder for the model to learn - for instance, adversarial training or adding Gaussian noise to the training examples. In these cases, the benign test accuracy could be higher than train accuracy, because benign examples are easier to evaluate. This isn't always a problem, however!
If this applies to you, and the gap between train and test accuracies is larger than you'd like (~30%, as in your question, is a pretty big gap), then this indicates that your model is underfitting to the harder patterns, so you'll need to increase the expressibility of your model. In the case of random forests, this might mean training the trees to a higher depth.
First you should check the data that is used for training. I think there is some problem with the data, the data may not be properly pre-processed.
Also, in this case, you should try more epochs. Plot the learning curve to analyze when the model is going to converge.
You should check the following:
Both training and validation accuracy scores should increase and loss should decrease.
If there is something wrong in step 1 after any particular epoch, then train your model until that epoch only, because your model is over-fitting after that.

Training Loss and Validation Loss in Deep Learning

Would you please guide me how to interpret the following results?
1) loss < validation_loss
2) loss > validation_loss
It seems that the training loss always should be less than validation loss. But, both of these cases happen when training a model.
Really a fundamental question in machine learning.
If validation loss >> training loss you can call it overfitting.
If validation loss > training loss you can call it some overfitting.
If validation loss < training loss you can call it some underfitting.
If validation loss << training loss you can call it underfitting.
Your aim is to make the validation loss as low as possible.
Some overfitting is nearly always a good thing. All that matters in the end is: is the validation loss as low as you can get it.
This often occurs when the training loss is quite a bit lower.
Also check how to prevent overfitting.
In machine learning and deep learning there are basically three cases
1) Underfitting
This is the only case where loss > validation_loss, but only slightly, if loss is far higher than validation_loss, please post your code and data so that we can have a look at
2) Overfitting
loss << validation_loss
This means that your model is fitting very nicely the training data but not at all the validation data, in other words it's not generalizing correctly to unseen data
3) Perfect fitting
loss == validation_loss
If both values end up to be roughly the same and also if the values are converging (plot the loss over time) then chances are very high that you are doing it right
1) Your model performs better on the training data than on the unknown validation data. A bit of overfitting is normal, but higher amounts need to be regulated with techniques like dropout to ensure generalization.
2) Your model performs better on the validation data. This can happen when you use augmentation on the training data, making it harder to predict in comparison to the unmodified validation samples. It can also happen when your training loss is calculated as a moving average over 1 epoch, whereas the validation loss is calculated after the learning phase of the same epoch.
Aurélien Geron made a good Twitter thread about this phenomenon. Summary:
Regularization is typically only applied during training, not validation and testing. For example, if you're using dropout, the model has fewer features available to it during training.
Training loss is measured after each batch, while the validation loss is measured after each epoch, so on average the training loss is measured ½ an epoch earlier. This means that the validation loss has the benefit of extra gradient updates.
the val set can be easier than the training set. For example, data augmentations often distort or occlude parts of the image. This can also happen if you get unlucky during sampling (val set has too many easy classes, or too many easy examples), or if your val set is too small. Or, the train set leaked into the val set.
If your validation loss is less than your training loss, you have not correctly split the training data. This correctly indicates that the distribution of the training and validation sets is different. It should ideally be the same. MOROVER, Good Fit: In the ideal case, the training and validation losses both drop and stabilize at specified points, indicating an optimal fit, i.e. a model that does neither overfit or underfit.

Should a neural network be able to have a perfect train accuracy?

The title says it all: Should a neural network be able to have a perfect train accuracy? Mine saturates at ~0.9 accuracy and I am wondering if that indicates a problem with my network or the training data.
Training instances: ~4500 sequences with an average length of 10 elements.
Network: Bi-directional vanilla RNN with a softmax layer on top.
Perfect accuracy on training data is usually a sign of a phenomenon called overfitting (https://en.wikipedia.org/wiki/Overfitting) and the model may generalize poorly to unseen data. So, no, probably this alone is not an indication that there is something wrong (you could still be overfitting but it is not possible to tell from the information in your question).
You should check the accuracy of the NN on the validation set (data your network has not seen during training) and judge its generalizability. usually it's an iterative process where you train many networks with different configurations in parallel and see which one performs best on the validation set. Also see cross validation (https://en.wikipedia.org/wiki/Cross-validation_(statistics))
If you have low measurement noise, a model may still not get zero training error. This could be for many reasons including that the model is not flexible enough to capture the true underlying function (which can be a complicated, high-dimensional, non-linear function). You can try increasing the number of hidden layers and nodes but you have to be careful about the same things like overfitting and only judge based on evaluation through cross validation.
You can definitely get a 100% accuracy on training datasets by increasing model complexity but I would be wary of that.
You cannot expect your model to be better on your test set than on your training set. This means if your training accuracy is lower than the desired accuracy, you have to change something. Most likely you have to increase the number of parameters of your model.
The reason why you might be ok with not having a perfect training accuracy is (1) the problem of overfitting (2) training time. The more complex your model is, the more likely is overfitting.
You might want to have a look at Structural Risc Minimization:
(source: svms.org)

SGD model "overconfidence"

I'm working on binary classification problem using Apache Mahout. The algorithm I use is OnlineLogisticRegression and the model which I currently have strongly tends to produce predictions which are either 1 or 0 without any middle values.
Please suggest a way to tune or tweak the algorithm to make it produce more intermediate values in predictions.
Thanks in advance!
What is the test error rate of the classifier? If it's near zero then being confident is a feature, not a bug.
If the test error rate is high (or at least not low), then the classifier might be overfitting the training set: measure the difference between of the training error and the test error. In that case, increasing regularization as rrenaud suggested might help.
If your classifier is not overfitting, then there might be an issue with the probability calibration. Logistic Regression models (e.g. using the logit link function) should yield good enough probability calibrations (if the problem is approximately linearly separable and the label not too noisy). You can check the calibration of the probabilities with a plot as explained in this paper. If this is really a calibration issue, then implementing a custom calibration based on Platt scaling or isotonic regression might help fix the issue.
From reading the Mahout AbstractOnlineLogisticRegression docs, it looks like you can control the regularization parameter lambda. Increasing lambda should mean your weights are closer to 0, and hence your predictions are more hedged.

Resources