How is the validation accuracy in Keras determined for every epoch? - machine-learning

This is in relation to a related post here.
Is the validation data evaluated on the model which gives the 0.9381 training accuracy or is it based on also splitting the validation data across the 500 steps per epoch, then taking the mean validation accuracy across all steps ?

Your training accuracy is evaluated after every batch.
The validation accuracy is calculated at the end of the Epoch.
If you want to test it, you can create a custom callback (https://keras.io/callbacks/). There is a method on_batch_end used for training accuracy and on_epoch_end used for the validation datas. If you save within the callback the accuracy and plot it, you will see the evolution.
You can see below, for example, the evolution of accuracy of 4 RNN cells after every batch on 1 Epoch. As the result was extremely noisy, I've added a sliding average. The star is the validation score at the end of the Epoch.

Related

The dilemma of overfitting in NN training

My question is in continuation to the one asked by another user: What's is the difference between train, validation and test set, in neural networks?
Once learning is over by terminating when the minimum MSE is reached by looking at the validation and train set performance (easy to do so using nntool box in Matlab), then using the trained net structure if the performance of the unseen test set is slightly poor than the training set we have an overfitting problem. I am always encountering this case eventhough the model for which during learning the parameters corresponding to validation and train set having nearly same performance is selected. Then how come the test set performance is worse than the train set?
Training data= Data we use to train our model.
Validation data= Data we use to test our model on every-epoch or on run-time So that we can early stop our model manually because of over-fitting or any other model. Now Suppose I am running 1000 epochs on my model and on 500 epochs I view that my model is giving 90% accuracy on training data and 70% accuracy on validation data. Now I can see that my model is over-fitting. I can manually stop my training and before 1000 epochs complete and tune my model more and than see the behavior.
Testing data= Now after completing my training on model after computing 1000 epochs. I will predict my test data and see the accuracy on test data. its giving 86%
My training accuracy is 90% validation accuracy is 87% and testing accuracy is 86%. this may vary because data in validation set, training set and testing set are totally different. We have 70% samples in training set 10% validation and 20% testing set. Now on my validation my model is predicting 8 images correctly and on testing my model predicting 18 images correctly out of 100. Its normal in real life projects because pixels in every image are varying form the other image thats why a little difference may happen.
In testing set their are more images than validation set that may be one reason. Because more the images more the risk of wrong prediction. e.g on 90% accuracy
my model predict 90 out of 100 correctly but if I increase the image sample to 1000 than my model may predict (850, 800 or 900) images correctly out 1000 on

How can I voluntarily overfit my model for text classification

I would like to show an example of a model that overfit a test set and does not generalize well on future data.
I split the news dataset in 3 sets:
train set length: 11314
test set length: 5500
future set length: 2031
I am using a text dataset and build a CountVectorizer.
I am creating a grid search (without cross-validation), each loop will test some parameters on the vectorizer ('min_df','max_df') and some parameter on my model LogisticRegression ('C', 'fit_intercept', 'tol', ...).
The best result I get is:
({'binary': False, 'max_df': 1.0, 'min_df': 1},
{'C': 0.1, 'fit_intercept': True, 'tol': 0.0001},
test set score: 0.64018181818181819,
training set score: 0.92902598550468451)
but now if I run it on the future set I will get a score similar to the test set:
clf.score(X_future, y_future): 0.6509108813392418
How can I demonstrate a case where I overfitted my test set so it does not generalize well to future data?
You have a model trained on some data "train set".
Performing a classification task on these data you get a score of 92%.
Then you take new data, not seen during the training, such as "test set" or "future set".
Performing a classification task on any of these unseen dataset you get a score of 65%.
This is exactly the definition of a model which is overfitting: it has a very high variance, a big difference in the performance between seen and unseen data.
By the way, taking into account your specific case, some parameter choices which could cause overfitting are the following:
min_df = 0
high C value for logistic regression (whihc means low regularization)
I wrote a comment on alsora's answer but I think I really should expand on it as an actual answer.
As I said, there is no way to "over-fit" the test set because over-fit implies something negative. A theoretical model that fits the test set at 92% but fits the training set to only 65% is a very good model indeed (assuming your sets are balanced).
I think what you are referring to as your "test set" might actually be a validation set, and your "future set" is actually the test set. Lets clarify.
You have a set of 18,845 examples. You divide them into 3 sets.
Training set: The examples the model gets to look at and learn off of. Every time your model makes a guess from this set you tell it whether it was right or wrong and it adjusts accordingly.
Validation set: After every epoch (time running through the training set), you check the model on these examples, which its never seen before. You compare the training loss and training accuracy to the validation loss and validation accuracy. If training accuracy > validation accuracy or training loss < validation loss, then your model is over-fitting and training should stop to avoid over-fitting. You can either stop it early (early-stopping) or add dropout. You should not give feedback to your model based on examples from the validation set. As long as you follow the above rule and as long as your validation set is well-mixed, you can't over-fit this data.
Testing set: Used to assess the accuracy of your model once training has completed. This is the one that matters because its based on examples your model has never seen before. Again you can't over-fit this data.
Of your 18,845 examples you have 11314 in training set, 5500 in the validation set, and 2031 in the testing set.

How does the model weights are modified in ML?

I have been reading this interesting link Linear Regression - SGD
and i have got question on below statement.
" The way this optimization algorithm works is that each training instance is shown to the model one at a time. The model makes a prediction for a training instance, the error is calculated and the model is updated in order to reduce the error for the next prediction. This process is repeated for a fixed number of iterations."
Question:
Is my below pseudo code correct?
for each training input:
1) Input to Model
2) Find the prediction
3) Find the error
4) Update Model.
What i don't understand is "This process is repeated for a fixed number of iterations" . Does it mean step 4) and step 3) is repeated until the error is minimized?
Correct me if i am wrong?
"This process is repeated for a fixed number of iterations." means that you choose the number of epochs or the number of batches send to you network to train it.
When you train your network you have a training dataset. You give your network (with placeholders) iages and labels associated with these inputs (generally you give samples (input + label) by batches).
It makes a prediction for each input and computes the error (the loss function you uses). And then it tunes weights (and biases) to minimize the loss function (it does what is called a gradient descent).
You should tale a look at Gradient Descent here : http://sebastianruder.com/optimizing-gradient-descent/
You are the one deciding how long you want your network to train by fixing the number of time your whole training set is going to be send to your network (what's called an epoch) or the number of batches.
Hope it helps

What does inconsistent test results mean?

I'm doing some research on CNN for text classification using tensorflow. When I run my model I get a very high training accuracy (arround 100%). However, on test split I get an inconsistent accuracy results (sometimes 11% and sometimes 90%).
Moreover, I noticed also that the loss in training is decreasing until it reaches small numbers like 0.000499564048368, while in testing it is not and sometimes it gets high values like 70. What does this mean? Any ideas?
If you get very high training accuracy and bad testing accuracy, you are almost definitely overfitting. To get a better picture of what your models real accuracy is, use cross-validation.
Cross validation splits the dataset into a training and validation set, and does this multiple times, slightly changing the training and validation data each time. This is beneficial because it can prevent scenarios where you train your model on one label, and it can't accurately identify another one. For example, picture a training set like this:
Feature1, Feature2, Label
x y 0
a y 0
b c 1
If we train the model only on the first two datapoints, it will not be able to identify the third datapoint because it is not built generally.

Overfitting and Data splitting

Let's say that I have a data file like:
Index,product_buying_date,col1,col2
0,2013-01-16,34,Jack
1,2013-01-12,43,Molly
2,2013-01-21,21,Adam
3,2014-01-09,54,Peirce
4,2014-01-17,38,Goldberg
5,2015-01-05,72,Chandler
..
..
2000000,2015-01-27,32,Mike
with some more data and I have a target variable y. Assume something as per your convenience.
Now I am aware that we divide the data into 2 parts i.e. Train and Test. And then we divide Train into 70:30, build the model with 70% and validate it with 30%. We tune the parameters so that model does not get overfit. And then predict with the Test data. For example: I divide 2000000 into two equal parts. 1000000 is train and I divide it in validate i.e. 30% of 1000000 which is 300000 and 70% is where I build the model i.e. 700000.
QUESTION: Is the above logic depending upon how the original data splits?
Generally we shuffle the data and then break it into train, validate and test. (train + validate = Train). (Please don't confuse here)
But what if the split is alternate. Like When I divide it in Train and Test first, I give even rows to Test and odd rows to Train. (Here data is initially sort on the basis of 'product_buying_date' column so when i split it in odd and even rows it gets uniformly split.
And when I build the model with Train I overfit it so that I get maximum AUC with Test data.
QUESTION: Isn't overfitting helping in this case?
QUESTION: Is the above logic depending upon how the original data
splits?
If dataset is large(hundred of thousand), you can randomly split the data and you should not have any problem but if dataset is small then you can adopt the different approaches like cross-validation to generate the data set. Cross-validation states that you split you make n number of training-validation set out of your Training set.
suppose you have 2000 data points, you split like
1000 - Training dataset
1000 - testing dataset.
5-cross validation would mean that you would make five 800/200 training/validation dataset.
QUESTION: Isn't overfitting helping in this case?
Number one rule of the machine learning is that, you don't touch the test data set. It's a holly data set that should not be touched.
If you overfit the test data to get maximum AUC score then there won't be any meaning of validation dataset. Foremost aim of any ml algorithm is to reduce the generalization error i.e. algorithm should be able to perform good on unseen data. If you would tune your algorithm with testing data. you won't be able to meet this criteria. In cross-validation also you do not touch your testing set. you select your algorithm. tune its parameter with validation dataset and after you have done with that apply your algorithm to test dataset which is your final score.

Resources