So i have seen differing implementations of cross validation.
I'm currently using pytorch to train a neural network.
My current layout looks like this:
I have 6 discrete Datasets. 5 are used for cross validation.
Network_1 trains on Datasets: 1,2,3,4 computes loss on 5
Network_2 trains on Datasets: 1,2,3,5 computes loss on 4
Network_3 trains on Datasets: 1,2,4,5 computes loss on 3
Network_4 trains on Datasets: 1,3,4,5 computes loss on 2
Network_5 trains on Datasets: 2,3,4,5 computes loss on 1
Then comes epoch 2 and i do the exact same again:
Network_1 trains on Datasets: 1,2,3,4 computes loss on 5
Network_2 trains on Datasets: 1,2,3,5 computes loss on 4
Network_3 trains on Datasets: 1,2,4,5 computes loss on 3
Network_4 trains on Datasets: 1,3,4,5 computes loss on 2
Network_5 trains on Datasets: 2,3,4,5 computes loss on 1
For testing on the Dataset 6 i should merge the predictions from all 5 networks and take the average score of the prediction (still have to do the averaging of the prediction matrices).
Have i understood cross validation correctly? Is this how it's supposed to work? Will this work properly?
I put effort on not testing with data that i already trained on. I still dont
Would greatly appreciate the help :)
You can definitely apply cross validation with neural network, but because neural network are computationally demanding models this is not usually done. To reduce variance, there are other techniques which are ordinarily applied in neural networks, such as early stopping or dropout.
That being said, I am not sure you're applying it in the right way. You should train across all the epochs, so that:
Network_1 trains on Datasets: 1,2,3,4 up to the end of training. Then computes loss on 5
Network_2 trains on Datasets: 1,2,3,5 up to the end of training. Then computes loss on 4
Network_3 trains on Datasets: 1,2,4,5 up to the end of training. Then computes loss on 3
Network_4 trains on Datasets: 1,3,4,5 up to the end of training. Then computes loss on 2
Network_5 trains on Datasets: 2,3,4,5 up to the end of training. Then computes loss on 1
Once each network is trained up to the end of training (so across all the epochs), and validated on the left-out dataset (called validation dataset), you can average the scores you obtained.
This score (and indeed the real point of cross validation) should give you a fair evaluation of your model, which should not drop when you're going to test your it on the test set (the one you left out from training from the beginning).
Cross validation is usually used in pair with some form of grid search to produce an unbiased form of evaluation of different models you want to compare. So if you want for example to compare NetworkA and NetworkB which differ with respect to some parameters, you use cross validation for NetworkA, cross validation for NetworkB, and then take that one having the highest cross validation score as final model.
As last step, once you decided which is the best model, you usually retrain your model taking all the data you have in the train set (i.e. datasets 1,2,3,4,5 in your case) and test this model on the test set (Dataset 6).
I am new to machine learning, I have built a model that predicts if a client will subscribe in the following month or not. I got 73.4 on the training set and 72.8 on the test set. is it okay? or do I have Overfitting?
It's ok.
Overfitting happens when the accuracy in the training set in higher and the accuracy in the test set is lower (with a marginal difference).
This is what overfitting looks like.
Train accuracy: 99.4%
Test accuracy: 71.4%
You can, however, increase the accuracy using different models and feature engineering
We call it as over-fitting,If the accuracy of training data is abnormally higher (greater than 95%) and accuracy of test data is very low (less than 65%).
In your case,both training and testing accuracy are almost similar.So there is no over-fitting.
Try for more test data and check whether the accuracy is decreasing or not.You can also try to improve the model by
Trying different algorithms
Increasing the size of train data
Trying K-fold cross validation
Hyper parameter tuning
Using Regularization methods
Standardizing feature variables
When training in Caffe, there are Train and Test net outputs for each iteration. I know this is the loss. However, is this the average loss over my batch or the total loss? And is this the same for both Classification and Regression?
For example, if I were to have a batch of 100 training examples and my loss over that iteration is 100, does that mean that the average loss per example is 1?
Train loss is the averaged loss over the last training batch. That means that if you have 100 training examples in your mini-batch and your loss over that iteration is 100, then you have the average loss per example equals to 100.
Test loss is also an averaged loss but over all the test batches. You specify the test batch size and the number of testing iterations. Caffe will take #iter of such mini-batches, evaluate loss for them and provide you an averaged value. If #test_iter x batch_size == testset_size, you will have an averaged value across the full test set.
I am using convolutional neural networks (via Keras) as my model for facial expression recognition (55 subjects). My data set is quite hard and around 450k with 7 classes. I have balanced my training set per subject and per class label.
I implemented a very simple CNN architecture (with real-time data augmentation):
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode=borderMode, init=initialization, input_shape=(48, 48, 3)))
model.add(BatchNormalization())
model.add(PReLU())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256))
model.add(BatchNormalization())
model.add(PReLU())
model.add(Dropout(0.5))
model.add(Dense(nb_output))
model.add(Activation('softmax'))
After first epoch, my training loss decreases constantly while validation loss increases. Could overfitting happen that soon? Or is there a problem with my data being confusing? Should I also balance my testing set?
It could be that the task is easy to solve and after one epoch the model has learned enough to solve it, and training for more epochs just increases overfitting.
But if you have balanced the train set and not the test set, what may be happening is that you are training for one task (expression recognition on evenly distributed data) and then you are testing on a slightly different task, because the test set is not balanced.
I'm aware that "accuracy" isn't what measured against the training set for a neural network during training, but I'd like to know, essentially
what would happen if I stop trianing now and try to evaluate training set in terms of accuracy
at various points during training of a TensorFlow network being trained with dropout.
Can this question be answered simply by running with the training data and keep_prob == 1.0, that is with something like
sess.run(accuracy, feed_dict={x: train_x, y_: train_y, keep_prob: 1.0})