So i have seen differing implementations of cross validation.
I'm currently using pytorch to train a neural network.
My current layout looks like this:
I have 6 discrete Datasets. 5 are used for cross validation.
Network_1 trains on Datasets: 1,2,3,4 computes loss on 5
Network_2 trains on Datasets: 1,2,3,5 computes loss on 4
Network_3 trains on Datasets: 1,2,4,5 computes loss on 3
Network_4 trains on Datasets: 1,3,4,5 computes loss on 2
Network_5 trains on Datasets: 2,3,4,5 computes loss on 1
Then comes epoch 2 and i do the exact same again:
Network_1 trains on Datasets: 1,2,3,4 computes loss on 5
Network_2 trains on Datasets: 1,2,3,5 computes loss on 4
Network_3 trains on Datasets: 1,2,4,5 computes loss on 3
Network_4 trains on Datasets: 1,3,4,5 computes loss on 2
Network_5 trains on Datasets: 2,3,4,5 computes loss on 1
For testing on the Dataset 6 i should merge the predictions from all 5 networks and take the average score of the prediction (still have to do the averaging of the prediction matrices).
Have i understood cross validation correctly? Is this how it's supposed to work? Will this work properly?
I put effort on not testing with data that i already trained on. I still dont
Would greatly appreciate the help :)
You can definitely apply cross validation with neural network, but because neural network are computationally demanding models this is not usually done. To reduce variance, there are other techniques which are ordinarily applied in neural networks, such as early stopping or dropout.
That being said, I am not sure you're applying it in the right way. You should train across all the epochs, so that:
Network_1 trains on Datasets: 1,2,3,4 up to the end of training. Then computes loss on 5
Network_2 trains on Datasets: 1,2,3,5 up to the end of training. Then computes loss on 4
Network_3 trains on Datasets: 1,2,4,5 up to the end of training. Then computes loss on 3
Network_4 trains on Datasets: 1,3,4,5 up to the end of training. Then computes loss on 2
Network_5 trains on Datasets: 2,3,4,5 up to the end of training. Then computes loss on 1
Once each network is trained up to the end of training (so across all the epochs), and validated on the left-out dataset (called validation dataset), you can average the scores you obtained.
This score (and indeed the real point of cross validation) should give you a fair evaluation of your model, which should not drop when you're going to test your it on the test set (the one you left out from training from the beginning).
Cross validation is usually used in pair with some form of grid search to produce an unbiased form of evaluation of different models you want to compare. So if you want for example to compare NetworkA and NetworkB which differ with respect to some parameters, you use cross validation for NetworkA, cross validation for NetworkB, and then take that one having the highest cross validation score as final model.
As last step, once you decided which is the best model, you usually retrain your model taking all the data you have in the train set (i.e. datasets 1,2,3,4,5 in your case) and test this model on the test set (Dataset 6).
I am trying to do a transfer learning with ResNet50V2 model using triplet loss function. I have kept Include_top = False, input shape = (160,160,3) with Imagenet weights. The last 3 layers of my model is shown in the below image with 6 million trainable parameters.
During the training process, I could see the loss function values reducing from 7.6 to 0.8 but the accuracy does not improve. But when I replace the model with VGG16 and while training the last 3 layers, the accuracy improves from 50% to 90% along with loss value reducing from 6.0 to 0.5.
Where am I going wrong ? Is there anything specific I should look at while training resnet model ? How to train the resnet model ?
I've not checked the training accuracy and losses after training using both the approaches.
If I understood your question correctly, it's a yes.
For example, I programmed my model to pick up the best epoch during training and output that state for test set.
Let's say I train during 10 epochs, each very "first time" (achieved by re-starting the Kernel) it will always choose 9 or 10 epoch, but If reuse the model and train for another 10 epochs, usually it chooses from 0 to 4 as best epoch, also the results are slightly better now. Hence it is telling me that the model is taking the first 10 into account. Furthermore these results are consistent with training for 20 epochs, it chooses from 10 to 14 as best epoch.
Im new to the world of Deep Learning and i would like to clarify something on my 1st Deep learning code, the MNIST example. Maybe also i'm completely wrong BTW so please take it easy :)
I have split the training data to batches, each one with a size of 50 and max epochs to 15 (or until the validation loss variable starts increasing).
I am getting 93% accuracy just on the 1st epoch, how is that possible if (as far as i know) on 1st epoch it has forward and backpropogate the complete training set just 1 time, so the training set have only abjust its weights and biases only once?
I thought i would get a fine accuracy after many epochs not just on 1st abjustance of the weights
Yes..you can get a good accuracy in the first epoch as well. It depends more on the complexity of the data and the model you build. sometimes if the learning rate is too high, than also it could so happen you get a higher training accuracy.
Also, coming to the adjusting weights and biases part, it could be a mini-batch training and for every mini-batch, the model updates the weights. So weights could have updated many times which is equal to number of training data images/ sample size
I'm working on relation classification with the SemEval2010 Task 8 dataset. The dataset is already split into 8'000 samples for the training and 2'717 for the testing. In order to be as fair as possible, I use only my model at the end to computing its performance (F1-Score).
In order to tune my convolutional neural networks, I keep 6'400 samples for the training and 1'600 for the validation. I train the model and after each epoch (~10' of computation) I compute the F1-Score of my predictions.
I read the paper http://page.mi.fu-berlin.de/prechelt/Biblio/stop_tricks1997.pdf and stop training when the last 3 performances were increasing (similar to UP in the paper). In the paper, they return the model corresponding to the best performance seen so far.
My question is : in order to be as accurate as possible, we need the whole 8'000 samples for the training. Is it correct to say we will train until the epoch which had the best performance on the validation set and then do the predictions ? Or should we save the model corresponding to the best performance and "waste" 1'600 samples ?