Accuracy on 1st epoch - MNIST Deep Learning example - machine-learning

Im new to the world of Deep Learning and i would like to clarify something on my 1st Deep learning code, the MNIST example. Maybe also i'm completely wrong BTW so please take it easy :)
I have split the training data to batches, each one with a size of 50 and max epochs to 15 (or until the validation loss variable starts increasing).
I am getting 93% accuracy just on the 1st epoch, how is that possible if (as far as i know) on 1st epoch it has forward and backpropogate the complete training set just 1 time, so the training set have only abjust its weights and biases only once?
I thought i would get a fine accuracy after many epochs not just on 1st abjustance of the weights

Yes..you can get a good accuracy in the first epoch as well. It depends more on the complexity of the data and the model you build. sometimes if the learning rate is too high, than also it could so happen you get a higher training accuracy.
Also, coming to the adjusting weights and biases part, it could be a mini-batch training and for every mini-batch, the model updates the weights. So weights could have updated many times which is equal to number of training data images/ sample size

Related

Keras loss changes on epoch end

I am training a model in Keras (tensorflow2 backend) using the imagedatagenerator class to train on batches.
I have noticed that when the second epoch starts the loss value is really smaller than the value at the end of the first epoch.
Here is what I mean:
Keras loss
Note that the starting value in the second epoch is around the value that you are seeing in the screenshot.
Does anyone knows why that happens?
Does keras updates again the weights when all batches are processed ?
The loss is expected to be smaller, but your surprise to the extent of it is understandable.
The reason the second epoch has such a lower loss is because during the first epoch your model makes mistakes and yields great losses - which get better and better.
Keras displays the mean loss over all instances in an epoch.
So if the model made mistakes on the first 90% of the training set in epoch, and then was perfect for the last 10% of the data, the loss would still be very large because it's mean loss.
Then, at the start of 2nd epoch, model is already better at predicting, so the mean loss is lower.

Train Accuracy increases, Train loss is stable, Validation loss Increases, Validation Accuracy is low and increases

My neural network trainign in pytorch is getting very wierd.
I am training a known dataset that came splitted into train and validation.
I'm shuffeling the data during training and do data augmentation on the fly.
I have those results:
Train accuracy start at 80% and increases
Train loss decreases and stays stable
Validation accuracy start at 30% but increases slowly
Validation loss increases
I have the following graphs to show:
How can you explain that the validation loss increases and the validation accuracy increases?
How can be such a big difference of accuracy between validation and training sets? 90% and 40%?
Update:
I balanced the data set.
It is binary classification. It now has now 1700 examples from class 1, 1200 examples from class 2. Total 600 for validation and 2300 for training.
I still see similar behavior:
**Can it be becuase I froze the weights in part of the network?
**Can it be becuase the hyperparametrs like lr?
I found the solution:
I had different data augmentation for training set and validation set. Matching them also increased the validation accuracy!
If the training set is very large in comparison to the validation set, you are more likely to overfit and learn the training data, which would make generalizing the model very difficult. I see your training accuracy is at 0.98 and your validation accuracy increases at a very slow rate, which would imply that you have overfit your training data.
Try reducing the number of samples in your training set to improve how well your model generalizes to unseen data.
Let me answer your 2nd question first. High accuracy on training data and low accuracy on val/test data indicates the model might not generalize well to infer real cases. That is what the validation process is all about. You need to finetune or even rebuild your model.
With regard to the first question, val loss might not necessarily correspond to the val accuracy. The model makes the prediction based on its model, and loss function calculates the difference between probablities of matrix and the target if you are using CrossEntropy function.

Why does my neural network never overfit?

I am training a deep residual network with 10 hidden layers with game data.
Does anyone have an idea why I don't get any overfitting here?
Training and test loss still decreasing after 100 epochs of training.
https://imgur.com/Tf3DIZL
Just a couple of advice:
for deep learning is recommended to do even 90/10 or 95/5 splitting (Andrew Ng)
this small difference between curves means that your learning_rate is not tuned; try to increase it (and, probably, number of epochs if you will implement some kind of 'smart' lr-reduce)
it is also reasonable for DNN to try to overfit with the small amount of data (10-100 rows) and an enormous number of iterations
check for data leakage in the set: weights analysis inside each layer may help you in this

ResNet How to achieve accuracy as in the document?

I implement the ResNet for the cifar 10 in accordance with this document https://arxiv.org/pdf/1512.03385.pdf
But my accuracy is significantly different from the accuracy obtained in the document
My - 86%
Pcs daughter - 94%
What's my mistake?
https://github.com/slavaglaps/ResNet_cifar10
Your question is a little bit too generic, my opinion is that the network is over fitting to the training data set, as you can see the training loss is quite low, but after the epoch 50 the validation loss is not improving anymore.
I didn't read the paper in deep so I don't know how did they solved the problem but increasing regularization might help. The following link will point you in the right direction http://cs231n.github.io/neural-networks-3/
below I copied the summary of the text:
Summary
To train a Neural Network:
Gradient check your implementation with a small batch of data and be aware of the pitfalls.
As a sanity check, make sure your initial loss is reasonable, and that you can achieve 100% training accuracy on a very small portion of
the data
During training, monitor the loss, the training/validation accuracy, and if you’re feeling fancier, the magnitude of updates in relation to
parameter values (it should be ~1e-3), and when dealing with ConvNets,
the first-layer weights.
The two recommended updates to use are either SGD+Nesterov Momentum or Adam.
Decay your learning rate over the period of the training. For example, halve the learning rate after a fixed number of epochs, or
whenever the validation accuracy tops off.
Search for good hyperparameters with random search (not grid search). Stage your search from coarse (wide hyperparameter ranges,
training only for 1-5 epochs), to fine (narrower rangers, training for
many more epochs)
Form model ensembles for extra performance
I would argue that the difference in data pre processing makes the difference in performance. He is using padding and random crops, which in essence increases the amount of training samples and decreases the generalization error. Also as the previous poster said you are missing regularization features, such as the weight decay.
You should take another look at the paper and make sure you implement everything like they did.

Meaning of an Epoch in Neural Networks Training

while I'm reading in how to build ANN in pybrain, they say:
Train the network for some epochs. Usually you would set something
like 5 here,
trainer.trainEpochs( 1 )
I looked for what is that mean , then I conclude that we use an epoch of data to update weights, If I choose to train the data with 5 epochs as pybrain advice, the dataset will be divided into 5 subsets, and the wights will update 5 times as maximum.
I'm familiar with online training where the wights are updated after each sample data or feature vector, My question is how to be sure that 5 epochs will be enough to build a model and setting the weights probably? what is the advantage of this way on online training? Also the term "epoch" is used on online training, does it mean one feature vector?
One epoch consists of one full training cycle on the training set. Once every sample in the set is seen, you start again - marking the beginning of the 2nd epoch.
This has nothing to do with batch or online training per se. Batch means that you update once at the end of the epoch (after every sample is seen, i.e. #epoch updates) and online that you update after each sample (#samples * #epoch updates).
You can't be sure if 5 epochs or 500 is enough for convergence since it will vary from data to data. You can stop training when the error converges or gets lower than a certain threshold. This also goes into the territory of preventing overfitting. You can read up on early stopping and cross-validation regarding that.
sorry for reactivating this thread.
im new to neural nets and im investigating the impact of 'mini-batch' training.
so far, as i understand it, an epoch (as runDOSrun is saying) is a through use of all in the TrainingSet (not DataSet. because DataSet = TrainingSet + ValidationSet). in mini batch training, you can sub divide the TrainingSet into small Sets and update weights inside an epoch. 'hopefully' this would make the network 'converge' faster.
some definitions of neural networks are outdated and, i guess, must be redefined.
The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset. One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters.

Resources