When should I stop training my model?

When should I stop training my model? - machine-learning

I've implemented deep CNN and have this log:
Iter 2300, Minibatch Loss 2535.55078125, Batch Accuracy 0.800000011920929
Test accuracy = 0.7236111164093018
Iter 2400, Minibatch Loss 2402.5517578125, Batch Accuracy 0.699999988079071
Test accuracy = 0.8097222182485794
Iter 2500, Minibatch Loss 1642.6527099609375, Batch Accuracy 0.8999999761581421
Test accuracy = 0.8311110999849107
Iter 2600, Minibatch Loss 4008.334716796875, Batch Accuracy 0.8999999761581421
Test accuracy = 0.8463888929949868
Iter 2700, Minibatch Loss 2555.335205078125, Batch Accuracy 0.800000011920929
Test accuracy = 0.8077777789698706
Iter 2800, Minibatch Loss 1188.008056640625, Batch Accuracy 0.8999999761581421
Test accuracy = 0.8074999981456332
Iter 2900, Minibatch Loss 426.5060119628906, Batch Accuracy 0.8999999761581421
Test accuracy = 0.7513888908757105
Iter 3000, Minibatch Loss 5560.1845703125, Batch Accuracy 0.699999988079071
Test accuracy = 0.8733333349227907
Iter 3100, Minibatch Loss 3904.02490234375, Batch Accuracy 0.8999999761581421
Test accuracy = 0.817222214407391
Iter 3110, Minibatch Loss 9638.71875, Batch Accuracy 0.8333333134651184
Test accuracy = 0.8238888879617057
My question is: should I wait when training will be finished for some reason or I can stop when test accuracy is highest? It is 0.8733333349227907 there.

You can stop when the test accuracy stops increasing or starts decreasing. This is called early stopping and is straightforward to implement. XGBoost, Keras and many libraries have this functionality as an option: https://keras.io/callbacks/#earlystopping
Try to plot the intermediate values, it will give you important insights of the training process. Please see http://cs231n.github.io/neural-networks-3/#accuracy.

Related

How is the training accuracy in Keras determined for every epoch?

I am training a model in Keras with as follows:
model.fit(Xtrn, ytrn batch_size=16, epochs=50, verbose=1, shuffle=True,
callbacks=[model_checkpoint], validation_data=(Xval, yval))
The fitting output looks as follows:
As shown in the model.fit I have a batch size of 16 and a total of 8000 training samples as shown in the output. So from my understanding, training takes place every 16 batches. Which also means training is ran 500 times for a single epoch (i.e., 8000/16 =500)
So let's take the training accuracy printed in the output for Epoch 1/50, which in this case is 0.9381. I would like to know how is this training accuracy of 0.9381 derived.
Is it the:
Is the mean training accuracy, taken as the average from the 500 times training, performed for every batch?
OR,
Is it the best (or max) training accuracy from out of the 500 instances the training procedure is run?

Take a look at the BaseLogger in Keras where they're computing a running mean.
For each epoch the accuracy is the average of all the batches seen before in that epoch.
class BaseLogger(Callback):
"""Callback that accumulates epoch averages of metrics.
This callback is automatically applied to every Keras model.
"""
def on_epoch_begin(self, epoch, logs=None):
self.seen = 0
self.totals = {}
def on_batch_end(self, batch, logs=None):
logs = logs or {}
batch_size = logs.get('size', 0)
self.seen += batch_size
for k, v in logs.items():
if k in self.totals:
self.totals[k] += v * batch_size
else:
self.totals[k] = v * batch_size
def on_epoch_end(self, epoch, logs=None):
if logs is not None:
for k in self.params['metrics']:
if k in self.totals:
# Make value available to next callbacks.
logs[k] = self.totals[k] / self.seen

Training accuracy increases aggresively, test accuracy settles

While training a convolutional neural network following this article, the accuracy of the training set increases too much while the accuracy on the test set settles.
Below is an example with 6400 training examples, randomly chosen at each epoch (so some examples might be seen at the previous epochs, some might be new), and 6400 same test examples.
For a bigger data set (64000 or 100000 training examples), the increase in training accuracy is even more abrupt, going to 98 on the third epoch.
I also tried using the same 6400 training examples each epoch, just randomly shuffled. As expected, the result is worse.
epoch 3 loss 0.54871 acc 79.01
learning rate 0.1
nr_test_examples 6400
TEST epoch 3 loss 0.60812 acc 68.48
nr_training_examples 6400
tb 91
epoch 4 loss 0.51283 acc 83.52
learning rate 0.1
nr_test_examples 6400
TEST epoch 4 loss 0.60494 acc 68.68
nr_training_examples 6400
tb 91
epoch 5 loss 0.47531 acc 86.91
learning rate 0.05
nr_test_examples 6400
TEST epoch 5 loss 0.59846 acc 68.98
nr_training_examples 6400
tb 91
epoch 6 loss 0.42325 acc 92.17
learning rate 0.05
nr_test_examples 6400
TEST epoch 6 loss 0.60667 acc 68.10
nr_training_examples 6400
tb 91
epoch 7 loss 0.38460 acc 95.84
learning rate 0.05
nr_test_examples 6400
TEST epoch 7 loss 0.59695 acc 69.92
nr_training_examples 6400
tb 91
epoch 8 loss 0.35238 acc 97.58
learning rate 0.05
nr_test_examples 6400
TEST epoch 8 loss 0.60952 acc 68.21
This is my model (I'm using RELU activation after each convolution):
conv 5x5 (1, 64)
max-pooling 2x2
dropout
conv 3x3 (64, 128)
max-pooling 2x2
dropout
conv 3x3 (128, 256)
max-pooling 2x2
dropout
conv 3x3 (256, 128)
dropout
fully_connected(18*18*128, 128)
dropout
output(128, 128)
What could be the cause?
I'm using Momentum Optimizer with learning rate decay:
batch = tf.Variable(0, trainable=False)
train_size = 6400
learning_rate = tf.train.exponential_decay(
0.1, # Base learning rate.
batch * batch_size, # Current index into the dataset.
train_size*5, # Decay step.
0.5, # Decay rate.
staircase=True)
# Use simple momentum for the optimization.
optimizer = tf.train.MomentumOptimizer(learning_rate,
0.9).minimize(cost, global_step=batch)

This is very much expected. This problem is called over-fitting. This is when your model starts "memorizing" the training examples without actually learning anything useful for the Test set. In fact, this is exactly why we use a test set in the first place. Since if we have a complex enough model we can always fit the data perfectly, even if not meaningfully. The test set is what tells us what the model has actually learned.
Its also useful to use a Validation set which is like a test set, but you use it to find out when to stop training. When the Validation error stops lowering you stop training. why not use the test set for this? The test set is to know how well your model would do in the real world. If you start using information from the test set to choose things about your training process, than its like your cheating and you will be punished by your test error no longer representing your real world error.
Lastly, convolutional neural networks are notorious for their ability to over-fit. It has been shown the Conv-nets can get zero training error even if you shuffle the labels and even random pixels. That means that there doesn't have to be a real pattern for the Conv-net to learn to represent it. This means that you have to regularize a conv-net. That is, you have to use things like Dropout, batch normalization, early stopping.
I'll leave a few links if you want to read more:
Over-fitting, validation, early stopping
https://elitedatascience.com/overfitting-in-machine-learning
Conv-nets fitting random labels:
https://arxiv.org/pdf/1611.03530.pdf
(this paper is a bit advanced, but its interresting to skim through)
P.S. to actually improve your test accuracy you will need to change your model or train with data augmentation. You might want to try transfer learning as well.

Training accuracy on SGD

How do you compute for the training accuracy for SGD? Do you compute it using the batch data you trained your network with? Or using the entire dataset? (for each batch optimization iteration)
I tried computing the training accuracy for each iteration using the batch data I trained my network with. And it almost always gives me 100% training accuracy (sometimes 100%, 90%, 80%, always multiples of 10%, but the very first iteration gave me 100%). Is this because I am computing the accuracy on the same batch data I trained it with for that iteration? Or is my model overfitting that it gave me 100% instantly, but the validation accuracy is low? (this is the main question here, if this is acceptable, or there is something wrong with the model)
Here are the hyperparameters I used.
batch_size = 64
kernel_size = 60 #from 60 #optimal 2
depth = 15 #from 60 #optimal 15
num_hidden = 1000 #from 1000 #optimal 80
learning_rate = 0.0001
training_epochs = 8
total_batches = train_x.shape[0] // batch_size

Calculating the training accuracy on the batch data during the training process is correct. If the number of the accuracy is always multiple of 10%, then most likely it is because your batch size is 10. For example, if 8 of the training outputs match the labels, then your training accuracy will be 80%. If the training accuracy number goes up and down, there are two main possibilities:
1. If you print out the accuracy numbers multiple time over one epoch, it is normal, especially at the early stage of training, because the model is predicting over different data samples;
2. If you print out the accuracy once each epoch, and if you see the training accuracy goes up and down during the later stage of the training, that means your learning rate is too big. You need to decease that overtime during the training.
If these do not answer your question, please provider more details so that we can help.

Why the `Train net output` loss and `iteration loss` are the same during training with caffe?

I am training AlexNet on my own data using caffe. One of the issues I see is that the "Train net output" loss and "iteration loss" are nearly the same in the training process. Moreover, this loss has fluctuations.
like:
...
...Iteration 900, loss 0.649719
... Train net output #0: loss = 0.649719 (* 1 = 0.649719 loss )
... Iteration 900, lr = 0.001
...Iteration 1000, loss 0.892498
... Train net output #0: loss = 0.892498 (* 1 = 0.892498 loss )
... Iteration 1000, lr = 0.001
...Iteration 1100, loss 0.550938
... Train net output #0: loss = 0.550944 (* 1 = 0.550944 loss )
... Iteration 1100, lr = 0.001
...
should I see this fluctuation?
As you see the difference between reported losses are not significant. Does it show a problem in my training?
my solver is:
net: "/train_val.prototxt"
test_iter: 1999
test_interval: 10441
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 100
max_iter: 208820
momentum: 0.9
weight_decay: 0.0005
snapshot: 10441
snapshot_prefix: "/caffe_alexnet_train"
solver_mode: GPU

Caffe uses Stochastic Gradient Descent (SGD) method for training the net. In the long run, the loss decreases, however, locally, it is perfectly normal for the loss to fluctuate a bit.
The reported "iteration loss" is the weighted sum of all loss layers of your net, averaged over average_loss iterations. On the other hand, the reported "train net output..." reports each net output from the current iteration only.
In your example, you did not set average_loss in your 'solver', and thus average_loss=1 by default. Since you only have one loss output with loss_weight=1 the reported "train net output..." and "iteration loss" are the same (up to display precision).
To conclude: your output is perfectly normal.

How to evaluate the result is good or not in caffe?

I train my data set using caffe. I set (in slover.prototxt):
test_iter: 1000
test interval: 1000
max_iter: 450000
base_lr: 0.0001
lr_policy: "step"
step_size: 100000
The test accuracy is around 0.02 and test loss is around 1.6 at the first test. Then the test accuracy increase and the test loss decrease every test.
At iter 32000 the test accuracy is 1 and the test loss is 0.45.
Then the accuracy decrease and the loss increase.
I think the loss is too large when accuracy is 1.
How do I know the result I got is good or not?
Is there any method I can use to make an evaluation?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart