I have been reading up on the log output Restarting data prefetching from start. Apparently, it means that one has not enough data and the data is prefetched* from the start. However, my dataset exists of 10.000 data samples and my batch size is 4. How is it possible that it has to prefetch the data, since my batch size is 4 which means it will take 4 data samples per iteration. Can anyone clarify my understanding?
LOG:
I0409 20:33:35.053406 20072 data_layer.cpp:73] Restarting data prefetching from start.
I0409 20:33:35.053447 20074 data_layer.cpp:73] Restarting data prefetching from start.
I0409 20:33:40.320605 20074 data_layer.cpp:73] Restarting data prefetching from start.
I0409 20:33:40.320598 20072 data_layer.cpp:73] Restarting data prefetching from start.
I0409 20:33:45.591019 20072 data_layer.cpp:73] Restarting data prefetching from start.
I0409 20:33:45.591047 20074 data_layer.cpp:73] Restarting data prefetching from start.
I0409 20:33:49.392580 20034 solver.cpp:398] Test net output #0: loss = nan (* 1 = nan loss)
I0409 20:33:49.780678 20034 solver.cpp:219] Iteration 0 (-4.2039e-45 iter/s, 20.1106s/100 iters), loss = 54.0694
I0409 20:33:49.780731 20034 solver.cpp:238] Train net output #0: loss = 54.0694 (* 1 = 54.0694 loss)
I0409 20:33:49.780750 20034 sgd_solver.cpp:105] Iteration 0, lr = 0.0001
I0409 20:34:18.812854 20034 solver.cpp:219] Iteration 100 (3.44442 iter/s, 29.0325s/100 iters), loss = 21.996
I0409 20:34:18.813213 20034 solver.cpp:238] Train net output #0: loss = 21.996 (* 1 = 21.996 loss)
If you have 10,000 samples and you process them in batches of size 4, it means that after 10,000/4=2,500 iterations you will process all your data and caffe will start over reading the data from the beginning.
BTW, going over all samples once is also referred to as an "epoch".
After every epoch caffe will print to the log
Restarting data prefetching from start
Related
I'm trying to train a small CNN from scratch to classify images of 10 different animal species. The images have different dimensions, but I'd say around 300x300. Anyway, every image is resized to 224x224 before going into the model.
Here is the network I'm training:
# Convolution 1
self.cnn1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=0)
self.relu1 = nn.ReLU()
# Max pool 1
self.maxpool1 = nn.MaxPool2d(kernel_size=2)
# Convolution 2
self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=0)
self.relu2 = nn.ReLU()
# Max pool 2
self.maxpool2 = nn.MaxPool2d(kernel_size=2)
# Fully connected 1
self.fc1 = nn.Linear(32 * 54 * 54, 10)
I'm using a SGD optimizer with fixed learning rate = 0.005 and weight decay = 0.01. I'm using a cross entropy function.
The accuracy of the model is good (around 99% after the 43-th epoch). However:
in some epoch I get a 'nan' as training loss
in some other epoch the accuracy drops significantly (sometimes the two happen in the same epoch). However, in the next epoch the accuracy comes back to a normal level.
If I understood it correctly a nan in training loss most of the times is caused by gradient values getting too small (underflow) or too big (overflow). Could this be the case?
Should I try by increasing the weight decay to 0.05? Or should I do gradient clipping to avoid exploding gradients? If so which would be a reasonable bound?
Still I don't understand the second issue.
I am new to PyTorch and I'm trying to build a simple neural net for classification. The problem is the network doesn't learn at all. I tried various learning rate ranging from 0.3 to 1e-8 and I also tried training it for a longer duration. My data is small with only 120 training examples and the batch size is 16. Here is the code
Define network
model = nn.Sequential(nn.Linear(4999, 1000),
nn.ReLU(),
nn.Linear(1000,200),
nn.ReLU(),
nn.Linear(200,1),
nn.Sigmoid())
Loss and optimizer
import torch.optim as optim
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.BCELoss(reduction="mean")
Training
num_epochs = 100
for epoch in range(num_epochs):
cumulative_loss = 0
for i, data in enumerate(batch_gen(X_train, y_train, batch_size=16)):
inputs, labels = data
inputs = torch.from_numpy(inputs).float()
labels = torch.from_numpy(labels).float()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
cumulative_loss += loss.item()
if i%5 == 0 and i != 0:
print(f"epoch {epoch} batch {i} => Loss: {cumulative_loss/5}")
print("Finished Training!!")
Any help is appreciated!
The reason your loss doesn't seem to decrease every epoch is because you're not printing it every epoch. You're actually printing it every 5th batch. And the loss does not decrease a lot per batch.
Try the following. Here, loss every epoch will be printed.
num_epochs = 100
for epoch in range(num_epochs):
cumulative_loss = 0
for i, data in enumerate(batch_gen(X_train, y_train, batch_size=16)):
inputs, labels = data
inputs = torch.from_numpy(inputs).float()
labels = torch.from_numpy(labels).float()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
cumulative_loss += loss.item()
print(f"epoch {epoch} => Loss: {cumulative_loss}")
print("Finished Training!!")
One reason that your loss doesn't decrease could be because your neural-net isn't deep enough to learn anything. So, trying add more layers.
model = nn.Sequential(nn.Linear(4999, 3000),
nn.ReLU(),
nn.Linear(3000,200),
nn.ReLU(),
nn.Linear(2000,1000),
nn.ReLU(),
nn.Linear(500,250),
nn.ReLU(),
nn.Linear(250,1),
nn.Sigmoid())
Also, I just noticed you're passing data that has very high dimensionality. You have 4999 features/columns and only 120 training examples/rows. Converging a model with so less data is next to impossible (considering you have very high dimensional data).
I'd suggest you try finding more rows or perform dimensionality reduction on your input data (like PCA) to reduce the feature space (to maybe 50/100 or lesser features) and then try again. Chances are that your model still won't converge but it's worth a try.
While training a convolutional neural network following this article, the accuracy of the training set increases too much while the accuracy on the test set settles.
Below is an example with 6400 training examples, randomly chosen at each epoch (so some examples might be seen at the previous epochs, some might be new), and 6400 same test examples.
For a bigger data set (64000 or 100000 training examples), the increase in training accuracy is even more abrupt, going to 98 on the third epoch.
I also tried using the same 6400 training examples each epoch, just randomly shuffled. As expected, the result is worse.
epoch 3 loss 0.54871 acc 79.01
learning rate 0.1
nr_test_examples 6400
TEST epoch 3 loss 0.60812 acc 68.48
nr_training_examples 6400
tb 91
epoch 4 loss 0.51283 acc 83.52
learning rate 0.1
nr_test_examples 6400
TEST epoch 4 loss 0.60494 acc 68.68
nr_training_examples 6400
tb 91
epoch 5 loss 0.47531 acc 86.91
learning rate 0.05
nr_test_examples 6400
TEST epoch 5 loss 0.59846 acc 68.98
nr_training_examples 6400
tb 91
epoch 6 loss 0.42325 acc 92.17
learning rate 0.05
nr_test_examples 6400
TEST epoch 6 loss 0.60667 acc 68.10
nr_training_examples 6400
tb 91
epoch 7 loss 0.38460 acc 95.84
learning rate 0.05
nr_test_examples 6400
TEST epoch 7 loss 0.59695 acc 69.92
nr_training_examples 6400
tb 91
epoch 8 loss 0.35238 acc 97.58
learning rate 0.05
nr_test_examples 6400
TEST epoch 8 loss 0.60952 acc 68.21
This is my model (I'm using RELU activation after each convolution):
conv 5x5 (1, 64)
max-pooling 2x2
dropout
conv 3x3 (64, 128)
max-pooling 2x2
dropout
conv 3x3 (128, 256)
max-pooling 2x2
dropout
conv 3x3 (256, 128)
dropout
fully_connected(18*18*128, 128)
dropout
output(128, 128)
What could be the cause?
I'm using Momentum Optimizer with learning rate decay:
batch = tf.Variable(0, trainable=False)
train_size = 6400
learning_rate = tf.train.exponential_decay(
0.1, # Base learning rate.
batch * batch_size, # Current index into the dataset.
train_size*5, # Decay step.
0.5, # Decay rate.
staircase=True)
# Use simple momentum for the optimization.
optimizer = tf.train.MomentumOptimizer(learning_rate,
0.9).minimize(cost, global_step=batch)
This is very much expected. This problem is called over-fitting. This is when your model starts "memorizing" the training examples without actually learning anything useful for the Test set. In fact, this is exactly why we use a test set in the first place. Since if we have a complex enough model we can always fit the data perfectly, even if not meaningfully. The test set is what tells us what the model has actually learned.
Its also useful to use a Validation set which is like a test set, but you use it to find out when to stop training. When the Validation error stops lowering you stop training. why not use the test set for this? The test set is to know how well your model would do in the real world. If you start using information from the test set to choose things about your training process, than its like your cheating and you will be punished by your test error no longer representing your real world error.
Lastly, convolutional neural networks are notorious for their ability to over-fit. It has been shown the Conv-nets can get zero training error even if you shuffle the labels and even random pixels. That means that there doesn't have to be a real pattern for the Conv-net to learn to represent it. This means that you have to regularize a conv-net. That is, you have to use things like Dropout, batch normalization, early stopping.
I'll leave a few links if you want to read more:
Over-fitting, validation, early stopping
https://elitedatascience.com/overfitting-in-machine-learning
Conv-nets fitting random labels:
https://arxiv.org/pdf/1611.03530.pdf
(this paper is a bit advanced, but its interresting to skim through)
P.S. to actually improve your test accuracy you will need to change your model or train with data augmentation. You might want to try transfer learning as well.
I've implemented deep CNN and have this log:
Iter 2300, Minibatch Loss 2535.55078125, Batch Accuracy 0.800000011920929
Test accuracy = 0.7236111164093018
Iter 2400, Minibatch Loss 2402.5517578125, Batch Accuracy 0.699999988079071
Test accuracy = 0.8097222182485794
Iter 2500, Minibatch Loss 1642.6527099609375, Batch Accuracy 0.8999999761581421
Test accuracy = 0.8311110999849107
Iter 2600, Minibatch Loss 4008.334716796875, Batch Accuracy 0.8999999761581421
Test accuracy = 0.8463888929949868
Iter 2700, Minibatch Loss 2555.335205078125, Batch Accuracy 0.800000011920929
Test accuracy = 0.8077777789698706
Iter 2800, Minibatch Loss 1188.008056640625, Batch Accuracy 0.8999999761581421
Test accuracy = 0.8074999981456332
Iter 2900, Minibatch Loss 426.5060119628906, Batch Accuracy 0.8999999761581421
Test accuracy = 0.7513888908757105
Iter 3000, Minibatch Loss 5560.1845703125, Batch Accuracy 0.699999988079071
Test accuracy = 0.8733333349227907
Iter 3100, Minibatch Loss 3904.02490234375, Batch Accuracy 0.8999999761581421
Test accuracy = 0.817222214407391
Iter 3110, Minibatch Loss 9638.71875, Batch Accuracy 0.8333333134651184
Test accuracy = 0.8238888879617057
My question is: should I wait when training will be finished for some reason or I can stop when test accuracy is highest? It is 0.8733333349227907 there.
You can stop when the test accuracy stops increasing or starts decreasing. This is called early stopping and is straightforward to implement. XGBoost, Keras and many libraries have this functionality as an option: https://keras.io/callbacks/#earlystopping
Try to plot the intermediate values, it will give you important insights of the training process. Please see http://cs231n.github.io/neural-networks-3/#accuracy.
I am training AlexNet on my own data using caffe. One of the issues I see is that the "Train net output" loss and "iteration loss" are nearly the same in the training process. Moreover, this loss has fluctuations.
like:
...
...Iteration 900, loss 0.649719
... Train net output #0: loss = 0.649719 (* 1 = 0.649719 loss )
... Iteration 900, lr = 0.001
...Iteration 1000, loss 0.892498
... Train net output #0: loss = 0.892498 (* 1 = 0.892498 loss )
... Iteration 1000, lr = 0.001
...Iteration 1100, loss 0.550938
... Train net output #0: loss = 0.550944 (* 1 = 0.550944 loss )
... Iteration 1100, lr = 0.001
...
should I see this fluctuation?
As you see the difference between reported losses are not significant. Does it show a problem in my training?
my solver is:
net: "/train_val.prototxt"
test_iter: 1999
test_interval: 10441
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 100
max_iter: 208820
momentum: 0.9
weight_decay: 0.0005
snapshot: 10441
snapshot_prefix: "/caffe_alexnet_train"
solver_mode: GPU
Caffe uses Stochastic Gradient Descent (SGD) method for training the net. In the long run, the loss decreases, however, locally, it is perfectly normal for the loss to fluctuate a bit.
The reported "iteration loss" is the weighted sum of all loss layers of your net, averaged over average_loss iterations. On the other hand, the reported "train net output..." reports each net output from the current iteration only.
In your example, you did not set average_loss in your 'solver', and thus average_loss=1 by default. Since you only have one loss output with loss_weight=1 the reported "train net output..." and "iteration loss" are the same (up to display precision).
To conclude: your output is perfectly normal.