When to stop training in caffe? - machine-learning

I am using bvlc_reference_caffenet for training. I am doing both training and testing. Below is an example log of my trained network:
I0430 11:49:08.408740 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:21.221074 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:34.038710 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:46.816813 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:56.630870 23334 solver.cpp:397] Test net output #0: accuracy = 0.932502
I0430 11:49:56.630940 23334 solver.cpp:397] Test net output #1: loss = 0.388662 (* 1 = 0.388662 loss)
I0430 11:49:57.218236 23334 solver.cpp:218] Iteration 71000 (0.319361 iter/s, 62.625s/20 iters), loss = 0.00146191
I0430 11:49:57.218300 23334 solver.cpp:237] Train net output #0: loss = 0.00146191 (* 1 = 0.00146191 loss)
I0430 11:49:57.218308 23334 sgd_solver.cpp:105] Iteration 71000, lr = 0.001
I0430 11:50:09.168726 23334 solver.cpp:218] Iteration 71020 (1.67357 iter/s, 11.9505s/20 iters), loss = 0.000806865
I0430 11:50:09.168778 23334 solver.cpp:237] Train net output #0: loss = 0.000806868 (* 1 = 0.000806868 loss)
I0430 11:50:09.168787 23334 sgd_solver.cpp:105] Iteration 71020, lr = 0.001
I0430 11:50:21.127496 23334 solver.cpp:218] Iteration 71040 (1.67241 iter/s, 11.9588s/20 iters), loss = 0.000182312
I0430 11:50:21.127539 23334 solver.cpp:237] Train net output #0: loss = 0.000182314 (* 1 = 0.000182314 loss)
I0430 11:50:21.127562 23334 sgd_solver.cpp:105] Iteration 71040, lr = 0.001
I0430 11:50:33.248086 23334 solver.cpp:218] Iteration 71060 (1.65009 iter/s, 12.1206s/20 iters), loss = 0.000428604
I0430 11:50:33.248260 23334 solver.cpp:237] Train net output #0: loss = 0.000428607 (* 1 = 0.000428607 loss)
I0430 11:50:33.248272 23334 sgd_solver.cpp:105] Iteration 71060, lr = 0.001
I0430 11:50:45.518955 23334 solver.cpp:218] Iteration 71080 (1.62989 iter/s, 12.2707s/20 iters), loss = 0.00108446
I0430 11:50:45.519006 23334 solver.cpp:237] Train net output #0: loss = 0.00108447 (* 1 = 0.00108447 loss)
I0430 11:50:45.519011 23334 sgd_solver.cpp:105] Iteration 71080, lr = 0.001
I0430 11:50:51.287315 23341 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:50:57.851781 23334 solver.cpp:218] Iteration 71100 (1.62169 iter/s, 12.3328s/20 iters), loss = 0.00150949
I0430 11:50:57.851828 23334 solver.cpp:237] Train net output #0: loss = 0.0015095 (* 1 = 0.0015095 loss)
I0430 11:50:57.851837 23334 sgd_solver.cpp:105] Iteration 71100, lr = 0.001
I0430 11:51:09.912184 23334 solver.cpp:218] Iteration 71120 (1.65832 iter/s, 12.0604s/20 iters), loss = 0.00239335
I0430 11:51:09.912330 23334 solver.cpp:237] Train net output #0: loss = 0.00239335 (* 1 = 0.00239335 loss)
I0430 11:51:09.912340 23334 sgd_solver.cpp:105] Iteration 71120, lr = 0.001
I0430 11:51:21.968586 23334 solver.cpp:218] Iteration 71140 (1.65888 iter/s, 12.0563s/20 iters), loss = 0.00161807
I0430 11:51:21.968646 23334 solver.cpp:237] Train net output #0: loss = 0.00161808 (* 1 = 0.00161808 loss)
I0430 11:51:21.968654 23334 sgd_solver.cpp:105] Iteration 71140, lr = 0.001
What confuses me is the loss. I was going to stop training my network when loss goes below 0.0001 but there are two losses: training loss and test loss. Training loss seems to stay around 0.0001 but test loss is at 0.388 which is way above the threshold I set. Which one do I use to stop my training?

Having such a large gap between test and train performance might indicate that you over-fit your data.
The purpose of validation set is to make sure you do not overfit. You should use the performance on the validation set to decide whether to stop training or precede.

In general, you want to stop training when your validation accuracy hits a plateau. Your data above indicates that you have, indeed, over-trained your model.
Ideally, the training, testing, and validation error should be roughly equal. In practice, this rarely happens.
Note that the loss is not a good metric, unless your loss function and weights are the same for all phases of evaluation. For instance, GoogleNet weights the training loss function across three layers, but the validation test worries only about final accuracy.

Related

CNN model for RGB images giving 0% accuracy

I am trying to train a CNN model on CelebA (RGB images) dataset. But, when I train the model and check its accuracy it is 0% or close to 0%. I think the issue is in the ConNeuralNet function or the hyperparameters but due to my limited knowledge I'm not sure what I'm missing here. Can someone please help. Thanks
# Creating a simple network
class ConvNeuralNet(torch.nn.Module):
def __init__(self, num_classes=10178):
super(ConvNeuralNet, self).__init__()
self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
self.max_pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.conv_layer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv_layer4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3)
self.max_pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.fc1 = nn.Linear(13312, 128)
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
out = self.conv_layer1(x)
out = self.conv_layer2(out)
out = self.max_pool1(out)
out = self.conv_layer3(out)
out = self.conv_layer4(out)
out = self.max_pool2(out)
out = out.reshape(out.size(0), -1)
out = self.fc1(out)
out = self.relu1(out)
out = self.fc2(out)
return F.log_softmax(out,dim=-1)
def trainTorch(torch_model, train_loader, test_loader,
nb_epochs=NB_EPOCHS, batch_size=BATCH_SIZE, train_end=-1, test_end=-1, learning_rate=LEARNING_RATE, optimizer=None):
train_loss = []
total = 0
correct = 0
step = 0
for _epoch in range(nb_epochs):
for xs, ys in train_loader:
xs, ys = Variable(xs), Variable(ys)
if torch.cuda.is_available():
xs, ys = xs.cuda(), ys.cuda()
optimizer.zero_grad()
preds = torch_model(xs)
preds = F.log_softmax(preds, dim=1)
loss = F.cross_entropy(preds, ys)
loss.backward()
train_loss.append(loss.data.item())
optimizer.step() # update gradients
preds_np = preds.cpu().detach().numpy()
correct += (np.argmax(preds_np, axis=1) == ys.cpu().detach().numpy()).sum()
total += train_loader.batch_size
step += 1
if total % 1000 == 0:
acc = float(correct) / total
print('[%s] Training accuracy: %.2f%%' % (step, acc * 100))
total = 0
correct = 0
nb_epochs = 8
image_size = 64
batch_size = 64
num_classes = 10178
learning_rate = 0.001
num_epochs = 8
# Device will determine whether to run the training on GPU or CPU.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
trans = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='train', target_type='identity', transform=trans, download="True"),
batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='test', target_type='identity', transform=trans),
batch_size=batch_size)
#Training the model
print("Training Model")
# Set optimizer with optimizer
optimizer = torch.optim.SGD(model1.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)
total_step = len(train_loader)
trainTorch(model1, train_loader, test_loader, nb_epochs, batch_size, train_end, test_end, learning_rate, optimizer = optimizer)
**Update I ran the code for a bit to see if it would start converging. One thing is that there are over 10,000 classes. With a batch size of 64 this means that it will take more than 150 mini-batches before your model has seen every class in your dataset. You certanly shouldn't expect the model to start achieving accurate predictions within a few hundred steps.
When I printed the loss value I noticed it was decreasing very slowly. I changed to learning rate to 0.01 and it started decreasing faster.
Also, your model is very shallow for a face recognition model. You're better off using something like a resnet variant (e.g. resnet-50 or resnet-101 from torchvision), rather than custom rolling your own model.
Primary changes include
Learning rate increased
Fix the loss function
Remove log_softmax from output of model
Add activation to the conv layers
IMO the comments about softmax are a bit misleading since you don't need to softmax the output of your model if you are using cross_entropy. You also don't need softmax to get the argmax of the prediction since both softmax and log_softmax don't change the relative ordering of the predictions (i.e. both softmax and log are strictly increasing functions).
IMO the comment about using average pooling to reduce the input size of the first fc layer is a good one and may improve performance, but you'll need to experiment with that one to find good parameters for it so I left it out of this answer.
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torchvision import datasets, transforms
# Creating a simple network
class ConvNeuralNet(torch.nn.Module):
def __init__(self, num_classes=10178):
super(ConvNeuralNet, self).__init__()
self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
self.max_pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.conv_layer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv_layer4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3)
self.max_pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.fc1 = nn.Linear(13312, 128)
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
# note the relu activations on the conv layers
out = F.relu(self.conv_layer1(x))
out = F.relu(self.conv_layer2(out))
out = self.max_pool1(out)
out = F.relu(self.conv_layer3(out))
out = F.relu(self.conv_layer4(out))
out = self.max_pool2(out)
# you may want an adaptive average pool 2d here to reduce size of feature map further
out = out.reshape(out.size(0), -1)
out = self.fc1(out)
out = self.relu1(out)
out = self.fc2(out)
# return raw logits, not log-softmax output
return out
def trainTorch(torch_model, train_loader, test_loader, nb_epochs, batch_size, learning_rate, optimizer):
train_loss = []
total = 0
correct = 0
step = 0
for _epoch in range(nb_epochs):
for xs, ys in train_loader:
# the Variable interface has been deprecated for years, it is effectively a no-op in modern pytorch
# see: https://pytorch.org/docs/stable/autograd.html#variable-deprecated
if torch.cuda.is_available():
xs, ys = xs.cuda(), ys.cuda()
optimizer.zero_grad()
logits = torch_model(xs)
# don't softmax or log-softmax the inputs to cross_entropy
loss = F.cross_entropy(logits, ys)
# The following is equivalent but less numerically stable
# loss = F.nll_loss(F.log_softmax(logits), ys)
loss.backward()
train_loss.append(loss.item())
optimizer.step() # update gradients
logits_np = logits.cpu().detach().numpy()
correct += (np.argmax(logits_np, axis=1) == ys.cpu().detach().numpy()).sum()
total += train_loader.batch_size
step += 1
if step % 200 == 0:
acc = float(correct) / total
avg_loss = sum(train_loss) / len(train_loss)
print(f'[{step}] Training accuracy: {acc*100:.2f}% Training loss: {avg_loss:.4f}')
total = 0
correct = 0
train_loss = []
nb_epochs = 8
image_size = 64
batch_size = 64
num_classes = 10178
# increased learning rate to 0.01
learning_rate = 0.01
num_epochs = 8
# Device will determine whether to run the training on GPU or CPU.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
trans = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='train', target_type='identity', transform=trans, download=True),
batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='test', target_type='identity', transform=trans),
batch_size=batch_size)
model = ConvNeuralNet(num_classes)
if torch.cuda.is_available():
model.cuda()
#Training the model
print("Training Model")
# Set optimizer with optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay=0.005, momentum=0.9)
total_step = len(train_loader)
trainTorch(model, train_loader, test_loader, nb_epochs, batch_size, learning_rate, optimizer=optimizer)
Output
Training Model
[200] Training accuracy: 0.00% Training loss: 9.2286
[400] Training accuracy: 0.02% Training loss: 9.2286
[600] Training accuracy: 0.04% Training loss: 9.2265
[800] Training accuracy: 0.00% Training loss: 9.2253
[1000] Training accuracy: 0.00% Training loss: 9.2222
[1200] Training accuracy: 0.00% Training loss: 9.2105
[1400] Training accuracy: 0.02% Training loss: 9.1776
[1600] Training accuracy: 0.03% Training loss: 9.1329
[1800] Training accuracy: 0.02% Training loss: 9.1013
[2000] Training accuracy: 0.02% Training loss: 9.0830
[2200] Training accuracy: 0.02% Training loss: 9.0715
[2400] Training accuracy: 0.01% Training loss: 9.0622
[2600] Training accuracy: 0.02% Training loss: 9.0456
[2800] Training accuracy: 0.00% Training loss: 9.0301
[3000] Training accuracy: 0.00% Training loss: 9.0357
[3200] Training accuracy: 0.02% Training loss: 9.0402
[3400] Training accuracy: 0.02% Training loss: 9.0321
[3600] Training accuracy: 0.02% Training loss: 9.0217
[3800] Training accuracy: 0.02% Training loss: 8.9757
[4000] Training accuracy: 0.09% Training loss: 8.9059
[4200] Training accuracy: 0.09% Training loss: 8.8331
[4400] Training accuracy: 0.09% Training loss: 8.7601
[4600] Training accuracy: 0.09% Training loss: 8.7356
[4800] Training accuracy: 0.10% Training loss: 8.6717
[5000] Training accuracy: 0.12% Training loss: 8.6311
[5200] Training accuracy: 0.16% Training loss: 8.5515
[5400] Training accuracy: 0.16% Training loss: 8.4943
[5600] Training accuracy: 0.14% Training loss: 8.4345
[5800] Training accuracy: 0.14% Training loss: 8.4107
[6000] Training accuracy: 0.18% Training loss: 8.3317
[6200] Training accuracy: 0.22% Training loss: 8.2716
[6400] Training accuracy: 0.31% Training loss: 8.1934
[6600] Training accuracy: 0.30% Training loss: 8.1500
[6800] Training accuracy: 0.35% Training loss: 8.0979
[7000] Training accuracy: 0.21% Training loss: 8.0739
[7200] Training accuracy: 0.44% Training loss: 8.0220
[7400] Training accuracy: 0.29% Training loss: 7.9819
From the output we see the loss is decreasing and the accuracy is starting to increase. Its hard to predict how well this will work and when it will converge but this is a good start. You'll probably need to use a better model and a learning rate scheduler to get better performance.
For example, just switching for a resnet-50
model = torchvision.models.resnet50(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, num_classes)
The model starts converging much faster
Training Model
[200] Training accuracy: 0.05% Training loss: 9.1942
[400] Training accuracy: 0.05% Training loss: 8.9244
[600] Training accuracy: 0.15% Training loss: 8.5936
[800] Training accuracy: 0.30% Training loss: 8.3147
[1000] Training accuracy: 0.39% Training loss: 8.0745
[1200] Training accuracy: 0.43% Training loss: 7.9146
[1400] Training accuracy: 0.45% Training loss: 7.7706
[1600] Training accuracy: 0.64% Training loss: 7.6551
[1800] Training accuracy: 0.68% Training loss: 7.5784
[2000] Training accuracy: 0.74% Training loss: 7.5327
[2200] Training accuracy: 0.72% Training loss: 7.4689
[2400] Training accuracy: 0.63% Training loss: 7.4378
[2600] Training accuracy: 0.83% Training loss: 7.3789
[2800] Training accuracy: 0.90% Training loss: 7.2812
[3000] Training accuracy: 0.84% Training loss: 7.2771
[3200] Training accuracy: 0.96% Training loss: 7.2536
[3400] Training accuracy: 1.00% Training loss: 7.2538

Linear regression model accuracy is always 1.0 in tensorflow

Problem:
I am building a model that will predict housing price. So, firstly I
decided to build a Linear regression model in Tensorflow. But when I
start training I see that my accuracy is always 1
I am new to machine learning. Please, someone, tell me what's going wrong I can't figure it out. I searched in google but doesn't find any answer that solves my problem.
Here's my code
df_train = df_train.loc[:, ['OverallQual', 'GrLivArea', 'GarageArea', 'SalePrice']]
df_X = df_train.loc[:, ['OverallQual', 'GrLivArea', 'GarageArea']]
df_Y = df_train.loc[:, ['SalePrice']]
df_yy = get_dummies(df_Y)
print("Shape of df_X: ", df_X.shape)
X_train, X_test, y_train, y_test = train_test_split(df_X, df_yy, test_size=0.15)
X_train = np.asarray(X_train).astype(np.float32)
X_test = np.asarray(X_test).astype(np.float32)
y_train = np.asarray(y_train).astype(np.float32)
y_test = np.asarray(y_test).astype(np.float32)
X = tf.placeholder(tf.float32, [None, num_of_features])
y = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.zeros([num_of_features, 1]))
b = tf.Variable(tf.zeros([1]))
prediction = tf.add(tf.matmul(X, W), b)
num_epochs = 20000
# calculating loss
cost = tf.reduce_mean(tf.losses.softmax_cross_entropy(onehot_labels=y, logits=prediction))
optimizer = tf.train.GradientDescentOptimizer(0.00001).minimize(cost)
correct_prediction = tf.equal(tf.argmax(prediction, axis=1), tf.argmax(y, axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(num_epochs):
if epoch % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={X: X_train, y: y_train})
print('step %d, training accuracy %g' % (epoch, train_accuracy))
optimizer.run(feed_dict={X: X_train, y: y_train})
print('test accuracy %g' % accuracy.eval(feed_dict={
X: X_test, y: y_test}))
Output is:
step 0, training accuracy 1
step 100, training accuracy 1
step 200, training accuracy 1
step 300, training accuracy 1
step 400, training accuracy 1
step 500, training accuracy 1
step 600, training accuracy 1
step 700, training accuracy 1
............................
............................
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 1
EDIT:
I changed my cost function to this
cost = tf.reduce_sum(tf.pow(prediction-y, 2))/(2*1241)
But still my output is always 1.
EDIT 2:
In response to lejlot comment:
Thanks lejlot. I changed my accuracy code to this
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter("/tmp/hpp1")
writer.add_graph(sess.graph)
for epoch in range(num_epochs):
if epoch % 5:
s = sess.run(merged_summary, feed_dict={X: X_train, y: y_train})
writer.add_summary(s, epoch)
sess.run(optimizer,feed_dict={X: X_train, y: y_train})
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X: X_train, y: y_train})
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=", sess.run(b))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={X: X_train, y: y_train})
print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
But the output is all nan
Output:
....................................
Epoch: 19900 cost= nan W= nan b= nan
Epoch: 19950 cost= nan W= nan b= nan
Epoch: 20000 cost= nan W= nan b= nan
Optimization Finished!
Training cost= nan W= nan b= nan
You want to use linear regression, but you actually use logistic regression. Take a look at tf.losses.softmax_cross_entropy: it outputs a probability distribution, i.e. a vector of numbers that sum up to 1. In your case, the vector has size=1, hence it always outputs [1].
Here are two examples that will help you see the difference: linear regression and logistic regression.

Caffe's test accuracy during validation phase being constant when training a network

I wonder why my test accuracy keeps on getting a constant value of 0.5. I use CaffeNet network with only change in the fully connected layer's parameter where I configured num_output: 2.
My training set contains 1000 positive and 1000 negative examples whereas my validation set has 1000 positive and 1000 negative examples as well. The dataset contains images of person (whole body RGB colored). I've defined a mean file and scale value in the data layer. My network is trained to learn a person or not (binary classifier).
A snippet of my solver information looks like below:
test_iter: 80
test_interval: 10
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 20
display: 10
max_iter: 80
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
The training output is as follows:
I0228 11:49:27.411556 3422 solver.cpp:274] Learning Rate Policy: step
I0228 11:49:27.590368 3422 solver.cpp:331] Iteration 0, Testing net (#0)
I0228 11:53:29.203058 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 11:57:59.969632 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 11:58:26.602972 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 11:58:26.602999 3422 solver.cpp:398] Test net output #1: loss = 0.726503 (* 1 = 0.726503 loss)
I0228 12:00:03.892771 3422 solver.cpp:219] Iteration 0 (-6.49109e-41 iter/s, 636.481s/10 iters), loss = 0.961699
I0228 12:00:03.892915 3422 solver.cpp:238] Train net output #0: loss = 0.961699 (* 1 = 0.961699 loss)
I0228 12:00:03.892925 3422 sgd_solver.cpp:105] Iteration 0, lr = 0.01
I0228 12:04:28.831887 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:13:36.909935 3422 solver.cpp:331] Iteration 10, Testing net (#0)
I0228 12:17:36.894516 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:22:00.724030 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:22:27.375306 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 12:22:27.375334 3422 solver.cpp:398] Test net output #1: loss = 0.698973 (* 1 = 0.698973 loss)
I0228 12:23:56.072116 3422 solver.cpp:219] Iteration 10 (0.00698237 iter/s, 1432.18s/10 iters), loss = 0.696559
I0228 12:23:56.072247 3422 solver.cpp:238] Train net output #0: loss = 0.696558 (* 1 = 0.696558 loss)
I0228 12:23:56.072252 3422 sgd_solver.cpp:105] Iteration 10, lr = 0.01
I0228 12:25:23.664594 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:37:08.202978 3422 solver.cpp:331] Iteration 20, Testing net (#0)
I0228 12:41:05.859966 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:45:28.599306 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:45:55.524168 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 12:45:55.524190 3422 solver.cpp:398] Test net output #1: loss = 0.693187 (* 1 = 0.693187 loss)
I0228 12:45:55.553427 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:47:24.159780 3422 solver.cpp:219] Iteration 20 (0.00710183 iter/s, 1408.09s/10 iters), loss = 0.690313
I0228 12:47:24.159914 3422 solver.cpp:238] Train net output #0: loss = 0.690313 (* 1 = 0.690313 loss)
I0228 12:47:24.159920 3422 sgd_solver.cpp:105] Iteration 20, lr = 0.001
I0228 12:57:31.167225 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:00:23.671567 3422 solver.cpp:331] Iteration 30, Testing net (#0)
I0228 13:04:14.114737 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:08:30.406244 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:08:56.273648 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 13:08:56.273674 3422 solver.cpp:398] Test net output #1: loss = 0.696971 (* 1 = 0.696971 loss)
I0228 13:10:28.487870 3422 solver.cpp:219] Iteration 30 (0.00722373 iter/s, 1384.33s/10 iters), loss = 0.700565
I0228 13:10:28.488041 3422 solver.cpp:238] Train net output #0: loss = 0.700565 (* 1 = 0.700565 loss)
I0228 13:10:28.488049 3422 sgd_solver.cpp:105] Iteration 30, lr = 0.001
I0228 13:17:38.463490 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:23:29.700287 3422 solver.cpp:331] Iteration 40, Testing net (#0)
I0228 13:27:27.217670 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:31:48.651156 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:32:15.021637 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 13:32:15.021661 3422 solver.cpp:398] Test net output #1: loss = 0.694784 (* 1 = 0.694784 loss)
I0228 13:33:43.542735 3422 solver.cpp:219] Iteration 40 (0.00716818 iter/s, 1395.05s/10 iters), loss = 0.700307
I0228 13:33:43.542875 3422 solver.cpp:238] Train net output #0: loss = 0.700307 (* 1 = 0.700307 loss)
I0228 13:33:43.542897 3422 sgd_solver.cpp:105] Iteration 40, lr = 0.0001
I0228 13:36:37.602869 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:46:57.980952 3422 solver.cpp:331] Iteration 50, Testing net (#0)
I0228 13:50:55.125911 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:55:22.078013 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:55:49.644492 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 13:55:49.644516 3422 solver.cpp:398] Test net output #1: loss = 0.693804 (* 1 = 0.693804 loss)
I0228 13:57:19.439967 3422 solver.cpp:219] Iteration 50 (0.00706266 iter/s, 1415.9s/10 iters), loss = 0.685755
I0228 13:57:19.440101 3422 solver.cpp:238] Train net output #0: loss = 0.685755 (* 1 = 0.685755 loss)
I0228 13:57:19.440107 3422 sgd_solver.cpp:105] Iteration 50, lr = 0.0001
I0228 13:57:19.843221 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:09:13.012436 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:10:40.182121 3422 solver.cpp:331] Iteration 60, Testing net (#0)
I0228 14:14:37.148968 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:18:57.929569 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:19:24.183915 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 14:19:24.183939 3422 solver.cpp:398] Test net output #1: loss = 0.693612 (* 1 = 0.693612 loss)
I0228 14:20:51.017705 3422 solver.cpp:219] Iteration 60 (0.00708428 iter/s, 1411.58s/10 iters), loss = 0.693453
I0228 14:20:51.017838 3422 solver.cpp:238] Train net output #0: loss = 0.693453 (* 1 = 0.693453 loss)
I0228 14:20:51.017845 3422 sgd_solver.cpp:105] Iteration 60, lr = 1e-05
I0228 14:29:34.635071 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:34:02.693697 3422 solver.cpp:331] Iteration 70, Testing net (#0)
I0228 14:37:59.742414 3429 data_layer.cpp:73] Restarting data prefetching from start.
I also tried to change the value of test_iter to 40 (instead of previously set to 80) after following this link and this one if the parameter is related, but it still didn't resolve. Also, I tried to reshuffle the data by regenerating the dataset using a modified create_imagenet.sh script but the issue still remains.
Every time I changed value in the solver, I always changed the fully connected layer's name as well. Is this a correct way?
The number of epoch here is ~10. Is it possible culprit? Does this kind of problem fall under over-fitting issue?
Any hints or suggestions are welcome.
EDITED:
I turned on the debug info in the solver and found the loss is infinitesimal. Can I deduce that it's not learning much or at all then? The log with the debug info is as below:
I0228 19:58:37.235631 6771 net.cpp:593] [Forward] Layer pool2, top blob pool2 data: 1.00214
I0228 19:58:37.810919 6771 net.cpp:593] [Forward] Layer norm2, top blob norm2 data: 1.00212
I0228 19:58:42.022397 6771 net.cpp:593] [Forward] Layer conv3, top blob conv3 data: 0.432846
I0228 19:58:42.022722 6771 net.cpp:605] [Forward] Layer conv3, param blob 0 data: 0.00796926
I0228 19:58:42.022725 6771 net.cpp:605] [Forward] Layer conv3, param blob 1 data: 0.000184241
I0228 19:58:42.041185 6771 net.cpp:593] [Forward] Layer relu3, top blob conv3 data: 0.2017
I0228 19:58:45.277812 6771 net.cpp:593] [Forward] Layer conv4, top blob conv4 data: 0.989365
I0228 19:58:45.278079 6771 net.cpp:605] [Forward] Layer conv4, param blob 0 data: 0.00797053
I0228 19:58:45.278082 6771 net.cpp:605] [Forward] Layer conv4, param blob 1 data: 0.99991
I0228 19:58:45.296561 6771 net.cpp:593] [Forward] Layer relu4, top blob conv4 data: 0.989365
I0228 19:58:47.495208 6771 net.cpp:593] [Forward] Layer conv5, top blob conv5 data: 1.52664
I0228 19:58:47.495394 6771 net.cpp:605] [Forward] Layer conv5, param blob 0 data: 0.00804997
I0228 19:58:47.495399 6771 net.cpp:605] [Forward] Layer conv5, param blob 1 data: 0.996736
I0228 19:58:47.507951 6771 net.cpp:593] [Forward] Layer relu5, top blob conv5 data: 0.128866
I0228 19:58:47.562223 6771 net.cpp:593] [Forward] Layer pool5, top blob pool5 data: 0.151769
I0228 19:58:48.269973 6771 net.cpp:593] [Forward] Layer fc6, top blob fc6 data: 0.95253
I0228 19:58:48.280905 6771 net.cpp:605] [Forward] Layer fc6, param blob 0 data: 0.00397552
I0228 19:58:48.280917 6771 net.cpp:605] [Forward] Layer fc6, param blob 1 data: 0.999847
I0228 19:58:48.282137 6771 net.cpp:593] [Forward] Layer relu6, top blob fc6 data: 0.935909
I0228 19:58:48.286769 6771 net.cpp:593] [Forward] Layer drop6, top blob fc6 data: 0.938786
I0228 19:58:48.602710 6771 net.cpp:593] [Forward] Layer fc7, top blob fc7 data: 3.76741
I0228 19:58:48.607655 6771 net.cpp:605] [Forward] Layer fc7, param blob 0 data: 0.00411323
I0228 19:58:48.607664 6771 net.cpp:605] [Forward] Layer fc7, param blob 1 data: 0.997461
I0228 19:58:48.608860 6771 net.cpp:593] [Forward] Layer relu7, top blob fc7 data: 3.41694e-06
I0228 19:58:48.613621 6771 net.cpp:593] [Forward] Layer drop7, top blob fc7 data: 3.15335e-06
I0228 19:58:48.615514 6771 net.cpp:593] [Forward] Layer fc8_new15, top blob fc8_new15 data: 0.0446082
I0228 19:58:48.615520 6771 net.cpp:605] [Forward] Layer fc8_new15, param blob 0 data: 0.0229027
I0228 19:58:48.615522 6771 net.cpp:605] [Forward] Layer fc8_new15, param blob 1 data: 0.0444381
I0228 19:58:48.615579 6771 net.cpp:593] [Forward] Layer loss, top blob loss data: 0.693174
I0228 19:58:48.615586 6771 net.cpp:621] [Backward] Layer loss, bottom blob fc8_new15 diff: 0.00195124
I0228 19:58:48.617902 6771 net.cpp:621] [Backward] Layer fc8_new15, bottom blob fc7 diff: 8.65365e-05
I0228 19:58:48.617914 6771 net.cpp:632] [Backward] Layer fc8_new15, param blob 0 diff: 8.20022e-07
I0228 19:58:48.617916 6771 net.cpp:632] [Backward] Layer fc8_new15, param blob 1 diff: 0.0105705
I0228 19:58:48.619067 6771 net.cpp:621] [Backward] Layer drop7, bottom blob fc7 diff: 8.65526e-05
I0228 19:58:48.620265 6771 net.cpp:621] [Backward] Layer relu7, bottom blob fc7 diff: 1.21017e-09
I0228 19:58:49.261282 6771 net.cpp:621] [Backward] Layer fc7, bottom blob fc6 diff: 2.00745e-08
I0228 19:58:49.266103 6771 net.cpp:632] [Backward] Layer fc7, param blob 0 diff: 1.43563e-07
I0228 19:58:49.266114 6771 net.cpp:632] [Backward] Layer fc7, param blob 1 diff: 9.29627e-08
I0228 19:58:49.267330 6771 net.cpp:621] [Backward] Layer drop6, bottom blob fc6 diff: 1.99176e-08
I0228 19:58:49.268508 6771 net.cpp:621] [Backward] Layer relu6, bottom blob fc6 diff: 1.85305e-08
I0228 19:58:50.779518 6771 net.cpp:621] [Backward] Layer fc6, bottom blob pool5 diff: 8.8138e-09
I0228 19:58:50.790220 6771 net.cpp:632] [Backward] Layer fc6, param blob 0 diff: 3.01911e-07
I0228 19:58:50.790235 6771 net.cpp:632] [Backward] Layer fc6, param blob 1 diff: 1.99256e-06
I0228 19:58:50.813318 6771 net.cpp:621] [Backward] Layer pool5, bottom blob conv5 diff: 1.84585e-09
I0228 19:58:50.826406 6771 net.cpp:621] [Backward] Layer relu5, bottom blob conv5 diff: 3.86034e-10
I0228 19:58:55.093768 6771 net.cpp:621] [Backward] Layer conv5, bottom blob conv4 diff: 5.76684e-10
I0228 19:58:55.093967 6771 net.cpp:632] [Backward] Layer conv5, param blob 0 diff: 1.47824e-06
I0228 19:58:55.093973 6771 net.cpp:632] [Backward] Layer conv5, param blob 1 diff: 1.92951e-06
I0228 19:58:55.114212 6771 net.cpp:621] [Backward] Layer relu4, bottom blob conv4 diff: 5.76684e-10
I0228 19:59:01.392058 6771 net.cpp:621] [Backward] Layer conv4, bottom blob conv3 diff: 2.31243e-10
I0228 19:59:01.392359 6771 net.cpp:632] [Backward] Layer conv4, param blob 0 diff: 1.76617e-07
I0228 19:59:01.392364 6771 net.cpp:632] [Backward] Layer conv4, param blob 1 diff: 8.78101e-07
I0228 19:59:01.412240 6771 net.cpp:621] [Backward] Layer relu3, bottom blob conv3 diff: 8.56331e-11
I0228 19:59:09.734658 6771 net.cpp:621] [Backward] Layer conv3, bottom blob norm2 diff: 7.87699e-11
I0228 19:59:09.735258 6771 net.cpp:632] [Backward] Layer conv3, param blob 0 diff: 1.33159e-07
I0228 19:59:09.735270 6771 net.cpp:632] [Backward] Layer conv3, param blob 1 diff: 1.47704e-07
I0228 19:59:10.390552 6771 net.cpp:621] [Backward] Layer norm2, bottom blob pool2 diff: 7.87615e-11
I0228 19:59:10.452433 6771 net.cpp:621] [Backward] Layer pool2, bottom blob conv2 diff: 1.50474e-11
I0228 19:59:10.516407 6771 net.cpp:621] [Backward] Layer relu2, bottom blob conv2 diff: 1.50474e-11
I0228 19:59:20.241587 6771 net.cpp:621] [Backward] Layer conv2, bottom blob norm1 diff: 2.07819e-11
I0228 19:59:20.241801 6771 net.cpp:632] [Backward] Layer conv2, param blob 0 diff: 3.61894e-09
I0228 19:59:20.241807 6771 net.cpp:632] [Backward] Layer conv2, param blob 1 diff: 1.05108e-07
I0228 19:59:35.405725 6771 net.cpp:621] [Backward] Layer norm1, bottom blob pool1 diff: 2.07819e-11
I0228 19:59:35.494249 6771 net.cpp:621] [Backward] Layer pool1, bottom blob conv1 diff: 4.26e-12
I0228 19:59:35.585350 6771 net.cpp:621] [Backward] Layer relu1, bottom blob conv1 diff: 3.25633e-12
I0228 19:59:38.335880 6771 net.cpp:632] [Backward] Layer conv1, param blob 0 diff: 9.37551e-09
I0228 19:59:38.335896 6771 net.cpp:632] [Backward] Layer conv1, param blob 1 diff: 5.86281e-08
E0228 19:59:38.411557 6771 net.cpp:721] [Backward] All net params (data, diff): L1 norm = (246967, 14.733); L2 norm = (103.38, 0.0470958)
I0228 19:59:38.411592 6771 solver.cpp:219] Iteration 70 (0.00886075 iter/s, 1128.57s/10 iters), loss = 0.693174
I0228 19:59:38.411600 6771 solver.cpp:238] Train net output #0: loss = 0.693174 (* 1 = 0.693174 loss)
I0228 19:59:38.411605 6771 sgd_solver.cpp:105] Iteration 70, lr = 1e-05
I0228 20:05:17.468423 6775 data_layer.cpp:73] Restarting data prefetching from start.
data_layer.cpp:73] Restarting data prefetching from start.
The above message occurs when the .txt file that is given as input to data layer reached the end of file.
This message can occur frequently when:
You gave the wrong .txt file to data layer
The format of the .txt file is not as expected by Caffe
Very few number of data is present in the file.

Caffe training loss is 0

I'm training an alexnet .caffemodel with faceScrub dataset, I'm following
Face Detection
Fine-Tuning
Thing is that when I'm training the model I get this output:
I0302 10:59:50.184250 11346 solver.cpp:331] Iteration 0, Testing net (#0)
I0302 11:09:01.198473 11346 solver.cpp:398] Test net output #0: accuracy = 0.96793
I0302 11:09:01.198635 11346 solver.cpp:398] Test net output #1: loss = 0.354751 (* 1 = 0.354751 loss)
I0302 11:09:12.543730 11346 solver.cpp:219] Iteration 0 (0 iter/s, 562.435s/20 iters), loss = 0.465583
I0302 11:09:12.543861 11346 solver.cpp:238] Train net output #0: loss = 0.465583 (* 1 = 0.465583 loss)
I0302 11:09:12.543902 11346 sgd_solver.cpp:105] Iteration 0, lr = 0.001
I0302 11:14:41.847237 11346 solver.cpp:219] Iteration 20 (0.0607343 iter/s, 329.303s/20 iters), loss = 4.65581e-09
I0302 11:14:41.847409 11346 solver.cpp:238] Train net output #0: loss = 0 (* 1 = 0 loss)
I0302 11:14:41.847447 11346 sgd_solver.cpp:105] Iteration 20, lr = 0.001
I0302 11:18:25.848346 11346 solver.cpp:219] Iteration 40 (0.0892857 iter/s, 224s/20 iters), loss = 4.65581e-09
I0302 11:18:25.848526 11346 solver.cpp:238] Train net output #0: loss = 0 (* 1 = 0 loss)
I0302 11:18:25.848565 11346 sgd_solver.cpp:105] Iteration 40, lr = 0.001
and it continues the same.
The only thing I am suspicious on is that in the Face Detection link train_val.prototxt it uses num_output: 2 in the fc8_flickr layer, so I have the .txt file with all the images in this format:
/media/jose/B430F55030F51A56/faceScrub/download/Steve_Carell/face/a3b1b70acd0fda72c98be121a2af3ea2f4209fe7.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Matt_Czuchry/face/98882354bbf3a508b48c6f53a84a68ca6797e617.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Linda_Gray/face/ca9356b2382d2595ba8a9ff399dc3efa80873d72.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Veronica_Hamel/face/900da3a6a22b25b3974e1f7602686f460126d028.jpg 1
With 1 being the class containing a face. If I remove the 1, it gets stuck in Iteration 0, Testing net (#0).
Any insight on this?

caffe loss is nan or 0

I am training a network and I have changed to learning rate from 0.1 to 0.00001. The output always remains the same. No mean is used for training.
What could be the reasons for such a weird loss?
I1107 15:07:28.381621 12333 solver.cpp:404] Test net output #0: loss = 3.37134e+11 (* 1 = 3.37134e+11 loss)
I1107 15:07:28.549142 12333 solver.cpp:228] Iteration 0, loss = 1.28092e+11
I1107 15:07:28.549201 12333 solver.cpp:244] Train net output #0: loss = 1.28092e+11 (* 1 = 1.28092e+11 loss)
I1107 15:07:28.549211 12333 sgd_solver.cpp:106] Iteration 0, lr = 1e-07
I1107 15:07:59.490077 12333 solver.cpp:228] Iteration 50, loss = -nan
I1107 15:07:59.490170 12333 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I1107 15:07:59.490176 12333 sgd_solver.cpp:106] Iteration 50, lr = 1e-07
I1107 15:08:29.177093 12333 solver.cpp:228] Iteration 100, loss = -nan
I1107 15:08:29.177119 12333 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I1107 15:08:29.177125 12333 sgd_solver.cpp:106] Iteration 100, lr = 1e-07
I1107 15:08:59.758381 12333 solver.cpp:228] Iteration 150, loss = -nan
I1107 15:08:59.758513 12333 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I1107 15:08:59.758545 12333 sgd_solver.cpp:106] Iteration 150, lr = 1e-07
I1107 15:09:30.210208 12333 solver.cpp:228] Iteration 200, loss = -nan
I1107 15:09:30.210304 12333 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I1107 15:09:30.210310 12333 sgd_solver.cpp:106] Iteration 200, lr = 1e-07
you loss is not 0, not even close. You start with 3.3e+11 (that is ~10^11) and it seems like soon after it explodes and you get nan. You need to drastically scale down you loss values. If you are using "EuclideanLoss" you might want to average the loss by the size of the depth map, scale the predicted values to [-1,1] range, or any other scaling method that will prevent your loss from exploding.

Resources