I am training a network and I have changed to learning rate from 0.1 to 0.00001. The output always remains the same. No mean is used for training.
What could be the reasons for such a weird loss?
I1107 15:07:28.381621 12333 solver.cpp:404] Test net output #0: loss = 3.37134e+11 (* 1 = 3.37134e+11 loss)
I1107 15:07:28.549142 12333 solver.cpp:228] Iteration 0, loss = 1.28092e+11
I1107 15:07:28.549201 12333 solver.cpp:244] Train net output #0: loss = 1.28092e+11 (* 1 = 1.28092e+11 loss)
I1107 15:07:28.549211 12333 sgd_solver.cpp:106] Iteration 0, lr = 1e-07
I1107 15:07:59.490077 12333 solver.cpp:228] Iteration 50, loss = -nan
I1107 15:07:59.490170 12333 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I1107 15:07:59.490176 12333 sgd_solver.cpp:106] Iteration 50, lr = 1e-07
I1107 15:08:29.177093 12333 solver.cpp:228] Iteration 100, loss = -nan
I1107 15:08:29.177119 12333 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I1107 15:08:29.177125 12333 sgd_solver.cpp:106] Iteration 100, lr = 1e-07
I1107 15:08:59.758381 12333 solver.cpp:228] Iteration 150, loss = -nan
I1107 15:08:59.758513 12333 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I1107 15:08:59.758545 12333 sgd_solver.cpp:106] Iteration 150, lr = 1e-07
I1107 15:09:30.210208 12333 solver.cpp:228] Iteration 200, loss = -nan
I1107 15:09:30.210304 12333 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I1107 15:09:30.210310 12333 sgd_solver.cpp:106] Iteration 200, lr = 1e-07
you loss is not 0, not even close. You start with 3.3e+11 (that is ~10^11) and it seems like soon after it explodes and you get nan. You need to drastically scale down you loss values. If you are using "EuclideanLoss" you might want to average the loss by the size of the depth map, scale the predicted values to [-1,1] range, or any other scaling method that will prevent your loss from exploding.
Related
Problem:
I am building a model that will predict housing price. So, firstly I
decided to build a Linear regression model in Tensorflow. But when I
start training I see that my accuracy is always 1
I am new to machine learning. Please, someone, tell me what's going wrong I can't figure it out. I searched in google but doesn't find any answer that solves my problem.
Here's my code
df_train = df_train.loc[:, ['OverallQual', 'GrLivArea', 'GarageArea', 'SalePrice']]
df_X = df_train.loc[:, ['OverallQual', 'GrLivArea', 'GarageArea']]
df_Y = df_train.loc[:, ['SalePrice']]
df_yy = get_dummies(df_Y)
print("Shape of df_X: ", df_X.shape)
X_train, X_test, y_train, y_test = train_test_split(df_X, df_yy, test_size=0.15)
X_train = np.asarray(X_train).astype(np.float32)
X_test = np.asarray(X_test).astype(np.float32)
y_train = np.asarray(y_train).astype(np.float32)
y_test = np.asarray(y_test).astype(np.float32)
X = tf.placeholder(tf.float32, [None, num_of_features])
y = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.zeros([num_of_features, 1]))
b = tf.Variable(tf.zeros([1]))
prediction = tf.add(tf.matmul(X, W), b)
num_epochs = 20000
# calculating loss
cost = tf.reduce_mean(tf.losses.softmax_cross_entropy(onehot_labels=y, logits=prediction))
optimizer = tf.train.GradientDescentOptimizer(0.00001).minimize(cost)
correct_prediction = tf.equal(tf.argmax(prediction, axis=1), tf.argmax(y, axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(num_epochs):
if epoch % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={X: X_train, y: y_train})
print('step %d, training accuracy %g' % (epoch, train_accuracy))
optimizer.run(feed_dict={X: X_train, y: y_train})
print('test accuracy %g' % accuracy.eval(feed_dict={
X: X_test, y: y_test}))
Output is:
step 0, training accuracy 1
step 100, training accuracy 1
step 200, training accuracy 1
step 300, training accuracy 1
step 400, training accuracy 1
step 500, training accuracy 1
step 600, training accuracy 1
step 700, training accuracy 1
............................
............................
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 1
EDIT:
I changed my cost function to this
cost = tf.reduce_sum(tf.pow(prediction-y, 2))/(2*1241)
But still my output is always 1.
EDIT 2:
In response to lejlot comment:
Thanks lejlot. I changed my accuracy code to this
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter("/tmp/hpp1")
writer.add_graph(sess.graph)
for epoch in range(num_epochs):
if epoch % 5:
s = sess.run(merged_summary, feed_dict={X: X_train, y: y_train})
writer.add_summary(s, epoch)
sess.run(optimizer,feed_dict={X: X_train, y: y_train})
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X: X_train, y: y_train})
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=", sess.run(b))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={X: X_train, y: y_train})
print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
But the output is all nan
Output:
....................................
Epoch: 19900 cost= nan W= nan b= nan
Epoch: 19950 cost= nan W= nan b= nan
Epoch: 20000 cost= nan W= nan b= nan
Optimization Finished!
Training cost= nan W= nan b= nan
You want to use linear regression, but you actually use logistic regression. Take a look at tf.losses.softmax_cross_entropy: it outputs a probability distribution, i.e. a vector of numbers that sum up to 1. In your case, the vector has size=1, hence it always outputs [1].
Here are two examples that will help you see the difference: linear regression and logistic regression.
I am using bvlc_reference_caffenet for training. I am doing both training and testing. Below is an example log of my trained network:
I0430 11:49:08.408740 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:21.221074 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:34.038710 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:46.816813 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:56.630870 23334 solver.cpp:397] Test net output #0: accuracy = 0.932502
I0430 11:49:56.630940 23334 solver.cpp:397] Test net output #1: loss = 0.388662 (* 1 = 0.388662 loss)
I0430 11:49:57.218236 23334 solver.cpp:218] Iteration 71000 (0.319361 iter/s, 62.625s/20 iters), loss = 0.00146191
I0430 11:49:57.218300 23334 solver.cpp:237] Train net output #0: loss = 0.00146191 (* 1 = 0.00146191 loss)
I0430 11:49:57.218308 23334 sgd_solver.cpp:105] Iteration 71000, lr = 0.001
I0430 11:50:09.168726 23334 solver.cpp:218] Iteration 71020 (1.67357 iter/s, 11.9505s/20 iters), loss = 0.000806865
I0430 11:50:09.168778 23334 solver.cpp:237] Train net output #0: loss = 0.000806868 (* 1 = 0.000806868 loss)
I0430 11:50:09.168787 23334 sgd_solver.cpp:105] Iteration 71020, lr = 0.001
I0430 11:50:21.127496 23334 solver.cpp:218] Iteration 71040 (1.67241 iter/s, 11.9588s/20 iters), loss = 0.000182312
I0430 11:50:21.127539 23334 solver.cpp:237] Train net output #0: loss = 0.000182314 (* 1 = 0.000182314 loss)
I0430 11:50:21.127562 23334 sgd_solver.cpp:105] Iteration 71040, lr = 0.001
I0430 11:50:33.248086 23334 solver.cpp:218] Iteration 71060 (1.65009 iter/s, 12.1206s/20 iters), loss = 0.000428604
I0430 11:50:33.248260 23334 solver.cpp:237] Train net output #0: loss = 0.000428607 (* 1 = 0.000428607 loss)
I0430 11:50:33.248272 23334 sgd_solver.cpp:105] Iteration 71060, lr = 0.001
I0430 11:50:45.518955 23334 solver.cpp:218] Iteration 71080 (1.62989 iter/s, 12.2707s/20 iters), loss = 0.00108446
I0430 11:50:45.519006 23334 solver.cpp:237] Train net output #0: loss = 0.00108447 (* 1 = 0.00108447 loss)
I0430 11:50:45.519011 23334 sgd_solver.cpp:105] Iteration 71080, lr = 0.001
I0430 11:50:51.287315 23341 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:50:57.851781 23334 solver.cpp:218] Iteration 71100 (1.62169 iter/s, 12.3328s/20 iters), loss = 0.00150949
I0430 11:50:57.851828 23334 solver.cpp:237] Train net output #0: loss = 0.0015095 (* 1 = 0.0015095 loss)
I0430 11:50:57.851837 23334 sgd_solver.cpp:105] Iteration 71100, lr = 0.001
I0430 11:51:09.912184 23334 solver.cpp:218] Iteration 71120 (1.65832 iter/s, 12.0604s/20 iters), loss = 0.00239335
I0430 11:51:09.912330 23334 solver.cpp:237] Train net output #0: loss = 0.00239335 (* 1 = 0.00239335 loss)
I0430 11:51:09.912340 23334 sgd_solver.cpp:105] Iteration 71120, lr = 0.001
I0430 11:51:21.968586 23334 solver.cpp:218] Iteration 71140 (1.65888 iter/s, 12.0563s/20 iters), loss = 0.00161807
I0430 11:51:21.968646 23334 solver.cpp:237] Train net output #0: loss = 0.00161808 (* 1 = 0.00161808 loss)
I0430 11:51:21.968654 23334 sgd_solver.cpp:105] Iteration 71140, lr = 0.001
What confuses me is the loss. I was going to stop training my network when loss goes below 0.0001 but there are two losses: training loss and test loss. Training loss seems to stay around 0.0001 but test loss is at 0.388 which is way above the threshold I set. Which one do I use to stop my training?
Having such a large gap between test and train performance might indicate that you over-fit your data.
The purpose of validation set is to make sure you do not overfit. You should use the performance on the validation set to decide whether to stop training or precede.
In general, you want to stop training when your validation accuracy hits a plateau. Your data above indicates that you have, indeed, over-trained your model.
Ideally, the training, testing, and validation error should be roughly equal. In practice, this rarely happens.
Note that the loss is not a good metric, unless your loss function and weights are the same for all phases of evaluation. For instance, GoogleNet weights the training loss function across three layers, but the validation test worries only about final accuracy.
I'm training an alexnet .caffemodel with faceScrub dataset, I'm following
Face Detection
Fine-Tuning
Thing is that when I'm training the model I get this output:
I0302 10:59:50.184250 11346 solver.cpp:331] Iteration 0, Testing net (#0)
I0302 11:09:01.198473 11346 solver.cpp:398] Test net output #0: accuracy = 0.96793
I0302 11:09:01.198635 11346 solver.cpp:398] Test net output #1: loss = 0.354751 (* 1 = 0.354751 loss)
I0302 11:09:12.543730 11346 solver.cpp:219] Iteration 0 (0 iter/s, 562.435s/20 iters), loss = 0.465583
I0302 11:09:12.543861 11346 solver.cpp:238] Train net output #0: loss = 0.465583 (* 1 = 0.465583 loss)
I0302 11:09:12.543902 11346 sgd_solver.cpp:105] Iteration 0, lr = 0.001
I0302 11:14:41.847237 11346 solver.cpp:219] Iteration 20 (0.0607343 iter/s, 329.303s/20 iters), loss = 4.65581e-09
I0302 11:14:41.847409 11346 solver.cpp:238] Train net output #0: loss = 0 (* 1 = 0 loss)
I0302 11:14:41.847447 11346 sgd_solver.cpp:105] Iteration 20, lr = 0.001
I0302 11:18:25.848346 11346 solver.cpp:219] Iteration 40 (0.0892857 iter/s, 224s/20 iters), loss = 4.65581e-09
I0302 11:18:25.848526 11346 solver.cpp:238] Train net output #0: loss = 0 (* 1 = 0 loss)
I0302 11:18:25.848565 11346 sgd_solver.cpp:105] Iteration 40, lr = 0.001
and it continues the same.
The only thing I am suspicious on is that in the Face Detection link train_val.prototxt it uses num_output: 2 in the fc8_flickr layer, so I have the .txt file with all the images in this format:
/media/jose/B430F55030F51A56/faceScrub/download/Steve_Carell/face/a3b1b70acd0fda72c98be121a2af3ea2f4209fe7.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Matt_Czuchry/face/98882354bbf3a508b48c6f53a84a68ca6797e617.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Linda_Gray/face/ca9356b2382d2595ba8a9ff399dc3efa80873d72.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Veronica_Hamel/face/900da3a6a22b25b3974e1f7602686f460126d028.jpg 1
With 1 being the class containing a face. If I remove the 1, it gets stuck in Iteration 0, Testing net (#0).
Any insight on this?
I am basically using caffeNet to do some sort of image classification, with 256 classes. I am feeding the network a list of HDF5 files. But my network doesn't seem to learn, I m having accuracy 0 all the time and the training error and validation error are the same. I would think if the dataset was not enough to learn, training error should be very small and the validation error should be large right? Also tried it with different batch sizes and learning rates, with no success. Here is the solver.prototxt and the network prototxt(a caffenet arch.). Any suggestion is appreciate.
I1103 12:01:41.822055 108615 solver.cpp:337] Iteration 0, Testing net (#0)
I1103 12:01:41.849742 108615 solver.cpp:404] Test net output #0: accuracy = 0
I1103 12:01:41.849761 108615 solver.cpp:404] Test net output #1: loss = 6.02617 (* 1 = 6.02617 loss)
I1103 12:01:41.869380 108615 solver.cpp:228] Iteration 0, loss = 6.05644
I1103 12:01:41.869398 108615 solver.cpp:244] Train net output #0: loss = 6.05644 (* 1 = 6.05644 loss)
I1103 12:01:41.869413 108615 sgd_solver.cpp:106] Iteration 0, lr = 0.1
I1103 12:01:47.624855 108615 solver.cpp:228] Iteration 500, loss = 87.3365
I1103 12:01:47.624876 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:47.624882 108615 sgd_solver.cpp:106] Iteration 500, lr = 0.1
I1103 12:01:53.290213 108615 solver.cpp:337] Iteration 1000, Testing net (#0)
I1103 12:01:53.299310 108615 solver.cpp:404] Test net output #0: accuracy = 0
I1103 12:01:53.299327 108615 solver.cpp:404] Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:53.314584 108615 solver.cpp:228] Iteration 1000, loss = 87.3365
I1103 12:01:53.314615 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:53.314621 108615 sgd_solver.cpp:106] Iteration 1000, lr = 0.01
I1103 12:01:58.991268 108615 solver.cpp:228] Iteration 1500, loss = 87.3365
I1103 12:01:58.991315 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:58.991322 108615 sgd_solver.cpp:106] Iteration 1500, lr = 0.01
I1103 12:02:04.664419 108615 solver.cpp:337] Iteration 2000, Testing net (#0)
I1103 12:02:04.673518 108615 solver.cpp:404] Test net output #0: accuracy = 0
I1103 12:02:04.673537 108615 solver.cpp:404] Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:04.690434 108615 solver.cpp:228] Iteration 2000, loss = 87.3365
I1103 12:02:04.690469 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:04.690481 108615 sgd_solver.cpp:106] Iteration 2000, lr = 0.001
I1103 12:02:10.373788 108615 solver.cpp:228] Iteration 2500, loss = 87.3365
I1103 12:02:10.373852 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:10.373859 108615 sgd_solver.cpp:106] Iteration 2500, lr = 0.001
I1103 12:02:16.047372 108615 solver.cpp:337] Iteration 3000, Testing net (#0)
I1103 12:02:16.056390 108615 solver.cpp:404] Test net output #0: accuracy = 0
I1103 12:02:16.056407 108615 solver.cpp:404] Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:16.070235 108615 solver.cpp:228] Iteration 3000, loss = 87.3365
I1103 12:02:16.070261 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:16.070267 108615 sgd_solver.cpp:106] Iteration 3000, lr = 0.0001
I1103 12:02:21.755348 108615 solver.cpp:228] Iteration 3500, loss = 87.3365
I1103 12:02:21.755369 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:21.755375 108615 sgd_solver.cpp:106] Iteration 3500, lr = 0.0001
----------------------------------
net: "/A/B/train.prototxt"
test_iter: 10
test_interval: 1000
base_lr: 0.1
lr_policy: "step"
gamma: 0.1
stepsize: 1000
display: 10
max_iter: 4000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "/A/B/model_"
solver_mode: GPU
--------------------------------------------------
layer {
name: "data"
type: "HDF5Data"
top: "X"
top: "y"
hdf5_data_param{
source:"/Path/to/trainh5list.txt"
batch_size: 1
}
include{phase: TRAIN}
}
layer {
name: "data"
type: "HDF5Data"
top: "X"
top: "y"
hdf5_data_param{
source:"/Path/to/testh5list.txt"
batch_size: 1
}
include{phase: TEST}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "X"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
.
.
.
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 256
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "y"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "y"
top: "loss"
}
This is the result I get when I train my own model
I0510 20:53:16.677439 3591 solver.cpp:337] Iteration 0, Testing net (#0)
I0510 20:57:20.822933 3591 solver.cpp:404] Test net output #0: accuracy = 3.78788e-05
I0510 20:57:20.823001 3591 solver.cpp:404] Test net output #1: loss = 9.27223 (* 1 = 9.27223 loss)
I0510 20:57:21.423084 3591 solver.cpp:228] Iteration 0, loss = 9.29181
I0510 20:57:21.423110 3591 solver.cpp:244] Train net output #0: loss = 9.29181 (* 1 = 9.29181 loss)
I0510 20:57:21.423120 3591 sgd_solver.cpp:106] Iteration 0, lr = 0.001
I0510 21:06:57.498831 3591 solver.cpp:337] Iteration 1000, Testing net (#0)
I0510 21:10:59.477396 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0510 21:10:59.477463 3591 solver.cpp:404] Test net output #1: loss = 8.86572 (* 1 = 8.86572 loss)
I0510 21:20:35.828510 3591 solver.cpp:337] Iteration 2000, Testing net (#0)
I0510 21:24:42.838196 3591 solver.cpp:404] Test net output #0: accuracy = 0.00144886
I0510 21:24:42.838245 3591 solver.cpp:404] Test net output #1: loss = 8.83859 (* 1 = 8.83859 loss)
I0510 21:24:43.412120 3591 solver.cpp:228] Iteration 2000, loss = 8.81461
I0510 21:24:43.412145 3591 solver.cpp:244] Train net output #0: loss = 8.81461 (* 1 = 8.81461 loss)
I0510 21:24:43.412150 3591 sgd_solver.cpp:106] Iteration 2000, lr = 0.001
I0510 21:38:50.990823 3591 solver.cpp:337] Iteration 3000, Testing net (#0)
I0510 21:42:52.918418 3591 solver.cpp:404] Test net output #0: accuracy = 0.00140152
I0510 21:42:52.918493 3591 solver.cpp:404] Test net output #1: loss = 8.81789 (* 1 = 8.81789 loss)
I0510 22:00:09.519151 3591 solver.cpp:337] Iteration 4000, Testing net (#0)
I0510 22:09:13.918016 3591 solver.cpp:404] Test net output #0: accuracy = 0.00149621
I0510 22:09:13.918102 3591 solver.cpp:404] Test net output #1: loss = 8.80909 (* 1 = 8.80909 loss)
I0510 22:09:15.127683 3591 solver.cpp:228] Iteration 4000, loss = 8.8597
I0510 22:09:15.127722 3591 solver.cpp:244] Train net output #0: loss = 8.8597 (* 1 = 8.8597 loss)
I0510 22:09:15.127729 3591 sgd_solver.cpp:106] Iteration 4000, lr = 0.001
I0510 22:28:39.320019 3591 solver.cpp:337] Iteration 5000, Testing net (#0)
I0510 22:37:43.847064 3591 solver.cpp:404] Test net output #0: accuracy = 0.00118371
I0510 22:37:43.847173 3591 solver.cpp:404] Test net output #1: loss = 8.80527 (* 1 = 8.80527 loss)
I0510 23:58:17.120088 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_10000.caffemodel
I0510 23:58:17.238307 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_10000.solverstate
I0510 23:58:17.491825 3591 solver.cpp:337] Iteration 10000, Testing net (#0)
I0511 00:02:19.412715 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0511 00:02:19.412762 3591 solver.cpp:404] Test net output #1: loss = 8.79114 (* 1 = 8.79114 loss)
I0511 00:02:19.986547 3591 solver.cpp:228] Iteration 10000, loss = 8.83457
I0511 00:02:19.986570 3591 solver.cpp:244] Train net output #0: loss = 8.83457 (* 1 = 8.83457 loss)
I0511 00:02:19.986578 3591 sgd_solver.cpp:106] Iteration 10000, lr = 0.001
I0511 00:11:55.546052 3591 solver.cpp:337] Iteration 11000, Testing net (#0)
I0511 00:15:57.490486 3591 solver.cpp:404] Test net output #0: accuracy = 0.00164773
I0511 00:15:57.490532 3591 solver.cpp:404] Test net output #1: loss = 8.78702 (* 1 = 8.78702 loss)
I0511 00:25:33.666496 3591 solver.cpp:337] Iteration 12000, Testing net (#0)
I0511 00:29:35.603062 3591 solver.cpp:404] Test net output #0: accuracy = 0.0016572
I0511 00:29:35.603109 3591 solver.cpp:404] Test net output #1: loss = 8.7848 (* 1 = 8.7848 loss)
I0511 00:29:36.177078 3591 solver.cpp:228] Iteration 12000, loss = 9.00561
I0511 00:29:36.177105 3591 solver.cpp:244] Train net output #0: loss = 9.00561 (* 1 = 9.00561 loss)
I0511 00:29:36.177114 3591 sgd_solver.cpp:106] Iteration 12000, lr = 0.001
I0511 00:39:11.729369 3591 solver.cpp:337] Iteration 13000, Testing net (#0)
I0511 00:43:13.678067 3591 solver.cpp:404] Test net output #0: accuracy = 0.001875
I0511 00:43:13.678113 3591 solver.cpp:404] Test net output #1: loss = 8.78359 (* 1 = 8.78359 loss)
I0511 00:52:49.851985 3591 solver.cpp:337] Iteration 14000, Testing net (#0)
I0511 00:56:51.767343 3591 solver.cpp:404] Test net output #0: accuracy = 0.00154356
I0511 00:56:51.767390 3591 solver.cpp:404] Test net output #1: loss = 8.77998 (* 1 = 8.77998 loss)
I0511 00:56:52.341564 3591 solver.cpp:228] Iteration 14000, loss = 8.83385
I0511 00:56:52.341591 3591 solver.cpp:244] Train net output #0: loss = 8.83385 (* 1 = 8.83385 loss)
I0511 00:56:52.341598 3591 sgd_solver.cpp:106] Iteration 14000, lr = 0.001
I0511 02:14:38.224290 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_20000.caffemodel
I0511 02:14:38.735008 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_20000.solverstate
I0511 02:14:38.805809 3591 solver.cpp:337] Iteration 20000, Testing net (#0)
I0511 02:18:40.681993 3591 solver.cpp:404] Test net output #0: accuracy = 0.00179924
I0511 02:18:40.682086 3591 solver.cpp:404] Test net output #1: loss = 8.78129 (* 1 = 8.78129 loss)
I0511 02:18:41.255969 3591 solver.cpp:228] Iteration 20000, loss = 8.82502
I0511 02:18:41.255995 3591 solver.cpp:244] Train net output #0: loss = 8.82502 (* 1 = 8.82502 loss)
I0511 02:18:41.256001 3591 sgd_solver.cpp:106] Iteration 20000, lr = 0.001
I0511 04:30:58.924096 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_30000.caffemodel
I0511 04:31:00.742739 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_30000.solverstate
I0511 04:31:01.151980 3591 solver.cpp:337] Iteration 30000, Testing net (#0)
I0511 04:35:03.075263 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0511 04:35:03.075307 3591 solver.cpp:404] Test net output #1: loss = 8.77867 (* 1 = 8.77867 loss)
I0511 04:35:03.649479 3591 solver.cpp:228] Iteration 30000, loss = 8.82915
I0511 04:35:03.649507 3591 solver.cpp:244] Train net output #0: loss = 8.82915 (* 1 = 8.82915 loss)
I0511 04:35:03.649513 3591 sgd_solver.cpp:106] Iteration 30000, lr = 0.001
I0511 07:55:36.848265 3591 solver.cpp:337] Iteration 45000, Testing net (#0)
I0511 07:59:38.834043 3591 solver.cpp:404] Test net output #0: accuracy = 0.00179924
I0511 07:59:38.834095 3591 solver.cpp:404] Test net output #1: loss = 8.77432 (* 1 = 8.77432 loss)
I0511 09:03:48.141854 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_50000.caffemodel
I0511 09:03:49.736464 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_50000.solverstate
I0511 09:03:49.797582 3591 solver.cpp:337] Iteration 50000, Testing net (#0)
I0511 09:07:51.777150 3591 solver.cpp:404] Test net output #0: accuracy = 0.001875
I0511 09:07:51.777207 3591 solver.cpp:404] Test net output #1: loss = 8.77058 (* 1 = 8.77058 loss)
I0511 09:07:52.351323 3591 solver.cpp:228] Iteration 50000, loss = 9.11435
I0511 09:07:52.351351 3591 solver.cpp:244] Train net output #0: loss = 9.11435 (* 1 = 9.11435 loss)
I0511 09:07:52.351357 3591 sgd_solver.cpp:106] Iteration 50000, lr = 0.001
I0511 09:17:28.188742 3591 solver.cpp:337] Iteration 51000, Testing net (#0)
I0511 09:21:30.200623 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0511 09:21:30.200716 3591 solver.cpp:404] Test net output #1: loss = 8.77026 (* 1 = 8.77026 loss)
I0511 09:31:06.596501 3591 solver.cpp:337] Iteration 52000, Testing net (#0)
I0511 09:35:08.580215 3591 solver.cpp:404] Test net output #0: accuracy = 0.00182765
I0511 09:35:08.580313 3591 solver.cpp:404] Test net output #1: loss = 8.76917 (* 1 = 8.76917 loss)
I0511 09:35:09.154428 3591 solver.cpp:228] Iteration 52000, loss = 8.89758
I0511 09:35:09.154453 3591 solver.cpp:244] Train net output #0: loss = 8.89758 (* 1 = 8.89758 loss)
I0511 09:35:09.154459 3591 sgd_solver.cpp:106] Iteration 52000, lr = 0.001
I0511 09:44:44.906309 3591 solver.cpp:337] Iteration 53000, Testing net (#0)
I0511 09:48:46.866353 3591 solver.cpp:404] Test net output #0: accuracy = 0.00185606
I0511 09:48:46.866430 3591 solver.cpp:404] Test net output #1: loss = 8.7708 (* 1 = 8.7708 loss)
I0511 09:58:23.097244 3591 solver.cpp:337] Iteration 54000, Testing net (#0)
I0511 10:02:25.056555 3591 solver.cpp:404] Test net output #0: accuracy = 0.00192235
I0511 10:02:25.056605 3591 solver.cpp:404] Test net output #1: loss = 8.76884 (* 1 = 8.76884 loss)
I0511 10:02:25.630312 3591 solver.cpp:228] Iteration 54000, loss = 8.90552
I0511 10:02:25.630337 3591 solver.cpp:244] Train net output #0: loss = 8.90552 (* 1 = 8.90552 loss)
I0511 10:02:25.630342 3591 sgd_solver.cpp:106] Iteration 54000, lr = 0.001
I0511 14:44:51.563555 3591 solver.cpp:337] Iteration 75000, Testing net (#0)
I0511 14:48:53.573640 3591 solver.cpp:404] Test net output #0: accuracy = 0.0016572
I0511 14:48:53.573724 3591 solver.cpp:404] Test net output #1: loss = 8.76967 (* 1 = 8.76967 loss)
I0511 14:58:30.080453 3591 solver.cpp:337] Iteration 76000, Testing net (#0)
I0511 15:02:32.076011 3591 solver.cpp:404] Test net output #0: accuracy = 0.001875
I0511 15:02:32.076077 3591 solver.cpp:404] Test net output #1: loss = 8.7695 (* 1 = 8.7695 loss)
I0511 15:02:32.650342 3591 solver.cpp:228] Iteration 76000, loss = 9.0084
I0511 15:02:32.650367 3591 solver.cpp:244] Train net output #0: loss = 9.0084 (* 1 = 9.0084 loss)
I0511 15:02:32.650373 3591 sgd_solver.cpp:106] Iteration 76000, lr = 0.001
I0511 15:12:08.597450 3591 solver.cpp:337] Iteration 77000, Testing net (#0)
I0511 15:16:10.636613 3591 solver.cpp:404] Test net output #0: accuracy = 0.00181818
I0511 15:16:10.636693 3591 solver.cpp:404] Test net output #1: loss = 8.76889 (* 1 = 8.76889 loss)
I0511 15:25:47.167667 3591 solver.cpp:337] Iteration 78000, Testing net (#0)
I0511 15:29:49.204596 3591 solver.cpp:404] Test net output #0: accuracy = 0.00185606
I0511 15:29:49.204649 3591 solver.cpp:404] Test net output #1: loss = 8.77059 (* 1 = 8.77059 loss)
I0511 15:29:49.779094 3591 solver.cpp:228] Iteration 78000, loss = 8.73139
I0511 15:29:49.779119 3591 solver.cpp:244] Train net output #0: loss = 8.73139 (* 1 = 8.73139 loss)
I0511 15:29:49.779124 3591 sgd_solver.cpp:106] Iteration 78000, lr = 0.001
I0511 15:39:25.730358 3591 solver.cpp:337] Iteration 79000, Testing net (#0)
I0511 15:43:27.756417 3591 solver.cpp:404] Test net output #0: accuracy = 0.00192235
I0511 15:43:27.756485 3591 solver.cpp:404] Test net output #1: loss = 8.76846 (* 1 = 8.76846 loss)
I0511 15:53:04.419961 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_80000.caffemodel
I0511 15:53:06.138357 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_80000.solverstate
I0511 15:53:06.519551 3591 solver.cpp:337] Iteration 80000, Testing net (#0)
I0511 15:57:08.719681 3591 solver.cpp:404] Test net output #0: accuracy = 0.00164773
I0511 15:57:08.719737 3591 solver.cpp:404] Test net output #1: loss = 8.77126 (* 1 = 8.77126 loss)
I0511 15:57:09.294163 3591 solver.cpp:228] Iteration 80000, loss = 8.56576
I0511 15:57:09.294188 3591 solver.cpp:244] Train net output #0: loss = 8.56576 (* 1 = 8.56576 loss)
I0511 15:57:09.294193 3591 sgd_solver.cpp:106] Iteration 80000, lr = 0.001
I0511 17:01:19.190099 3591 solver.cpp:337] Iteration 85000, Testing net (#0)
I0511 17:05:21.148668 3591 solver.cpp:404] Test net output #0: accuracy = 0.00185606
I0511 17:05:21.148733 3591 solver.cpp:404] Test net output #1: loss = 8.77196 (* 1 = 8.77196 loss)
I0511 17:14:57.670343 3591 solver.cpp:337] Iteration 86000, Testing net (#0)
I0511 17:18:59.659850 3591 solver.cpp:404] Test net output #0: accuracy = 0.00181818
I0511 17:18:59.659907 3591 solver.cpp:404] Test net output #1: loss = 8.77126 (* 1 = 8.77126 loss)
I0511 17:19:00.234335 3591 solver.cpp:228] Iteration 86000, loss = 8.72875
I0511 17:19:00.234359 3591 solver.cpp:244] Train net output #0: loss = 8.72875 (* 1 = 8.72875 loss)
I0511 17:19:00.234364 3591 sgd_solver.cpp:106] Iteration 86000, lr = 0.001
I0511 17:28:36.196920 3591 solver.cpp:337] Iteration 87000, Testing net (#0)
I0511 17:32:38.181174 3591 solver.cpp:404] Test net output #0: accuracy = 0.00181818
I0511 17:32:38.181231 3591 solver.cpp:404] Test net output #1: loss = 8.771 (* 1 = 8.771 loss)
I0511 17:42:14.658293 3591 solver.cpp:337] Iteration 88000, Testing net (#0)
I0511 17:46:16.614358 3591 solver.cpp:404] Test net output #0: accuracy = 0.00188447
I0511 17:46:16.614415 3591 solver.cpp:404] Test net output #1: loss = 8.76964 (* 1 = 8.76964 loss)
I0511 17:46:17.188212 3591 solver.cpp:228] Iteration 88000, loss = 8.80409
I0511 17:46:17.188233 3591 solver.cpp:244] Train net output #0: loss = 8.80409 (* 1 = 8.80409 loss)
I0511 17:46:17.188240 3591 sgd_solver.cpp:106] Iteration 88000, lr = 0.001
I0511 17:55:53.358322 3591 solver.cpp:337] Iteration 89000, Testing net (#0)
I0511 17:59:55.305763 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0511 17:59:55.305868 3591 solver.cpp:404] Test net output #1: loss = 8.76909 (* 1 = 8.76909 loss)
I0511 18:09:31.658655 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_90000.caffemodel
I0511 18:09:33.138741 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_90000.solverstate
I0511 18:09:33.691995 3591 solver.cpp:337] Iteration 90000, Testing net (#0)
I0511 18:13:35.626065 3591 solver.cpp:404] Test net output #0: accuracy = 0.00168561
I0511 18:13:35.626148 3591 solver.cpp:404] Test net output #1: loss = 8.76973 (* 1 = 8.76973 loss)
I0511 18:13:36.200448 3591 solver.cpp:228] Iteration 90000, loss = 8.97326
I0511 18:13:36.200469 3591 solver.cpp:244] Train net output #0: loss = 8.97326 (* 1 = 8.97326 loss)
I0511 18:13:36.200474 3591 sgd_solver.cpp:106] Iteration 90000, lr = 0.001
I0511 19:31:23.715662 3591 solver.cpp:337] Iteration 96000, Testing net (#0)
I0511 19:35:25.677780 3591 solver.cpp:404] Test net output #0: accuracy = 0.00188447
I0511 19:35:25.677836 3591 solver.cpp:404] Test net output #1: loss = 8.7695 (* 1 = 8.7695 loss)
I0511 19:35:26.251850 3591 solver.cpp:228] Iteration 96000, loss = 8.74232
I0511 19:35:26.251875 3591 solver.cpp:244] Train net output #0: loss = 8.74232 (* 1 = 8.74232 loss)
I0511 19:35:26.251880 3591 sgd_solver.cpp:106] Iteration 96000, lr = 0.001
I0511 19:45:02.057610 3591 solver.cpp:337] Iteration 97000, Testing net (#0)
I0511 19:49:04.029269 3591 solver.cpp:404] Test net output #0: accuracy = 0.00188447
I0511 19:49:04.029357 3591 solver.cpp:404] Test net output #1: loss = 8.77655 (* 1 = 8.77655 loss)
I0511 19:58:40.265120 3591 solver.cpp:337] Iteration 98000, Testing net (#0)
I0511 20:02:42.182787 3591 solver.cpp:404] Test net output #0: accuracy = 0.00183712
I0511 20:02:42.182859 3591 solver.cpp:404] Test net output #1: loss = 8.77069 (* 1 = 8.77069 loss)
I0511 20:02:42.756922 3591 solver.cpp:228] Iteration 98000, loss = 8.61745
I0511 20:02:42.756944 3591 solver.cpp:244] Train net output #0: loss = 8.61745 (* 1 = 8.61745 loss)
Duo to the limit of characters of codes, I have to delete some rows of the log. However, it doesn’t matter.
As you can see, there is no difference between "Iteration 98000" and "Iteration 0". I am really puzzled with this situation.
This is the architecture of my model
name: "NN2"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
mean_file :"/home/jiayi-wei/caffe/examples/NN2/image_train_mean.binaryproto"
data_param {
source: "/home/jiayi-wei/caffe/examples/NN2/img_train_lmdb"
batch_size: 30
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
mean_file :"/home/jiayi-wei/caffe/examples/NN2/image_train_mean.binaryproto"
data_param {
source: "/home/jiayi-wei/caffe/examples/NN2/img_val_lmdb"
batch_size: 11
backend: LMDB
}
}
#first layers
layer {
name: "conv11"
type: "Convolution"
bottom: "data"
top: "conv11"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu11"
type: "ReLU"
bottom: "conv11"
top: "conv11"
}
layer {
name: "conv12"
type: "Convolution"
bottom: "conv11"
top: "conv12"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu12"
type: "ReLU"
bottom: "conv12"
top: "conv12"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv12"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#second layers
layer {
name: "conv21"
type: "Convolution"
bottom: "pool1"
top: "conv21"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu21"
type: "ReLU"
bottom: "conv21"
top: "conv21"
}
layer {
name: "conv22"
type: "Convolution"
bottom: "conv21"
top: "conv22"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu22"
type: "ReLU"
bottom: "conv22"
top: "conv22"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv22"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#third layers
layer {
name: "conv31"
type: "Convolution"
bottom: "pool2"
top: "conv31"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu31"
type: "ReLU"
bottom: "conv31"
top: "conv31"
}
layer {
name: "conv32"
type: "Convolution"
bottom: "conv31"
top: "conv32"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu32"
type: "ReLU"
bottom: "conv32"
top: "conv32"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv32"
top: "pool3"
pooling_param {
pool: MAX
pad:1
kernel_size: 2
stride: 2
}
}
#fourth layer
layer {
name: "conv41"
type: "Convolution"
bottom: "pool3"
top: "conv41"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu41"
type: "ReLU"
bottom: "conv41"
top: "conv41"
}
layer {
name: "conv42"
type: "Convolution"
bottom: "conv41"
top: "conv42"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu42"
type: "ReLU"
bottom: "conv42"
top: "conv42"
}
layer {
name: "conv43"
type: "Convolution"
bottom: "conv42"
top: "conv43"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu43"
type: "ReLU"
bottom: "conv43"
top: "conv43"
}
layer {
name: "pool4"
type: "Pooling"
bottom: "conv43"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#fiveth layer
layer {
name: "conv51"
type: "Convolution"
bottom: "pool4"
top: "conv51"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu51"
type: "ReLU"
bottom: "conv51"
top: "conv51"
}
layer {
name: "conv52"
type: "Convolution"
bottom: "conv51"
top: "conv52"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu52"
type: "ReLU"
bottom: "conv52"
top: "conv52"
}
layer {
name: "conv53"
type: "Convolution"
bottom: "conv52"
top: "conv53"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv53"
top: "pool5"
pooling_param {
pool: AVE
pad:1
kernel_size: 2
stride: 2
}
}
#drop_Fc
layer {
name: "dropout"
type: "Dropout"
bottom: "pool5"
top: "pool5"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output:1000
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output:10575
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc7"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "SoftMax"
type: "SoftmaxWithLoss"
bottom: "fc7"
bottom: "label"
top: "SoftMax"
}
Following is my solver. And i have change base_lr to "0.001"
net: "train_val.prototxt"
test_iter: 10000
test_interval: 1000
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "/home/jiayi-wei/caffe/examples/NN2"
solver_mode: GPU
I have tried to change some parametric and I have already tried to reduce a "conv" layer from the block who has three "conv" layers. However the result always keep like the picture shows.
Please tell me how can i make out the problem? thanks
Your base_lr seems to be high. Start with a base_lrof 0.001 and go on reducing it by a factor of 10 whenever you stop seeing improvement in accuracy for several thousand iterations.
NOTE: This is just a rule of thumb, it may not work in all cases.
From your log, it seems that your model tended to keep predicting label unchangingly during training and namely, your training diverged. I advise you to make the following check.
Check out your label when convert train/validation lmdb data and in your CNN architechure, the Dropout layer is better placed under the InnerProduct layer, namely "fc6", instead of Pooling layer, "pool5".
I don't know how you sampled your training data during training. In principle, if you just use Softmax cost(multi-nominal cross entropy loss), you should shuffle your train data when prepare your train/val lmdb data and set a properly large batch size, for example 256 during training.
Maybe your learn rate(base_lr) was too large, you may further reduce your learn rate from 0.001 to 0.0001, but I noticed that the CASIA WebFace baseline(http://arxiv.org/abs/1411.7923) used a 0.01 learning rate, and the input data scale, active function, the depth and width of your model is similar to that, so it was less possibly caused by learn rate.(but you should check whether the weight initializing method matters much.)
Try a smaller convolutional kernel size. Sometimes this may help due to reducing the information loss resulting from the alignment problem between convolution kernel and its corresponding input feature map.
By the way, you are training a task of classifying 10575 classes with every class only having about 40 training samples, so to some extent, training data is insufficient. So like work in the base line, to enhance the model's ability to distinguish the same and the different samples, it's better to add a Contrastive cost besides a Softmax cost.
Reference
Sun Y, Chen Y, Wang X, et al. Deep learning face representation by joint identification-verification[C]//Advances in Neural Information Processing Systems. 2014: 1988-1996.