How to prevent weight update in caffe - machine-learning

Some layers of my net load a pretrained model. I want to fix their parameters and train other layers.
I followed this page and set lr_multi decay_multi to 0, propagate_down: false, and even base_lr: 0 weight_decay: 0 in the solver. However, the test loss (use all test images for each test) is still changing very slowly in each iter. After thousands iters the accuracy would go to 0 (from 80% when the pre-trained model was loaded).
Here is a a two-layer example which I just initialize the weights and set above parameters to 0. I want to feeze all layers in this example but when training start, the loss keeps changing...
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.017
mirror: true
crop_size: 32
mean_value: 115
mean_value: 126
mean_value: 130
color: true
contrast: true
brightness: true
}
image_data_param {
source: "/data/zhuhao5/data/cifar100/cifar100_train_replicate.txt"
batch_size: 64
shuffle: true
#pair_size: 3
}
}
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.017
mirror: false
crop_size: 32
mean_value: 115
mean_value: 126
mean_value: 130
}
image_data_param {
source: "/data/zhuhao5/data/cifar100/cifar100_test.txt"
batch_size: 100
shuffle: false
}
}
#-------------- TEACHER --------------------
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
propagate_down: false
top: "conv1"
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 16
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "msra"
}
}
}
layer {
name: "res2_1a_1_bn"
type: "BatchNorm"
bottom: "conv1"
propagate_down: false
top: "res2_1a_1_bn"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
}
layer {
name: "res2_1a_1_scale"
type: "Scale"
bottom: "res2_1a_1_bn"
propagate_down: false
top: "res2_1a_1_bn"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
bias_term: true
}
}
layer {
name: "res2_1a_1_relu"
type: "ReLU"
bottom: "res2_1a_1_bn"
propagate_down: false
top: "res2_1a_1_bn"
}
layer {
name: "pool_5"
type: "Pooling"
bottom: "res2_1a_1_bn"
propagate_down: false
top: "pool_5"
pooling_param {
pool: AVE
global_pooling: true
}
}
layer {
name: "fc100"
type: "InnerProduct"
bottom: "pool_5"
propagate_down: false
top: "fc100"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
inner_product_param {
num_output: 100
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
}
}
#---------------------------------
layer {
name: "tea_soft_loss"
type: "SoftmaxWithLoss"
bottom: "fc100"
bottom: "label"
propagate_down: false
propagate_down: false
top: "tea_soft_loss"
loss_weight: 0
}
##----------- ACCURACY----------------
layer {
name: "teacher_accuracy"
type: "Accuracy"
bottom: "fc100"
bottom: "label"
top: "teacher_accuracy"
accuracy_param {
top_k: 1
}
}
Here is the solver:
test_iter: 100
test_interval: 10
base_lr: 0
momentum: 0
weight_decay: 0
lr_policy: "poly"
power: 1
display: 10000
max_iter: 80000
snapshot: 5000
type: "SGD"
solver_mode: GPU
random_seed: 10086
and log:
I0829 16:31:39.363433 14986 net.cpp:200] teacher_accuracy does not need backward computation.
I0829 16:31:39.363438 14986 net.cpp:200] tea_soft_loss does not need backward computation.
I0829 16:31:39.363442 14986 net.cpp:200] fc100_fc100_0_split does not need backward computation.
I0829 16:31:39.363446 14986 net.cpp:200] fc100 does not need backward computation.
I0829 16:31:39.363451 14986 net.cpp:200] pool_5 does not need backward computation.
I0829 16:31:39.363454 14986 net.cpp:200] res2_1a_1_relu does not need backward computation.
I0829 16:31:39.363458 14986 net.cpp:200] res2_1a_1_scale does not need backward computation.
I0829 16:31:39.363462 14986 net.cpp:200] res2_1a_1_bn does not need backward computation.
I0829 16:31:39.363466 14986 net.cpp:200] conv1 does not need backward computation.
I0829 16:31:39.363471 14986 net.cpp:200] label_data_1_split does not need backward computation.
I0829 16:31:39.363485 14986 net.cpp:200] data does not need backward computation.
I0829 16:31:39.363490 14986 net.cpp:242] This network produces output tea_soft_loss
I0829 16:31:39.363494 14986 net.cpp:242] This network produces output teacher_accuracy
I0829 16:31:39.363507 14986 net.cpp:255] Network initialization done.
I0829 16:31:39.363559 14986 solver.cpp:56] Solver scaffolding done.
I0829 16:31:39.363852 14986 caffe.cpp:248] Starting Optimization
I0829 16:31:39.363862 14986 solver.cpp:272] Solving WRN_22_12_to_WRN_18_4_v5_net
I0829 16:31:39.363865 14986 solver.cpp:273] Learning Rate Policy: poly
I0829 16:31:39.365981 14986 solver.cpp:330] Iteration 0, Testing net (#0)
I0829 16:31:39.366190 14986 blocking_queue.cpp:49] Waiting for data
I0829 16:31:39.742347 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 85.9064
I0829 16:31:39.742437 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0113
I0829 16:31:39.749806 14986 solver.cpp:218] Iteration 0 (0 iter/s, 0.385886s/10000 iters), loss = 0
I0829 16:31:39.749862 14986 solver.cpp:237] Train net output #0: tea_soft_loss = 4.97483
I0829 16:31:39.749877 14986 solver.cpp:237] Train net output #1: teacher_accuracy = 0
I0829 16:31:39.749908 14986 sgd_solver.cpp:105] Iteration 0, lr = 0
I0829 16:31:39.794306 14986 solver.cpp:330] Iteration 10, Testing net (#0)
I0829 16:31:40.171447 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.9119
I0829 16:31:40.171510 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0115
I0829 16:31:40.219133 14986 solver.cpp:330] Iteration 20, Testing net (#0)
I0829 16:31:40.596911 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.91862
I0829 16:31:40.596971 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0116
I0829 16:31:40.645246 14986 solver.cpp:330] Iteration 30, Testing net (#0)
I0829 16:31:41.021711 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.92105
I0829 16:31:41.021772 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:41.069464 14986 solver.cpp:330] Iteration 40, Testing net (#0)
I0829 16:31:41.447345 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.91916
I0829 16:31:41.447407 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:41.495157 14986 solver.cpp:330] Iteration 50, Testing net (#0)
I0829 16:31:41.905607 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.9208
I0829 16:31:41.905654 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:41.952659 14986 solver.cpp:330] Iteration 60, Testing net (#0)
I0829 16:31:42.327942 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.91936
I0829 16:31:42.328025 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:42.374279 14986 solver.cpp:330] Iteration 70, Testing net (#0)
I0829 16:31:42.761359 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.91859
I0829 16:31:42.761430 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:42.807821 14986 solver.cpp:330] Iteration 80, Testing net (#0)
I0829 16:31:43.232321 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.91668
I0829 16:31:43.232398 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:43.266436 14986 solver.cpp:330] Iteration 90, Testing net (#0)
I0829 16:31:43.514633 14986 blocking_queue.cpp:49] Waiting for data
I0829 16:31:43.638617 14986 solver.cpp:397] Test net output #0: tea_soft_loss = 4.91836
I0829 16:31:43.638684 14986 solver.cpp:397] Test net output #1: teacher_accuracy = 0.0117
I0829 16:31:43.685451 14986 solver.cpp:330] Iteration 100, Testing net (#0)
I wonder what I missed in the updating process in caffe :(

Found the reason.
BatchNorm layer use different use_global_stats in TRAIN and TEST phases.
In my issue, I should set use_global_stats: true in the training process.
And also don't forget the Scale layer.
The revised layers should be
layer {
name: "res2_1a_1_bn"
type: "BatchNorm"
bottom: "conv1"
top: "res2_1a_1_bn"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "res2_1a_1_scale"
type: "Scale"
bottom: "res2_1a_1_bn"
top: "res2_1a_1_bn"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
bias_term: true
}
}

Related

low accuracy in neural network with caffe

In caffe I create a simple network to classifying face images as follows:
myExampleNet.prototxt
name: "myExample"
layer {
name: "example"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/myExample/myExample_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/myExample/myExample_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 50
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 155
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
myExampleSolver.prototxt
net: "examples/myExample/myExampleNet.prototxt"
test_iter: 15
test_interval: 500
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 30000
snapshot: 5000
snapshot_prefix: "examples/myExample/myExample"
solver_mode: CPU
I use convert_imageset of caffe to create LMDB database and my data has about 40000 training and 16000 testing data in face. 155 cases and each one has about 260 and 100 images of train and test respectively.
I use this command for training data:
build/tools/convert_imageset -resize_height=100 -resize_width=100 -shuffle examples/myExample/myData/data/ examples/myExample/myData/data/labels_train.txt examples/myExample/myExample_train_lmdb
and this command for test data:
build/tools/convert_imageset -resize_height=100 -resize_width=100 -shuffle examples/myExample/myData/data/ examples/myExample/myData/data/labels_test.txt examples/myExample/myExample_test_lmdb
But after 30000 iterations my loss is high and the accuracy is low:
...
I0127 09:25:55.602881 27305 solver.cpp:310] Iteration 30000, loss = 4.98317
I0127 09:25:55.602917 27305 solver.cpp:330] Iteration 30000, Testing net (#0)
I0127 09:25:55.602926 27305 net.cpp:676] Ignoring source layer example
I0127 09:25:55.827739 27305 solver.cpp:397] Test net output #0: accuracy = 0.0126667
I0127 09:25:55.827764 27305 solver.cpp:397] Test net output #1: loss = 5.02207 (* 1 = 5.02207 loss)
and when I change my dataset to mnist and change the ip2 layer num_output from 155 to 10, the loss is dramatically reduced and accuracy increases!
Which part is wrong?
There is not necessarily something wrong in your code.
The fact that you get these good results for MNIST says indeed that you have a model that is 'correct' in the sense that it does not produce coding errors etc, but it is in no way any guarantee that it will perform well in another, different problem.
Keep in mind that, in principle, it is much easier to predict a 10-class problem (like MNIST) than a 155-class one; the baseline (i.e. simple random guessing) accuracy in the first case is about 10%, while for the second case is only ~ 0.65%. Add that your data size (comparable to MNIST) is not bigger either (are they also color pictures, i.e. 3-channels in contrast with the single-channel MNIST?), and your results may start looking not that puzzling and surprising.
Additionally, it has turned out that MNIST is notoriously easy to fit (I have been trying myself to build models that will not fit MNIST well, without much success so far), and you easily reach a conclusion that has now become common wisdom in the community, i.e. that good performance on MNIST does not say really much for a model architecture.

Caffe Loss Doesn't Decrease

I am a new user to caffe and I've basically made small modifications to the FCN model to train on my own data. I've noticed that after 680 iterations the loss has not changed. I thought maybe it was because I was applying a scale of 1/255 on the pixels, but I've removed that and there was no change.
My data is in LMDB (1 LMDB for training images, 1 LMDB for training labels, 1 for validation and 1 for validation labels) and the labels are 0 and 1 stored as uint8.
Does anyone have any suggestions?
I0830 23:05:45.645638 2989601728 solver.cpp:218] Iteration 0 (0 iter/s, 74.062s/20 iters), loss = 190732
I0830 23:05:45.647449 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:05:45.647469 2989601728 sgd_solver.cpp:105] Iteration 0, lr = 1e-14
I0830 23:28:42.183948 2989601728 solver.cpp:218] Iteration 20 (0.0145293 iter/s, 1376.53s/20 iters), loss = 190732
I0830 23:28:42.185940 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:28:42.185962 2989601728 sgd_solver.cpp:105] Iteration 20, lr = 1e-14
I0830 23:51:43.803419 2989601728 solver.cpp:218] Iteration 40 (0.0144758 iter/s, 1381.62s/20 iters), loss = 190732
I0830 23:51:43.817291 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:51:43.817371 2989601728 sgd_solver.cpp:105] Iteration 40, lr = 1e-14
I0831 00:17:23.955076 2989601728 solver.cpp:218] Iteration 60 (0.0129858 iter/s, 1540.14s/20 iters), loss = 190732
I0831 00:17:23.957161 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 00:17:23.957203 2989601728 sgd_solver.cpp:105] Iteration 60, lr = 1e-14
I0831 00:40:41.079898 2989601728 solver.cpp:218] Iteration 80 (0.0143152 iter/s, 1397.12s/20 iters), loss = 190732
I0831 00:40:41.082603 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 00:40:41.082649 2989601728 sgd_solver.cpp:105] Iteration 80, lr = 1e-14
I0831 01:03:53.159317 2989601728 solver.cpp:218] Iteration 100 (0.014367 iter/s, 1392.08s/20 iters), loss = 190732
I0831 01:03:53.161844 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:03:53.161903 2989601728 sgd_solver.cpp:105] Iteration 100, lr = 1e-14
I0831 01:27:03.867575 2989601728 solver.cpp:218] Iteration 120 (0.0143812 iter/s, 1390.71s/20 iters), loss = 190732
I0831 01:27:03.869439 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:27:03.869469 2989601728 sgd_solver.cpp:105] Iteration 120, lr = 1e-14
I0831 01:50:10.512094 2989601728 solver.cpp:218] Iteration 140 (0.0144233 iter/s, 1386.64s/20 iters), loss = 190732
I0831 01:50:10.514268 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:50:10.514302 2989601728 sgd_solver.cpp:105] Iteration 140, lr = 1e-14
I0831 02:09:50.607455 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 02:09:50.672649 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 02:13:16.209158 2989601728 solver.cpp:218] Iteration 160 (0.0144332 iter/s, 1385.69s/20 iters), loss = 190732
I0831 02:13:16.211565 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:13:16.211609 2989601728 sgd_solver.cpp:105] Iteration 160, lr = 1e-14
I0831 02:36:30.536650 2989601728 solver.cpp:218] Iteration 180 (0.0143439 iter/s, 1394.32s/20 iters), loss = 190732
I0831 02:36:30.538833 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:36:30.539871 2989601728 sgd_solver.cpp:105] Iteration 180, lr = 1e-14
I0831 02:59:38.813151 2989601728 solver.cpp:218] Iteration 200 (0.0144064 iter/s, 1388.27s/20 iters), loss = 190732
I0831 02:59:38.814018 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:59:38.814097 2989601728 sgd_solver.cpp:105] Iteration 200, lr = 1e-14
I0831 03:22:46.534659 2989601728 solver.cpp:218] Iteration 220 (0.0144121 iter/s, 1387.72s/20 iters), loss = 190732
I0831 03:22:46.536751 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 03:22:46.536808 2989601728 sgd_solver.cpp:105] Iteration 220, lr = 1e-14
I0831 03:46:38.997651 2989601728 solver.cpp:218] Iteration 240 (0.013962 iter/s, 1432.46s/20 iters), loss = 190732
I0831 03:46:39.001502 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 03:46:39.001591 2989601728 sgd_solver.cpp:105] Iteration 240, lr = 1e-14
I0831 04:09:49.981889 2989601728 solver.cpp:218] Iteration 260 (0.0143784 iter/s, 1390.98s/20 iters), loss = 190732
I0831 04:09:49.983256 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:09:49.983301 2989601728 sgd_solver.cpp:105] Iteration 260, lr = 1e-14
I0831 04:32:59.845221 2989601728 solver.cpp:218] Iteration 280 (0.0143899 iter/s, 1389.86s/20 iters), loss = 190732
I0831 04:32:59.847712 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:32:59.847936 2989601728 sgd_solver.cpp:105] Iteration 280, lr = 1e-14
I0831 04:56:07.752025 2989601728 solver.cpp:218] Iteration 300 (0.0144102 iter/s, 1387.9s/20 iters), loss = 190732
I0831 04:56:07.754050 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:56:07.754091 2989601728 sgd_solver.cpp:105] Iteration 300, lr = 1e-14
I0831 05:16:57.383947 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 05:16:57.468634 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 05:19:16.101671 2989601728 solver.cpp:218] Iteration 320 (0.0144056 iter/s, 1388.35s/20 iters), loss = 190732
I0831 05:19:16.102998 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 05:19:16.103953 2989601728 sgd_solver.cpp:105] Iteration 320, lr = 1e-14
I0831 05:42:22.554265 2989601728 solver.cpp:218] Iteration 340 (0.0144253 iter/s, 1386.45s/20 iters), loss = 190732
I0831 05:42:22.557201 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 05:42:22.558081 2989601728 sgd_solver.cpp:105] Iteration 340, lr = 1e-14
I0831 06:05:33.816596 2989601728 solver.cpp:218] Iteration 360 (0.0143755 iter/s, 1391.26s/20 iters), loss = 190732
I0831 06:05:33.819310 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:05:33.819358 2989601728 sgd_solver.cpp:105] Iteration 360, lr = 1e-14
I0831 06:28:38.358750 2989601728 solver.cpp:218] Iteration 380 (0.0144452 iter/s, 1384.54s/20 iters), loss = 190732
I0831 06:28:38.362834 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:28:38.363451 2989601728 sgd_solver.cpp:105] Iteration 380, lr = 1e-14
I0831 06:51:48.489392 2989601728 solver.cpp:218] Iteration 400 (0.0143872 iter/s, 1390.13s/20 iters), loss = 190732
I0831 06:51:48.490061 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:51:48.491013 2989601728 sgd_solver.cpp:105] Iteration 400, lr = 1e-14
I0831 07:15:00.156152 2989601728 solver.cpp:218] Iteration 420 (0.0143713 iter/s, 1391.67s/20 iters), loss = 190732
I0831 07:15:00.159214 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 07:15:00.159261 2989601728 sgd_solver.cpp:105] Iteration 420, lr = 1e-14
I0831 07:38:09.862089 2989601728 solver.cpp:218] Iteration 440 (0.0143916 iter/s, 1389.7s/20 iters), loss = 190732
I0831 07:38:09.865105 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 07:38:09.865152 2989601728 sgd_solver.cpp:105] Iteration 440, lr = 1e-14
I0831 08:01:15.438222 2989601728 solver.cpp:218] Iteration 460 (0.0144345 iter/s, 1385.57s/20 iters), loss = 190732
I0831 08:01:15.439589 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:01:15.440675 2989601728 sgd_solver.cpp:105] Iteration 460, lr = 1e-14
I0831 08:24:24.188830 2989601728 solver.cpp:218] Iteration 480 (0.0144015 iter/s, 1388.75s/20 iters), loss = 190732
I0831 08:24:24.191907 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:24:24.191951 2989601728 sgd_solver.cpp:105] Iteration 480, lr = 1e-14
I0831 08:24:24.514991 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 08:24:24.524113 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 08:47:29.558264 2989601728 solver.cpp:218] Iteration 500 (0.0144366 iter/s, 1385.37s/20 iters), loss = 190732
I0831 08:47:29.562070 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:47:29.562104 2989601728 sgd_solver.cpp:105] Iteration 500, lr = 1e-14
I0831 09:10:43.430681 2989601728 solver.cpp:218] Iteration 520 (0.0143486 iter/s, 1393.87s/20 iters), loss = 190732
I0831 09:10:43.432601 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:10:43.433498 2989601728 sgd_solver.cpp:105] Iteration 520, lr = 1e-14
I0831 09:33:53.022397 2989601728 solver.cpp:218] Iteration 540 (0.0143927 iter/s, 1389.59s/20 iters), loss = 190732
I0831 09:33:53.024354 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:33:53.024405 2989601728 sgd_solver.cpp:105] Iteration 540, lr = 1e-14
I0831 09:56:59.140298 2989601728 solver.cpp:218] Iteration 560 (0.0144288 iter/s, 1386.11s/20 iters), loss = 190732
I0831 09:56:59.142597 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:56:59.142642 2989601728 sgd_solver.cpp:105] Iteration 560, lr = 1e-14
I0831 10:20:10.334044 2989601728 solver.cpp:218] Iteration 580 (0.0143762 iter/s, 1391.19s/20 iters), loss = 190732
I0831 10:20:10.336256 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 10:20:10.336287 2989601728 sgd_solver.cpp:105] Iteration 580, lr = 1e-14
I0831 10:43:15.363580 2989601728 solver.cpp:218] Iteration 600 (0.0144402 iter/s, 1385.03s/20 iters), loss = 190732
I0831 10:43:15.365350 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 10:43:15.365380 2989601728 sgd_solver.cpp:105] Iteration 600, lr = 1e-14
I0831 11:06:26.864280 2989601728 solver.cpp:218] Iteration 620 (0.014373 iter/s, 1391.5s/20 iters), loss = 190732
I0831 11:06:26.867431 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:06:26.867480 2989601728 sgd_solver.cpp:105] Iteration 620, lr = 1e-14
I0831 11:29:37.275745 2989601728 solver.cpp:218] Iteration 640 (0.0143843 iter/s, 1390.41s/20 iters), loss = 190732
I0831 11:29:37.277166 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:29:37.277206 2989601728 sgd_solver.cpp:105] Iteration 640, lr = 1e-14
I0831 11:30:47.900959 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 11:30:47.934394 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 11:53:00.394335 2989601728 solver.cpp:218] Iteration 660 (0.014254 iter/s, 1403.11s/20 iters), loss = 190732
I0831 11:53:00.399102 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:53:00.399185 2989601728 sgd_solver.cpp:105] Iteration 660, lr = 1e-14
I0831 12:16:24.352802 2989601728 solver.cpp:218] Iteration 680 (0.0142455 iter/s, 1403.95s/20 iters), loss = 190732
I0831 12:16:24.355890 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 12:16:24.356781 2989601728 sgd_solver.cpp:105] Iteration 680, lr = 1e-14
This is my the definition of the network for the training phase:
name: "face-detect"
state {
phase: TRAIN
level: 0
stage: ""
}
layer {
name: "data"
type: "Data"
top: "data"
include {
phase: TRAIN
}
transform_param {
mean_value: 104.006989
mean_value: 116.66877
mean_value: 122.678917
}
data_param {
source: "data/fddb-face-database/train_img_lmdb"
scale: 0.00390625
batch_size: 16
backend: LMDB
}
}
layer {
name: "label"
type: "Data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "data/fddb-face-database/train_lab_lmdb"
batch_size: 16
backend: LMDB
}
}
layer {
name: "mod1_conv1"
type: "Convolution"
bottom: "data"
top: "mod1_conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod1_relu1"
type: "ReLU"
bottom: "mod1_conv1"
top: "mod1_conv1"
}
layer {
name: "mod1_conv2"
type: "Convolution"
bottom: "mod1_conv1"
top: "mod1_conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod1_relu2"
type: "ReLU"
bottom: "mod1_conv2"
top: "mod1_conv2"
}
layer {
name: "mod1_pool1"
type: "Pooling"
bottom: "mod1_conv2"
top: "mod1_pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "mod2_conv1"
type: "Convolution"
bottom: "mod1_pool1"
top: "mod2_conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod2_relu1"
type: "ReLU"
bottom: "mod2_conv1"
top: "mod2_conv1"
}
layer {
name: "mod2_conv2"
type: "Convolution"
bottom: "mod2_conv1"
top: "mod2_conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod2_relu2"
type: "ReLU"
bottom: "mod2_conv2"
top: "mod2_conv2"
}
layer {
name: "mod2_pool1"
type: "Pooling"
bottom: "mod2_conv2"
top: "mod2_pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "mod3_conv1"
type: "Convolution"
bottom: "mod2_pool1"
top: "mod3_conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod3_relu1"
type: "ReLU"
bottom: "mod3_conv1"
top: "mod3_conv1"
}
layer {
name: "mod3_conv2"
type: "Convolution"
bottom: "mod3_conv1"
top: "mod3_conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod3_relu2"
type: "ReLU"
bottom: "mod3_conv2"
top: "mod3_conv2"
}
layer {
name: "mod3_pool1"
type: "Pooling"
bottom: "mod3_conv2"
top: "mod3_pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "mod4_conv1"
type: "Convolution"
bottom: "mod3_pool1"
top: "mod4_conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod4_relu1"
type: "ReLU"
bottom: "mod4_conv1"
top: "mod4_conv1"
}
layer {
name: "mod4_conv2"
type: "Convolution"
bottom: "mod4_conv1"
top: "mod4_conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod4_relu2"
type: "ReLU"
bottom: "mod4_conv2"
top: "mod4_conv2"
}
layer {
name: "mod4_pool1"
type: "Pooling"
bottom: "mod4_conv2"
top: "mod4_pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "mod5_conv1"
type: "Convolution"
bottom: "mod4_pool1"
top: "mod5_conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod5_relu1"
type: "ReLU"
bottom: "mod5_conv1"
top: "mod5_conv1"
}
layer {
name: "mod5_conv2"
type: "Convolution"
bottom: "mod5_conv1"
top: "mod5_conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "mod5_relu2"
type: "ReLU"
bottom: "mod5_conv2"
top: "mod5_conv2"
}
layer {
name: "mod5_pool1"
type: "Pooling"
bottom: "mod5_conv2"
top: "mod5_pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "mod6_fc1"
type: "Convolution"
bottom: "mod5_pool1"
top: "mod6_fc1"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
stride: 1
}
}
layer {
name: "mod6_relu1"
type: "ReLU"
bottom: "mod6_fc1"
top: "mod6_fc1"
}
layer {
name: "mod6_drop1"
type: "Dropout"
bottom: "mod6_fc1"
top: "mod6_fc1"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "mod6_score1"
type: "Convolution"
bottom: "mod6_fc1"
top: "mod6_score1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 2
pad: 0
kernel_size: 1
}
}
layer {
name: "mod6_upscore1"
type: "Deconvolution"
bottom: "mod6_score1"
top: "mod6_upscore1"
param {
lr_mult: 0
}
convolution_param {
num_output: 2
bias_term: false
kernel_size: 2
stride: 2
}
}
layer {
name: "mod6_score2"
type: "Convolution"
bottom: "mod4_pool1"
top: "mod6_score2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 2
pad: 0
kernel_size: 1
}
}
layer {
name: "crop"
type: "Crop"
bottom: "mod6_score2"
bottom: "mod6_upscore1"
top: "mod6_score2c"
}
layer {
name: "mod6_fuse1"
type: "Eltwise"
bottom: "mod6_upscore1"
bottom: "mod6_score2c"
top: "mod6_fuse1"
eltwise_param {
operation: SUM
}
}
layer {
name: "mod6_upfuse1"
type: "Deconvolution"
bottom: "mod6_fuse1"
top: "mod6_upfuse1"
param {
lr_mult: 0
}
convolution_param {
num_output: 2
bias_term: false
kernel_size: 2
stride: 2
}
}
layer {
name: "mod6_score3"
type: "Convolution"
bottom: "mod3_pool1"
top: "mod6_score3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 2
pad: 0
kernel_size: 1
}
}
layer {
name: "crop"
type: "Crop"
bottom: "mod6_score3"
bottom: "mod6_upfuse1"
top: "mod6_score3c"
}
layer {
name: "mod6_fuse2"
type: "Eltwise"
bottom: "mod6_upfuse1"
bottom: "mod6_score3c"
top: "mod6_fuse2"
eltwise_param {
operation: SUM
}
}
layer {
name: "mod6_upfuse2"
type: "Deconvolution"
bottom: "mod6_fuse2"
top: "mod6_upfuse2"
param {
lr_mult: 0
}
convolution_param {
num_output: 2
bias_term: false
kernel_size: 8
stride: 8
}
}
layer {
name: "crop"
type: "Crop"
bottom: "mod6_upfuse2"
bottom: "label"
top: "score"
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
loss_param {
normalize: false
}
}
This is my solver.prototxt:
net: "models/face-detect/train_val.prototxt"
test_iter: 736
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for unnormalized softmax
base_lr: 1e-14
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "models/face-detect/snapshot/train"
test_initialization: false
# Uncomment the following to default to CPU mode solving
solver_mode: CPU
Here is how I prepared my LMDB:
def load_image(img_path, size=None):
# Load image as np.uint8 {0, ..., 255}
# image shape: [height, width, channel]
img = cv2.imread(img_path)
# Resize to stack size
if size != None:
img = imresize(img, size);
# Switch to BGR from RGB
img = img[:, :, ::-1];
# Switch to [channel, height, width]
img = np.transpose(img, (2, 0, 1));
return img;
def load_label(img_path, size=None):
img = cv2.imread(img_path, cv2.COLOR_BGR2GRAY);
if size != None:
img = imresize(img, size);
# Verbose storage to single channel
img = np.reshape(img, [1, img.shape[0], img.shape[1]]);
return img;
def imgs_to_lmdb(img_paths, lmdb_path, dtype='rgb', size=None):
in_db = lmdb.open(lmdb_path, map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
for img_idx, img_path in enumerate(img_paths):
if dtype == 'rgb':
img = load_image(img_path, size);
elif dtype == 'label':
img = load_label(img_path, size);
# Store as byte data
img_dat = caffe.io.array_to_datum(img);
in_txn.put('{:0>10d}'.format(img_idx), img_dat.SerializeToString());
in_db.close()
Your base_lr seems to be too small. So your weights won't get updated quick enough. You should start with a base_lr of 1e-10. The learning rate is multiplied with the loss gradient and used to update the weights. If the learning rate is too small the update will be very small and the convergence will be too slow. Too large a learning rate will give you erratic results. There is no magic number to start with so you have to find the right hyper-parameters for your data and network empirically.
You should also try learning rate decay. My favorite is the constant learning rate decay used in GoogleLeNet in which we reduce the learning rate by 4% every 8 epochs. A decaying learning rate helps in convergence as it tries to retain more information by reducing update capacity. This means that your network will not forget what it has already learned.
After this always use a momentum based optimizer like Adam or RMSprop. They greatly reduce the jitters in learning and ensure a smooth progressing to the minima.

Caffe training loss is 0

I'm training an alexnet .caffemodel with faceScrub dataset, I'm following
Face Detection
Fine-Tuning
Thing is that when I'm training the model I get this output:
I0302 10:59:50.184250 11346 solver.cpp:331] Iteration 0, Testing net (#0)
I0302 11:09:01.198473 11346 solver.cpp:398] Test net output #0: accuracy = 0.96793
I0302 11:09:01.198635 11346 solver.cpp:398] Test net output #1: loss = 0.354751 (* 1 = 0.354751 loss)
I0302 11:09:12.543730 11346 solver.cpp:219] Iteration 0 (0 iter/s, 562.435s/20 iters), loss = 0.465583
I0302 11:09:12.543861 11346 solver.cpp:238] Train net output #0: loss = 0.465583 (* 1 = 0.465583 loss)
I0302 11:09:12.543902 11346 sgd_solver.cpp:105] Iteration 0, lr = 0.001
I0302 11:14:41.847237 11346 solver.cpp:219] Iteration 20 (0.0607343 iter/s, 329.303s/20 iters), loss = 4.65581e-09
I0302 11:14:41.847409 11346 solver.cpp:238] Train net output #0: loss = 0 (* 1 = 0 loss)
I0302 11:14:41.847447 11346 sgd_solver.cpp:105] Iteration 20, lr = 0.001
I0302 11:18:25.848346 11346 solver.cpp:219] Iteration 40 (0.0892857 iter/s, 224s/20 iters), loss = 4.65581e-09
I0302 11:18:25.848526 11346 solver.cpp:238] Train net output #0: loss = 0 (* 1 = 0 loss)
I0302 11:18:25.848565 11346 sgd_solver.cpp:105] Iteration 40, lr = 0.001
and it continues the same.
The only thing I am suspicious on is that in the Face Detection link train_val.prototxt it uses num_output: 2 in the fc8_flickr layer, so I have the .txt file with all the images in this format:
/media/jose/B430F55030F51A56/faceScrub/download/Steve_Carell/face/a3b1b70acd0fda72c98be121a2af3ea2f4209fe7.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Matt_Czuchry/face/98882354bbf3a508b48c6f53a84a68ca6797e617.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Linda_Gray/face/ca9356b2382d2595ba8a9ff399dc3efa80873d72.jpg 1
/media/jose/B430F55030F51A56/faceScrub/download/Veronica_Hamel/face/900da3a6a22b25b3974e1f7602686f460126d028.jpg 1
With 1 being the class containing a face. If I remove the 1, it gets stuck in Iteration 0, Testing net (#0).
Any insight on this?

Training error and validation error are the same - zero accuracy during training

I am basically using caffeNet to do some sort of image classification, with 256 classes. I am feeding the network a list of HDF5 files. But my network doesn't seem to learn, I m having accuracy 0 all the time and the training error and validation error are the same. I would think if the dataset was not enough to learn, training error should be very small and the validation error should be large right? Also tried it with different batch sizes and learning rates, with no success. Here is the solver.prototxt and the network prototxt(a caffenet arch.). Any suggestion is appreciate.
I1103 12:01:41.822055 108615 solver.cpp:337] Iteration 0, Testing net (#0)
I1103 12:01:41.849742 108615 solver.cpp:404] Test net output #0: accuracy = 0
I1103 12:01:41.849761 108615 solver.cpp:404] Test net output #1: loss = 6.02617 (* 1 = 6.02617 loss)
I1103 12:01:41.869380 108615 solver.cpp:228] Iteration 0, loss = 6.05644
I1103 12:01:41.869398 108615 solver.cpp:244] Train net output #0: loss = 6.05644 (* 1 = 6.05644 loss)
I1103 12:01:41.869413 108615 sgd_solver.cpp:106] Iteration 0, lr = 0.1
I1103 12:01:47.624855 108615 solver.cpp:228] Iteration 500, loss = 87.3365
I1103 12:01:47.624876 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:47.624882 108615 sgd_solver.cpp:106] Iteration 500, lr = 0.1
I1103 12:01:53.290213 108615 solver.cpp:337] Iteration 1000, Testing net (#0)
I1103 12:01:53.299310 108615 solver.cpp:404] Test net output #0: accuracy = 0
I1103 12:01:53.299327 108615 solver.cpp:404] Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:53.314584 108615 solver.cpp:228] Iteration 1000, loss = 87.3365
I1103 12:01:53.314615 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:53.314621 108615 sgd_solver.cpp:106] Iteration 1000, lr = 0.01
I1103 12:01:58.991268 108615 solver.cpp:228] Iteration 1500, loss = 87.3365
I1103 12:01:58.991315 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:58.991322 108615 sgd_solver.cpp:106] Iteration 1500, lr = 0.01
I1103 12:02:04.664419 108615 solver.cpp:337] Iteration 2000, Testing net (#0)
I1103 12:02:04.673518 108615 solver.cpp:404] Test net output #0: accuracy = 0
I1103 12:02:04.673537 108615 solver.cpp:404] Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:04.690434 108615 solver.cpp:228] Iteration 2000, loss = 87.3365
I1103 12:02:04.690469 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:04.690481 108615 sgd_solver.cpp:106] Iteration 2000, lr = 0.001
I1103 12:02:10.373788 108615 solver.cpp:228] Iteration 2500, loss = 87.3365
I1103 12:02:10.373852 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:10.373859 108615 sgd_solver.cpp:106] Iteration 2500, lr = 0.001
I1103 12:02:16.047372 108615 solver.cpp:337] Iteration 3000, Testing net (#0)
I1103 12:02:16.056390 108615 solver.cpp:404] Test net output #0: accuracy = 0
I1103 12:02:16.056407 108615 solver.cpp:404] Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:16.070235 108615 solver.cpp:228] Iteration 3000, loss = 87.3365
I1103 12:02:16.070261 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:16.070267 108615 sgd_solver.cpp:106] Iteration 3000, lr = 0.0001
I1103 12:02:21.755348 108615 solver.cpp:228] Iteration 3500, loss = 87.3365
I1103 12:02:21.755369 108615 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:21.755375 108615 sgd_solver.cpp:106] Iteration 3500, lr = 0.0001
----------------------------------
net: "/A/B/train.prototxt"
test_iter: 10
test_interval: 1000
base_lr: 0.1
lr_policy: "step"
gamma: 0.1
stepsize: 1000
display: 10
max_iter: 4000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "/A/B/model_"
solver_mode: GPU
--------------------------------------------------
layer {
name: "data"
type: "HDF5Data"
top: "X"
top: "y"
hdf5_data_param{
source:"/Path/to/trainh5list.txt"
batch_size: 1
}
include{phase: TRAIN}
}
layer {
name: "data"
type: "HDF5Data"
top: "X"
top: "y"
hdf5_data_param{
source:"/Path/to/testh5list.txt"
batch_size: 1
}
include{phase: TEST}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "X"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
.
.
.
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 256
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "y"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "y"
top: "loss"
}

When I train my caffe model, the loss always keeps a large value and the accuracy does not improve.

This is the result I get when I train my own model
I0510 20:53:16.677439 3591 solver.cpp:337] Iteration 0, Testing net (#0)
I0510 20:57:20.822933 3591 solver.cpp:404] Test net output #0: accuracy = 3.78788e-05
I0510 20:57:20.823001 3591 solver.cpp:404] Test net output #1: loss = 9.27223 (* 1 = 9.27223 loss)
I0510 20:57:21.423084 3591 solver.cpp:228] Iteration 0, loss = 9.29181
I0510 20:57:21.423110 3591 solver.cpp:244] Train net output #0: loss = 9.29181 (* 1 = 9.29181 loss)
I0510 20:57:21.423120 3591 sgd_solver.cpp:106] Iteration 0, lr = 0.001
I0510 21:06:57.498831 3591 solver.cpp:337] Iteration 1000, Testing net (#0)
I0510 21:10:59.477396 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0510 21:10:59.477463 3591 solver.cpp:404] Test net output #1: loss = 8.86572 (* 1 = 8.86572 loss)
I0510 21:20:35.828510 3591 solver.cpp:337] Iteration 2000, Testing net (#0)
I0510 21:24:42.838196 3591 solver.cpp:404] Test net output #0: accuracy = 0.00144886
I0510 21:24:42.838245 3591 solver.cpp:404] Test net output #1: loss = 8.83859 (* 1 = 8.83859 loss)
I0510 21:24:43.412120 3591 solver.cpp:228] Iteration 2000, loss = 8.81461
I0510 21:24:43.412145 3591 solver.cpp:244] Train net output #0: loss = 8.81461 (* 1 = 8.81461 loss)
I0510 21:24:43.412150 3591 sgd_solver.cpp:106] Iteration 2000, lr = 0.001
I0510 21:38:50.990823 3591 solver.cpp:337] Iteration 3000, Testing net (#0)
I0510 21:42:52.918418 3591 solver.cpp:404] Test net output #0: accuracy = 0.00140152
I0510 21:42:52.918493 3591 solver.cpp:404] Test net output #1: loss = 8.81789 (* 1 = 8.81789 loss)
I0510 22:00:09.519151 3591 solver.cpp:337] Iteration 4000, Testing net (#0)
I0510 22:09:13.918016 3591 solver.cpp:404] Test net output #0: accuracy = 0.00149621
I0510 22:09:13.918102 3591 solver.cpp:404] Test net output #1: loss = 8.80909 (* 1 = 8.80909 loss)
I0510 22:09:15.127683 3591 solver.cpp:228] Iteration 4000, loss = 8.8597
I0510 22:09:15.127722 3591 solver.cpp:244] Train net output #0: loss = 8.8597 (* 1 = 8.8597 loss)
I0510 22:09:15.127729 3591 sgd_solver.cpp:106] Iteration 4000, lr = 0.001
I0510 22:28:39.320019 3591 solver.cpp:337] Iteration 5000, Testing net (#0)
I0510 22:37:43.847064 3591 solver.cpp:404] Test net output #0: accuracy = 0.00118371
I0510 22:37:43.847173 3591 solver.cpp:404] Test net output #1: loss = 8.80527 (* 1 = 8.80527 loss)
I0510 23:58:17.120088 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_10000.caffemodel
I0510 23:58:17.238307 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_10000.solverstate
I0510 23:58:17.491825 3591 solver.cpp:337] Iteration 10000, Testing net (#0)
I0511 00:02:19.412715 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0511 00:02:19.412762 3591 solver.cpp:404] Test net output #1: loss = 8.79114 (* 1 = 8.79114 loss)
I0511 00:02:19.986547 3591 solver.cpp:228] Iteration 10000, loss = 8.83457
I0511 00:02:19.986570 3591 solver.cpp:244] Train net output #0: loss = 8.83457 (* 1 = 8.83457 loss)
I0511 00:02:19.986578 3591 sgd_solver.cpp:106] Iteration 10000, lr = 0.001
I0511 00:11:55.546052 3591 solver.cpp:337] Iteration 11000, Testing net (#0)
I0511 00:15:57.490486 3591 solver.cpp:404] Test net output #0: accuracy = 0.00164773
I0511 00:15:57.490532 3591 solver.cpp:404] Test net output #1: loss = 8.78702 (* 1 = 8.78702 loss)
I0511 00:25:33.666496 3591 solver.cpp:337] Iteration 12000, Testing net (#0)
I0511 00:29:35.603062 3591 solver.cpp:404] Test net output #0: accuracy = 0.0016572
I0511 00:29:35.603109 3591 solver.cpp:404] Test net output #1: loss = 8.7848 (* 1 = 8.7848 loss)
I0511 00:29:36.177078 3591 solver.cpp:228] Iteration 12000, loss = 9.00561
I0511 00:29:36.177105 3591 solver.cpp:244] Train net output #0: loss = 9.00561 (* 1 = 9.00561 loss)
I0511 00:29:36.177114 3591 sgd_solver.cpp:106] Iteration 12000, lr = 0.001
I0511 00:39:11.729369 3591 solver.cpp:337] Iteration 13000, Testing net (#0)
I0511 00:43:13.678067 3591 solver.cpp:404] Test net output #0: accuracy = 0.001875
I0511 00:43:13.678113 3591 solver.cpp:404] Test net output #1: loss = 8.78359 (* 1 = 8.78359 loss)
I0511 00:52:49.851985 3591 solver.cpp:337] Iteration 14000, Testing net (#0)
I0511 00:56:51.767343 3591 solver.cpp:404] Test net output #0: accuracy = 0.00154356
I0511 00:56:51.767390 3591 solver.cpp:404] Test net output #1: loss = 8.77998 (* 1 = 8.77998 loss)
I0511 00:56:52.341564 3591 solver.cpp:228] Iteration 14000, loss = 8.83385
I0511 00:56:52.341591 3591 solver.cpp:244] Train net output #0: loss = 8.83385 (* 1 = 8.83385 loss)
I0511 00:56:52.341598 3591 sgd_solver.cpp:106] Iteration 14000, lr = 0.001
I0511 02:14:38.224290 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_20000.caffemodel
I0511 02:14:38.735008 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_20000.solverstate
I0511 02:14:38.805809 3591 solver.cpp:337] Iteration 20000, Testing net (#0)
I0511 02:18:40.681993 3591 solver.cpp:404] Test net output #0: accuracy = 0.00179924
I0511 02:18:40.682086 3591 solver.cpp:404] Test net output #1: loss = 8.78129 (* 1 = 8.78129 loss)
I0511 02:18:41.255969 3591 solver.cpp:228] Iteration 20000, loss = 8.82502
I0511 02:18:41.255995 3591 solver.cpp:244] Train net output #0: loss = 8.82502 (* 1 = 8.82502 loss)
I0511 02:18:41.256001 3591 sgd_solver.cpp:106] Iteration 20000, lr = 0.001
I0511 04:30:58.924096 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_30000.caffemodel
I0511 04:31:00.742739 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_30000.solverstate
I0511 04:31:01.151980 3591 solver.cpp:337] Iteration 30000, Testing net (#0)
I0511 04:35:03.075263 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0511 04:35:03.075307 3591 solver.cpp:404] Test net output #1: loss = 8.77867 (* 1 = 8.77867 loss)
I0511 04:35:03.649479 3591 solver.cpp:228] Iteration 30000, loss = 8.82915
I0511 04:35:03.649507 3591 solver.cpp:244] Train net output #0: loss = 8.82915 (* 1 = 8.82915 loss)
I0511 04:35:03.649513 3591 sgd_solver.cpp:106] Iteration 30000, lr = 0.001
I0511 07:55:36.848265 3591 solver.cpp:337] Iteration 45000, Testing net (#0)
I0511 07:59:38.834043 3591 solver.cpp:404] Test net output #0: accuracy = 0.00179924
I0511 07:59:38.834095 3591 solver.cpp:404] Test net output #1: loss = 8.77432 (* 1 = 8.77432 loss)
I0511 09:03:48.141854 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_50000.caffemodel
I0511 09:03:49.736464 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_50000.solverstate
I0511 09:03:49.797582 3591 solver.cpp:337] Iteration 50000, Testing net (#0)
I0511 09:07:51.777150 3591 solver.cpp:404] Test net output #0: accuracy = 0.001875
I0511 09:07:51.777207 3591 solver.cpp:404] Test net output #1: loss = 8.77058 (* 1 = 8.77058 loss)
I0511 09:07:52.351323 3591 solver.cpp:228] Iteration 50000, loss = 9.11435
I0511 09:07:52.351351 3591 solver.cpp:244] Train net output #0: loss = 9.11435 (* 1 = 9.11435 loss)
I0511 09:07:52.351357 3591 sgd_solver.cpp:106] Iteration 50000, lr = 0.001
I0511 09:17:28.188742 3591 solver.cpp:337] Iteration 51000, Testing net (#0)
I0511 09:21:30.200623 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0511 09:21:30.200716 3591 solver.cpp:404] Test net output #1: loss = 8.77026 (* 1 = 8.77026 loss)
I0511 09:31:06.596501 3591 solver.cpp:337] Iteration 52000, Testing net (#0)
I0511 09:35:08.580215 3591 solver.cpp:404] Test net output #0: accuracy = 0.00182765
I0511 09:35:08.580313 3591 solver.cpp:404] Test net output #1: loss = 8.76917 (* 1 = 8.76917 loss)
I0511 09:35:09.154428 3591 solver.cpp:228] Iteration 52000, loss = 8.89758
I0511 09:35:09.154453 3591 solver.cpp:244] Train net output #0: loss = 8.89758 (* 1 = 8.89758 loss)
I0511 09:35:09.154459 3591 sgd_solver.cpp:106] Iteration 52000, lr = 0.001
I0511 09:44:44.906309 3591 solver.cpp:337] Iteration 53000, Testing net (#0)
I0511 09:48:46.866353 3591 solver.cpp:404] Test net output #0: accuracy = 0.00185606
I0511 09:48:46.866430 3591 solver.cpp:404] Test net output #1: loss = 8.7708 (* 1 = 8.7708 loss)
I0511 09:58:23.097244 3591 solver.cpp:337] Iteration 54000, Testing net (#0)
I0511 10:02:25.056555 3591 solver.cpp:404] Test net output #0: accuracy = 0.00192235
I0511 10:02:25.056605 3591 solver.cpp:404] Test net output #1: loss = 8.76884 (* 1 = 8.76884 loss)
I0511 10:02:25.630312 3591 solver.cpp:228] Iteration 54000, loss = 8.90552
I0511 10:02:25.630337 3591 solver.cpp:244] Train net output #0: loss = 8.90552 (* 1 = 8.90552 loss)
I0511 10:02:25.630342 3591 sgd_solver.cpp:106] Iteration 54000, lr = 0.001
I0511 14:44:51.563555 3591 solver.cpp:337] Iteration 75000, Testing net (#0)
I0511 14:48:53.573640 3591 solver.cpp:404] Test net output #0: accuracy = 0.0016572
I0511 14:48:53.573724 3591 solver.cpp:404] Test net output #1: loss = 8.76967 (* 1 = 8.76967 loss)
I0511 14:58:30.080453 3591 solver.cpp:337] Iteration 76000, Testing net (#0)
I0511 15:02:32.076011 3591 solver.cpp:404] Test net output #0: accuracy = 0.001875
I0511 15:02:32.076077 3591 solver.cpp:404] Test net output #1: loss = 8.7695 (* 1 = 8.7695 loss)
I0511 15:02:32.650342 3591 solver.cpp:228] Iteration 76000, loss = 9.0084
I0511 15:02:32.650367 3591 solver.cpp:244] Train net output #0: loss = 9.0084 (* 1 = 9.0084 loss)
I0511 15:02:32.650373 3591 sgd_solver.cpp:106] Iteration 76000, lr = 0.001
I0511 15:12:08.597450 3591 solver.cpp:337] Iteration 77000, Testing net (#0)
I0511 15:16:10.636613 3591 solver.cpp:404] Test net output #0: accuracy = 0.00181818
I0511 15:16:10.636693 3591 solver.cpp:404] Test net output #1: loss = 8.76889 (* 1 = 8.76889 loss)
I0511 15:25:47.167667 3591 solver.cpp:337] Iteration 78000, Testing net (#0)
I0511 15:29:49.204596 3591 solver.cpp:404] Test net output #0: accuracy = 0.00185606
I0511 15:29:49.204649 3591 solver.cpp:404] Test net output #1: loss = 8.77059 (* 1 = 8.77059 loss)
I0511 15:29:49.779094 3591 solver.cpp:228] Iteration 78000, loss = 8.73139
I0511 15:29:49.779119 3591 solver.cpp:244] Train net output #0: loss = 8.73139 (* 1 = 8.73139 loss)
I0511 15:29:49.779124 3591 sgd_solver.cpp:106] Iteration 78000, lr = 0.001
I0511 15:39:25.730358 3591 solver.cpp:337] Iteration 79000, Testing net (#0)
I0511 15:43:27.756417 3591 solver.cpp:404] Test net output #0: accuracy = 0.00192235
I0511 15:43:27.756485 3591 solver.cpp:404] Test net output #1: loss = 8.76846 (* 1 = 8.76846 loss)
I0511 15:53:04.419961 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_80000.caffemodel
I0511 15:53:06.138357 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_80000.solverstate
I0511 15:53:06.519551 3591 solver.cpp:337] Iteration 80000, Testing net (#0)
I0511 15:57:08.719681 3591 solver.cpp:404] Test net output #0: accuracy = 0.00164773
I0511 15:57:08.719737 3591 solver.cpp:404] Test net output #1: loss = 8.77126 (* 1 = 8.77126 loss)
I0511 15:57:09.294163 3591 solver.cpp:228] Iteration 80000, loss = 8.56576
I0511 15:57:09.294188 3591 solver.cpp:244] Train net output #0: loss = 8.56576 (* 1 = 8.56576 loss)
I0511 15:57:09.294193 3591 sgd_solver.cpp:106] Iteration 80000, lr = 0.001
I0511 17:01:19.190099 3591 solver.cpp:337] Iteration 85000, Testing net (#0)
I0511 17:05:21.148668 3591 solver.cpp:404] Test net output #0: accuracy = 0.00185606
I0511 17:05:21.148733 3591 solver.cpp:404] Test net output #1: loss = 8.77196 (* 1 = 8.77196 loss)
I0511 17:14:57.670343 3591 solver.cpp:337] Iteration 86000, Testing net (#0)
I0511 17:18:59.659850 3591 solver.cpp:404] Test net output #0: accuracy = 0.00181818
I0511 17:18:59.659907 3591 solver.cpp:404] Test net output #1: loss = 8.77126 (* 1 = 8.77126 loss)
I0511 17:19:00.234335 3591 solver.cpp:228] Iteration 86000, loss = 8.72875
I0511 17:19:00.234359 3591 solver.cpp:244] Train net output #0: loss = 8.72875 (* 1 = 8.72875 loss)
I0511 17:19:00.234364 3591 sgd_solver.cpp:106] Iteration 86000, lr = 0.001
I0511 17:28:36.196920 3591 solver.cpp:337] Iteration 87000, Testing net (#0)
I0511 17:32:38.181174 3591 solver.cpp:404] Test net output #0: accuracy = 0.00181818
I0511 17:32:38.181231 3591 solver.cpp:404] Test net output #1: loss = 8.771 (* 1 = 8.771 loss)
I0511 17:42:14.658293 3591 solver.cpp:337] Iteration 88000, Testing net (#0)
I0511 17:46:16.614358 3591 solver.cpp:404] Test net output #0: accuracy = 0.00188447
I0511 17:46:16.614415 3591 solver.cpp:404] Test net output #1: loss = 8.76964 (* 1 = 8.76964 loss)
I0511 17:46:17.188212 3591 solver.cpp:228] Iteration 88000, loss = 8.80409
I0511 17:46:17.188233 3591 solver.cpp:244] Train net output #0: loss = 8.80409 (* 1 = 8.80409 loss)
I0511 17:46:17.188240 3591 sgd_solver.cpp:106] Iteration 88000, lr = 0.001
I0511 17:55:53.358322 3591 solver.cpp:337] Iteration 89000, Testing net (#0)
I0511 17:59:55.305763 3591 solver.cpp:404] Test net output #0: accuracy = 0.00186553
I0511 17:59:55.305868 3591 solver.cpp:404] Test net output #1: loss = 8.76909 (* 1 = 8.76909 loss)
I0511 18:09:31.658655 3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_90000.caffemodel
I0511 18:09:33.138741 3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_90000.solverstate
I0511 18:09:33.691995 3591 solver.cpp:337] Iteration 90000, Testing net (#0)
I0511 18:13:35.626065 3591 solver.cpp:404] Test net output #0: accuracy = 0.00168561
I0511 18:13:35.626148 3591 solver.cpp:404] Test net output #1: loss = 8.76973 (* 1 = 8.76973 loss)
I0511 18:13:36.200448 3591 solver.cpp:228] Iteration 90000, loss = 8.97326
I0511 18:13:36.200469 3591 solver.cpp:244] Train net output #0: loss = 8.97326 (* 1 = 8.97326 loss)
I0511 18:13:36.200474 3591 sgd_solver.cpp:106] Iteration 90000, lr = 0.001
I0511 19:31:23.715662 3591 solver.cpp:337] Iteration 96000, Testing net (#0)
I0511 19:35:25.677780 3591 solver.cpp:404] Test net output #0: accuracy = 0.00188447
I0511 19:35:25.677836 3591 solver.cpp:404] Test net output #1: loss = 8.7695 (* 1 = 8.7695 loss)
I0511 19:35:26.251850 3591 solver.cpp:228] Iteration 96000, loss = 8.74232
I0511 19:35:26.251875 3591 solver.cpp:244] Train net output #0: loss = 8.74232 (* 1 = 8.74232 loss)
I0511 19:35:26.251880 3591 sgd_solver.cpp:106] Iteration 96000, lr = 0.001
I0511 19:45:02.057610 3591 solver.cpp:337] Iteration 97000, Testing net (#0)
I0511 19:49:04.029269 3591 solver.cpp:404] Test net output #0: accuracy = 0.00188447
I0511 19:49:04.029357 3591 solver.cpp:404] Test net output #1: loss = 8.77655 (* 1 = 8.77655 loss)
I0511 19:58:40.265120 3591 solver.cpp:337] Iteration 98000, Testing net (#0)
I0511 20:02:42.182787 3591 solver.cpp:404] Test net output #0: accuracy = 0.00183712
I0511 20:02:42.182859 3591 solver.cpp:404] Test net output #1: loss = 8.77069 (* 1 = 8.77069 loss)
I0511 20:02:42.756922 3591 solver.cpp:228] Iteration 98000, loss = 8.61745
I0511 20:02:42.756944 3591 solver.cpp:244] Train net output #0: loss = 8.61745 (* 1 = 8.61745 loss)
Duo to the limit of characters of codes, I have to delete some rows of the log. However, it doesn’t matter. 
As you can see, there is no difference between "Iteration 98000" and "Iteration 0". I am really puzzled with this situation.
This is the architecture of my model
name: "NN2"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
mean_file :"/home/jiayi-wei/caffe/examples/NN2/image_train_mean.binaryproto"
data_param {
source: "/home/jiayi-wei/caffe/examples/NN2/img_train_lmdb"
batch_size: 30
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
mean_file :"/home/jiayi-wei/caffe/examples/NN2/image_train_mean.binaryproto"
data_param {
source: "/home/jiayi-wei/caffe/examples/NN2/img_val_lmdb"
batch_size: 11
backend: LMDB
}
}
#first layers
layer {
name: "conv11"
type: "Convolution"
bottom: "data"
top: "conv11"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu11"
type: "ReLU"
bottom: "conv11"
top: "conv11"
}
layer {
name: "conv12"
type: "Convolution"
bottom: "conv11"
top: "conv12"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu12"
type: "ReLU"
bottom: "conv12"
top: "conv12"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv12"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#second layers
layer {
name: "conv21"
type: "Convolution"
bottom: "pool1"
top: "conv21"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu21"
type: "ReLU"
bottom: "conv21"
top: "conv21"
}
layer {
name: "conv22"
type: "Convolution"
bottom: "conv21"
top: "conv22"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu22"
type: "ReLU"
bottom: "conv22"
top: "conv22"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv22"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#third layers
layer {
name: "conv31"
type: "Convolution"
bottom: "pool2"
top: "conv31"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu31"
type: "ReLU"
bottom: "conv31"
top: "conv31"
}
layer {
name: "conv32"
type: "Convolution"
bottom: "conv31"
top: "conv32"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu32"
type: "ReLU"
bottom: "conv32"
top: "conv32"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv32"
top: "pool3"
pooling_param {
pool: MAX
pad:1
kernel_size: 2
stride: 2
}
}
#fourth layer
layer {
name: "conv41"
type: "Convolution"
bottom: "pool3"
top: "conv41"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu41"
type: "ReLU"
bottom: "conv41"
top: "conv41"
}
layer {
name: "conv42"
type: "Convolution"
bottom: "conv41"
top: "conv42"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu42"
type: "ReLU"
bottom: "conv42"
top: "conv42"
}
layer {
name: "conv43"
type: "Convolution"
bottom: "conv42"
top: "conv43"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu43"
type: "ReLU"
bottom: "conv43"
top: "conv43"
}
layer {
name: "pool4"
type: "Pooling"
bottom: "conv43"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#fiveth layer
layer {
name: "conv51"
type: "Convolution"
bottom: "pool4"
top: "conv51"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu51"
type: "ReLU"
bottom: "conv51"
top: "conv51"
}
layer {
name: "conv52"
type: "Convolution"
bottom: "conv51"
top: "conv52"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu52"
type: "ReLU"
bottom: "conv52"
top: "conv52"
}
layer {
name: "conv53"
type: "Convolution"
bottom: "conv52"
top: "conv53"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad:1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv53"
top: "pool5"
pooling_param {
pool: AVE
pad:1
kernel_size: 2
stride: 2
}
}
#drop_Fc
layer {
name: "dropout"
type: "Dropout"
bottom: "pool5"
top: "pool5"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output:1000
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output:10575
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc7"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "SoftMax"
type: "SoftmaxWithLoss"
bottom: "fc7"
bottom: "label"
top: "SoftMax"
}
Following is my solver. And i have change base_lr to "0.001"
net: "train_val.prototxt"
test_iter: 10000
test_interval: 1000
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "/home/jiayi-wei/caffe/examples/NN2"
solver_mode: GPU
I have tried to change some parametric and I have already tried to reduce a "conv" layer from the block who has three "conv" layers. However the result always keep like the picture shows.
Please tell me how can i make out the problem? thanks
Your base_lr seems to be high. Start with a base_lrof 0.001 and go on reducing it by a factor of 10 whenever you stop seeing improvement in accuracy for several thousand iterations.
NOTE: This is just a rule of thumb, it may not work in all cases.
From your log, it seems that your model tended to keep predicting label unchangingly during training and namely, your training diverged. I advise you to make the following check.
Check out your label when convert train/validation lmdb data and in your CNN architechure, the Dropout layer is better placed under the InnerProduct layer, namely "fc6", instead of Pooling layer, "pool5".
I don't know how you sampled your training data during training. In principle, if you just use Softmax cost(multi-nominal cross entropy loss), you should shuffle your train data when prepare your train/val lmdb data and set a properly large batch size, for example 256 during training.
Maybe your learn rate(base_lr) was too large, you may further reduce your learn rate from 0.001 to 0.0001, but I noticed that the CASIA WebFace baseline(http://arxiv.org/abs/1411.7923) used a 0.01 learning rate, and the input data scale, active function, the depth and width of your model is similar to that, so it was less possibly caused by learn rate.(but you should check whether the weight initializing method matters much.)
Try a smaller convolutional kernel size. Sometimes this may help due to reducing the information loss resulting from the alignment problem between convolution kernel and its corresponding input feature map.
By the way, you are training a task of classifying 10575 classes with every class only having about 40 training samples, so to some extent, training data is insufficient. So like work in the base line, to enhance the model's ability to distinguish the same and the different samples, it's better to add a Contrastive cost besides a Softmax cost.
Reference
Sun Y, Chen Y, Wang X, et al. Deep learning face representation by joint identification-verification[C]//Advances in Neural Information Processing Systems. 2014: 1988-1996.

Resources