I am trying to count objects in an image using Alexnet.
I have currently images containing 1, 2, 3 or 4 objects per image. For initial checkup, I have 10 images per class. For example in training set I have:
image label
image1 1
image2 1
image3 1
...
image39 4
image40 4
I used imagenet create script to create a lmdb file for this dataset. Which successfully converted my set of images to lmdb.
Alexnet, as an example is converted to a regression model for learning the number of objects in the image by introducing EucledeanLosslayer instead of Softmax Layer. As suggested by many. The rest of the network is the same.
However, despite doing all the above, when I run the model, I received only zeros as output during testing phase(shown below). It did not learn any thing. However, the training loss decreased continuously in each iteration.
I don't understand what mistakes I have made. Can anybody guide me why the predicted values are always 0? And how can I check the regressed values in testing phase, so that to check how many samples are correct and what's the value for each of my image?
The predicted and the actual label of the test dataset is given as :
I0928 17:52:45.585160 18302 solver.cpp:243] Iteration 1880, loss = 0.60498
I0928 17:52:45.585212 18302 solver.cpp:259] Train net output #0: loss = 0.60498 (* 1 = 0.60498 loss)
I0928 17:52:45.585225 18302 solver.cpp:592] Iteration 1880, lr = 1e-06
I0928 17:52:48.397922 18302 solver.cpp:347] Iteration 1900, Testing net (#0)
I0928 17:52:48.499543 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 1
I0928 17:52:48.499641 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 2
I0928 17:52:48.499660 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 3
I0928 17:52:48.499681 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 4
...
Note: I also created hdf5 format files in-order to have floating labels, i.e. 1.0, 2.0, 3.0 and 4.0. However, when i changed data layer to HDF5 type, i can not crop the image for data-augmentation as being done in alexnet with lmdb layer, as well as normalization. I used the script given on "https://github.com/nikogamulin/caffe-utils/blob/master/hdf5/demo.m" for hdf5 data and followed his steps for using it in my model.
I have updated last layers as such:
layer {
name: "fc8reg"
type: "InnerProduct"
bottom: "fc7"
top: "fc8reg"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8reg"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "fc8reg"
bottom: "label"
top: "loss"
}
Without judging whether your network diverged or not, the obvious mistake you have made is that you shouldn't use a Accuracy layer to test a regression network. It is only for testing a classification network trained by a SoftmaxWithLoss Layer.
In fact, given an image for a network, the Accuracy layer in the network will always sort its input array(here it is bottom: "fc8reg") and choose the index of the maximal value in the array as the predicted label by default.
Since num_output == 1 in fc8reg layer, accuracy layer will always predict index 0 for the input image as its predicted label as you have seen.
At last, you can use a EuclideanLoss layer to test your regression network. This similar problem may also give you some hint.
If you are to print and calculate the regressed values after training, and count the accuracy of the regression network, you can simply write a RegressionAccuracy layer like this.
Or, if your target label only has 4 discrete values {1,2,3,4}, you can still train a classification network for your task.
In my opinion, everything is correct, but your network is not converging, which is not a rare hapenning. Your network is actually converging to zero outputs!
Maybe most of your samples have 0 as their label.
Also don't forget to include the loss layer only during TRAIN; otherwise, it will learn on test data as wel.
Related
I have set up Caffe and using FCN-8s model with little change with output classes:
layer {
name: "score_5classes"
type: "Convolution"
bottom: "score"
top: "score_5classes"
convolution_param {
num_output: 2
pad: 0
kernel_size: 1
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score_5classes"
bottom: "label"
top: "loss"
loss_param {
normalize: true
}
}
I have changed last layer output number to 2, because I want to classify my input images into 2 classes, 0 and 1 (So it seems I should have 2 outputs! I cant understand why?! It could be an output matrix with zeros and ones, couldnt it?)
So my questions are:
1.Should I sum these 2 classes ? because I need 1 output
2.The loss is so small! even when the output is far away from the desired! how Caffe calculates the lost layer?
Thanks
When doing binary classification, using "SoftmaxWithLoss" with two outputs, is mathematically equivalent to using "SigmoidCrossEntropyLoss". So, if you really only need one output you can set your last layer to num_output: 1 and use "SigmoidCrossEntropyLoss". However, if you want to take advantage of caffe's "Accuracy" layer, you need to use two outputs and "SoftmaxWithLoss" layer.
Regarding your questions:
1. If you opt to use "SoftmaxWithLoss" and you only need one output, take the second output for each pixel as this entry represents the probability of class 1.
I'll leave it to you as an exercise to figure out what you'll get if you take the sum (hint: `"Softmax" output probabilities...)
2. The loss is very small most likely because you have severe class imbalance - most of your pixels are 0 while only very few are 1 (or vice versa), therefore predicting always 0 does not incur such great penalty. If this is your case, I suggest looking at Focal Loss that addresses this issue.
I am new to deep learning. I found there are two prototxt files when I used caffe, one is "deploy" and another is "train_val".
I know that "train_val" is used to train the model. But for the "deploy" file some people said it is for test the image.
So, my question is does the "deploy" only have forward() network so the test image data only go through the forward network for once to get the score?
As you already noted there are some fundamental differences between 'train_val.prototxt' and 'deploy.prototxt'.
One key difference is that 'deploy.prototxt' usually lack any loss layer.
When you have no loss function defined for a net, there is no meaning of backward propagation: what gradients would you propagate? gradients of what function?
Therefore, a net object in caffe has backward() method implemented for all phases. Nevertheless, this method is meaningless when you test the net with no loss function (only prediction).
ideally that is how it should work,but the files are just network definition.You can use one single file to both train and test.You have to specify what phase you you want some blobs to be availables,meaning you can definite two inputData layer ,one that would be used during training,and another used for testing and specify the corresponding phase like this:
name: "MyModel"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
crop_size: 227
mean_file: "data/train_mean.binaryproto" # location of the training data mean
}
data_param {
source: "data/train_lmdb" # location of the training samples
batch_size: 128 # how many samples are grouped into one mini-batch
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
During training, the previous layer will be used where the the second will be ignored.
During test phase,the first layer will be ignored the second layer will be used as input for testing.
Another point is that during testing ,we need the accuracy of our prediction as we don't need to update our weights anymore
l
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
if the include directive is not given,the layer is included in all phases.
Although you can include the accuracy layer also during training to see how the output are goings(ie: for measuring accuracy improvement after how many iterations),we need it more on predictions .
in your solver,you can specify test_iter to specify after how many iteration test operation will be carried out(You validate your model each test_iter iterations)
train_val and deploy file separate those two phases into two different files.All specification in train_val are related to the training phase. and deploy for testing.I am not sure,where the train_val combination came from,but i suppose it was due to the fact that you can validate your model after test_iter and continue to train again from there.
As you dont need the loss during test,rather than the probability,you can use softmax for probability out function in stead of of softmaxwithloss in deploy or you can have both defined.
The caffe test command performs forward operation but doesn't do the backward()(back propagation) operation.I hope it helps
AFAIK, we have two ways to obtain the validation loss.
(1) online during training process by setting the solver as follows:
train_net: 'train.prototxt'
test_net: "test.prototxt"
test_iter: 200
test_interval: 100
(2) offline based on the weight in the .caffemodel file. In this question, I regard to the second way due to limited GPU. First, I saved the weight of network to .caffemodel after each 100 iterations by snapshot: 100. Based on these .caffemodel, I want to calculate the validation loss
../build/tools/caffe test -model ./test.prototxt -weights $snapshot -iterations 10 -gpu 0
where snapshot is file name of .caffemodel. For example snap_network_100.caffemodel
And the data layer of my test prototxt is
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TEST
}
hdf5_data_param {
source: "./list.txt"
batch_size: 8
shuffle: true
}
}
The first and the second ways give different validation loss. I found that the first way the validation loss independent of batch size. It means the validation loss is same with different batch size. While, the second way, the validation loss changed with different batch size but the loss is very close together with different iterations.
My question is that which way is correct to compute validation loss?
You compute the validation loss for different number of iterations:
test_iter: 200
In your 'solver.prototxt', vs. -iterations 10 when running from command line. This means you are averaging the loss over different number of validation samples.
Since you are using far less samples when validating from command line, you are much more sensitive to batch_size.
Make sure you are using exactly the same settings and verify that the validation loss is indeed the same.
I'm new to caffe. thank you guys!
in https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto
I saw 1 uncommented enum variable phase. it has 2 option TRAIN and TEST.
enum Phase {
TRAIN = 0;
TEST = 1;
}
how did they work? I saw a model recently has this 2 phase too. the .prototxt file looks like:
name: "CIFAR10_full"
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
data_param {
source: "CIFAR-10/cifar10_train_lmdb"
backend: LMDB
batch_size: 200
}
transform_param {
mirror: true
}
include: { phase: TRAIN }
}
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
data_param {
source: "CIFAR-10/cifar10_test_lmdb"
backend: LMDB
batch_size: 100
}
transform_param {
mirror: false
}
include: { phase: TEST }
}
can I switch from TRAIN phase to TEST phase? where is the switch?
during training (i.e., execution of $CAFFE_ROOT/tools/caffe train [...]) caffe can alternate between training phases, and testing phases: that is, during training phase parameters are changed, while in the test phase, the parameters are fixed and the model only runs feed forward examples to estimate the current performance of the model.
It is quite natural to use two different data sets for training and testing, and this is why you use the different phase values.
You can read more about the train/test iterations here.
TRAIN specifies a layer for the model used during training.
TEST specifies a layer for the model used during testing.
Thus, you can define 2 models in the single prototxt file: one model for training and one model for testing.
Info on this can be found in the Model Definition section of the web page http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
I'm trying to use the MINST Caffe example via the C++ API, but I'm having a bit of trouble working out how to restructure the network prototxt file I'll deploy after training. I've trained and tested the model with the original file (lenet_train_test.prototxt), but when I want to deploy it and make predictions like in the C++ and OpenCV example, I realise I have to modify the input section to make it similar to the deploy.prototxt file they have.
Can I replace the information in the training and testing layers of the lenet_train_test.prototxt with this section of the deploy.prototxt file?
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 3
dim: 227
dim: 227
}
The images I'll be passing for classification to the network will be grayscale and 24*24 pixels, and I'll also want to scale it like was done with the MINST dataset, so could I modify the section to this?
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 1
dim: 24
dim: 24
}
transform_param {
scale: 0.00390625
}
I'm not entirely sure what the "dim: 10" is coming from though.
In order to "convert" you train_val prototxt to a deploy one you remove the input data layers (reading your train/val data) and replacing them with the declaration
name: "CaffeNet"
input: "data"
input_shape {
dim: 10
dim: 1
dim: 24
dim: 24
}
Note that the deploy prototxt does not have two phases for train and test only a single flavor.
Replacing the input data layer with this declaration basically tells caffe that you are responsible of supplying the data, and the net should allocate space for inputs of this size.
Regarding scale: once you deploy your net, the net has no control over the inputs - it does not read the data for you as the input data layers in the train_val net. Therefore, you'll have to scale the input data yourself before feeding it to the network. You can use the DataTransformer class to help you transform your input blobs in the same way they were transformed during training.
Regarding the first dim: 10: every Blob (i.e., data/parameters storage unit) in caffe has 4 dimensions: batch-size, channels, height and width. This parameter actually means the net should allocate space for batches of 10 inputs at a time.
The "magic" number 10 comes from the way googlenet and other competitors in ILSVRC challenge used to classify images: they classified 10 crops from each image and averaged the outputs to produce better classification results.