I have set up Caffe and using FCN-8s model with little change with output classes:
layer {
name: "score_5classes"
type: "Convolution"
bottom: "score"
top: "score_5classes"
convolution_param {
num_output: 2
pad: 0
kernel_size: 1
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score_5classes"
bottom: "label"
top: "loss"
loss_param {
normalize: true
}
}
I have changed last layer output number to 2, because I want to classify my input images into 2 classes, 0 and 1 (So it seems I should have 2 outputs! I cant understand why?! It could be an output matrix with zeros and ones, couldnt it?)
So my questions are:
1.Should I sum these 2 classes ? because I need 1 output
2.The loss is so small! even when the output is far away from the desired! how Caffe calculates the lost layer?
Thanks
When doing binary classification, using "SoftmaxWithLoss" with two outputs, is mathematically equivalent to using "SigmoidCrossEntropyLoss". So, if you really only need one output you can set your last layer to num_output: 1 and use "SigmoidCrossEntropyLoss". However, if you want to take advantage of caffe's "Accuracy" layer, you need to use two outputs and "SoftmaxWithLoss" layer.
Regarding your questions:
1. If you opt to use "SoftmaxWithLoss" and you only need one output, take the second output for each pixel as this entry represents the probability of class 1.
I'll leave it to you as an exercise to figure out what you'll get if you take the sum (hint: `"Softmax" output probabilities...)
2. The loss is very small most likely because you have severe class imbalance - most of your pixels are 0 while only very few are 1 (or vice versa), therefore predicting always 0 does not incur such great penalty. If this is your case, I suggest looking at Focal Loss that addresses this issue.
Related
I am new to deep learning. I found there are two prototxt files when I used caffe, one is "deploy" and another is "train_val".
I know that "train_val" is used to train the model. But for the "deploy" file some people said it is for test the image.
So, my question is does the "deploy" only have forward() network so the test image data only go through the forward network for once to get the score?
As you already noted there are some fundamental differences between 'train_val.prototxt' and 'deploy.prototxt'.
One key difference is that 'deploy.prototxt' usually lack any loss layer.
When you have no loss function defined for a net, there is no meaning of backward propagation: what gradients would you propagate? gradients of what function?
Therefore, a net object in caffe has backward() method implemented for all phases. Nevertheless, this method is meaningless when you test the net with no loss function (only prediction).
ideally that is how it should work,but the files are just network definition.You can use one single file to both train and test.You have to specify what phase you you want some blobs to be availables,meaning you can definite two inputData layer ,one that would be used during training,and another used for testing and specify the corresponding phase like this:
name: "MyModel"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
crop_size: 227
mean_file: "data/train_mean.binaryproto" # location of the training data mean
}
data_param {
source: "data/train_lmdb" # location of the training samples
batch_size: 128 # how many samples are grouped into one mini-batch
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
During training, the previous layer will be used where the the second will be ignored.
During test phase,the first layer will be ignored the second layer will be used as input for testing.
Another point is that during testing ,we need the accuracy of our prediction as we don't need to update our weights anymore
l
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
if the include directive is not given,the layer is included in all phases.
Although you can include the accuracy layer also during training to see how the output are goings(ie: for measuring accuracy improvement after how many iterations),we need it more on predictions .
in your solver,you can specify test_iter to specify after how many iteration test operation will be carried out(You validate your model each test_iter iterations)
train_val and deploy file separate those two phases into two different files.All specification in train_val are related to the training phase. and deploy for testing.I am not sure,where the train_val combination came from,but i suppose it was due to the fact that you can validate your model after test_iter and continue to train again from there.
As you dont need the loss during test,rather than the probability,you can use softmax for probability out function in stead of of softmaxwithloss in deploy or you can have both defined.
The caffe test command performs forward operation but doesn't do the backward()(back propagation) operation.I hope it helps
I am using BatchNorm layer. I know the meaning of setting use_global_stats that often set false for training and true for testing/deploy. This is my setting in the testing phase.
layer {
name: "bnorm1"
type: "BatchNorm"
bottom: "conv1"
top: "bnorm1"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "scale1"
type: "Scale"
bottom: "bnorm1"
top: "bnorm1"
bias_term: true
scale_param {
filler {
value: 1
}
bias_filler {
value: 0.0
}
}
}
In solver.prototxt, I used the Adam method. I found an interesting problem that happens in my case. If I choose base_lr: 1e-3, then I got a good performance when I set use_global_stats: false in the testing phase. However, if I chose base_lr: 1e-4, then I got a good performance when I set use_global_stats: true in the testing phase. It demonstrates that base_lr effects to the batchnorm setting (even I used Adam method)? Could you suggest any reason for that? Thanks all
AFAIK learning rate does not directly affect the learned parameters of "BatchNorm" layer. Indeed, caffe forces lr_mult for all internal parameters of this layer to be zero regardless of base_lr or the type of the solver.
However, you might encounter a case where the adjacent layers converge to different points according to the base_lr you are using, and indirectly this causes the "BatchNorm" to behave differently.
I have a question regarding the reduction layer in caffe. I haven't found any examples on how to use this layer in my .prototxt file. So I would appreciate it, if anybody could give me a short example of how to use this layer.
This is the documentation: http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1ReductionLayer.html , but there is no example given :-/
I want to use this layer to reduce an output matrix of 1x18x18 to one scalar value. So i want the absolute sum of this matrix.
Try:
layer {
name: "reduction"
type: "Reduction"
bottom: "in"
top: "out"
reduction_param {
axis: 0 # reduce all dims after first
operation: ASUM # use absolute sum
}
}
For more information, see caffe.proto and the new caffe.help.
i have a question regarding NVIDIA DIGITS Framework.
So i have been using caffe without DIGITS and have used HDF5 Layers so far. There i could use multiple "top" (data_0, data_1, data_2) inputs (see code below). So i could feed the net with more then one input image. But in DIGITS only lmdb input layer works.
So is it possible to create a lmdb input layer with multiple input images??
layer {
name: "data"
type: "HDF5Data"
top: "data_0"
top: "data_1"
top: "data_2"
top: "label"
hdf5_data_param {
source: "train.txt"
batch_size: 64
shuffle: true
}
}
Sorry, that isn't supported in DIGITS.
Since DIGITS manages your datasets for you, it also sets up the data layers in your network for you. This way you don't need to copy+paste the LMDB paths into your network when you want to run a previous network on a new dataset or when you move the location of your jobs on disk. It's a decision which sacrifices flexibility for the sake of making the common case easy.
For classification, one LMDB should have two tops: "data" and "label". For other dataset types, there should be one LMDB with a single "data" top, and another LMDB with a single "label" top. If you need a more complicated data layer setup, then you'll need to either use Caffe directly or make some changes to the DIGITS source code.
DIGITS's HDF5 support is not great because Caffe's HDF5 support is not great.
I am trying to count objects in an image using Alexnet.
I have currently images containing 1, 2, 3 or 4 objects per image. For initial checkup, I have 10 images per class. For example in training set I have:
image label
image1 1
image2 1
image3 1
...
image39 4
image40 4
I used imagenet create script to create a lmdb file for this dataset. Which successfully converted my set of images to lmdb.
Alexnet, as an example is converted to a regression model for learning the number of objects in the image by introducing EucledeanLosslayer instead of Softmax Layer. As suggested by many. The rest of the network is the same.
However, despite doing all the above, when I run the model, I received only zeros as output during testing phase(shown below). It did not learn any thing. However, the training loss decreased continuously in each iteration.
I don't understand what mistakes I have made. Can anybody guide me why the predicted values are always 0? And how can I check the regressed values in testing phase, so that to check how many samples are correct and what's the value for each of my image?
The predicted and the actual label of the test dataset is given as :
I0928 17:52:45.585160 18302 solver.cpp:243] Iteration 1880, loss = 0.60498
I0928 17:52:45.585212 18302 solver.cpp:259] Train net output #0: loss = 0.60498 (* 1 = 0.60498 loss)
I0928 17:52:45.585225 18302 solver.cpp:592] Iteration 1880, lr = 1e-06
I0928 17:52:48.397922 18302 solver.cpp:347] Iteration 1900, Testing net (#0)
I0928 17:52:48.499543 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 1
I0928 17:52:48.499641 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 2
I0928 17:52:48.499660 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 3
I0928 17:52:48.499681 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 4
...
Note: I also created hdf5 format files in-order to have floating labels, i.e. 1.0, 2.0, 3.0 and 4.0. However, when i changed data layer to HDF5 type, i can not crop the image for data-augmentation as being done in alexnet with lmdb layer, as well as normalization. I used the script given on "https://github.com/nikogamulin/caffe-utils/blob/master/hdf5/demo.m" for hdf5 data and followed his steps for using it in my model.
I have updated last layers as such:
layer {
name: "fc8reg"
type: "InnerProduct"
bottom: "fc7"
top: "fc8reg"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8reg"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "fc8reg"
bottom: "label"
top: "loss"
}
Without judging whether your network diverged or not, the obvious mistake you have made is that you shouldn't use a Accuracy layer to test a regression network. It is only for testing a classification network trained by a SoftmaxWithLoss Layer.
In fact, given an image for a network, the Accuracy layer in the network will always sort its input array(here it is bottom: "fc8reg") and choose the index of the maximal value in the array as the predicted label by default.
Since num_output == 1 in fc8reg layer, accuracy layer will always predict index 0 for the input image as its predicted label as you have seen.
At last, you can use a EuclideanLoss layer to test your regression network. This similar problem may also give you some hint.
If you are to print and calculate the regressed values after training, and count the accuracy of the regression network, you can simply write a RegressionAccuracy layer like this.
Or, if your target label only has 4 discrete values {1,2,3,4}, you can still train a classification network for your task.
In my opinion, everything is correct, but your network is not converging, which is not a rare hapenning. Your network is actually converging to zero outputs!
Maybe most of your samples have 0 as their label.
Also don't forget to include the loss layer only during TRAIN; otherwise, it will learn on test data as wel.