I am new at Caffe and I want to use already trained caffeNet model with ImageNet. I applied net surgery by removing a convolutional intermediate conv4 layer.
'layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "relu5-new"
type: "ReLU"
bottom: "conv5-new"
top: "conv5-new"
}
layer {
name: "pool5-new"
type: "Pooling"
bottom: "conv5-new"
top: "pool5-new"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5-new"
top: "fc6"
inner_product_param {
num_output: 4096
}
}'
Full of prototxt file can be found here
After saving this new network the accuracy became 0. Should I make fine tuning on ImageNet validation set, or is there something wrong on my new prototxt file?
Any help will be appreciated!
The original net you started with had conv4 between conv3 and conv5: this means the filters (weights) of conv5 were expecting certain number of inputs and certain "order" or "meaning" of inputs. Once you removed conv4, you had to alter conv5 to accept different number of inputs. Therefore, the new conv5 layer must be trained to accommodate to the new inputs it receives.
In this case, when you introduced a new conv5 layer, you should have weight_filler defined in your prototxt to guide caffe as to how to initialize the new weights. Otherwise caffe will set the weights to zero and it will be almost impossible to finetune in this case.
Related
I'm trying to reproduce following thesis with caffe
Deep EXpectation
Last layer has 100 outputs, each layer is implying probability of predicted age. And final predicted age is calculated by following equation:
so I want to make loss using EUCLIDEAN_LOSS with label and Predicted value.
I show my prototxt for last output layer and loss layer.
layer {
bottom: "pool5"
top: "fc100"
name: "fc100"
type: "InnerProduct"
inner_product_param {
num_output: 100
}
}
layer {
bottom: "fc100"
top: "prob"
name: "prob"
type: "Softmax"
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc100"
bottom: "label"
top: "loss"
loss_weight: 1
}
Just for now, I am trying these with SoftmaxWithLoss. However, this loss is more appropriate to classification not for regression. How can I design the loss layer for in this case?
Thanks in advance.
TL;DR
I've been through similar task once, and from my experience there was little difference (in terms of output accuracy) between training discrete labels and regressing a single continuous value.
There are several ways you can approach this problem:
1. Regressing a single output
Since you only need to predict a single scalar value, you should train your net to do just so:
layer {
bottom: "pool5"
top: "fc1"
name: "fc1"
type: "InnerProduct"
inner_product_param {
num_output: 1 # predict single output
}
}
You need to make sure the predicted value is in range [0..99]:
layer {
bottom: "fc1"
top: "pred01" # map to [0..1] range
type: "Sigmoid"
name: "pred01"
}
layer {
bottom: "pred01"
top: "pred_age"
type: "Scale"
name: "pred_age"
param { lr_mult: 0 } # do not learn this scale - it is fixed
scale_param {
bias_term: false
filler { type: "constant" value: 99 }
}
}
Once you have the prediction in pred_age you can add a loss layer
layer {
bottom: "pred_age"
bottom: "true_age"
top: "loss"
type: "EuclideanLoss"
name: "loss"
}
Though, I would advice to use "SmoothL1" in this case as it is more robust.
2. Regressing the expectation of the discrete prediction
You can implement your prediction formula in caffe. You need a fixed vector of values [0..99] for that. There are many ways to do that, none is very straight-forward. Here's one way using net-surgery:
First, define the net
layer {
bottom: "prob"
top: "pred_age"
name: "pred_age"
type: "Convolution"
param { lr_mult: 0 } # fixed layer.
convolution_param {
num_output: 1
bias_term: false
}
}
layer {
bottom: "pred_age"
bottom: "true_age"
top: "loss"
type: "EuclideanLoss" # same comment about type of loss as before
name: "loss"
}
You cannot use this net yet, first you need to set the kernel of pred_age layer to 0..99.
In python, load the new
net = caffe.Net('path/to/train_val.prototxt', caffe.TRAIN)
li = list(net._layer_names).index('pred_age') # get layer index
net.layers[li].blobs[0].data[...] = np.arange(100, dtype=np.float32) # set the kernel
net.save('/path/to/init_weights.caffemodel') # save the weights
Now you can train your net, but MAKE SURE you are starting your train from the weights saved in '/path/to/init_weights.caffemodel'.
In caffe I create a simple network to classifying face images as follows:
myExampleNet.prototxt
name: "myExample"
layer {
name: "example"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/myExample/myExample_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/myExample/myExample_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 50
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 155
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
myExampleSolver.prototxt
net: "examples/myExample/myExampleNet.prototxt"
test_iter: 15
test_interval: 500
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 30000
snapshot: 5000
snapshot_prefix: "examples/myExample/myExample"
solver_mode: CPU
I use convert_imageset of caffe to create LMDB database and my data has about 40000 training and 16000 testing data in face. 155 cases and each one has about 260 and 100 images of train and test respectively.
I use this command for training data:
build/tools/convert_imageset -resize_height=100 -resize_width=100 -shuffle examples/myExample/myData/data/ examples/myExample/myData/data/labels_train.txt examples/myExample/myExample_train_lmdb
and this command for test data:
build/tools/convert_imageset -resize_height=100 -resize_width=100 -shuffle examples/myExample/myData/data/ examples/myExample/myData/data/labels_test.txt examples/myExample/myExample_test_lmdb
But after 30000 iterations my loss is high and the accuracy is low:
...
I0127 09:25:55.602881 27305 solver.cpp:310] Iteration 30000, loss = 4.98317
I0127 09:25:55.602917 27305 solver.cpp:330] Iteration 30000, Testing net (#0)
I0127 09:25:55.602926 27305 net.cpp:676] Ignoring source layer example
I0127 09:25:55.827739 27305 solver.cpp:397] Test net output #0: accuracy = 0.0126667
I0127 09:25:55.827764 27305 solver.cpp:397] Test net output #1: loss = 5.02207 (* 1 = 5.02207 loss)
and when I change my dataset to mnist and change the ip2 layer num_output from 155 to 10, the loss is dramatically reduced and accuracy increases!
Which part is wrong?
There is not necessarily something wrong in your code.
The fact that you get these good results for MNIST says indeed that you have a model that is 'correct' in the sense that it does not produce coding errors etc, but it is in no way any guarantee that it will perform well in another, different problem.
Keep in mind that, in principle, it is much easier to predict a 10-class problem (like MNIST) than a 155-class one; the baseline (i.e. simple random guessing) accuracy in the first case is about 10%, while for the second case is only ~ 0.65%. Add that your data size (comparable to MNIST) is not bigger either (are they also color pictures, i.e. 3-channels in contrast with the single-channel MNIST?), and your results may start looking not that puzzling and surprising.
Additionally, it has turned out that MNIST is notoriously easy to fit (I have been trying myself to build models that will not fit MNIST well, without much success so far), and you easily reach a conclusion that has now become common wisdom in the community, i.e. that good performance on MNIST does not say really much for a model architecture.
I am attempting to implement a Caffe Softmax layer with a "temperature" parameter. I am implementing a network utilizing the distillation technique outlined here.
Essentially, I would like my Softmax layer to utilize the Softmax w/ temperature function as follows:
F(X) = exp(zi(X)/T) / sum(exp(zl(X)/T))
Using this, I want to be able to tweak the temperature T before training. I have found a similar question, but this question is attempting to implement Softmax with temperature on the deploy network. I am struggling to implement the additional Scale layer described as "option 4" in the first answer.
I am using the cifar10_full_train_test prototxt file found in Caffe's examples directory. I have tried making the following change:
Original
...
...
...
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip1"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip1"
bottom: "label"
top: "loss"
}
Modified
...
...
...
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip1"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
type: "Scale"
name: "temperature"
top: "zi/T"
bottom: "ip1"
scale_param {
filler: { type: 'constant' value: 0.025 } ### I wanted T = 40, so 1/40=.025
}
param { lr_mult: 0 decay_mult: 0 }
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip1"
bottom: "label"
top: "loss"
}
After a quick train (5,000 iterations), I checked to see if my classification probabilities are appearing more even, but they actually appeared to be less evenly distributed.
Example:
high temp T: F(X) = [0.2, 0.5, 0.1, 0.2]
low temp T: F(X) = [0.02, 0.95, 0.01, 0.02]
~my attempt: F(X) = [0, 1.0, 0, 0]
Do I appear to be on the right track with this implementation? Either way, what am I missing?
You are not using the "cooled" predictions "zi/T" your "Scale" layer produce.
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "zi/T" # Use the "cooled" predictions instead of the originals.
bottom: "label"
top: "loss"
}
The accepted answer has helped me to understand my misconceptions regarding the Softmax temperature implementation.
As #Shai pointed out, in order to observe the "cooled" probability outputs as I was expecting, the Scale layer must only be added to the "deploy" prototxt file. It is not necessary to include the Scale layer in the train/val prototxt at all. In other words, the temperature must be applied to the Softmax layer, not the SoftmaxWithLoss layer.
If you want to apply the "cooled" effect to your probability vector, simply make sure your last two layers are as such:
deploy.prototxt
layer {
type: "Scale"
name: "temperature"
top: "zi/T"
bottom: "ip1"
scale_param {
filler: { type: 'constant' value: 1/T } ## Replace "1/T" with actual 1/T value
}
param { lr_mult: 0 decay_mult: 0 }
}
layer {
name: "prob"
type: "Softmax"
bottom: "zi/T"
top: "prob"
}
My confusion was due primarily to my misunderstanding of the difference between SoftmaxWithLoss and Softmax.
I'm very new to Caffe but want to add a maxout layer in my project. There are some code about maxout in the website,such as
implement maxout with caffe
My code is here:
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 16
kernel_size: 9
stride: 1
}
}
layers {
name: "slice1"
type: "Slice"
bottom: "Conv1"
top: "slice1A"
top: "slice1B"
top: "slice1C"
top: "slice1D"
slice_param{
axis: 1
slice_point: 4
slice_point: 8
slice_point: 12
}
}
layers {
name: "maxout1"
type: ELTWISE
bottom: "slice1A"
bottom: "slice1B"
bottom: "slice1C"
bottom: "slice1D"
top: "maxout1"
eltwise_param {
operation:MAX
}
}
Here, I use the SLICE layer to divide conv1 layer to four and do ELTWISE operation. There will be four outputs,but i don't know how the slice1A, slice1B,slice1C and slice1D to do MAX operation.
The following picture is my opinion.
ELTWISE diagram of this code snippet
Thank you very much!
First, to put it simply, what maxout to do is taking two or more tensors as input, which have exactly the same dimension, for example, it takes 2 10-dimension vectors as input and then, on same position of the 2 vectors choosing the maximum as maxout's output vector's elements, which will finally result in a 10-dimension vector. You can see this procedure as a fusion process.
Then to the ELTWISE layer in your code, this layer is exactly going to accomplish the above procedure. Specificially, the ELTWISE layer will seperately take one element from each bottom conv1A,conv1B, then choose the maximum from the two elements as the output maxout1's element and repeat this operation successively till the last position of the bottom. The maximizing operation is assigned by
"eltwise_param {
operation:MAX
}"
in your code.
I have a network which has 4 Boolean outputs. It is not a classification problem and each of them are meaningful. I expect to get a zero or one for each of them. Right now I have used the Euclidean loss function.
There are 1000000 samples. In the input file, each of them have 144 features, so there the size of the input is 1000000*144.
I have used batch size of 50, because otherwise the processing time is too much.
The output file is of the size 1000000*4, i.e. there are four output per each input.
When I am using the accuracy layer, it complains about the dimension of output. It needs just one Boolean output, not four. I think it is because it considers the problem as a classification problem.
I have two questions.
First, considering the error of the accuracy layer, is the Euclidean loss function suitable for this task? And How I can get the accuracy for my network?
Second,I gonna get the exact value of the predicted output for each of the four variable. I mean I need the exact predicted values for each test record. Now, I just have the loss value for each batch.
Please guide me to solve those issues.
Thanks,
Afshin
The train network is:
{ state {
phase: TRAIN
}
layer {
name: "abbas"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "/home/afo214/Research/hdf5/simulation/Train-1000-11- 1/Train-Sc-B-1000-11-1.txt"
batch_size: 50
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
inner_product_param {
num_output: 350
weight_filler {
type: "xavier"
}
}
}
layer {
name: "sig1"
bottom: "ip1"
top: "sig1"
type: "Sigmoid"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "sig1"
top: "ip2"
inner_product_param {
num_output: 150
weight_filler {
type: "xavier"
}
}
}
The test network is also:
state {
phase: TEST
}
layer {
name: "abbas"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "/home/afo214/Research/hdf5/simulation/Train-1000-11- 1/Train-Sc-B-1000-11-1.txt"
batch_size: 50
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
inner_product_param {
num_output: 350
weight_filler {
type: "xavier"
}
}
}
layer {
name: "sig1"
bottom: "ip1"
top: "sig1"
type: "Sigmoid"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "sig1"
top: "ip2"
inner_product_param {
num_output: 150
weight_filler {
type: "xavier"
}
}
}
layer {
name: "sig2"
bottom: "ip2"
top: "sig2"
type: "Sigmoid"
}
layer {
name: "ip4"
type: "InnerProduct"
bottom: "sig2"
top: "ip4"
inner_product_param {
num_output: 4
weight_filler {
type: "xavier"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip4"
bottom: "label"
top: "accuracy"
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "ip4"
bottom: "label"
top: "loss"
}
And I get this error:
accuracy_layer.cpp:34] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (50 vs. 200) Number of labels must match number of predictions; e.g., if label axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
Without using the accuracy layer caffe gives me the loss value.
Should "EuclideanLoss" be used for predicting binary outputs?
If you are trying to predict discrete binary labels then "EuclideanLoss" is not a very good choice. This loss is better suited for regression tasks where you wish to predict continuous values (e.g., estimating coordinated of bounding boxes etc.).
For predicting discrete labels, "SoftmaxWithLoss" or "InfogainLoss" are better suited. Usually, "SoftmaxWithLoss" is used.
For predicting binary outputs you may also consider "SigmoidCrossEntropyLoss".
Why is there an error in the "Accuracy" layer?
In caffe, "Accuracy" layers expects two inputs ("bottom"s): one is a prediction vector and the other is the ground truth expected discrete label.
In your case, you need to provide, for each binary output a vector of length 2 with the predicted probabilities of 0 and 1, and a single binary label:
layer {
name: "acc01"
type: "Accuracy"
bottom: "predict01"
bottom: "label01"
top: "acc01"
}
In this example you measure the accuracy for a single binary output. The input "predict01" is a two-vector for each example in the batch (for batch_size: 50 the shape of this blob should be 50-by-2).
What can you do?
You are trying to predict 4 different outputs in a single net, therefore, you need 4 different loss and accuracy layers.
First, you need to split ("Slice") the ground truth labels into 4 scalars (instead of a single binary 4-vector):
layer {
name: "label_split"
bottom: "label" # name of input 4-vector
top: "label01"
top: "label02"
top: "label03"
top: "label04"
type: "Slice"
slice_param {
axis: 1
slice_point: 1
slice_point: 2
slice_point: 3
}
}
Now you have to have a prediction, loss and accuracy layer for each of the binary labels
layer {
name: "predict01"
type: "InnerProduct"
bottom: "sig2"
top: "predict01"
inner_product_param {
num_outout: 2 # because you need to predict 2 probabilities one for False, one for True
...
}
layer {
name: "loss01"
type: "SoftmaxWithLoss"
bottom: "predict01"
bottom: "label01"
top: "loss01"
}
layer {
name: "acc01"
type: "Accuracy"
bottom: "predict01"
bottom: "label01"
top: "acc01"
}
Now you need to replicate these three layer for each of the four binary labels you wish to predict.