I have 2 different models, let's say NM1 and NM2.
So, what I'm looking is something that works like in the example below.
Let's say that we have a picture of a dog.
NM1 predicts that it's a cat on the picture with a probability 0.52 and that it's a dog with a probability 0.48.
NM2 predicts that it's a dog with a probability 0.6 and that it's a cat with a probability 0.4.
NM1 - will predict wrong
NM2 - will predict correctly
NM1 + NM2 - connection will predict correctly (because 0.48 + 0.6 > 0.52 + 0.4)
So, each model ends with InnerProducts (after Softmax) which give me 2 vectors of probabilities.
Next step, I have those 2 vectors and I want to add them. Here I use Eltwise layer.
layer {
name: "eltwise-sum"
type: "Eltwise"
bottom: "fc8"
bottom: "fc8N"
top: "out"
eltwise_param { operation: SUM }
}
Before joining NM1 had accuracy ~70% and NM2 ~10%.
After joining accuracy can't reach even 1%.
Thus, my conclusion is that I understand something wrong and I'd be grateful if someone could explain to me where I'm wrong.
PS. I did turn off shuffle when creating lmdb.
UPDATE
layer {
name: "eltwise-sum"
type: "Eltwise"
bottom: "fc8L"
bottom: "fc8NL"
top: "out"
eltwise_param {
operation: SUM
coeff: 0.5
coeff: 0.5
}
}
#accur for PI alone
layer {
name: "accuracyPINorm"
type: "Accuracy"
bottom: "fc8L"
bottom: "label"
top: "accuracyPiNorm"
include {
phase: TEST
}
}
#accur for norm images alone
layer {
name: "accuracyIMGNorm"
type: "Accuracy"
bottom: "fc8NL"
bottom: "labelN"
top: "accuracyIMGNorm"
include {
phase: TEST
}
}
#accur for them together
layer {
name: "accuracy"
type: "Accuracy"
bottom: "out"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
If you want to add (element-wise) the probabilities, you need to add after the "Softmax" layer, and not after the "InnerProduct" layer. You should have something like
layer {
type: "InnerProduct"
name: "fc8"
top: "fc8"
# ...
}
layer {
type: "Softmax"
name: "prob_nm1"
top: "prob_nm1"
bottom: "fc8"
}
layer {
type: "InnerProduct"
name: "fc8N"
top: "fc8N"
# ...
}
layer {
type: "Softmax"
name: "prob_nm2"
top: "prob_nm2"
bottom: "fc8N"
}
# Joining the probabilites
layer {
type: "Eltwise"
name: "prob_sum"
bottom: "prob_nm1"
bottom: "prob_nm2"
top: "prob_sum"
eltwise_param {
operation: SUM
coeff: 0.5
coeff: 0.5
}
}
Related
I'm trying to reproduce following thesis with caffe
Deep EXpectation
Last layer has 100 outputs, each layer is implying probability of predicted age. And final predicted age is calculated by following equation:
so I want to make loss using EUCLIDEAN_LOSS with label and Predicted value.
I show my prototxt for last output layer and loss layer.
layer {
bottom: "pool5"
top: "fc100"
name: "fc100"
type: "InnerProduct"
inner_product_param {
num_output: 100
}
}
layer {
bottom: "fc100"
top: "prob"
name: "prob"
type: "Softmax"
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc100"
bottom: "label"
top: "loss"
loss_weight: 1
}
Just for now, I am trying these with SoftmaxWithLoss. However, this loss is more appropriate to classification not for regression. How can I design the loss layer for in this case?
Thanks in advance.
TL;DR
I've been through similar task once, and from my experience there was little difference (in terms of output accuracy) between training discrete labels and regressing a single continuous value.
There are several ways you can approach this problem:
1. Regressing a single output
Since you only need to predict a single scalar value, you should train your net to do just so:
layer {
bottom: "pool5"
top: "fc1"
name: "fc1"
type: "InnerProduct"
inner_product_param {
num_output: 1 # predict single output
}
}
You need to make sure the predicted value is in range [0..99]:
layer {
bottom: "fc1"
top: "pred01" # map to [0..1] range
type: "Sigmoid"
name: "pred01"
}
layer {
bottom: "pred01"
top: "pred_age"
type: "Scale"
name: "pred_age"
param { lr_mult: 0 } # do not learn this scale - it is fixed
scale_param {
bias_term: false
filler { type: "constant" value: 99 }
}
}
Once you have the prediction in pred_age you can add a loss layer
layer {
bottom: "pred_age"
bottom: "true_age"
top: "loss"
type: "EuclideanLoss"
name: "loss"
}
Though, I would advice to use "SmoothL1" in this case as it is more robust.
2. Regressing the expectation of the discrete prediction
You can implement your prediction formula in caffe. You need a fixed vector of values [0..99] for that. There are many ways to do that, none is very straight-forward. Here's one way using net-surgery:
First, define the net
layer {
bottom: "prob"
top: "pred_age"
name: "pred_age"
type: "Convolution"
param { lr_mult: 0 } # fixed layer.
convolution_param {
num_output: 1
bias_term: false
}
}
layer {
bottom: "pred_age"
bottom: "true_age"
top: "loss"
type: "EuclideanLoss" # same comment about type of loss as before
name: "loss"
}
You cannot use this net yet, first you need to set the kernel of pred_age layer to 0..99.
In python, load the new
net = caffe.Net('path/to/train_val.prototxt', caffe.TRAIN)
li = list(net._layer_names).index('pred_age') # get layer index
net.layers[li].blobs[0].data[...] = np.arange(100, dtype=np.float32) # set the kernel
net.save('/path/to/init_weights.caffemodel') # save the weights
Now you can train your net, but MAKE SURE you are starting your train from the weights saved in '/path/to/init_weights.caffemodel'.
In caffe I create a simple network to classifying face images as follows:
myExampleNet.prototxt
name: "myExample"
layer {
name: "example"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/myExample/myExample_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/myExample/myExample_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 50
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 155
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
myExampleSolver.prototxt
net: "examples/myExample/myExampleNet.prototxt"
test_iter: 15
test_interval: 500
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 30000
snapshot: 5000
snapshot_prefix: "examples/myExample/myExample"
solver_mode: CPU
I use convert_imageset of caffe to create LMDB database and my data has about 40000 training and 16000 testing data in face. 155 cases and each one has about 260 and 100 images of train and test respectively.
I use this command for training data:
build/tools/convert_imageset -resize_height=100 -resize_width=100 -shuffle examples/myExample/myData/data/ examples/myExample/myData/data/labels_train.txt examples/myExample/myExample_train_lmdb
and this command for test data:
build/tools/convert_imageset -resize_height=100 -resize_width=100 -shuffle examples/myExample/myData/data/ examples/myExample/myData/data/labels_test.txt examples/myExample/myExample_test_lmdb
But after 30000 iterations my loss is high and the accuracy is low:
...
I0127 09:25:55.602881 27305 solver.cpp:310] Iteration 30000, loss = 4.98317
I0127 09:25:55.602917 27305 solver.cpp:330] Iteration 30000, Testing net (#0)
I0127 09:25:55.602926 27305 net.cpp:676] Ignoring source layer example
I0127 09:25:55.827739 27305 solver.cpp:397] Test net output #0: accuracy = 0.0126667
I0127 09:25:55.827764 27305 solver.cpp:397] Test net output #1: loss = 5.02207 (* 1 = 5.02207 loss)
and when I change my dataset to mnist and change the ip2 layer num_output from 155 to 10, the loss is dramatically reduced and accuracy increases!
Which part is wrong?
There is not necessarily something wrong in your code.
The fact that you get these good results for MNIST says indeed that you have a model that is 'correct' in the sense that it does not produce coding errors etc, but it is in no way any guarantee that it will perform well in another, different problem.
Keep in mind that, in principle, it is much easier to predict a 10-class problem (like MNIST) than a 155-class one; the baseline (i.e. simple random guessing) accuracy in the first case is about 10%, while for the second case is only ~ 0.65%. Add that your data size (comparable to MNIST) is not bigger either (are they also color pictures, i.e. 3-channels in contrast with the single-channel MNIST?), and your results may start looking not that puzzling and surprising.
Additionally, it has turned out that MNIST is notoriously easy to fit (I have been trying myself to build models that will not fit MNIST well, without much success so far), and you easily reach a conclusion that has now become common wisdom in the community, i.e. that good performance on MNIST does not say really much for a model architecture.
I am attempting to implement a Caffe Softmax layer with a "temperature" parameter. I am implementing a network utilizing the distillation technique outlined here.
Essentially, I would like my Softmax layer to utilize the Softmax w/ temperature function as follows:
F(X) = exp(zi(X)/T) / sum(exp(zl(X)/T))
Using this, I want to be able to tweak the temperature T before training. I have found a similar question, but this question is attempting to implement Softmax with temperature on the deploy network. I am struggling to implement the additional Scale layer described as "option 4" in the first answer.
I am using the cifar10_full_train_test prototxt file found in Caffe's examples directory. I have tried making the following change:
Original
...
...
...
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip1"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip1"
bottom: "label"
top: "loss"
}
Modified
...
...
...
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip1"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
type: "Scale"
name: "temperature"
top: "zi/T"
bottom: "ip1"
scale_param {
filler: { type: 'constant' value: 0.025 } ### I wanted T = 40, so 1/40=.025
}
param { lr_mult: 0 decay_mult: 0 }
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip1"
bottom: "label"
top: "loss"
}
After a quick train (5,000 iterations), I checked to see if my classification probabilities are appearing more even, but they actually appeared to be less evenly distributed.
Example:
high temp T: F(X) = [0.2, 0.5, 0.1, 0.2]
low temp T: F(X) = [0.02, 0.95, 0.01, 0.02]
~my attempt: F(X) = [0, 1.0, 0, 0]
Do I appear to be on the right track with this implementation? Either way, what am I missing?
You are not using the "cooled" predictions "zi/T" your "Scale" layer produce.
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "zi/T" # Use the "cooled" predictions instead of the originals.
bottom: "label"
top: "loss"
}
The accepted answer has helped me to understand my misconceptions regarding the Softmax temperature implementation.
As #Shai pointed out, in order to observe the "cooled" probability outputs as I was expecting, the Scale layer must only be added to the "deploy" prototxt file. It is not necessary to include the Scale layer in the train/val prototxt at all. In other words, the temperature must be applied to the Softmax layer, not the SoftmaxWithLoss layer.
If you want to apply the "cooled" effect to your probability vector, simply make sure your last two layers are as such:
deploy.prototxt
layer {
type: "Scale"
name: "temperature"
top: "zi/T"
bottom: "ip1"
scale_param {
filler: { type: 'constant' value: 1/T } ## Replace "1/T" with actual 1/T value
}
param { lr_mult: 0 decay_mult: 0 }
}
layer {
name: "prob"
type: "Softmax"
bottom: "zi/T"
top: "prob"
}
My confusion was due primarily to my misunderstanding of the difference between SoftmaxWithLoss and Softmax.
I have a network which has 4 Boolean outputs. It is not a classification problem and each of them are meaningful. I expect to get a zero or one for each of them. Right now I have used the Euclidean loss function.
There are 1000000 samples. In the input file, each of them have 144 features, so there the size of the input is 1000000*144.
I have used batch size of 50, because otherwise the processing time is too much.
The output file is of the size 1000000*4, i.e. there are four output per each input.
When I am using the accuracy layer, it complains about the dimension of output. It needs just one Boolean output, not four. I think it is because it considers the problem as a classification problem.
I have two questions.
First, considering the error of the accuracy layer, is the Euclidean loss function suitable for this task? And How I can get the accuracy for my network?
Second,I gonna get the exact value of the predicted output for each of the four variable. I mean I need the exact predicted values for each test record. Now, I just have the loss value for each batch.
Please guide me to solve those issues.
Thanks,
Afshin
The train network is:
{ state {
phase: TRAIN
}
layer {
name: "abbas"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "/home/afo214/Research/hdf5/simulation/Train-1000-11- 1/Train-Sc-B-1000-11-1.txt"
batch_size: 50
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
inner_product_param {
num_output: 350
weight_filler {
type: "xavier"
}
}
}
layer {
name: "sig1"
bottom: "ip1"
top: "sig1"
type: "Sigmoid"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "sig1"
top: "ip2"
inner_product_param {
num_output: 150
weight_filler {
type: "xavier"
}
}
}
The test network is also:
state {
phase: TEST
}
layer {
name: "abbas"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "/home/afo214/Research/hdf5/simulation/Train-1000-11- 1/Train-Sc-B-1000-11-1.txt"
batch_size: 50
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
inner_product_param {
num_output: 350
weight_filler {
type: "xavier"
}
}
}
layer {
name: "sig1"
bottom: "ip1"
top: "sig1"
type: "Sigmoid"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "sig1"
top: "ip2"
inner_product_param {
num_output: 150
weight_filler {
type: "xavier"
}
}
}
layer {
name: "sig2"
bottom: "ip2"
top: "sig2"
type: "Sigmoid"
}
layer {
name: "ip4"
type: "InnerProduct"
bottom: "sig2"
top: "ip4"
inner_product_param {
num_output: 4
weight_filler {
type: "xavier"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip4"
bottom: "label"
top: "accuracy"
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "ip4"
bottom: "label"
top: "loss"
}
And I get this error:
accuracy_layer.cpp:34] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (50 vs. 200) Number of labels must match number of predictions; e.g., if label axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
Without using the accuracy layer caffe gives me the loss value.
Should "EuclideanLoss" be used for predicting binary outputs?
If you are trying to predict discrete binary labels then "EuclideanLoss" is not a very good choice. This loss is better suited for regression tasks where you wish to predict continuous values (e.g., estimating coordinated of bounding boxes etc.).
For predicting discrete labels, "SoftmaxWithLoss" or "InfogainLoss" are better suited. Usually, "SoftmaxWithLoss" is used.
For predicting binary outputs you may also consider "SigmoidCrossEntropyLoss".
Why is there an error in the "Accuracy" layer?
In caffe, "Accuracy" layers expects two inputs ("bottom"s): one is a prediction vector and the other is the ground truth expected discrete label.
In your case, you need to provide, for each binary output a vector of length 2 with the predicted probabilities of 0 and 1, and a single binary label:
layer {
name: "acc01"
type: "Accuracy"
bottom: "predict01"
bottom: "label01"
top: "acc01"
}
In this example you measure the accuracy for a single binary output. The input "predict01" is a two-vector for each example in the batch (for batch_size: 50 the shape of this blob should be 50-by-2).
What can you do?
You are trying to predict 4 different outputs in a single net, therefore, you need 4 different loss and accuracy layers.
First, you need to split ("Slice") the ground truth labels into 4 scalars (instead of a single binary 4-vector):
layer {
name: "label_split"
bottom: "label" # name of input 4-vector
top: "label01"
top: "label02"
top: "label03"
top: "label04"
type: "Slice"
slice_param {
axis: 1
slice_point: 1
slice_point: 2
slice_point: 3
}
}
Now you have to have a prediction, loss and accuracy layer for each of the binary labels
layer {
name: "predict01"
type: "InnerProduct"
bottom: "sig2"
top: "predict01"
inner_product_param {
num_outout: 2 # because you need to predict 2 probabilities one for False, one for True
...
}
layer {
name: "loss01"
type: "SoftmaxWithLoss"
bottom: "predict01"
bottom: "label01"
top: "loss01"
}
layer {
name: "acc01"
type: "Accuracy"
bottom: "predict01"
bottom: "label01"
top: "acc01"
}
Now you need to replicate these three layer for each of the four binary labels you wish to predict.
i am extracting 30 facial keypoints (x,y) from an input image as per kaggle facialkeypoints competition.
How do i setup caffe to run a regression and produce 30 dimensional output??.
Input: 96x96 image
Output: 30 - (30 dimensions).
How do i setup caffe accordingly?. I am using EUCLIDEAN_LOSS (sum of squares) to get the regressed output. Here is a simple logistic regressor model using caffe but it is not working. Looks accuracy layer cannot handle multi-label output.
I0120 17:51:27.039113 4113 net.cpp:394] accuracy <- label_fkp_1_split_1
I0120 17:51:27.039135 4113 net.cpp:356] accuracy -> accuracy
I0120 17:51:27.039158 4113 net.cpp:96] Setting up accuracy
F0120 17:51:27.039201 4113 accuracy_layer.cpp:26] Check failed: bottom[1]->channels() == 1 (30 vs. 1)
*** Check failure stack trace: ***
# 0x7f7c2711bdaa (unknown)
# 0x7f7c2711bce4 (unknown)
# 0x7f7c2711b6e6 (unknown)
Here is the layer file:
name: "LogReg"
layers {
name: "fkp"
top: "data"
top: "label"
type: HDF5_DATA
hdf5_data_param {
source: "train.txt"
batch_size: 100
}
include: { phase: TRAIN }
}
layers {
name: "fkp"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "test.txt"
batch_size: 100
}
include: { phase: TEST }
}
layers {
name: "ip"
type: INNER_PRODUCT
bottom: "data"
top: "ip"
inner_product_param {
num_output: 30
}
}
layers {
name: "loss"
type: EUCLIDEAN_LOSS
bottom: "ip"
bottom: "label"
top: "loss"
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "ip"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
i found it :)
I replaced the SOFTLAYER to EUCLIDEAN_LOSS function and changed the number of outputs. It worked.
layers {
name: "loss"
type: EUCLIDEAN_LOSS
bottom: "ip1"
bottom: "label"
top: "loss"
}
HINGE_LOSS is also another option.