Different learning rate affect to batchnorm setting. Why?

Different learning rate affect to batchnorm setting. Why? - machine-learning

I am using BatchNorm layer. I know the meaning of setting use_global_stats that often set false for training and true for testing/deploy. This is my setting in the testing phase.
layer {
name: "bnorm1"
type: "BatchNorm"
bottom: "conv1"
top: "bnorm1"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "scale1"
type: "Scale"
bottom: "bnorm1"
top: "bnorm1"
bias_term: true
scale_param {
filler {
value: 1
}
bias_filler {
value: 0.0
}
}
}
In solver.prototxt, I used the Adam method. I found an interesting problem that happens in my case. If I choose base_lr: 1e-3, then I got a good performance when I set use_global_stats: false in the testing phase. However, if I chose base_lr: 1e-4, then I got a good performance when I set use_global_stats: true in the testing phase. It demonstrates that base_lr effects to the batchnorm setting (even I used Adam method)? Could you suggest any reason for that? Thanks all

AFAIK learning rate does not directly affect the learned parameters of "BatchNorm" layer. Indeed, caffe forces lr_mult for all internal parameters of this layer to be zero regardless of base_lr or the type of the solver.
However, you might encounter a case where the adjacent layers converge to different points according to the base_lr you are using, and indirectly this causes the "BatchNorm" to behave differently.

Related

Interpret Caffe FCN output classes

I have set up Caffe and using FCN-8s model with little change with output classes:
layer {
name: "score_5classes"
type: "Convolution"
bottom: "score"
top: "score_5classes"
convolution_param {
num_output: 2
pad: 0
kernel_size: 1
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score_5classes"
bottom: "label"
top: "loss"
loss_param {
normalize: true
}
}
I have changed last layer output number to 2, because I want to classify my input images into 2 classes, 0 and 1 (So it seems I should have 2 outputs! I cant understand why?! It could be an output matrix with zeros and ones, couldnt it?)
So my questions are:
1.Should I sum these 2 classes ? because I need 1 output
2.The loss is so small! even when the output is far away from the desired! how Caffe calculates the lost layer?
Thanks

When doing binary classification, using "SoftmaxWithLoss" with two outputs, is mathematically equivalent to using "SigmoidCrossEntropyLoss". So, if you really only need one output you can set your last layer to num_output: 1 and use "SigmoidCrossEntropyLoss". However, if you want to take advantage of caffe's "Accuracy" layer, you need to use two outputs and "SoftmaxWithLoss" layer.
Regarding your questions:
1. If you opt to use "SoftmaxWithLoss" and you only need one output, take the second output for each pixel as this entry represents the probability of class 1.
I'll leave it to you as an exercise to figure out what you'll get if you take the sum (hint: `"Softmax" output probabilities...)
2. The loss is very small most likely because you have severe class imbalance - most of your pixels are 0 while only very few are 1 (or vice versa), therefore predicting always 0 does not incur such great penalty. If this is your case, I suggest looking at Focal Loss that addresses this issue.

forward network in CNN

I am new to deep learning. I found there are two prototxt files when I used caffe, one is "deploy" and another is "train_val".
I know that "train_val" is used to train the model. But for the "deploy" file some people said it is for test the image.
So, my question is does the "deploy" only have forward() network so the test image data only go through the forward network for once to get the score?

As you already noted there are some fundamental differences between 'train_val.prototxt' and 'deploy.prototxt'.
One key difference is that 'deploy.prototxt' usually lack any loss layer.
When you have no loss function defined for a net, there is no meaning of backward propagation: what gradients would you propagate? gradients of what function?
Therefore, a net object in caffe has backward() method implemented for all phases. Nevertheless, this method is meaningless when you test the net with no loss function (only prediction).

ideally that is how it should work,but the files are just network definition.You can use one single file to both train and test.You have to specify what phase you you want some blobs to be availables,meaning you can definite two inputData layer ,one that would be used during training,and another used for testing and specify the corresponding phase like this:
name: "MyModel"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
crop_size: 227
mean_file: "data/train_mean.binaryproto" # location of the training data mean
}
data_param {
source: "data/train_lmdb" # location of the training samples
batch_size: 128 # how many samples are grouped into one mini-batch
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
During training, the previous layer will be used where the the second will be ignored.
During test phase,the first layer will be ignored the second layer will be used as input for testing.
Another point is that during testing ,we need the accuracy of our prediction as we don't need to update our weights anymore
l
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
if the include directive is not given,the layer is included in all phases.
Although you can include the accuracy layer also during training to see how the output are goings(ie: for measuring accuracy improvement after how many iterations),we need it more on predictions .
in your solver,you can specify test_iter to specify after how many iteration test operation will be carried out(You validate your model each test_iter iterations)
train_val and deploy file separate those two phases into two different files.All specification in train_val are related to the training phase. and deploy for testing.I am not sure,where the train_val combination came from,but i suppose it was due to the fact that you can validate your model after test_iter and continue to train again from there.
As you dont need the loss during test,rather than the probability,you can use softmax for probability out function in stead of of softmaxwithloss in deploy or you can have both defined.
The caffe test command performs forward operation but doesn't do the backward()(back propagation) operation.I hope it helps

Caffe - How to use reduction layer?

I have a question regarding the reduction layer in caffe. I haven't found any examples on how to use this layer in my .prototxt file. So I would appreciate it, if anybody could give me a short example of how to use this layer.
This is the documentation: http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1ReductionLayer.html , but there is no example given :-/
I want to use this layer to reduce an output matrix of 1x18x18 to one scalar value. So i want the absolute sum of this matrix.

Try:
layer {
name: "reduction"
type: "Reduction"
bottom: "in"
top: "out"
reduction_param {
axis: 0 # reduce all dims after first
operation: ASUM # use absolute sum
}
}
For more information, see caffe.proto and the new caffe.help.

Testing a regression network in caffe

I am trying to count objects in an image using Alexnet.
I have currently images containing 1, 2, 3 or 4 objects per image. For initial checkup, I have 10 images per class. For example in training set I have:
image label
image1 1
image2 1
image3 1
...
image39 4
image40 4
I used imagenet create script to create a lmdb file for this dataset. Which successfully converted my set of images to lmdb.
Alexnet, as an example is converted to a regression model for learning the number of objects in the image by introducing EucledeanLosslayer instead of Softmax Layer. As suggested by many. The rest of the network is the same.
However, despite doing all the above, when I run the model, I received only zeros as output during testing phase(shown below). It did not learn any thing. However, the training loss decreased continuously in each iteration.
I don't understand what mistakes I have made. Can anybody guide me why the predicted values are always 0? And how can I check the regressed values in testing phase, so that to check how many samples are correct and what's the value for each of my image?
The predicted and the actual label of the test dataset is given as :
I0928 17:52:45.585160 18302 solver.cpp:243] Iteration 1880, loss = 0.60498
I0928 17:52:45.585212 18302 solver.cpp:259] Train net output #0: loss = 0.60498 (* 1 = 0.60498 loss)
I0928 17:52:45.585225 18302 solver.cpp:592] Iteration 1880, lr = 1e-06
I0928 17:52:48.397922 18302 solver.cpp:347] Iteration 1900, Testing net (#0)
I0928 17:52:48.499543 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 1
I0928 17:52:48.499641 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 2
I0928 17:52:48.499660 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 3
I0928 17:52:48.499681 18302 accuracy_layer.cpp:88] Predicted_Value: 0 Actual Label: 4
...
Note: I also created hdf5 format files in-order to have floating labels, i.e. 1.0, 2.0, 3.0 and 4.0. However, when i changed data layer to HDF5 type, i can not crop the image for data-augmentation as being done in alexnet with lmdb layer, as well as normalization. I used the script given on "https://github.com/nikogamulin/caffe-utils/blob/master/hdf5/demo.m" for hdf5 data and followed his steps for using it in my model.
I have updated last layers as such:
layer {
name: "fc8reg"
type: "InnerProduct"
bottom: "fc7"
top: "fc8reg"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8reg"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "fc8reg"
bottom: "label"
top: "loss"
}

Without judging whether your network diverged or not, the obvious mistake you have made is that you shouldn't use a Accuracy layer to test a regression network. It is only for testing a classification network trained by a SoftmaxWithLoss Layer.
In fact, given an image for a network, the Accuracy layer in the network will always sort its input array(here it is bottom: "fc8reg") and choose the index of the maximal value in the array as the predicted label by default.
Since num_output == 1 in fc8reg layer, accuracy layer will always predict index 0 for the input image as its predicted label as you have seen.
At last, you can use a EuclideanLoss layer to test your regression network. This similar problem may also give you some hint.
If you are to print and calculate the regressed values after training, and count the accuracy of the regression network, you can simply write a RegressionAccuracy layer like this.
Or, if your target label only has 4 discrete values {1,2,3,4}, you can still train a classification network for your task.

In my opinion, everything is correct, but your network is not converging, which is not a rare hapenning. Your network is actually converging to zero outputs!
Maybe most of your samples have 0 as their label.
Also don't forget to include the loss layer only during TRAIN; otherwise, it will learn on test data as wel.

in caffe prototxt file. what does the TRAIN and TEST phase do?

I'm new to caffe. thank you guys!
in https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto
I saw 1 uncommented enum variable phase. it has 2 option TRAIN and TEST.
enum Phase {
TRAIN = 0;
TEST = 1;
}
how did they work? I saw a model recently has this 2 phase too. the .prototxt file looks like:
name: "CIFAR10_full"
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
data_param {
source: "CIFAR-10/cifar10_train_lmdb"
backend: LMDB
batch_size: 200
}
transform_param {
mirror: true
}
include: { phase: TRAIN }
}
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
data_param {
source: "CIFAR-10/cifar10_test_lmdb"
backend: LMDB
batch_size: 100
}
transform_param {
mirror: false
}
include: { phase: TEST }
}
can I switch from TRAIN phase to TEST phase? where is the switch?

during training (i.e., execution of $CAFFE_ROOT/tools/caffe train [...]) caffe can alternate between training phases, and testing phases: that is, during training phase parameters are changed, while in the test phase, the parameters are fixed and the model only runs feed forward examples to estimate the current performance of the model.
It is quite natural to use two different data sets for training and testing, and this is why you use the different phase values.
You can read more about the train/test iterations here.

TRAIN specifies a layer for the model used during training.
TEST specifies a layer for the model used during testing.
Thus, you can define 2 models in the single prototxt file: one model for training and one model for testing.
Info on this can be found in the Model Definition section of the web page http://caffe.berkeleyvision.org/gathered/examples/imagenet.html

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Different learning rate affect to batchnorm setting. Why? - machine-learning

Related

Interpret Caffe FCN output classes

forward network in CNN

Caffe - How to use reduction layer?

Testing a regression network in caffe

in caffe prototxt file. what does the TRAIN and TEST phase do?

Categories

Resources