Caffe conv layer weights and dimensions - machine-learning

I came across this nice article which gives an intuitive explanation of how convnets work.
Now trying to understand what is going on exactly inside a caffe conv layer:
With input data of shape 1 x 13 x 19 x 19, and 128 filters conv layer:
layers {
name: "conv1_7x7_128"
type: CONVOLUTION
blobs_lr: 1.
blobs_lr: 2.
bottom: "data"
top: "conv2"
convolution_param {
num_output: 128
kernel_size: 7
pad: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
Layer output shape is 1 x 128 x 19 x 19 if i understand correctly.
Looking at the layer's weights' shapes in net->layers()[1]->blobs():
layer 1: type Convolution 'conv1_7x7_128'
blob 0: 128 13 7 7
blob 1: 128
Looks like blob 0 has all the weigths: one 7x7 matrix per plane (13) per filter (128).
Doing convolutions with blob 0 on 1 x 13 x 19 x 19 data, if i understand correctly we end up with 128 x 13 x 19 x 19 output (there's padding so each 7x7 matrix produces one number for each pixel)
How does 128 x 13 x 19 x 19 turn into layer's 1 x 128 x 19 x 19 output ?
What are the 128 weights in blob 1 ?
Bonus question: what is blobs_lr ?

You are quoting an older version of caffe's prototxt format. Adjusting to new format will give you
layer { # layer and not layer*s*
name: "conv1_7x7_128"
type: "Convolution" # tyoe as string
param { lr_mult: 1. } # instead of blobs_lr
param { lr_mult: 2. } # instead of blobs_lr
bottom: "data"
top: "conv2"
convolution_param {
num_output: 128
kernel_size: 7
pad: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
If you have input data of shape 1 x 13 x 19 x 19, means your batch_size is 1, you have 13 channels with spatial dimensions of 19 x 19.
Applying 128 filters of 7 x 7 (each filter is applied to all 13 input channels) means you have 128 filters of shape 13 x 7 x 7 (this is the shape of your first layer's parameter). Applying each filter results with a single output channel 1 x 1 x 19 x 19, since you have 128 such filters you end up with 1 x 128 x 19 x 19 output.
The second layer's parameter is the bias term - an additive scalar to the result of each filter. You can turn off the bias term by adding
bias_term: false
To the convolution_param of you layer.
You can read more about convolution layer here.
As for the bonus question, Eliethesaiyan already answered it well in his comment.

Related

Upscale layer with deconvolution or other

I need to use an upscale layer in caffe which "doubles" the pixels. A 10x10 image becomes 20x20 with pixels "doubled" in both horizontal and vertical dimension. I heard that deconv layer may help with a stride of 2, no padding and a kernel size of 1x1 but this put zeros between pixels. Does anyone can help me ? Thanks
I would try kernel size of 2 and weights init (and fixed?) to 1.
layer {
name: "upsample"
type: "Deconvolution"
bottom: x
top: y
convolution_param {
num_output: # same as number of input channels
group: # same as number of channels
bias_term: false # no need for bias
kernel_size: 2
stride: 2
pad: 0
weight_filler: { type: "constant" val: 1 }
}
param { lr_mult: 0 }
}
Note the group and num_output should be equal so you have the same kernel acting on each channel independently.

error in making skip-layer connection network based on VGG16 in caffe

I am currently reading the paper on 'CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection', it is using the skip-connection to fuse conv3-3, conv4-3 and conv5-3 together, the steps are shown below
Extract the feature maps of the face region (at multiple scales conv3-3, conv4-3, conv5-3) and apply RoI-Pooling to it (i.e. convert to a fixed height and width).
L2-normalize each feature map.
Concatenate the (RoI-pooled and normalized) feature maps of the face (at multiple scales) with each other (creates one tensor).
Apply a 1x1 convolution to the face tensor.
Apply two fully connected layers to the face tensor, creating a vector.
I used the caffe and made a prototxt based on faster-RCNN VGG16 , the following parts are added into the original prototxt
# roi pooling the conv3-3 layer and L2 normalize it
layer {
name: "roi_pool3"
type: "ROIPooling"
bottom: "conv3_3"
bottom: "rois"
top: "pool3_roi"
roi_pooling_param {
pooled_w: 7
pooled_h: 7
spatial_scale: 0.25 # 1/4
}
}
layer {
name:"roi_pool3_l2norm"
type:"L2Norm"
bottom: "pool3_roi"
top:"pool3_roi"
}
-------------
# roi pooling the conv4-3 layer and L2 normalize it
layer {
name: "roi_pool4"
type: "ROIPooling"
bottom: "conv4_3"
bottom: "rois"
top: "pool4_roi"
roi_pooling_param {
pooled_w: 7
pooled_h: 7
spatial_scale: 0.125 # 1/8
}
}
layer {
name:"roi_pool4_l2norm"
type:"L2Norm"
bottom: "pool4_roi"
top:"pool4_roi"
}
--------------------------
# roi pooling the conv5-3 layer and L2 normalize it
layer {
name: "roi_pool5"
type: "ROIPooling"
bottom: "conv5_3"
bottom: "rois"
top: "pool5"
roi_pooling_param {
pooled_w: 7
pooled_h: 7
spatial_scale: 0.0625 # 1/16
}
}
layer {
name:"roi_pool5_l2norm"
type:"L2Norm"
bottom: "pool5"
top:"pool5"
}
# concat roi_pool3, roi_pool4, roi_pool5 and apply 1*1 conv
layer {
name:"roi_concat"
type: "Concat"
concat_param {
axis: 1
}
bottom: "pool5"
bottom: "pool4_roi"
bottom: "pool3_roi"
top:"roi_concat"
}
layer {
name:"roi_concat_1*1_conv"
type:"Convolution"
top:"roi_concat_1*1_conv"
bottom:"roi_concat"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 1
weight_filler{
type:"xavier"
}
bias_filler{
type:"constant"
}
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "roi_concat_1*1_conv"
top: "fc6"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 4096
}
}
during the training, I met such a issue
F0616 16:43:02.899025 3712 net.cpp:757] Cannot copy param 0 weights from layer 'fc6'; shape mismatch. Source param shape is 1 1 4096 25088 (102760448); target param shape is 4096 10368 (42467328).
To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.
I could find out what goes wrong, I need some help from you if you can spot some problem or explanation.
Really appreciated!!
The error message you got is quite clear. You are trying to fine-tune the weights of the layers, but for "fc6" layer you have a problem:
The original net you copied the weights from had "fc6" layer with input dimension of 10368. On the other hand, your "fc6" layer has input dimension of 25088. You cannot use the same W matrix (aka param 0 of this layer) if the input dimension is different.
Now that you know the problem, look at the error message again:
Cannot copy param 0 weights from layer 'fc6'; shape mismatch.
Source param shape is 1 1 4096 25088 (102760448);
target param shape is 4096 10368 (42467328).
Caffe cannot copy W matrix (param 0) of "fc6" layer, its shape does not match the shape of W stored in .caffemodel you are trying to fine tune.
What can you do?
Simply read the next line of the error message:
To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.
Just rename the layer, and caffe will learn the weights from scratch (only for this layer).

tensorflow conv2d unexpected convolution result

I try to migrate a Caffe network and model(weights) to tensorflow.
The original first layer is defined as shown at last, which is a stride one convolution on 1x128x128 gray image with kernel size 5x5, output channel 96.
I converted the weights from caffemodel file to numpy array following this procedure:
net = caffe.Net(model, caffe.TEST);
net.copy_from(weights);
weights = net.params[name][0].data
bais = net.params[name][1].data
if "fc" in name:
weights = weights.transpose()#2D
elif "conv" in name:
weights = weights.transpose(2, 3, 1, 0)
Caffe weights shape:(96, 1, 5, 5),biases shape:(96,). After the transpose, new array of 'weights shape:', (5, 5, 1, 96), 'biases shape:', (96,), are used to initialize tensorflow filter.
tensorflow code is as followed:
gray = tf.reduce_mean(images, axis=3, keep_dims=True)
self.gray = gray
conv1 = self._conv_layer(gray, name='conv1')
def _conv_layer(self, input_, output_dim=96,
k_h=3, k_w=3, d_h=1, d_w=1, stddev=0.02,
name="conv2d"):
#Note: currently kernel size and input output channel num are decided by loaded filter weights.
#only strides are decided by calling param.
with tf.variable_scope(name) as scope:
filt = self.get_conv_filter(name)
conv = tf.nn.conv2d(input_, filt, strides=[1, d_h, d_w, 1], padding='SAME')
conv_biases = self.get_bias(name)
return tf.nn.bias_add(conv, conv_biases)
def get_conv_filter(self, name):
init = tf.constant_initializer(value=weights,
dtype=tf.float32)
shape = weights.shape
var = tf.get_variable(name="filter", initializer=init, shape=shape)
return var
I checked the input data of Caffe net and tensorflow's tensor gray, they are the same numbers with the same 2D layout. (1,1,128,128) and (10, 128, 128, 1), tensorflow use a batch size of 10.
I also checked the kernel through Caffe's print(net.blobs['conv1'].data[0,0,...]) and the numpy array used to initalize tensorflow var with print(weights[:,:,:,0]).
the kernel's first layer screen shot is shown below:
the bias is -0.65039569 and the upper left corner of the image is:
0.30989584 0.30989584 0.29427084 0.21354167 0.16145833
0.30989584 0.30989584 0.29427084 0.21354167 0.16145833
0.28645834 0.28645834 0.27083334 0.19010417 0.09114584
However, the two's upper left corner of conv1's first feature map are different.(please ignore the irrrelevant 256)
Only the leftmost column is consistent. I manually calculated and checked the results, the first and the second value of Caffe's (-0.71238005 -0.74042225) are correct according to the definition of convolution, the second value in tensorflow's (-0.71238005 -0.31195271) is incorrect.
Taking into account the padding, the first value is from 3x3 block of the image, the second should be the 3x4 block.
Since tensorflow has the correct first value, computed from the 3x3 block of image corner, I assume the kernel layout and image layout and 'SAME' padding are correct. I thought it was a problem with stride that caused the incorrect second value, but the stride must be one, otherwise tensorflow's conv1 feature map's size won't be (10, 128, 128, 96).
Caffe's convolution layer def:
input_param {
shape: {
dim: 10
dim: 1
dim: 128
dim: 128
}
}
transform_param {
crop_size: 128
mirror: false
}
}
layer{
name: "conv1"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 5
stride: 1
pad: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "data"
top: "conv1"
}
UPDATE:
Another contrived experiment(see code bolow) shows the tensorflow implementation is able to compute the correct second value. However, the error remains in the above situation. What is it that caused the error in the converted version?
input = np.random.rand(100,100)
input = input.reshape([1,100,100,1])
k = np.random.rand(5,5)
k = k.reshape([5,5,1,1])
input_tf = tf.constant(input,dtype=tf.float32)
init = tf.constant_initializer(value=k,
dtype=tf.float32)
filter = tf.get_variable(name="filter", initializer=init, shape=k.shape)
conv = tf.nn.conv2d(input_tf, filter, strides=[1,1,1,1], padding='SAME')

caffe: 5D blobs pooling?

I have a 5D blob like 1x8x128x128 and I have a Convolution layer which is able to process my 5D blob. When I want to use a pool layer though it does not work. How do you use a pool-layer with a 5D blob?
Check failed: 4 == bottom[0]->num_axes() (4 vs. 5) Input must have 4
axes, corresponding to (num, channels, height, width)
I think it is just not supported yet by caffe. Could I just use a convolution layer and do the pooling?
If you want to pool only the first two spatial dimensions, you can "Reshape" to 4D ("squashing" the channel and temporal dimensions), pool and then "Reshape" back to 5D:
layer {
name: "pool/reshape4D"
type: "Reshape"
bottom: "in"
top: "pool/reshape4D"
reshape_param { axis: 1 num_axes: 1 shape { dim: -1 } }
}
layer {
name: "pool"
type: "Pooling"
bottom: "pool/reshape4D"
top: "pool"
# pooling params here...
}
layer {
name: "pool/reshape5D"
type: "Reshape"
bottom: "pool"
top: "pool/reshape5D"
reshape_param { axis: 1 num_axes: 1 shape { dim: -1 dim: <temporal_dim> } } # replace <.> with the actual temporal dimension size.
}
See the definition of ReshapeParameter in caffe.proto for more details.

Auto-encoders with tied weights in Caffe

From my understanding, normally an auto-encoder uses tied weights in the encoding and decoding networks right?
I took a look at Caffe's auto-encoder example, but I didn't see how the weights are tied. I noticed that the encoding and decoding networks share the same blobs, but how is it guaranteed that the weights are updated correctly?
How to implement tied weights auto-encoders in Caffe?
While there's a history of using tied weights in auto-encoders, now days it is rarely used (to the best of my knowledge), which I believe is why this Caffe example doesn't use tied weights.
Nevertheless, Caffe does support auto-encoders with tied weights, and it is possilbe using two features: parameter sharing between layers and the transpose flag of the fully-connected layer (InnerProduct in Caffe). More specifically, two parameters are shared in Caffe if their name is the same, which can be specified under the param field like so:
layer {
name: "encode1"
type: "InnerProduct"
bottom: "data"
top: "encode1"
param {
name: "encode1_matrix"
lr_mult: 1
decay_mult: 1
}
param {
name: "encode1_bias"
lr_mult: 1
decay_mult: 0
}
inner_product_param {
num_output: 128
weight_filler {
type: "gaussian"
std: 1
sparse: 15
}
bias_filler {
type: "constant"
value: 0
}
}
}
If another fully-connected layer (with matching dimensions) used the names "encode1_matrix" and "encode1_bias" then these parameters will always be the same, and Caffe will take care of aggregating gradients and updating the parameters correctly. The second part is using the transpose flag of the fully-connected layer, so that the shared matrix is transposed before multiplication of its input. So, extending the above example, if we wanted to have a fully-connected layer with the same weight matrix as "encode1_matrix" as part of the decoding process, then we will define it like so:
layer {
name: "decode1"
type: "InnerProduct"
bottom: "encode1"
top: "decode1"
param {
name: "encode1_matrix"
lr_mult: 1
decay_mult: 1
}
param {
name: "decode1_bias"
lr_mult: 1
decay_mult: 0
}
inner_product_param {
num_output: 784
transpose: true
weight_filler {
type: "gaussian"
std: 1
sparse: 15
}
bias_filler {
type: "constant"
value: 0
}
}
}
Notice that the bias parameters are not shared (cannot be due to different output dimensions), while the matrices are shared and the decoder layer uses the transpose flag which completes the tied auto-encoder architecture.
See here for a complete working example of a tied auto-encoder using Caffe: https://gist.github.com/orsharir/beb479d9ad5d8e389800c47c9ec42840

Resources