I want to turn a n1*c1*h1*w1 blob to n1*1*h1*w1(using sum), what parameters should I set when using reduction layer?
Thanks a lot!
You can use a Slice layer to separate all your channels into different blobs, and then an Eltwise layer to sum them up.
Related
currently I am developing a new network using NiftyNet and would need some help.
I am trying to implement an Autofocus Layer [1] as proposed in the paper. However, at a certain point, the Autofocus Layer needs to calculate K (K=4) parallel convolutions each using the same weights (w) and concatenates the four outputs afterwards.
Is there a way to create four parallel convolutional layer with each having the same weights in NiftyNet?
Thank you in advance.
[1] https://arxiv.org/pdf/1805.08403.pdf
The solution to this problem is as follows.
There is no restriction allowing you to use the same convolutional layer multiple times, each time with another input. This simulates the desired parallelism and solves the weight sharing issue, because there is only one convolutional layer.
However, using this approach doesn't solve the issue having different dilation rates in each parallel layer - we only have one convolutional layer for the weight sharing problem as mentioned above.
Note: it is the same operation either using a given tensor as input
for a convolutional layer with dilation rate = 2 OR using a dilated
tensor with rate = 2 as input for a convolutional layer with dilation
rate = 1.
Therefore, creating K dilated tensors each with a different dilation rate and then using each of them as input for a single convolutional layer with dilation rate = 1 solves the problem having parallel layers each with a different dilation rate.
NiftyNet provides a class to create dilated tensors.
I know that deconvolution is basically convolution of output with flipped filters and I have implemented it for 2D data. But I am not able to generalize it for 3D data. For example consider the input of dimension 3x5x5 and the filter is of dimension 3x3x3 and stride is set to 1. SO, the output will be of the dimension 1x3x3. What I don't understand is how to calculate the deconvolution for this output. The flipped filter again will be of dimension 3x3x3 and output of convolution is of dimension 1x3x3 which are incompatible for convolution. So how can we calculate deconvolution ?
Perhaps this post will help you out a bit. You are correct in saying that a filter of the same size cannot fit the deconvolution dimensions. So in order to remedy that, the 1x3x3 gets padded throughout with zeros, mean-values, nn, etc. until it is the appropriate size that you require. Depth can be handled the same way. In your example, you want a 3x3x3 filter to 'deconvolve' the 1x3x3 to a 3x5x5. So we pad out the 1x3x3 to a 5x7x7 (with whichever method you prefer), and apply the filter. There are definite drawbacks with this process, stemming from the fact that you're trying to extrapolate more information from less.
I am interested in doing a ConvNets in which there are filters with different sizes at the same convolutional layer. How can I make it using Tensorflow?
you need to split your layer to several parallel conv with different filter sizes. Pad the convolutions according to filter size and then concat the outputs.
Look at GoogLeNet for an example of such configuration.
I recently found the "global_pooling" flag in the Pooling layer in caffe, however was unable to find sth about it in the documentation here (Layer Catalogue)
nor here (Pooling doxygen doc) .
Is there an easy forward examply explanation to this in comparison to the normal Pool-Layer behaviour?
With Global pooling reduces the dimensionality from 3D to 1D. Therefore Global pooling outputs 1 response for every feature map. This can be the maximum or the average or whatever other pooling operation you use.
It is often used at the end of the backend of a convolutional neural network to get a shape that works with dense layers. Therefore no flatten has to be applied.
Convolutions can work on any image input size (which is big enough). However, if you have a fully connected layer at the end, this layer needs a fixed input size. Hence the complete network needs a fixed image input size.
However, you can remove the fully connected layer and just work with convolutional layers. You can make a convolutional layer at the end which has the same number of filters as you have classes. But you want one value for each class which indicates the probability of that class. Hence you apply a pooling filter over the complete remaining feature map. This pooling is hence "global" as it always is as big as necessary. In contrast, usual pooling layers have a fixed size (e.g. of 2x2 or 3x3).
This is a general concept. You can also find global pooling in other libraries, e.g. Lasagne. If you want a good reference in literature, I recommend reading Network In Network.
We get only one value from entire feature map when we apply GP layer, in which kernel size is the h×w of the feature map. GP layers are used to reduce the spatial dimensions of a three-dimensional feature map. However, GP layers perform a more extreme type of dimensionality reduction, where a feature map with dimensions h×w×d is reduced in size to have dimensions 1×1×d. GP layers reduce each h×w feature map to a single number by simply taking the average of all hw values.
If you are looking for information regarding flags/parameters of caffe, it is best look them up in the comments of '$CAFFE_ROOT/src/caffe/proto/caffe.proto'.
For 'global_pooling' parameter the comment says:
// If global_pooling then it will pool over the size of the bottom by doing
// kernel_h = bottom->height and kernel_w = bottom->width
For more information about caffe layers, see this help pages.
Can the Keras deal with input images with different size? For example, in the fully convolutional neural network, the input images can have any size. However, we need to specify the input shape when we create a network by Keras. Therefore, how can we use Keras to deal with different input size without resizing the input images to the same size? Thanks for any help.
Yes.
Just change your input shape to shape=(n_channels, None, None).
Where n_channels is the number of channels in your input image.
I'm using Theano backend though, so if you are using tensorflow you might have to change it to (None,None,n_channels)
You should use:
input_shape=(1, None, None)
None in a shape denotes a variable dimension. Note that not all layers
will work with such variable dimensions, since some layers require
shape information (such as Flatten).
https://github.com/fchollet/keras/issues/1920
For example, using keras's functional API your input layer would be:
For a RGB dataset
inp = Input(shape=(3,None,None))
For a Gray dataset
inp = Input(shape=(1,None,None))
Implementing arbitrarily sized input arrays with the same computational kernels can pose many challenges - e.g. on a GPU, you need to know how big buffers to reserve, and more weakly how much to unroll your loops, etc. This is the main reason that Keras requires constant input shapes, variable-sized inputs are too painful to deal with.
This more commonly occurs when processing variable-length sequences like sentences in NLP. The common approach is to establish an upper bound on the size (and crop longer sequences), and then pad the sequences with zeros up to this size.
(You could also include masking on zero values to skip computations on the padded areas, except that the convolutional layers in Keras might still not support masked inputs...)
I'm not sure if for 3D data structures, the overhead of padding is not prohibitive - if you start getting memory errors, the easiest workaround is to reduce the batch size. Let us know about your experience with applying this trick on images!
Just use None while specifying input shape. But I still do not know how to pass different-shaped images into fit function.