the *.cafemodel is an output of a network after the training phase. Do you think its size is proportional to the number of parameters? It means that if I have two networks A and B, the network A results in *.caffemodel with size 10MB in the disk, while network B results in a *.caffemodel with size 20MB in the disk. Is it right if I said the network A has less number of learnable parameter than network B?
Not quite. The size is mostly dependent on the memory needed to store the parameters and weights. This includes not only the quantity of each, but the data size as well. If one model is set up to work with short fixed-point arithmetic, while another uses normal floats, there will be a large discrepancy in storage needs.
Also, layer connectivity has a lot to do with the file size. A fully-connected layer has far more weights than a convolution layer involving the same quantities of parameters.
Does that help sort things out a little?
Related
In the inception networks like inception-v3 and inception-v4, the kernel sizes are smaller in the lower layers,such as 3*3, but in the higher layers, the kernel sizes seem to be larger,such as 5*5,7*7,although they may be factorized to n*1&1*n later.But as network goes deeper,the spatial size of the feature map goes down,is there any relationship between these two thing?
ps:My question is why the kernel sizes in the lower layers seem to be samller(no more than 3*3),and you can find larger kernel size like 7*7 in the higher layers(more precisely, in the middle layers ).Is there any relationship between the spatial size of the feature map and the spatial spatial size of the conv kernels?Take inception v3 as a example, when the spatial sizes of the feature maps are larger than 35 in the first few layers of the network,the biggest kernel size is 5*5, but when the spatial size become 17, kernel size like 7*7 is used.
Any help will be appreciated.
Generally as you go deep into the network:-
The spatial size of feature map decreases to localize the object and
to reduce the computational cost.
The number of filters/kernels
increase as usually the initial layer represent generic features,
while the deeper layer represents more detailed features. Since
initial layers learn only primitive regularities in the data, you do
not need to have high volumes of filter there. However as you go
deep, you should try to look into as much details as possible, hence
increase in the number of filters/kernels. Therefore increasing the
number of filters in deeper layers increases the representational
power of the network.
In inception module, at each layer, kernels of multiple size (1x1, 3x3, 5x5) are used in computation and and resulting feature map is concatenated and passed to the next layer.
I am playing around convolutional neural networks at home with tensorflow (btw I have done the udacity deep learning course, so I have the theory basis). What impact has the size of the patch when one runs a convolution? does such size have to change when the image is bigger/smaller?
One of the exercises I did involved the CIFAR-10 databaese of images (32x32 px), then I used convolutions of 3x3 (with a padding of 1), getting decent results.
But lets say now I want to play with images larger than that (say 100x100), should I make my patches bigger? Do I keep them 3x3? Furthermore, what would be the impact of making a patch really big? (Say 50x50).
Normally I would test this at home directly, but running this on my computer is a bit slow (no nvidia GPU!)
So the question should be summarized as
Should I increase/decrease the size of my patches when my input images are bigger/smaller?
What is the impact (in terms of performance/overfitting) of increasing/decreasing my path size?
If you are not using padding, larger kernel makes number of neuron in the next layer will be smaller.
Example: Kernel with size 1x1 give the next layer the same number of neuron; kernel with size NxN give only one neuron in the next layer.
The impact of larger kernel:
Computational time is faster, memory usage is smaller
Loss a lot of details. Imagine NxN input neuron and the kernel size is NxN too, then the next layer only gives you one neuron. Loss a lot of details can lead you to underfitting.
The answer:
It depends on the images, if you needed a lot of details from the image you don't need to increase your kernel size. If your image is a 1000x1000 pixel large-version of MNIST image, I will increase the kernel size.
Smaller kernel will gives you a lot of details, it can lead you to overfitting, but larger kernel will gives you loss a lot of details, it can lead you to underfitting. You should tune your model to find the best size. Sometimes, time and machine specification should be considered
If you are using padding, you can adjust so the result neuron after convolution will be the same. I can't said it will be better than not using padding, but the loss of more details still occurs than using smaller kernel
It depends more on the size of the objects you want to detect or in other words, the size of the receptive field you want to have. Nevertheless, choosing the kernel size was always a challenging decision. That is why the Inception model was created which uses different kernel sizes (1x1, 3x3, 5x5). The creators of this model also went deeper and tried to decompose the convolutional layers into ones with smaller patch size while maintaining the same receptive field to try to speed up the training (ex. 5x5 was decomposed to two 3x3 and 3x3 was decomposed to 3x1 and 1x3) creating different versions of the inception model.
You can also check the Inception V2 paper for more details https://arxiv.org/abs/1512.00567
I recently found the "global_pooling" flag in the Pooling layer in caffe, however was unable to find sth about it in the documentation here (Layer Catalogue)
nor here (Pooling doxygen doc) .
Is there an easy forward examply explanation to this in comparison to the normal Pool-Layer behaviour?
With Global pooling reduces the dimensionality from 3D to 1D. Therefore Global pooling outputs 1 response for every feature map. This can be the maximum or the average or whatever other pooling operation you use.
It is often used at the end of the backend of a convolutional neural network to get a shape that works with dense layers. Therefore no flatten has to be applied.
Convolutions can work on any image input size (which is big enough). However, if you have a fully connected layer at the end, this layer needs a fixed input size. Hence the complete network needs a fixed image input size.
However, you can remove the fully connected layer and just work with convolutional layers. You can make a convolutional layer at the end which has the same number of filters as you have classes. But you want one value for each class which indicates the probability of that class. Hence you apply a pooling filter over the complete remaining feature map. This pooling is hence "global" as it always is as big as necessary. In contrast, usual pooling layers have a fixed size (e.g. of 2x2 or 3x3).
This is a general concept. You can also find global pooling in other libraries, e.g. Lasagne. If you want a good reference in literature, I recommend reading Network In Network.
We get only one value from entire feature map when we apply GP layer, in which kernel size is the h×w of the feature map. GP layers are used to reduce the spatial dimensions of a three-dimensional feature map. However, GP layers perform a more extreme type of dimensionality reduction, where a feature map with dimensions h×w×d is reduced in size to have dimensions 1×1×d. GP layers reduce each h×w feature map to a single number by simply taking the average of all hw values.
If you are looking for information regarding flags/parameters of caffe, it is best look them up in the comments of '$CAFFE_ROOT/src/caffe/proto/caffe.proto'.
For 'global_pooling' parameter the comment says:
// If global_pooling then it will pool over the size of the bottom by doing
// kernel_h = bottom->height and kernel_w = bottom->width
For more information about caffe layers, see this help pages.
I was wondering if there is any benefit to training on high resolution images rather than low resolution. I understand that it will take longer to train on larger images and that the dimensions must be a multiple of 32. My current image set is 1440x1920. Would I be better off resizing to 480x640, or is bigger better?
It's certainly not a requirement that your images be powers of two. There may be some cases where it speeds things up (e.g. GPU allocation) but it's not critical.
Smaller images will train significantly faster, and possibly even converge quicker (all other factors held constant) as you will be able to train on bigger batches (e.g. 100-1000 images in one pass, which you might not be able to do on a single machine with high res imagery).
As to whether to resize, you need to ask yourself if every pixel in that image is critical to your task. Often this is not the case - you can probably resize a photo of a bus down to say 128x128 and still recognize that it's a bus.
Using smaller images can also help your network generalise better, too, as there is less data to overfit.
A technique often used in image classification networks is to perform distortions (e.g. random cropping, scaling & brightness adjustment) on images to (a) convert odd-sized images to a constant size, (b) synthesize more data and (c) encourage the network to generalise.
This depends largely on the application. As a rule of thumb, I'd ask myself the question: can I complete the task myself on the resized images? If so, I'd downsize to the lowest resolution before it makes the task more difficult for you yourself. If not... you're going to have to be -very- patient using images 1440 * 1920. I imagine you'll almost always be better off experimenting with more varied architectures and hyper-parameter sets on smaller images compared to fewer models on full resolution images.
Whatever size you choose, you'll have to design your network for the image size you have in mind. If you're using convolutional layers, a larger image will require larger strides, filter sizes and/or layers. The number of parameters will stay the same for each convolution, though the number of features will grow (along with batch normalisation parameters if you're using it).
Can the Keras deal with input images with different size? For example, in the fully convolutional neural network, the input images can have any size. However, we need to specify the input shape when we create a network by Keras. Therefore, how can we use Keras to deal with different input size without resizing the input images to the same size? Thanks for any help.
Yes.
Just change your input shape to shape=(n_channels, None, None).
Where n_channels is the number of channels in your input image.
I'm using Theano backend though, so if you are using tensorflow you might have to change it to (None,None,n_channels)
You should use:
input_shape=(1, None, None)
None in a shape denotes a variable dimension. Note that not all layers
will work with such variable dimensions, since some layers require
shape information (such as Flatten).
https://github.com/fchollet/keras/issues/1920
For example, using keras's functional API your input layer would be:
For a RGB dataset
inp = Input(shape=(3,None,None))
For a Gray dataset
inp = Input(shape=(1,None,None))
Implementing arbitrarily sized input arrays with the same computational kernels can pose many challenges - e.g. on a GPU, you need to know how big buffers to reserve, and more weakly how much to unroll your loops, etc. This is the main reason that Keras requires constant input shapes, variable-sized inputs are too painful to deal with.
This more commonly occurs when processing variable-length sequences like sentences in NLP. The common approach is to establish an upper bound on the size (and crop longer sequences), and then pad the sequences with zeros up to this size.
(You could also include masking on zero values to skip computations on the padded areas, except that the convolutional layers in Keras might still not support masked inputs...)
I'm not sure if for 3D data structures, the overhead of padding is not prohibitive - if you start getting memory errors, the easiest workaround is to reduce the batch size. Let us know about your experience with applying this trick on images!
Just use None while specifying input shape. But I still do not know how to pass different-shaped images into fit function.