How many neurons are required in convolution layer of CNN? - machine-learning

How many neurons are required in convolution layer to process image of size 32 * 32 with
32 filters and kernel size 3 * 3. I know input neurons will be 32*32 i.e 1024 but how to calculate neurons required in hidden convolution layer

A convolutional layer consists of filters, I don't think they are conventionally called neurons. So I guess the answer to your question is 32. Also note that the kernel size and the image size do not matter. The kernel size affects the number of parameters but (obviously) not the number of filters.
That is the point of convolutional layer - it does not matter what is the size of the input, the number of parameters is fixed.

There is no fix method or terminology to define the number of hidden units in a network (you reference it as neuron), but what is advised is that keep the number of hidden units less than the number of features.
Example : your input feature has size 256 (16 by 16 image) than the number of hidden unit should be kept under 256 for a good fit and preventing overfitting !
More neurons will overfit the data introducing a high variance(test set error greater than training error)

Related

How to choose the window size of CNN in deep learning?

In Convolutional Neural Network (CNN), a filter is select for weights sharing. For example, in the following pictures, a 3x3 window with the stride (distance between adjacent neurons) 1 is chosen.
So my question is: How to choose the window size? If I use 4x4 with the stride being 2, how much difference will it cause? Thanks a lot in advance!
There's no definite answer to this: filter size is one of hyperparameters you generally need to tune. However, there're some useful observations, that may help you. It's often preferred to choose smaller filters, but have greater number of those.
Example: four 5x5 filters have 100 parameters (ignoring bias), while 10 3x3 filters have 90 parameters. Through the larger of filters you still can capture the variety of features in the image, but with fewer parameters. More on this here.
Modern CNNs go even further with this idea and choose consecutive 3x1 and 1x3 convolutional layers. This reduces the number of parameters even more, but doesn't affect the performance. See the evolution of inception network.
The choice of stride is also important, but it affects the tensor shape after the convolution, hence the whole network. The general rule is to use stride=1 in usual convolutions and preserve the spatial size with padding, and use stride=2 when you want to downsample the image.

Neural Network Hidden Layer Input Size for this Tutorial

I am following part 5 of this tutorial which can be found in in this link: http://peterroelants.github.io/posts/neural_network_implementation_part05/
This creates a neural network suitable for identification handwritten digits from 0-9.
In the middle of the tutorial, the author explains that the neural network has 64 inputs (representing the 64 pixel image) which contains two hidden neural networks that has a input size of 20. (see below screenshot)
I have two questions:
1) Can anyone explain the choice of projecting the 64 input layer onto a 20 input layer? Why the choice of 20? Is it arbitrary or determined by experiment? Is there an intuitive reason why?
2) Why two hidden layers? I read somewhere that most problems can be solved with 1-2 hidden layers, and that is usually determined by trial and error. Is it the same case here?
Appreciate any thoughts
The network has:
one input layer with 64 neurons --> one for each pixel
a hidden layer with 20 neurons
another hidden layer with 20 neurons
an output layer with 10 neurons --> one for each digit
The choice of two hidden layers with 20 neurons each is relatively arbitrary, and probably determined by experiment, just as you said. Also, the description of each of these layers as another network can be confusing/misleading. You are also right on account of 1-2 hidden layers usually being sufficient for problems, and with digit recognition, which is not to complex, this is the case.

Digit Recognition on CNN

I am testing printed digits (0-9) on a Convolutional Neural Network. It is giving 99+ % accuracy on the MNIST Dataset, but when I tried it using fonts installed on computer (Ariel, Calibri, Cambria, Cambria math, Times New Roman) and trained the images generated by fonts (104 images per font(Total 25 fonts - 4 images per font(little difference)) the training error rate does not go below 80%, i.e. 20% accuracy. Why?
Here is "2" number Images sample -
I resized every image 28 x 28.
Here is more detail :-
Training data size = 28 x 28 images.
Network parameters - As LeNet5
Architecture of Network -
Input Layer -28x28
| Convolutional Layer - (Relu Activation);
| Pooling Layer - (Tanh Activation)
| Convolutional Layer - (Relu Activation)
| Local Layer(120 neurons) - (Relu)
| Fully Connected (Softmax Activation, 10 outputs)
This works, giving 99+% accuracy on MNIST. Why is so bad with computer-generated fonts? A CNN can handle lot of variance in data.
I see two likely problems:
Preprocessing: MNIST is not only 28px x 28px, but also:
The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
Source: MNIST website
Overfitting:
MNIST has 60,000 training examples and 10,000 test examples. How many do you have?
Did you try dropout (see paper)?
Did you try dataset augmentation techniques? (e.g. slightly shifting the image, probably changing the aspect ratio a bit, you could also add noise - however, I don't think those will help)
Did you try smaller networks? (And how big are your filters / how many filters do you have?)
Remarks
Interesting idea! Did you try simply applying the trained MNIST network on your data? What are the results?
It may be an overfitting problem. It could happen when your network is too complex for the problem to resolve.
Check this article: http://es.mathworks.com/help/nnet/ug/improve-neural-network-generalization-and-avoid-overfitting.html
It definitely looks like an issue of overfitting. I see that you have two convolution layers, two max pooling layers and two fully connected. But how many weights total? You only have 96 examples per class, which is certainly smaller than the number of weights you have in your CNN. Remember that you want at least 5 times more instances in your training set than weights in your CNN.
You have two solutions to improve your CNN:
Shake each instance in the training set. You each number about 1 pixel around. It will already multiply your training set by 9.
Use a transformer layer. It will add an elastic deformation to each number at each epoch. It will strengthen a lot the learning by artificially increase your training set. Moreover, it will make it much more effective to predict other fonts.

Neural Networks (input and output layers)

When dealing with muticlass classification, is it always that the number of nodes (which are vectors) in the input layer excluding bias is the same as the number of nodes in the output layer?
No. The input layer ingests the features. The output layer makes predictions for classes. The number of features and classes does not need to be the same; it also depends on how exactly you model the multiple classes output.
Lars Kotthoff is right. However, when you are using an artificial neural network to build an autoencoder, you will want to have the same number of input and output nodes, and you will want the output nodes to learn the values of the input nodes.
Nope,
Usually number of input unites equals to number of features you are going use for training the NN classifier.
Size of the output layer equals to number of classes in the dataset. Further, if dataset has two classes only just one output unit is enough for discriminating these two classes.
The ANN output layer has a node for each class: if you have 3 classes, you use 3 nodes. The input layer (often called a feature vector) has a node for each feature used for prediction and usually an extra bias node. You usually need only 1 hidden layer, and discerning its ideal size tricky.
Having too many hidden layer nodes can result in overfitting and slow training. Having too few hidden layer nodes can result in underfitting (overgeneralizing).
Here are a few general guidelines (source) to start with:
The number of hidden neurons should be between the size of the input layer and the size of the output layer.
The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
The number of hidden neurons should be less than twice the size of the input layer.
If you have 3 classes and an input vector of 30 features, you can start with a hidden layer of around 23 nodes. Add and remove nodes from this layer during training to reduce your error, while testing against validation data to prevent overfitting.

Overflowing of neural network weights in training

I'm training my neural network to classify some things in an image. I crop 40x40 pixels images and classify it that it as some object or not. So it has 1600 input neurons, 3 hidden layers (500, 200, 30) and 1 output neuron that must say 1 or 0. I use the Flood library.
I cannot train it with QuasiNewtonMethod, because it uses a big matrix in the algorithm and it do not fit in my memory. So I use GradientDescent and the ObjectiveFunctional is NormalizedSquaredError.
The problem is that by training it overflows the weights and the output of the neural network is INF or NaN for every input.
Also my dataset is too big (about 800mb when it is in CSV) and I can't load it fully. So I made many InputTargetDataSets with 1000 instances and saved it as XML (the default format for Flood) and training it for one epoch on each dataset randomly shuffled. But also when I train it just on one big dataset (10000 instances) it overflows.
Why is this happening and how can I prevent that?
I would recommend normalization of inputs. You should also think about that if you have 1600 neurons..output of input layer will sum(if sigmoid neurons) and there can be many problems.
It is quite useful to print out some steps..for example in which step it overflows.
There are some tips for weights of neurons. I would recommend very small < 0.01. Maybe if you could give more info about NN and intervals of inputs, weights etc. I could give you some other ideas.
And btw I think it is mathematically proved that two layers should be enough so there is no need for three hidden layers if you are not using some specialized algorithms which simulate human eye..

Resources