Neural Networks (input and output layers) - machine-learning

When dealing with muticlass classification, is it always that the number of nodes (which are vectors) in the input layer excluding bias is the same as the number of nodes in the output layer?

No. The input layer ingests the features. The output layer makes predictions for classes. The number of features and classes does not need to be the same; it also depends on how exactly you model the multiple classes output.

Lars Kotthoff is right. However, when you are using an artificial neural network to build an autoencoder, you will want to have the same number of input and output nodes, and you will want the output nodes to learn the values of the input nodes.

Nope,
Usually number of input unites equals to number of features you are going use for training the NN classifier.
Size of the output layer equals to number of classes in the dataset. Further, if dataset has two classes only just one output unit is enough for discriminating these two classes.

The ANN output layer has a node for each class: if you have 3 classes, you use 3 nodes. The input layer (often called a feature vector) has a node for each feature used for prediction and usually an extra bias node. You usually need only 1 hidden layer, and discerning its ideal size tricky.
Having too many hidden layer nodes can result in overfitting and slow training. Having too few hidden layer nodes can result in underfitting (overgeneralizing).
Here are a few general guidelines (source) to start with:
The number of hidden neurons should be between the size of the input layer and the size of the output layer.
The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
The number of hidden neurons should be less than twice the size of the input layer.
If you have 3 classes and an input vector of 30 features, you can start with a hidden layer of around 23 nodes. Add and remove nodes from this layer during training to reduce your error, while testing against validation data to prevent overfitting.

Related

Neural Network Hidden Layer Input Size for this Tutorial

I am following part 5 of this tutorial which can be found in in this link: http://peterroelants.github.io/posts/neural_network_implementation_part05/
This creates a neural network suitable for identification handwritten digits from 0-9.
In the middle of the tutorial, the author explains that the neural network has 64 inputs (representing the 64 pixel image) which contains two hidden neural networks that has a input size of 20. (see below screenshot)
I have two questions:
1) Can anyone explain the choice of projecting the 64 input layer onto a 20 input layer? Why the choice of 20? Is it arbitrary or determined by experiment? Is there an intuitive reason why?
2) Why two hidden layers? I read somewhere that most problems can be solved with 1-2 hidden layers, and that is usually determined by trial and error. Is it the same case here?
Appreciate any thoughts
The network has:
one input layer with 64 neurons --> one for each pixel
a hidden layer with 20 neurons
another hidden layer with 20 neurons
an output layer with 10 neurons --> one for each digit
The choice of two hidden layers with 20 neurons each is relatively arbitrary, and probably determined by experiment, just as you said. Also, the description of each of these layers as another network can be confusing/misleading. You are also right on account of 1-2 hidden layers usually being sufficient for problems, and with digit recognition, which is not to complex, this is the case.

How to fit a classifier with high accuracy on the training set with low features?

I have input (r,c) in range (0, 1] as the coordinate of a pixel of an image and its color 1 or 2 only.
I have about 6,400 pixels.
My attempt of fitting X=(r,c) and y=color was a failure the accuracy won't go higher than 70%.
Here's the image:
The first is the actual image, the 2nd is the image I use to train on, it has only 2 colors. The last is the image that the neural network generated with about 500 weights training with 50 iterations. Input Layer is 2, one hidden layer of size 100, and the output layer is 2. (for binary classification like this, I may need only one output layer but I am just preparing for multi-class classification)
The classifier failed to fit the training set, why is that? I tried generating high polynomial terms of those 2 features but it doesn't help. I tried using Gaussian kernel and random 20-100 landmarks on the picture to add more features, also got similar output. I tried using logistic regressions, doesn't help.
Please help me increase the accuracy.
Here's the input:input.txt (you can load it into Octave the variable is coordinate (r,c features) and idx (color)
You can try plotting it first to make sure that you understand the input then try training on it and tell me if you get better result.
Your problem is hard to model. You are trying to fit function from R^2 to R, which has lots of complexity - lots of "spikes", lots of discontinuous regions (pixels that are completely separated from the rest). This is not an easy problem, and not usefull one.. In order to overfit your network to such setting you will need plenty of hidden units. Thus, what are the options to do so?
General things that are missing in the question, and are important
Your output variable should be {0, 1} if you are fitting your network through cross entropy cost (log likelihood), which you should use for classification.
50 iteraions (if you are talking about some mini-batch iteraions) is orders of magnitude to small, unless you mean 50 epochs (iterations over whole training set).
Actual things, that will probably need to be done (at least one of the below):
I assume that you are using ReLU activations (or Tanh, hard to say looking at the output) - you can instead use RBF activations, and increase number of hidden neurons to ~5000,
If you do not want to go with RBFs, then you will need 1-2 additional hidden layers to fit function of this complexity. Try architecture of type 100-100-100 instaed.
If the above fails - increase number of hidden units, that's all you need - enough capacity.
In general: neural networks are not designed for working with low dimensional datasets. This is nice example from the web, that you can learn pix-pos to color mapping, but it is completely artificial and seems to actually harm people intuitions.

Scikit learn multilayer neural network

As per the documentation provided by Scikit learn
hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)
I have little doubt.
In my code what I have configured is
MLPClassifier(algorithm='l-bfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
so what does 5 and 2 indicates?
What I understand is, 5 is the numbers of hidden layers, but then what is 2?
Ref - http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#
From the link you provided, in parameter table, hidden_layer_sizes row:
The ith element represents the number of neurons in the ith hidden
layer
Which means that you will have len(hidden_layer_sizes) hidden layers, and, each hidden layer i will have hidden_layer_sizes[i] neurons.
In your case, (5, 2) means:
1rst hidden layer has 5 neurons
2nd hidden layer has 2 neurons
So the number of hidden layers is implicitely set
Some details that I found online concerning the architecture and the units of the input, hidden and output layers in sklearn.
The number of input units will be the number of features
For multiclass classification the number of output units will be the number of labels
Try a single hidden layer, or if more than one then each hidden layer should have the same number of units
The more units in a hidden layer the better, try the same as the number of input features up to twice or even three or four times that

Neural network, minimum number of neurons

I've got a 2D surface where a ship (with constant speed) navigates around the scene to pick up candy. For every candy the ship picks up I increase the fitness. The NN has one output to steer the ship (0 for left and 1 for right, so 0.5 would be straight forward) There are four inputs in the range [-1 .. 1], that represents two normalized vectors. The ship direction and the direction to the piece of candy.
Is there any way to calculate the minimum number of neurons in the hidden layer? I also tried giving two inputs instead of four, the first was the dot product [-1..1] (where I dotted the ship direction with the direction to the candy) and the second was (0/1) if the candy was to the left/right of the ship. It seems like this approach worked a lot better with fewer neurons in the hidden layer.
Fewer inputs should imply fewer number of neurons. This is because the number of input combinations decrease and it gets easier for the neural network to learn the system. There is no golden rule as to how to calculate the best number of nodes in the hidden layer. However, with 2 inputs I'd say 2 hidden nodes should work fine. It really depends on the degree of non linearity in your inputs.
Defining the number of hidden layers and the number of neurons in each hidden layers always was a challenge and it may diverge from each type of problems. By the way, a single hidden layer in a feedforward neural network could solve most of the problems, given it can aproximate functions.
Murata defined some rules to use in neural networks to define the number of hidden neurons in a feedforward neural network:
The value should be between the size of the input and output layers.
The value should be 2/3 the size of the input layer plus the size of the output layer.
The value should be less than twice the size of the input layer
You could try these rules and evaluate the impact of it in your neural network.

In 3 layer MLP, why should the input to hidden weights be random?

For Example for 3-1-1 layer if the weights are initialized equally the MLP might not learn well. But why does this happen?
If you only have one neuron in the hidden layer, it doesn't matter. But, imagine a network with two neurons in the hidden layer. If they have the same weights for their input, than both neurons would always have the exact same activation, there is no additional information by having a second neuron. And in the backpropagation step, those weights would change by an equal amount. Hence, in every iteration, those hidden neurons have the same activation.
It looks like you have a typo in your question title. I'm guessing that you mean why should the weights of hidden layer be random. For the example network you indicate (3-1-1), it won't matter because you only have a single unit in the hidden layer. However, if you had multiple units in the hidden layer of a fully connected network (e.g., 3-2-1) you should randomize the weights because otherwise, all of the weights to the hidden layer will be updated identically. That is not what you want because each hidden layer unit would be producing the same hyperplane, which is no different than just having a single unit in that layer.

Resources