This question already exists:
How to decide on number of neurons in RNN?
Closed 2 years ago.
I am new to RNN and trying to understand them. My question is: Is the number of neurons dependent on the size of the sequence and the number of time steps? My main understanding is since it takes a sequence of input then the number of neurons should be the same as the size of the sequence. if we have 10 time-steps and thus 10 different inputs then, we should have 10 neurons. If no then how do we feed or sequence to the neurons if we have a sequence of size 20 and 10 neurons only?
I will answer your first question.
Generally - yes, the number of neurons is dependent on the size of the sequence: the more neurons - the better prediction of longer sequences. But even one neuron may give you a perfect prediction if your sequence is simple (e.g. if your sequence is [1, 1, 1, 1, 1]).
Related
The question is like this one What's the input of each LSTM layer in a stacked LSTM network?, but more into implementing details.
For simplicity how about 4 units and 2 units structures like the following
model.add(LSTM(4, input_shape=input_shape, return_sequences=True))
model.add(LSTM(2,input_shape=input_shape))
So I know the output of LSTM_1 is 4 length but how do the next 2 units handle these 4 inputs, are they fully connected to the next layer of nodes?
I guess they are fully connected but not sure like the following figure, it was not stated in the Keras document
Thanks!
It's not length 4, it's 4 "features".
The length is in the input shape and it never changes, there is absolutely no difference between what happens when you give a regular input to one LSTM and what happens when you give an output of an LSTM to another LSTM.
You can just look at the model's summary to see the shapes and understand what is going on. You never change the length using LSTMs.
They don't communicate at all. Each one takes the length dimension, processes it recurrently, independently from the other. When one finishes and outputs a tensor, the next one gets the tensor and process it alone following the same rules.
I am new to deep learning and tensor flow and I am trying to train a CNN at localizing digits in the Street View House Numbers data set. To this end I have an input set of 32x32 images and, since I want to recognize up to 5 digits, I am using as labels vectors of 20 elements like this
[top_x_digit1,top_y_digit1,width_digit1,height_digit1,top_x_digit2, etc..]
0,0,0,0 when there is no digit
As far as I understand, after (let me say) 3 layers of convolution and pooling I can add 5 (parallel) fully connected layers aimed at extracting each the box features of a different digit (when present, 0 0 0 0 otherwise).
is my approach correct?
As per the documentation provided by Scikit learn
hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)
I have little doubt.
In my code what I have configured is
MLPClassifier(algorithm='l-bfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
so what does 5 and 2 indicates?
What I understand is, 5 is the numbers of hidden layers, but then what is 2?
Ref - http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#
From the link you provided, in parameter table, hidden_layer_sizes row:
The ith element represents the number of neurons in the ith hidden
layer
Which means that you will have len(hidden_layer_sizes) hidden layers, and, each hidden layer i will have hidden_layer_sizes[i] neurons.
In your case, (5, 2) means:
1rst hidden layer has 5 neurons
2nd hidden layer has 2 neurons
So the number of hidden layers is implicitely set
Some details that I found online concerning the architecture and the units of the input, hidden and output layers in sklearn.
The number of input units will be the number of features
For multiclass classification the number of output units will be the number of labels
Try a single hidden layer, or if more than one then each hidden layer should have the same number of units
The more units in a hidden layer the better, try the same as the number of input features up to twice or even three or four times that
I'm trying to create the neural network shown below. It has 3 inputs, 2 outputs, and 2 hidden layers (so 4 layers altogether, or 3 layers of weight matrices). In the first hidden layer there are 4 neurons, and in the second hidden layer there are 3. There is a bias neuron going to the first and second hidden layer, and the output layer.
I have tried using the "create custom neural network" function in MATLAB, but I can't get it to work how I want it to.
This is how I used the function
net1=network(3,3,[1;1;1],[1,1,1;0,0,0;0,0,0],[0,0,0;1,0,0;0,1,0],[0,0,0])
view(net1)
And it gives me the neural network shown below:
As you can see, this isn't what I want. There are only 3 weights in the first layer, 1 in the second, 1 in the output layer, and only one output. How would I fix this?
Thanks!
Just to clarify how I want this network to work:
The user will input 3 numbers into the network.
Each one of the 3 inputs is multiplied by 4 different weights, and then these numbers are sent to the 4 neurons in the first hidden layer.
The bias node acts the same as one of the inputs, but it always has a value of 1. It is multiplied by 4 different weights, and then sent to the 4 neurons in the first hidden layer.
Each neuron in the first hidden layer sums the 4 numbers going into it, and then passes this number through the sigmoid activation function.
The neurons in the first hidden layer then output 4 numbers that are each multiplied by 3 different weights, and sent to the 3 neurons in the second hidden layer.
The bias node going to the second hidden layer works the same as the first bias node
Each neurons in the second hidden layer sums up the 5 numbers going into it and passes it through the sigmoid activation function.
The neurons in the second layer then output two numbers that are again multiplied by weights and go to each of the outputs
The output layer also sums all of its inputs, including its bias input, and then passes this through the sigmoid activation function to get the final two values.
After some time playing around I've figured out how to do it. The code I needed to use is:
net = newff([0 1; 0 1; 0 1],[4,3 2],{'logsig','logsig','logsig'})
view(net)
This creates the network I was looking for.
I was originally mistaken about the matlab representation of neural networks. The green arrows show the path of all of the numbers, not just a single number.
I'm training my neural network to classify some things in an image. I crop 40x40 pixels images and classify it that it as some object or not. So it has 1600 input neurons, 3 hidden layers (500, 200, 30) and 1 output neuron that must say 1 or 0. I use the Flood library.
I cannot train it with QuasiNewtonMethod, because it uses a big matrix in the algorithm and it do not fit in my memory. So I use GradientDescent and the ObjectiveFunctional is NormalizedSquaredError.
The problem is that by training it overflows the weights and the output of the neural network is INF or NaN for every input.
Also my dataset is too big (about 800mb when it is in CSV) and I can't load it fully. So I made many InputTargetDataSets with 1000 instances and saved it as XML (the default format for Flood) and training it for one epoch on each dataset randomly shuffled. But also when I train it just on one big dataset (10000 instances) it overflows.
Why is this happening and how can I prevent that?
I would recommend normalization of inputs. You should also think about that if you have 1600 neurons..output of input layer will sum(if sigmoid neurons) and there can be many problems.
It is quite useful to print out some steps..for example in which step it overflows.
There are some tips for weights of neurons. I would recommend very small < 0.01. Maybe if you could give more info about NN and intervals of inputs, weights etc. I could give you some other ideas.
And btw I think it is mathematically proved that two layers should be enough so there is no need for three hidden layers if you are not using some specialized algorithms which simulate human eye..