What is the range that weights can be for activation functions? - machine-learning

I am creating my own neural network and want to know how the weights relate to activation functions. I have coded the logistic and tanh activation functions and want to know if the weights that are attached to the individual neurons have to be different regarding each function. Right now I have the range of weights of the logistic activation function from 0 to 1 and tanh as -1 to 1.

Why would you limit the weights? I think your muddled up somewhere. This is how the output of a neuron is determined:
var output = ActivationFunction(every(connection.weight * connection.from.output) + bias)
So make a sum of all the incoming weights and their source values, add the bias, and pass it throught he activation function.
Yes, the sigmoid is limited between 0 and 1. But why would you limit your weights because of that? The value of the weights is nowhere linked to the activation function.
Limiting your weights to such small ranges (0 to 1 is a miniscule range) will make your network unable to solve certain patterns.

Related

can perceptrons take real values as input or just 0 and 1?

can perceptrons take real values as input or just 0 and 1? I am confused because the output of perceptron is 0 or 1 so I thought the input will be binary.
Let's break down the basics of a perceptron process;
Wight: When a signal comes in (it can be real value), it gets multiplied by a weight value that is assigned to this particular input. That is, if a neuron has three inputs, then it has three weights that can be adjusted individually.
Weighted sum: In the next step, the modified input signals are summed up to a single value (which can be real values). In this step, an offset is also added to the sum. This offset is called bias. The neural network also adjusts the bias during the learning phase. At the start, all the neurons have random weights and random biases. After each learning iteration, weights and biases are gradually shifted so that the next result is a bit closer to the desired output. This way, the neural network gradually moves towards a state where the desired patterns are “learned”.
Activation function: Finally, the result of the neuron’s calculation is turned into an output signal by using an activation function. The heaviside step function most commonly used - it produces a binary output, 0 or 1.
You can find more info on activation functions online or have a look at this: https://sefiks.com/2017/05/15/step-function-as-a-neural-network-activation-function/

How many neurons does a perceptron have?

This is a classical visualization of the perceptron learning model, though I don't know where it comes from originally.
My question is How many neurons does this perceptron have? My guess is N+2, N+1 for inputs, another 1 for output. Is it correct?
The above network takes numerical inputs X1,X2,.., Xn and has weights w1 ,w2 and wn associated with those inputs. Also, there is another input 1 with weight w0 (called the bias unit) associated with it. Also this is one neuron.
This is what a bias unit does:
Bias is to provide every node with a trainable constant value (in addition to the normal inputs that the node receives).
The output is the weighted sum. Something like this:
f(x)=x1*w1+x2*w2+xn*wn+1*w0
to learn more check this, explains it very well http://117.239.79.250/moodle/pluginfile.php/6283/mod_resource/content/1/ANN1.pdf
A perceptron itself is a type of Neuron. In the figure the four inputs aren't neurons but just 4 inputs to a single neuron (perceptron). Also, the step function circle isn't n extra neuron. This step function calculation happens inside the perceptron where the weighted sum is calculated.
So what you see in the figure is a single neuron with its components broken down into fundamental parts.

Neural Network Developing

I am try to write a neural network class but I don't fully understand some aspects of it. I have two questions on the folling design.
Am I doing this correctly? Does the bias neuron need to connect to all of neurons (except those in the input layer) or just those in the hidden layer?
My second question is about calculation the output value. I'm using the equation below to calculate the output value of the neurons.
HiddenLayerFirstNeuron.Value =
(input1.Value * weight) + (input2.Value * weight) + (Bias.Value * weight)
After this equation, I'm calculating the activation and the result send the output. And output neurons doing same.
I'm not sure what I am do and I want to clear up problems.
Take a look at: http://deeplearning.net/tutorial/contents.html in theano. This explains everything you need to know for multi layer perceptron using theano (symbolic mathematic library).
The bias is usually connected to all hidden and output units.
Yes, you compute the input of activation function like summation of weight*output of previous layer neuron.
Good luck with development ;)
There should be a separate bias neuron for each hidden and the output layer. Think of the layers as a function applied to a first order polynomials such as f(m*x+b)=y where y is your output and f(x) your activation function. If you look at the the linear term you will recognize the b. This represents the bias and it behaves similar with neural network as with this simplification: It shifts the hyperplane up and down the in the space. Keep in mind that you will have one bias per layer connected to all neurons of that layer f((wi*xi+b)+...+(wn*xn+b)) with an initial value of 1. When it comes to gradient descent, you will have to train this neuron like a normal weight.
In my opinion should you apply the activation function to the output layer as well. This is how it's usually done with multilayer perceptrons. But it actually depends of what you want. If you, for example, use the logistic function as activation function and you want an output in the interval (0,1), then you have to apply your activation function to the output as well. Since a basic linear combination, as it is in your example, can theoretically go above the boundaries of the previously mentioned Intervall.

Questions about Q-Learning using Neural Networks

I have implemented Q-Learning as described in,
http://web.cs.swarthmore.edu/~meeden/cs81/s12/papers/MarkStevePaper.pdf
In order to approx. Q(S,A) I use a neural network structure like the following,
Activation sigmoid
Inputs, number of inputs + 1 for Action neurons (All Inputs Scaled 0-1)
Outputs, single output. Q-Value
N number of M Hidden Layers.
Exploration method random 0 < rand() < propExplore
At each learning iteration using the following formula,
I calculate a Q-Target value then calculate an error using,
error = QTarget - LastQValueReturnedFromNN
and back propagate the error through the neural network.
Q1, Am I on the right track? I have seen some papers that implement a NN with one output neuron for each action.
Q2, My reward function returns a number between -1 and 1. Is it ok to return a number between -1 and 1 when the activation function is sigmoid (0 1)
Q3, From my understanding of this method given enough training instances it should be quarantined to find an optimal policy wight? When training for XOR sometimes it learns it after 2k iterations sometimes it won't learn even after 40k 50k iterations.
Q1. It is more efficient if you put all action neurons in the output. A single forward pass will give you all the q-values for that state. In addition, the neural network will be able to generalize in a much better way.
Q2. Sigmoid is typically used for classification. While you can use sigmoid in other layers, I would not use it in the last one.
Q3. Well.. Q-learning with neural networks is famous for not always converging. Have a look at DQN (deepmind). What they do is solving two important issues. They decorrelate the training data by using memory replay. Stochastic gradient descent doesn't like when training data is given in order. Second, they bootstrap using old weights. That way they reduce non-stationary.

Neural network classifier

When you have 2 classes A with 2 elements and B with one element in 1D space in any configuration. Task is to distinguish between the two classes, to classify them. If you can choose arbitrary activation function, what is the minimal number of neurons that can solve this.
I am thinking that you always have to use at least two neurons or am I wrong?
Your question is somewhat related to the classical XOR problem for perceptrons. Let us suppose for a moment, that it's about a neural network with the specific activation function - binary threshold - which perceptron has. Then the task turns into 1D XOR problem, and then indeed you need 2 neurons in hidden layer and 1 neuron in output layer to solve it. But you mention that an arbitrary activation function can be chosen. In this case we can choose radial basis function (RBF) network. If it is possible to denote class A as output value greater than T and class B as output value less than T, then only 1 RBF neuron will suffice to distinguish the classes. If you want every class to have its own output (which value can be treated as a probability measure of input data belonging to corresponding class), then you need 2 RBF neurons.

Resources