Neural network classifier - machine-learning

When you have 2 classes A with 2 elements and B with one element in 1D space in any configuration. Task is to distinguish between the two classes, to classify them. If you can choose arbitrary activation function, what is the minimal number of neurons that can solve this.
I am thinking that you always have to use at least two neurons or am I wrong?

Your question is somewhat related to the classical XOR problem for perceptrons. Let us suppose for a moment, that it's about a neural network with the specific activation function - binary threshold - which perceptron has. Then the task turns into 1D XOR problem, and then indeed you need 2 neurons in hidden layer and 1 neuron in output layer to solve it. But you mention that an arbitrary activation function can be chosen. In this case we can choose radial basis function (RBF) network. If it is possible to denote class A as output value greater than T and class B as output value less than T, then only 1 RBF neuron will suffice to distinguish the classes. If you want every class to have its own output (which value can be treated as a probability measure of input data belonging to corresponding class), then you need 2 RBF neurons.

Related

How many neurons does a perceptron have?

This is a classical visualization of the perceptron learning model, though I don't know where it comes from originally.
My question is How many neurons does this perceptron have? My guess is N+2, N+1 for inputs, another 1 for output. Is it correct?
The above network takes numerical inputs X1,X2,.., Xn and has weights w1 ,w2 and wn associated with those inputs. Also, there is another input 1 with weight w0 (called the bias unit) associated with it. Also this is one neuron.
This is what a bias unit does:
Bias is to provide every node with a trainable constant value (in addition to the normal inputs that the node receives).
The output is the weighted sum. Something like this:
f(x)=x1*w1+x2*w2+xn*wn+1*w0
to learn more check this, explains it very well http://117.239.79.250/moodle/pluginfile.php/6283/mod_resource/content/1/ANN1.pdf
A perceptron itself is a type of Neuron. In the figure the four inputs aren't neurons but just 4 inputs to a single neuron (perceptron). Also, the step function circle isn't n extra neuron. This step function calculation happens inside the perceptron where the weighted sum is calculated.
So what you see in the figure is a single neuron with its components broken down into fundamental parts.

Neural Network Developing

I am try to write a neural network class but I don't fully understand some aspects of it. I have two questions on the folling design.
Am I doing this correctly? Does the bias neuron need to connect to all of neurons (except those in the input layer) or just those in the hidden layer?
My second question is about calculation the output value. I'm using the equation below to calculate the output value of the neurons.
HiddenLayerFirstNeuron.Value =
(input1.Value * weight) + (input2.Value * weight) + (Bias.Value * weight)
After this equation, I'm calculating the activation and the result send the output. And output neurons doing same.
I'm not sure what I am do and I want to clear up problems.
Take a look at: http://deeplearning.net/tutorial/contents.html in theano. This explains everything you need to know for multi layer perceptron using theano (symbolic mathematic library).
The bias is usually connected to all hidden and output units.
Yes, you compute the input of activation function like summation of weight*output of previous layer neuron.
Good luck with development ;)
There should be a separate bias neuron for each hidden and the output layer. Think of the layers as a function applied to a first order polynomials such as f(m*x+b)=y where y is your output and f(x) your activation function. If you look at the the linear term you will recognize the b. This represents the bias and it behaves similar with neural network as with this simplification: It shifts the hyperplane up and down the in the space. Keep in mind that you will have one bias per layer connected to all neurons of that layer f((wi*xi+b)+...+(wn*xn+b)) with an initial value of 1. When it comes to gradient descent, you will have to train this neuron like a normal weight.
In my opinion should you apply the activation function to the output layer as well. This is how it's usually done with multilayer perceptrons. But it actually depends of what you want. If you, for example, use the logistic function as activation function and you want an output in the interval (0,1), then you have to apply your activation function to the output as well. Since a basic linear combination, as it is in your example, can theoretically go above the boundaries of the previously mentioned Intervall.

Maxout neurons: are the weights in the maxout function referring to 2 unique sets of weights?

I don't understand how maxout works and I suspect it's due to my visualization of the linear algebra multiplication. Basically, I'm under the impression that there are two sets of weights for the maxout functions, both individually trained and then only one is selected. But I'm suspecting this may be wrong, since I don't see a way that two different weights can be trained simultaneously in one feed forward run of the network.
Also, if the two weights w1 and w2 in the function does not refer to two unique sets of weights, then could there be more than two arguments being input to the maxout function, and of which only the max is chosen?
Here is the maxout function I read:
max((w1.T.dot(X) + b1), (w2.T.dot(X) + b2))
Is there a mental representation I could use to visualize this better?
I know this is late but I am gonna answer anyway.
Here you can check out the video by the author of maxout networks Ian Goodfellow, and here is the URL for the slides used in the video.
Below is the screenshot of the definition of Maxout Networks:
Click Here
So it turns out that you are absolutely correct. For each neuron, you create twice weights and twice biases. And if you want more, then you can create n weights and n biases for each neuron and then select with the max value, and do the same with all the neurons in the layer.

Questions about Q-Learning using Neural Networks

I have implemented Q-Learning as described in,
http://web.cs.swarthmore.edu/~meeden/cs81/s12/papers/MarkStevePaper.pdf
In order to approx. Q(S,A) I use a neural network structure like the following,
Activation sigmoid
Inputs, number of inputs + 1 for Action neurons (All Inputs Scaled 0-1)
Outputs, single output. Q-Value
N number of M Hidden Layers.
Exploration method random 0 < rand() < propExplore
At each learning iteration using the following formula,
I calculate a Q-Target value then calculate an error using,
error = QTarget - LastQValueReturnedFromNN
and back propagate the error through the neural network.
Q1, Am I on the right track? I have seen some papers that implement a NN with one output neuron for each action.
Q2, My reward function returns a number between -1 and 1. Is it ok to return a number between -1 and 1 when the activation function is sigmoid (0 1)
Q3, From my understanding of this method given enough training instances it should be quarantined to find an optimal policy wight? When training for XOR sometimes it learns it after 2k iterations sometimes it won't learn even after 40k 50k iterations.
Q1. It is more efficient if you put all action neurons in the output. A single forward pass will give you all the q-values for that state. In addition, the neural network will be able to generalize in a much better way.
Q2. Sigmoid is typically used for classification. While you can use sigmoid in other layers, I would not use it in the last one.
Q3. Well.. Q-learning with neural networks is famous for not always converging. Have a look at DQN (deepmind). What they do is solving two important issues. They decorrelate the training data by using memory replay. Stochastic gradient descent doesn't like when training data is given in order. Second, they bootstrap using old weights. That way they reduce non-stationary.

Artificial Neural Network R^4 to R^2 example

Given a target function f: R^4 -> R^2, can you draw me(give me an example) an Artificial Neural Network , lets say with two layers, and 3 nodes in the hidden layer.
Now, I think I understand how an ANN works when a function is like [0,1]^5 ->[0,1], but I am not quite sure how to do an example from R4 to R2.
I am new to machine learning, and it's a little bit diffult to catch up with all this concepts.
Thanks in advance.
First, you need two neurons in the output layer. Each neuron would correspond to one dimension of your output space.
Neurons in the output layer don't need an activation function that limits their values in the [0,1] interval (e.g. the logistic function). And even if you scale your output space in the interval [0,1], don't use a sigmoid function for activation.
Although your original data is not in [0,1]^4, you should do some preprocessing to scale and shift them to have mean zero and variance 1. You must apply same preprocessing to all your examples (training and test).
This should give you something to build up on.

Resources