Which layers in a neural network use activation functions? - machine-learning

Does the input layer of a neural network use activation functions, or is it just the hidden and output layers?

Here's a figure of a 3-layer neural network from Stanford's class on NN for visual recognition.
The two hidden layers will always have an activation function. The output layer may or may not have an activation function: for binary classification, you might have a sigmoid function to squash your output, but for regression, you typically will not have an activation function.
For clarity, the hidden layers compute:
output = activation_function(W x inputs + b)
As you probably know, the activation_function() may be a sigmoid, tanh, ReLU, or other.

Hidden and output layer neurons possess activation functions, but input layer neurons do not, input layer just gets input and multiply it with unique weights. Activation functions perform a transformation on the input received.

Related

Get original Image from flattened tensor/numpy array

In a Convolution Neural Network can we get the original image matrix (i.e. the input layer) from the first flattened layer? According to my primary understanding, it is not possible to get back the image but is there any way we could?
If you pass an input through a neural network, and retain the activations in some later layer of the network, it generally won't be possible to use those activations alone to recover the input. That is, for some input X, and the function f that maps the input to the activity in the later layer, f is typically not invertible. For an ordinary convolutional neural network, f would be the composition of convolutions, pooling operations, non-linear activation functions, and matrix multiplications.
Procedures exist that can recover an input X' for which f(X') is close to f(X).
Feature Visualization on distill.pub covers relevant techniques.

Neural Networks, Linear and Logistic Regression

Are Logistic and linear regressions special cases of a neural network ?
Please indicate if I can take this statement as correct.
A neural network can be configured to perform logistic regression or linear regression.
In either case, the neural network has exactly one trainable layer (the output layer), and that layer has exactly one neuron (the operator performing the W * x + b affine calculation and the activation). They differ in their activation function.
For logistic regression, there is a sigmoid activation function at the output layer, producing a floating point number in the range [0.0, 1.0]. You can make a binary decision by applying a threshold of 0.5 to the value.
For linear regression, there is typically no activation function at the output layer, so you get an unbounded floating point number.
In general, you can add hidden layers into your neural network (to add nonlinearity and more learning capacity) and still perform binary classification and regression so long as the output layer activation is configured as written above.

Neural Network Developing

I am try to write a neural network class but I don't fully understand some aspects of it. I have two questions on the folling design.
Am I doing this correctly? Does the bias neuron need to connect to all of neurons (except those in the input layer) or just those in the hidden layer?
My second question is about calculation the output value. I'm using the equation below to calculate the output value of the neurons.
HiddenLayerFirstNeuron.Value =
(input1.Value * weight) + (input2.Value * weight) + (Bias.Value * weight)
After this equation, I'm calculating the activation and the result send the output. And output neurons doing same.
I'm not sure what I am do and I want to clear up problems.
Take a look at: http://deeplearning.net/tutorial/contents.html in theano. This explains everything you need to know for multi layer perceptron using theano (symbolic mathematic library).
The bias is usually connected to all hidden and output units.
Yes, you compute the input of activation function like summation of weight*output of previous layer neuron.
Good luck with development ;)
There should be a separate bias neuron for each hidden and the output layer. Think of the layers as a function applied to a first order polynomials such as f(m*x+b)=y where y is your output and f(x) your activation function. If you look at the the linear term you will recognize the b. This represents the bias and it behaves similar with neural network as with this simplification: It shifts the hyperplane up and down the in the space. Keep in mind that you will have one bias per layer connected to all neurons of that layer f((wi*xi+b)+...+(wn*xn+b)) with an initial value of 1. When it comes to gradient descent, you will have to train this neuron like a normal weight.
In my opinion should you apply the activation function to the output layer as well. This is how it's usually done with multilayer perceptrons. But it actually depends of what you want. If you, for example, use the logistic function as activation function and you want an output in the interval (0,1), then you have to apply your activation function to the output as well. Since a basic linear combination, as it is in your example, can theoretically go above the boundaries of the previously mentioned Intervall.

Neural network (non) linearity

I am somewhat confused by the use of the term linear/non-linear when discussing neural networks. Can anyone clarify these 3 points for me:
Each node in a neural net is the weighted sum of inputs. This is a linear combination of inputs. So the value for each node (ignoring activation) is given by some linear function. I hear that neural nets are universal function approximators. Does this mean that, despite containing linear functions within each node, the total network is able to approximate a non-linear function as well? Are there any clear examples of how this works in practise?
An activation function is applied to the output of that node to squash/transform the output for further propagation through the rest of the network. Am I correct in interpreting this output from the activation function as the "strength" of that node?
Activation functions are also referred to as nonlinear functions. Where does the term non-linear come from? Because the input into activation is the result of linear combination of inputs into the node. I assume it's referring to the idea that something like the sigmoid function is a non-linear function? Why does it matter that the activation is non-linear?
1 Linearity
A neural network is only non-linear if you squash the output signal from the nodes with a non-linear activation function. A complete neural network (with non-linear activation functions) is an arbitrary function approximator.
Bonus: It should be noted that if you are using linear activation functions in multiple consecutive layers, you could just as well have pruned them down to a single layer due to them being linear. (The weights would be changed to more extreme values). Creating a network with multiple layers using linear activation functions would not be able to model more complicated functions than a network with a single layer.
2 Activation signal
Interpreting the squashed output signal could very well be interpreted as the strength of this signal (biologically speaking). Thought it might be incorrect to interpret the output strength as an equivalent of confidence as in fuzzy logic.
3 Non-linear activation functions
Yes, you are spot on. The input signals along with their respective weights are a linear combination. The non-linearity comes from your selection of activation functions. Remember that a linear function is drawn as a line - sigmoid, tanh, ReLU and so on may not be drawn with a single straight line.
Why do we need non-linear activation functions?
Most functions and classification tasks are probably best described by non-linear functions. If we decided to use linear activation functions we would end up with a much coarser approximation on a complex function.
Universal approximators
You can sometimes read in papers that neural networks are universal approximators. This implies that a "perfect" network could be fitted to any model/function you could throw at it, though configuring the perfect network (#nodes and #layers ++) is a non-trivial task.
Read more about the implications at this Wikipedia page.

Artificial Neural Network R^4 to R^2 example

Given a target function f: R^4 -> R^2, can you draw me(give me an example) an Artificial Neural Network , lets say with two layers, and 3 nodes in the hidden layer.
Now, I think I understand how an ANN works when a function is like [0,1]^5 ->[0,1], but I am not quite sure how to do an example from R4 to R2.
I am new to machine learning, and it's a little bit diffult to catch up with all this concepts.
Thanks in advance.
First, you need two neurons in the output layer. Each neuron would correspond to one dimension of your output space.
Neurons in the output layer don't need an activation function that limits their values in the [0,1] interval (e.g. the logistic function). And even if you scale your output space in the interval [0,1], don't use a sigmoid function for activation.
Although your original data is not in [0,1]^4, you should do some preprocessing to scale and shift them to have mean zero and variance 1. You must apply same preprocessing to all your examples (training and test).
This should give you something to build up on.

Resources