Get original Image from flattened tensor/numpy array - machine-learning

In a Convolution Neural Network can we get the original image matrix (i.e. the input layer) from the first flattened layer? According to my primary understanding, it is not possible to get back the image but is there any way we could?

If you pass an input through a neural network, and retain the activations in some later layer of the network, it generally won't be possible to use those activations alone to recover the input. That is, for some input X, and the function f that maps the input to the activity in the later layer, f is typically not invertible. For an ordinary convolutional neural network, f would be the composition of convolutions, pooling operations, non-linear activation functions, and matrix multiplications.
Procedures exist that can recover an input X' for which f(X') is close to f(X).
Feature Visualization on distill.pub covers relevant techniques.

Related

How do Convolutional neural networks proceed after the pooling step?

I am trying to learn about convolutional neural networks, but i am having trouble understanding what happens to neural networks after the pooling step.
So starting from the left we have our 28x28 matrix representing our picture. We apply a three 5x5 filters to it to get three 24x24 feature maps. We then apply max pooling to each 2x2 square feature map to get three 12x12 pooled layers. I understand everything up to this step.
But what happens now? The document I am reading says:
"The final layer of connections in the network is a fully-connected
layer. That is, this layer connects every neuron from the max-pooled
layer to every one of the 10 output neurons. "
The text did not go further into describing what happens beyond that and it left me with a few questions.
How are the three pooled layers mapped to the 10 output neurons? By fully connected, does it mean each neuron in every one of the three layers of the 12x12 pooled layers has a weight connecting it to the output layer? So there are 3x12x12x10 weights linking from the pooled layer to the output layer? Is an activation function still taken at the output neuron?
Pictures and extract taken from this online resource: http://neuralnetworksanddeeplearning.com/chap6.html
Essentially, the fully connected layer provides the main way for the neural network to make a prediction. If you have ten classes, then a fully connected layer consists of ten neurons, each with a different probability as to the likelihood of the classified sample belonging to that class (each neuron represents a class). These probabilities are determined by the hidden layers and convolution. The pooling layer is simply outputted into these ten neurons, providing the final interface for your network to make the prediction. Here's an example. After pooling, your fully connected layer could display this:
(0.1)
(0.01)
(0.2)
(0.9)
(0.2)
(0.1)
(0.1)
(0.1)
(0.1)
(0.1)
Where each neuron contains a probability that the sample belongs to that class. In this, case, if you are classifying images of handwritten digits and each neuron corresponds to a prediction that the image is 1-10, then the prediction would be 4. Hope that helps!
Yes, you're on the right track. There is a layer with a weight matrix of 4320 entries.
This matrix will be typically arranged as 432x10. This is because these 432 number are a fixed-size representation of the input image. At this point, you don't care about how you got it -- CNN, plain feed-forward or a crazy RNN going pixel-by-pixel, you just want to turn the description into classifaction. In most toolkits (e.g. TensorFlow, PyTorch or even plain numpy), you'll need to explicitly reshape the 3x12x12 output of the pooling into a 432-long vector. But that's just a rearrangement, the individual elements do not change.
Additionally, there will usually be a 10-long vector of biases, one for every output element.
Finally about the nonlinearity: Since this is about classification, you typically want the output 10 units to represent posterior probabilities that the input belongs to a particular class (digit). For this purpose, the softmax function is used: y = exp(o) / sum(exp(o)), where exp(o) stands for an element-wise exponentiation. It guarantees that it's output will be a proper categorical distribution, all elements in <0; 1> and summing up to 1. There is a nice a detailed discussion of softmax in neural networks in the Deep Learning book (I recommend reading Section 6.2.1 in addition to the softmax Sub-subsection itself.)
Also note that this is not specific to convolutional networks at all, you'll find this block fully connected layer -- softmax at the end of virtually every classification network. You can also view this block as the actual classifier, while anything in front of it (a shallow CNN in your case) is just trying to prepare nice features.

Which layers in a neural network use activation functions?

Does the input layer of a neural network use activation functions, or is it just the hidden and output layers?
Here's a figure of a 3-layer neural network from Stanford's class on NN for visual recognition.
The two hidden layers will always have an activation function. The output layer may or may not have an activation function: for binary classification, you might have a sigmoid function to squash your output, but for regression, you typically will not have an activation function.
For clarity, the hidden layers compute:
output = activation_function(W x inputs + b)
As you probably know, the activation_function() may be a sigmoid, tanh, ReLU, or other.
Hidden and output layer neurons possess activation functions, but input layer neurons do not, input layer just gets input and multiply it with unique weights. Activation functions perform a transformation on the input received.

Neural Network Developing

I am try to write a neural network class but I don't fully understand some aspects of it. I have two questions on the folling design.
Am I doing this correctly? Does the bias neuron need to connect to all of neurons (except those in the input layer) or just those in the hidden layer?
My second question is about calculation the output value. I'm using the equation below to calculate the output value of the neurons.
HiddenLayerFirstNeuron.Value =
(input1.Value * weight) + (input2.Value * weight) + (Bias.Value * weight)
After this equation, I'm calculating the activation and the result send the output. And output neurons doing same.
I'm not sure what I am do and I want to clear up problems.
Take a look at: http://deeplearning.net/tutorial/contents.html in theano. This explains everything you need to know for multi layer perceptron using theano (symbolic mathematic library).
The bias is usually connected to all hidden and output units.
Yes, you compute the input of activation function like summation of weight*output of previous layer neuron.
Good luck with development ;)
There should be a separate bias neuron for each hidden and the output layer. Think of the layers as a function applied to a first order polynomials such as f(m*x+b)=y where y is your output and f(x) your activation function. If you look at the the linear term you will recognize the b. This represents the bias and it behaves similar with neural network as with this simplification: It shifts the hyperplane up and down the in the space. Keep in mind that you will have one bias per layer connected to all neurons of that layer f((wi*xi+b)+...+(wn*xn+b)) with an initial value of 1. When it comes to gradient descent, you will have to train this neuron like a normal weight.
In my opinion should you apply the activation function to the output layer as well. This is how it's usually done with multilayer perceptrons. But it actually depends of what you want. If you, for example, use the logistic function as activation function and you want an output in the interval (0,1), then you have to apply your activation function to the output as well. Since a basic linear combination, as it is in your example, can theoretically go above the boundaries of the previously mentioned Intervall.

Artificial Neural Network R^4 to R^2 example

Given a target function f: R^4 -> R^2, can you draw me(give me an example) an Artificial Neural Network , lets say with two layers, and 3 nodes in the hidden layer.
Now, I think I understand how an ANN works when a function is like [0,1]^5 ->[0,1], but I am not quite sure how to do an example from R4 to R2.
I am new to machine learning, and it's a little bit diffult to catch up with all this concepts.
Thanks in advance.
First, you need two neurons in the output layer. Each neuron would correspond to one dimension of your output space.
Neurons in the output layer don't need an activation function that limits their values in the [0,1] interval (e.g. the logistic function). And even if you scale your output space in the interval [0,1], don't use a sigmoid function for activation.
Although your original data is not in [0,1]^4, you should do some preprocessing to scale and shift them to have mean zero and variance 1. You must apply same preprocessing to all your examples (training and test).
This should give you something to build up on.

Graphically, how does the non-linear activation function project the input onto the classification space?

I am finding a very hard time to visualize how the activation function actually manages to classify non-linearly separable training data sets.
Why does the activation function (e.g tanh function) work for non-linear cases? What exactly happens mathematically when the activation function projects the input to output? What separates training samples of different classes, and how does this work if one had to plot this process graphically?
I've tried looking for numerous sources, but what exactly makes the activation function actually work for classifying training samples in a neural network, I just cannot grasp easily and would like to be able to picture this in my mind.
Mathematical result behind neural networks is Universal Approximation Theorem. Basically, sigmoidal functions (those which saturate on both ends, like tanh) are smooth almost-piecewise-constant approximators. The more neurons you have – the better your approximation is.
This picture was taked from this article: A visual proof that neural nets can compute any function. Make sure to check that article, it has other examples and interactive applets.
NNs actually, at each level, create new features by distorting input space. Non-linear functions allow you to change "curvature" of target function, so further layers have chance to make it linear-separable. If there were no non-linear functions, any combination of linear function is still linear, thus no benefit from multi-layerness. As a graphical example consider
this animation
This pictures where taken from this article. Also check out that cool visualization applet.
Activation functions have very little to do with classifying non-linearly separable sets of data.
Activation functions are used as a way to normalize signals at every step in your neural network. They typically have an infinite domain and a finite range. Tanh, for example, has a domain of (-∞,∞) and a range of (-1,1). The sigmoid function maps the same domain to (0,1).
You can think of this as a way of enforcing equality across all of your learned features at a given neural layer (a.k.a. feature scaling). Since the input domain is not known before hand it's not as simple as regular feature scaling (for linear regression) and thusly activation functions must be used. The effects of the activation function are compensated for when computing errors during back-propagation.
Back-propagation is a process that applies error to the neural network. You can think of this as a positive reward for the neurons that contributed to the correct classification and a negative reward for the neurons that contributed to an incorrect classification. This contribution is often known as the gradient of the neural network. The gradient is, effectively, a multi-variable derivative.
When back-propagating the error, each individual neuron's contribution to the gradient is the activations function's derivative at the input value for that neuron. Sigmoid is a particularly interesting function because its derivative is extremely cheap to compute. Specifically s'(x) = 1 - s(x); it was designed this way.
Here is an example image (found by google image searching: neural network classification) that demonstrates how a neural network might be superimposed on top of your data set:
I hope that gives you a relatively clear idea of how neural networks might classify non-linearly separable datasets.

Resources