What is learned in convolutional network - machine-learning

In a convolutional net (CNN), someone answered to me than filters are initialized randomly.
I'm ok for this, but, when there is the gradient descent, who is learning? The features maps, or the filters ?
My intuition is the filters are learning, because they need to recognize complex things.
But I would like to be sure about this.

In the context of convolutional neural networks, kernel = filter = feature detector.
Here is a great illustration from Stanford's deep learning tutorial (also nicely explained by Denny Britz).
The filter is the yellow sliding window, and its value is:
The feature map is the pink matrix. Its value depends on both the filter and the image: as a result, it doesn't make sense to learn the feature map. Only the filter is learnt when the network is trained. The network may have other weights to be trained as well.

As aleju said, filters weights are learned. Feature maps are outputs of the convolutional layers. Besides convolutional filter weights, there are also weights of fully connected (and other types) layers.

Related

How are filters in a convolutional neural network trained through backpropagation?

I'm trying to implement a convolutional neural network from scratch. The problem is that I don't understand how filters are learned in CNNs.
I implemented a feedforward neural network from scratch before and I understand how backpropagation works for them. I also understand the basic CNN architecture. But how to compute the updated filters?
I don't want to use libraries like tensorflow because I want to understand the concepts behind all of this.
The backpropagation through CNN is principally the same as through the feed-forward layer. You can imagine the CNN as a sliding window applying the same feedforward layer on every window of your the input. (You just take all values which are in the window and arrange them in a single long vector.)
You can compute the gradients of the parameters independently for each window. Because you apply the same parameters in each window, you can sum the gradients from each window and use the gradients to update and use them to update the parameters of the filter.
P.S. It might be a good exercise to write the back-propagation yourself, but you hardly can reach the efficiency of the frameworks.

Back Propagation in Convolutional Neural Networks and how to update filters

Im learning about Convolutional Neural Networks and right now i'm confused about how to implement it.
I know about regular neural networks and concepts like Gradient Descent and Back Propagation, And i can understand how CNN's how works intuitively.
My question is about Back Propagation in CNN's. How it happens? The last fully connected layers is the regular Neural Networks and there is no problem about that. But how i can update filters in convolution layers? How I can Back Propagate error from fully connected layers to these filters? My problem is updating Filters!
Filters are only simple matrixes? Or they have structures like regular NN's and connections between layers simulates that capability? I read about Sparse Connectivity and Shared Weights but I cant relate them to CNN's. Im really confused about implementing CNN's and i cant find any tutorials that talks about these concept. I can't read Papers because I'm new to these things and my Math is not good.
i dont want to use TensorFlow or tools like this, Im learning the main concept and using pure Python.
First off, I can recommend this introduction to CNNs. Maybe you can grasp the idea of it better with this.
To answer some of your questions in short:
Let's say you want to use a CNN for image classification. The picture consists of NxM pixels and has 3 channels (RBG). To apply a convolutional layer on it, you use a filter. Filters are matrices of (usually, but not necessarily) quadratic shape (e. g. PxP) and a number of channels that equals the number of channels of the representation it is applied on. Therefore, the first Conv layer filter has also 3 channels. Channels are the number of layers of the filter, so to speak.
When applying a filter to a picture, you do something called discrete convolution. You take your filter (which is usually smaller than your image) and slide it over the picture step by step, and calculate the convolution. This basically is a matrix multiplication. Then you apply a activation function on it and maybe even a pooling layer. Important to note is that the filter for all performed convolutions on this layer stays the same, so you only have P*P parameters per layer. You tweak the filter in a way, so that it fits the training data as well as possible. That's why its parameters are called shared weights. When applying GD, you simply have to apply it on said filter weights.
Also, you can find a nice demo for the convolutions here.
Implementing these things are certainly possible, but for starting out you could try out tensorflow for experimenting. At least that's the way I learn new concepts :)

Perceptron and shape recognition

I recently implemented a simple Perceptron. This type of perceptron (composed of only one neuron giving binary information in output) can only solve problems where classes can be linearly separable.
I would like to implement a simple shape recognition in images of 8 by 8 pixels. I would like for example my neural network to be able to tell me if what I drawn is a circle, or not.
How to know if this problem has classes being linearly separable ? Because there is 64 inputs, can it still be linearly separable ? Can a simple perceptron solve this kind of problem ? If not, what kind of perceptron can ? I am a bit confused about that.
Thank you !
This problem, in a general sense, can not be solved by a single layer perception. In general other network structures such as convolutional neural networks are best for solving image classification problems, however given the small size of your images a multilayer perception may be sufficient.
Most problems are linearly separable, but not necessarily in 2 dimensions. Adding extra layers to a network allows it to transform data in higher dimensions so that it is linearly separable.
Look into multilayer perceptrons or convolutional neural networks. Examples of classification on the MNIST dataset might be helpful as well.

Applying Neural network for doing Image recognition

How is image recognition done by neural network after doing canny edge detection of the image? I don't seek for the code, I want to know how neural networks actually work in order to match similarity of the image from a set of images.
What should be considered in input layer, hidden layers, etc.?
This question is really wide. The main reason why neural networks are doing so great job in this issue is taking advantage of some intrinsic image properties and invariances as well as computational advances which makes this issue possible to deal with :
Hierarchical structure : Faces consists of eyes, mouth, ears, etc. Eyes consists of a certain set of shapes, which consist of certain kind of edges, lines.. etc. There is a certain hierarchy of different shapes, structures etc. which are used for image recognition - and this is why deep stacked neural networks are so good in dealing with this task - this hierarchy is coded in a structure of neural network.
Geometrical invariances : If you move an image of a car from a left corner to a right corner - you will still have an image of a car. This property is a reason of success of a certain kind of neural networks - convolutional ones. This kind of ANN topologies makes use of this invariances making learning so easy and powerful.
Increased computational power : Today's convolutional neural networks are designed in a way which makes computations very easy to do in a parallel way. Also modern GPU's architecture makes learning really fast - sometimes up to 10x faster than classical CPU implementations.
You can read a detailed explaination here.
You question is really very broad. Also, from this line of your question, How is image recognition done by neural network after doing canny edge detection of the image?, it can be inferred that you are new to neural networks and deep learning. Neural networks do not specifically perform canny edge detection.
I recommend that before jumping into convolutional neural networks (CNNs), you understand some basics of neural networks. This way you will be able to appreciate CNN concepts later. CS231n course could be a very good starting point for you, as basics of neural networks are also covered in that course.
It is really hard to write any specific answer for your broad question. Let me know if you have some specific questions.

Making neural net to draw an image (aka Google's inceptionism) using nolearn\lasagne

Probably lots of people already saw this article by Google research:
http://googleresearch.blogspot.ru/2015/06/inceptionism-going-deeper-into-neural.html
It describes how Google team have made neural networks to actually draw pictures, like an artificial artist :)
I wanted to do something similar just to see how it works and maybe use it in future to better understand what makes my network to fail. The question is - how to achieve it with nolearn\lasagne (or maybe pybrain - it will also work but I prefer nolearn).
To be more specific, guys from Google have trained an ANN with some architecture to classify images (for example, to classify which fish is on a photo). Fine, suppose I have an ANN constructed in nolearn with some architecture and I have trained to some degree. But... What to do next? I don't get it from their article. It doesn't seem that they just visualize the weights of some specific layers. It seems to me (maybe I am wrong) like they do one of 2 things:
1) Feed some existing image or purely a random noise to the trained network and visualize the activation of one of the neuron layers. But - looks like it is not fully true, since if they used convolution neural network the dimensionality of the layers might be lower then the dimensionality of original image
2) Or they feed random noise to the trained ANN, get its intermediate output from one of the middlelayers and feed it back into the network - to get some kind of a loop and inspect what neural networks layers think might be out there in the random noise. But again, I might be wrong due to the same dimensionality issue as in #1
So... Any thoughts on that? How we could do the similar stuff as Google did in original article using nolearn or pybrain?
From their ipython notebook on github:
Making the "dream" images is very simple. Essentially it is just a
gradient ascent process that tries to maximize the L2 norm of
activations of a particular DNN layer. Here are a few simple tricks
that we found useful for getting good images:
offset image by a random jitter
normalize the magnitude of gradient
ascent steps apply ascent across multiple scales (octaves)
It is done using a convolutional neural network, which you are correct that the dimensions of the activations will be smaller than the original image, but this isn't a problem.
You change the image with iterations of forward/backward propagation just how you would normally train a network. On the forward pass, you only need to go until you reach the particular layer you want to work with. Then on the backward pass, you are propagating back to the inputs of the network instead of the weights.
So instead of finding the gradients to the weights with respect to a loss function, you are finding gradients to inputs with respect to the l2 Normalization of a certain set of activations.

Resources