CNN Regression on Grid - Limitation of Convolutional Neural Networks? - machine-learning

I'm working on a (high energy physics related) problem using CNNs.
For understanding the problem, let's consider these examples here.
The left-hand side is the input to the CNN, the right-hand side the desired output. So the network is supposed to cluster the input. The actual algorithm behind this clustering (i.e. how we got the desired output for training) is really complex and we want the CNN to learn this.
I've tried different CNN architectures, for example one similar to the U-net architecture (https://arxiv.org/abs/1505.04597) but also various concatenations of convolutional layers, etc.
The outputs are always really similar (for all architectures).
Here you can see some CNN predictions.
In principle the network is performing quite well, but as you can see, in most cases the CNN output consists of several filled pixels that are directly next to each other, which will never (!) happen in the true cases.
I've been using mean squared error as the loss function in all of the networks.
Do you have any suggestions how one could avoid this problem and improve the networks performance?
Or is this a general limitation to CNNs and in practice it is not possible to solve such a problem using CNNs?
Thank you very much!

My suggestion would be to split up the work. First use a U-Shaped NN to find the activations in a binary segmentation task (like in your paper) and then regress on the found activations to find their final values. In my experience this works way better than doing regression on large images, because the MSE will result in blurry outputs, as you have observed.

The CNN does not know that you wanted a sharp result. As mentioned by #Thomas, MSE tends to give you blurry result as it is the nature of that loss function. Giving a blurry result does not introduce large loss in MSE.
An easy modification would be to use L1 Loss (absolute difference instead of squared error). It has a constant gradient unlike MSE whose gradient decreases with error.
If you really wanted a sharp result, it would be easier to add a manual step -- non maximum suppression (NMS). In practice, a 3x3 box-max filter might do.

Related

Reducing pixels in large data set (sklearn)

Im currently working on a classification project but I'm in doubt about how I should start off.
Goal
Accurately classifying pictures of size 80*80 (so 6400 pixels) in the correct class (binary).
Setting
5260 training samples, 600 test samples
Question
As there are more pixels than samples, it seems logic to me to 'drop' most of the pixels and only look at the important ones before I even start working out a classification method (like SVM, KNN etc.).
Say the training data consists of X_train (predictors) and Y_train (outcomes). So far, I've tried looking at the SelectKBest() method from sklearn for feature extraction. But what would be the best way to use this method and to know how many k's I've actually got to select?
It could also be the case that I'm completely on the wrong track here, so correct me if I'm wrong or suggest an other approach to this if possible.
You are suggesting to reduce the dimension of your feature space. That is a method of regularization to reduce overfitting. You haven't mentioned overfitting is an issue so I would test that first. Here are some things I would try:
Use transfer learning. Take a pretrained network for image recognition tasks and fine tune it to your dataset. Search for transfer learning and you'll find many resources.
Train a convolutional neural network on your dataset. CNNs are the go-to method for machine learning on images. Check for overfitting.
If you want to reduce the dimensionality of your dataset, resize the image. Going from 80x80 => 40x40 will reduce the number of pixels by 4x, assuming your task doesn't depend on fine details of the image you should maintain classification performance.
There are other things you may want to consider but I would need to know more about your problem and its requirements.

Perceptron and shape recognition

I recently implemented a simple Perceptron. This type of perceptron (composed of only one neuron giving binary information in output) can only solve problems where classes can be linearly separable.
I would like to implement a simple shape recognition in images of 8 by 8 pixels. I would like for example my neural network to be able to tell me if what I drawn is a circle, or not.
How to know if this problem has classes being linearly separable ? Because there is 64 inputs, can it still be linearly separable ? Can a simple perceptron solve this kind of problem ? If not, what kind of perceptron can ? I am a bit confused about that.
Thank you !
This problem, in a general sense, can not be solved by a single layer perception. In general other network structures such as convolutional neural networks are best for solving image classification problems, however given the small size of your images a multilayer perception may be sufficient.
Most problems are linearly separable, but not necessarily in 2 dimensions. Adding extra layers to a network allows it to transform data in higher dimensions so that it is linearly separable.
Look into multilayer perceptrons or convolutional neural networks. Examples of classification on the MNIST dataset might be helpful as well.

Regularization on indefinitely large training set?

I have an indefinitely large training set to train a neural network.
Does it make any sense in this scenario to use regularization techniques like dropout?
Yes, it probably still does. Dropout is regularization in a sense, but much subtler than something like L1 norm. It prevents excessive co-adaptation of feature detectors as described in the original paper.
You probably don't want the network to learn to depend on just one feature or just a small combo of features, even if that is the best feature in your training set, because it may not be the case in new data. Intuitively, a network with dropout trained to recognize people in images will likely still recognize them if the face is obscured, even if there was no example image like that in the training set (because the face high level feature would have been dropped out some fraction of the time); a network trained without dropout may not (because the face feature is probably one of the best single features for detecting people). You can think of dropout as a certain degree of forced concept generalization.
Empirically, the feature detectors that are produced with dropout are much more structured (eg, for images: closer to Gabor filters, for the first few layers) when dropout is used; without dropout they are closer to random (probably because that network approximates the Gabor filter it is converging towwards using a specific linear combo of random filters, if it can rely on the elements of that combo not being dropped out then there is no gradient towards separating the filters). This is also probably a good thing since it forces features which are independent to be implemented as independent early on, which may result in lower cross-talk later on.

Making neural net to draw an image (aka Google's inceptionism) using nolearn\lasagne

Probably lots of people already saw this article by Google research:
http://googleresearch.blogspot.ru/2015/06/inceptionism-going-deeper-into-neural.html
It describes how Google team have made neural networks to actually draw pictures, like an artificial artist :)
I wanted to do something similar just to see how it works and maybe use it in future to better understand what makes my network to fail. The question is - how to achieve it with nolearn\lasagne (or maybe pybrain - it will also work but I prefer nolearn).
To be more specific, guys from Google have trained an ANN with some architecture to classify images (for example, to classify which fish is on a photo). Fine, suppose I have an ANN constructed in nolearn with some architecture and I have trained to some degree. But... What to do next? I don't get it from their article. It doesn't seem that they just visualize the weights of some specific layers. It seems to me (maybe I am wrong) like they do one of 2 things:
1) Feed some existing image or purely a random noise to the trained network and visualize the activation of one of the neuron layers. But - looks like it is not fully true, since if they used convolution neural network the dimensionality of the layers might be lower then the dimensionality of original image
2) Or they feed random noise to the trained ANN, get its intermediate output from one of the middlelayers and feed it back into the network - to get some kind of a loop and inspect what neural networks layers think might be out there in the random noise. But again, I might be wrong due to the same dimensionality issue as in #1
So... Any thoughts on that? How we could do the similar stuff as Google did in original article using nolearn or pybrain?
From their ipython notebook on github:
Making the "dream" images is very simple. Essentially it is just a
gradient ascent process that tries to maximize the L2 norm of
activations of a particular DNN layer. Here are a few simple tricks
that we found useful for getting good images:
offset image by a random jitter
normalize the magnitude of gradient
ascent steps apply ascent across multiple scales (octaves)
It is done using a convolutional neural network, which you are correct that the dimensions of the activations will be smaller than the original image, but this isn't a problem.
You change the image with iterations of forward/backward propagation just how you would normally train a network. On the forward pass, you only need to go until you reach the particular layer you want to work with. Then on the backward pass, you are propagating back to the inputs of the network instead of the weights.
So instead of finding the gradients to the weights with respect to a loss function, you are finding gradients to inputs with respect to the l2 Normalization of a certain set of activations.

Multilayer perceptron for ocr works with only some data sets

NEW DEVELOPMENT
I recently used OpenCV's MLP implementation to test whether it could solve the same tasks. OpenCV was able to classify the same data sets that my implementation was able to, but unable to solve the one's that mine could not. Maybe this is due to termination parameters (determining when to end training). I stopped before 100,000 iterations, and the MLP did not generalize. This time the network architecture was 400 input neurons, 10 hidden neurons, and 2 output neurons.
I have implemented the multilayer perceptron algorithm, and verified that it works with the XOR logic gate. For OCR I taught the network to correctly classify letters of "A"s and "B"s that have been drawn with a thick drawing untensil (a marker). However when I try to teach the network to classify a thin drawing untensil (a pencil) the network seems to become stuck in a valley and unable to classify the letters in a reasonable amount of time. The same goes for letters I drew with GIMP.
I know people say we have to use momentum to get out of the valley, but the sources I read were vague. I tried increasing a momentum value when the change in error was insignificant and decreasing when above, but it did not seem to help.
My network architecture is 400 input neurons (one for each pixel), 2 hidden layers with 25 neurons each, and 2 neurons in the output layer. The images are gray scale images and the inputs are -0.5 for a black pixel and 0.5 for a white pixel.
EDIT:
Currently the network is trainning until the calculated error for each trainning example falls below an accepted error constant. I have also tried stopping trainning at 10,000 epochs, but this yields bad predictions. The activation function used is the sigmoid logistic function. The error function I am using is the sum of the squared error.
I suppose I may have reached a local minimum rather than a valley, but this should not happen repeatedly.
Momentum is not always good, it can help the model to jump out of the a bad valley but may also make the model to jump out of a good valley. Especially when the previous weights update directions is not good.
There are several reasons that make your model not work well.
The parameter are not well set, it is always a non-trivial task to set the parameters of the MLP.
An easy way is to first set the learning rate, momentum weight and regularization weight to a big number, but to set the iteration (or epoch) to a very large weight. Once the model diverge, half the learning rate, momentum weight and regularization weight.
This approach can make the model to slowly converge to a local optimal, and also give the chance for it to jump out a bad valley.
Moreover, in my opinion, one output neuron is enough for two class problem. There is no need to increase the complexity of the model if it is not necessary. Similarly, if possible, use a three-layer MLP instead of a four-layer MLP.

Resources