Variable Input in Feed-Forward Neural Networks - machine-learning

What are the most common strategies for having variable-length input in a feed-forward neural network?
To be more specific, consider the following hypothetical scenario:
I've got a car with four sensors, two on the left (proximity and color) and two on the right (also proximity and color).
There are two actuators (suppose left and right).
I've successfully trained a neural network to correlate two sets of inputs (4 neurons proximity/color) over the set of outputs (2 neurons for direction).
Now the question is, how do I scale it for:
A fixed upper-bound of same type sensors/actuators (say, 50); or even
An arbitrary amount of sensors/actuators?
P.S.: My gut-feeling is that I would need a form of making neural-networks to compose, but I don't have the slightest idea of where to start.

The simple solution is to always build vectors of some fixed, maximum number of features, and leave the inactive ones at a default value. The sensible default value is usually zero, esp. if you scale your inputs to the range [-1, 1].

Related

In a feedforward neural network, am I able to put in a feature input of "don't care"?

I've created a feedforward neural network using DL4J in Java.
Hypothetically and to keep things simple, assume this neural network is a binary classifier of squares and circles.
The input, a feature vector, would be composed of say... 5 different variables:
[number_of_corners,
number_of_edges,
area,
height,
width]
Now so far, my binary classifier can tell the two shapes apart quite well as I'm giving it a complete feature vector.
My question: is it possible to input only maybe 2 or 3 of these features? Or even 1? I understand results will be less accurate while doing so, I just need to be able to do so.
If it is possible, how?
How would I do it for a neural network with 213 different features in the input vector?
Let's assume, for example, that you know the area, height, and width features (so you don't know the number_of_corners and number_of_edges features).
If you know that a shape can have, say, a maximum of 10 corners and 10 edges, you could input 10 feature vectors with the same area, height and width but where each vector has a different value for the number_of_corners and number_of_edges features. Then you can just average over the 10 outputs of the network and round to the nearest integer (so that you still get a binary value).
Similarly, if you only know the area feature you could average over the outputs of the network given several random combinations of input values, where the only fixed value is the area and all the others vary. (I.e. the area feature is the same for each vector but every other feature has a random value.)
This may be a "trick" but I think that the average will converge to a value as you increase the number of (almost-)random vectors.
Edit
My solution would not be a good choice if you have a lot of features. In this case you could try to use maybe a Deep Belief Network or some autoencoder to infer the values of the other features given a small number of them. For example, a DBN can "reconstruct" a noisy output (if you train it enough, of course); you could then try to give the reconstructed input vector to your feed-forward network.

Neural network, minimum number of neurons

I've got a 2D surface where a ship (with constant speed) navigates around the scene to pick up candy. For every candy the ship picks up I increase the fitness. The NN has one output to steer the ship (0 for left and 1 for right, so 0.5 would be straight forward) There are four inputs in the range [-1 .. 1], that represents two normalized vectors. The ship direction and the direction to the piece of candy.
Is there any way to calculate the minimum number of neurons in the hidden layer? I also tried giving two inputs instead of four, the first was the dot product [-1..1] (where I dotted the ship direction with the direction to the candy) and the second was (0/1) if the candy was to the left/right of the ship. It seems like this approach worked a lot better with fewer neurons in the hidden layer.
Fewer inputs should imply fewer number of neurons. This is because the number of input combinations decrease and it gets easier for the neural network to learn the system. There is no golden rule as to how to calculate the best number of nodes in the hidden layer. However, with 2 inputs I'd say 2 hidden nodes should work fine. It really depends on the degree of non linearity in your inputs.
Defining the number of hidden layers and the number of neurons in each hidden layers always was a challenge and it may diverge from each type of problems. By the way, a single hidden layer in a feedforward neural network could solve most of the problems, given it can aproximate functions.
Murata defined some rules to use in neural networks to define the number of hidden neurons in a feedforward neural network:
The value should be between the size of the input and output layers.
The value should be 2/3 the size of the input layer plus the size of the output layer.
The value should be less than twice the size of the input layer
You could try these rules and evaluate the impact of it in your neural network.

Using an ANN to calculate a position vector's length and the angle between it and the x-axis

I'm new to neural networks and trying to get the hang of it by solving the following task:
Given a semi circle which defines an area above the x-axis, I would like to teach an ANN to output the length of a vector pointing to any position within that area. In addition, I would also like to know the angle between it and the x-axis.
I thought of this as a classical example of supervised learning and used Backpropagation to train a feed-forward network. The network is built by two Input-, two Output-, and variable amount of Hidden-neurons organised in a variable amount of hidden layers.
My training data is a random and unsorted sample of points within that area and the respective desired values. The coordinates of the points serve as the input of the net while I use the calculated values to minimise the error.
However, even after thousands of training iterations and empirical changes of the networks topology, I am unable to produce results with an error below ~0.2 (Radius: 20.0, Topology: 2/4/2).
Are there any obvious pitfalls I'm failing to see or does the chosen approach just not fit the task? Which other network types and/or learning techniques could be used to complete the task?
I wouldn't use variable amounts of hidden layers, I would use just one.
Then, I wouldn't use two output neurons, I would use two separate ANNs, one for each of the values you're after. This should do better, since your outputs aren't clearly related in my opinion.
Then, I would experiment with number of hidden neurons between 2 and 10 and different activation functions (logistic and tanh, maybe ReLUs).
After that, do you scale your data? It might be worth scaling both your inputs and outputs. Sigmoid units return small numbers, so it is good if you can adapt your outputs to be small as well (in [-1 , 1] or [0, 1]). For example, if want your angles in degrees, divide all of your targets by 360 before training the ANN on them. Then when the ANN returns a result, multiply it by 360 and see if that helps.
Finally, there are a number of ways to train your neural network. Gradient descent is the classic, but probably not the best. Better methods are conjugate gradient, BFGS etc. See here for optimizers if you're using python - even if not, they might give you an idea of what to search for in your language.

Convolutional neural networks: Aren't the central neurons over-represented in the output?

[This question is now also posed at Cross Validated]
The question in short
I'm studying convolutional neural networks, and I believe that these networks do not treat every input neuron (pixel/parameter) equivalently. Imagine we have a deep network (many layers) that applies convolution on some input image. The neurons in the "middle" of the image have many unique pathways to many deeper layer neurons, which means that a small variation in the middle neurons has a strong effect on the output. However, the neurons at the edge of the image have only 1 way (or, depending on the exact implementation, of the order of 1) pathways in which their information flows through the graph. It seems that these are "under-represented".
I am concerned about this, as this discrimination of edge neurons scales exponentially with the depth (number of layers) of the network. Even adding a max-pooling layer won't halt the exponential increase, only a full connection brings all neurons on equal footing. I'm not convinced that my reasoning is correct, though, so my questions are:
Am I right that this effect takes place in deep convolutional networks?
Is there any theory about this, has it ever been mentioned in literature?
Are there ways to overcome this effect?
Because I'm not sure if this gives sufficient information, I'll elaborate a bit more about the problem statement, and why I believe this is a concern.
More detailed explanation
Imagine we have a deep neural network that takes an image as input. Assume we apply a convolutional filter of 64x64 pixel over the image, where we shift the convolution window by 4 pixels each time. This means that every neuron in the input sends it's activation to 16x16 = 265 neurons in layer 2. Each of these neurons might send their activation to another 265, such that our topmost neuron is represented in 265^2 output neurons, and so on. This is, however, not true for neurons on the edges: these might be represented in only a small number of convolution windows, thus causing them to activate (of the order of) only 1 neuron in the next layer. Using tricks such as mirroring along the edges won't help this: the second-layer-neurons that will be projected to are still at the edges, which means that that the second-layer-neurons will be underrepresented (thus limiting the importance of our edge neurons as well). As can be seen, this discrepancy scales exponentially with the number of layers.
I have created an image to visualize the problem, which can be found here (I'm not allowed to include images in the post itself). This network has a convolution window of size 3. The numbers next to neurons indicate the number of pathways down to the deepest neuron. The image is reminiscent of Pascal's Triangle.
https://www.dropbox.com/s/7rbwv7z14j4h0jr/deep_conv_problem_stackxchange.png?dl=0
Why is this a problem?
This effect doesn't seem to be a problem at first sight: In principle, the weights should automatically adjust in such a way that the network does it's job. Moreover, the edges of an image are not that important anyway in image recognition. This effect might not be noticeable in everyday image recognition tests, but it still concerns me because of two reasons: 1) generalization to other applications, and 2) problems arising in the case of very deep networks.
1) There might be other applications, like speech or sound recognition, where it is not true that the middle-most neurons are the most important. Applying convolution is often done in this field, but I haven't been able to find any papers that mention the effect that I'm concerned with.
2) Very deep networks will notice an exponentially bad effect of the discrimination of boundary neurons, which means that central neurons can be overrepresented by multiple order of magnitude (imagine we have 10 layers such that the above example would give 265^10 ways the central neurons can project their information). As one increases the number of layers, one is bound to hit a limit where weights cannot feasibly compensate for this effect. Now imagine we perturb all neurons by a small amount. The central neurons will cause the output to change more strongly by several orders of magnitude, compared to the edge neurons. I believe that for general applications, and for very deep networks, ways around my problem should be found?
I will quote your sentences and below I will write my answers.
Am I right that this effect takes place in deep convolution networks
I think you are wrong in general but right according to your 64 by 64 sized convolution filter example. While you are structuring your convolution layer filter sizes, they would never be bigger than what you are looking for in your images. In other words - if your images are 200by200 and you convolve for 64by64 patches, you say that these 64by64 patches will learn some parts or exactly that image patch that identifies your category. The idea in the first layer is to learn edge-like partial important images not the entire cat or car itself.
Is there any theory about this, has it ever been mentioned in literature? and Are there ways to overcome this effect?
I never saw it in any paper I have looked through so far. And I do not think that this would be an issue even for very deep networks.
There is no such effect. Suppose your first layer which learned 64by64 patches is in action. If there is a patch in the top-left-most corner that would get fired(become active) then it will show up as a 1 in the next layers topmost left corner hence the information will be propagated through the network.
(not quoted) You should not think as 'a pixel is being useful in more neurons when it gets closer to center'. Think about 64x64 filter with a stride of 4:
if the pattern that your 64x64 filter look for is in the top-most-left corner of the image then it will get propagated to the next layers top most corner, otherwise there will be nothing in the next layer.
the idea is to keep meaningful parts of the image alive while suppressing the non-meaningful, dull parts, and combining these meaningful parts in following layers. In case of learning "an uppercase letter a-A" please look at only the images in the very old paper of Fukushima 1980 (http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) figure 7 and 5. Hence there is no importance of a pixel, there is importance of image patch which is the size of your convolution layer.
The central neurons will cause the output to change more strongly by several orders of magnitude, compared to the edge neurons. I believe that for general applications, and for very deep networks, ways around my problem should be found?
Suppose you are looking for a car in an image,
And suppose that in your 1st example the car is definitely in the 64by64 top-left-most part of your 200by200 image, in 2nd example the car is definitely in the 64by64 bottom-right-most part of your 200by200 image
In the second layer all your pixel values will be almost 0, for 1st image except the one in the very top-left-most corner and for 2nd image except the one in the very bottom-right-most corner.
Now, the center part of the image will mean nothing to my forward and backward propagation because the values will already be 0. But the corner values will never be discarded and will effect my learning weights.

How Sensitive Are FF Neural Networks?

CrossPost: https://stats.stackexchange.com/questions/103960/how-sensitive-are-neural-networks
I am aware of pruning, and am not sure if it removes the actual neuron or makes its weight zero, but I am asking this question as if a pruning process were not being used.
On variously sized feedforward neural networks on large datasets with lots of noise:
Is it possible one (or some trivial amount) extra OR missing hidden neurons OR hidden layers make or break a network? Or will its synapse weights simply degrade to zero if it is not necessary and compensate with the other neurons if it is missing one or two?
When experimenting, should input neurons be added one at a time or in groups of X? What is X? Increments of 5?
Lastly, should each hidden layer contain the same number of neurons? This is usually what I see in example. If not, how and why would you adjust their sizes if not relying on using pure experimentation?
I would prefer to overdo it and wait longer for a convergence than if larger networks will adapt itself to the solution. I have tried numerous configurations, but it is still difficult to gauge an optimum one.
1) Yes, absolutely. For example, if you have too less neurons in your hidden layer your model will be too simple and have high bias. Similarly, if you have too many neurons your model will overfit and have high variance. Adding more hidden layers allows you to model very complex problems like object recognition but there are a lot of tricks to make adding more hidden layers work; this is known as the field of deep learning.
2) In a single layered neural network its generally a rule of thumb to start with 2 times as many neurons as the number of inputs. You can determine the increment through binary search; i.e. run through a few different architectures and see how the accuracy changes..
3) No, definitely not - each hidden layer can contain as many neurons as you want it to contain. There is no way other can experimentation to determine their sizes; all of what you mention are hyperparameters which you must tune.
Im not sure if you are looking for a simple answer, but maybe you will be interested in a new neural network regularization technique called dropout. Dropout basically randomely "removes" some of the neurons during training forcing each of the neurons to be good feature detectors. It greatly prevents overfitting and you can go ahead and set the number of neurons to be high without worrying too much. Check this paper out for more info: http://www.cs.toronto.edu/~nitish/msc_thesis.pdf

Resources