neural network to implement binary counting - machine-learning

Recently I've implemented feed-forward multiplayer neural network and it successfully perform 3bit binary counting. Train-set includes all values from 0 0 0 to 1 1 1. (exhausted case)
However, if I train network only for fewer values, say only from 0 to 4,
it doesn't work well for the inputs that were missing from learning set (5,6,7)
So My question is should network also be able to calculate correct output for the inputs that are not included in the train set? This is confusing, as if you must provide learning set for all possible inputs what's the benefit behind neural networks?

Related

Is it better to make neural network to have hierarchical output?

I'm quite new to neural network and I recently built neural network for number classification in vehicle license plate. It has 3 layers: 1 input layer for 16*24(382 neurons) number image with 150 dpi , 1 hidden layer(199 neurons) with sigmoid activation function, 1 softmax output layer(10 neurons) for each number 0 to 9.
I'm trying to expand my neural network to also classify letters in license plate. But I'm worried if I just simply add more classes into output, for example add 10 letters into classification so total 20 classes, it would be hard for neural network to separate feature from each class. And also, I think it might cause problem when input is one of number and neural network wrongly classifies as one of letter with biggest probability, even though sum of probabilities of all number output exceeds that.
So I wonder if it is possible to build hierarchical neural network in following manner:
There are 3 neural networks: 'Item', 'Number', 'Letter'
'Item' neural network classifies whether input is numbers or letters.
If 'Item' neural network classifies input as numbers(letters), then input goes through 'Number'('Letter') neural network.
Return final output from Number(Letter) neural network.
And learning mechanism for each network is below:
'Item' neural network learns all images of numbers and letters. So there are 2 output.
'Number'('Letter') neural network learns images of only numbers(letter).
Which method should I pick to have better classification? Just simply add 10 more classes or build hierarchical neural networks with method above?
I'd strongly recommend training only a single neural network with outputs for all the kinds of images you want to be able to detect (so one output node per letter you want to be able to recognize, and one output node for every digit you want to be able to recognize).
The main reason for this is because recognizing digits and recognizing letters is really kind of exactly the same task. Intuitively, you can understand a trained neural network with multiple layers as performing the recognition in multiple steps. In the hidden layer it may learn to detect various kinds of simple, primitive shapes (e.g. the hidden layer may learn to detect vertical lines, horizontal lines, diagonal lines, certain kinds of simple curved shapes, etc.). Then, in the weights between hidden and output layers, it may learn how to recognize combinations of multiple of these primitive shapes as a specific output class (e.g. a vertical and a horizontal line in roughly the correct locations may be recoginzed as a capital letter L).
Those "things" it learns in the hidden layer will be perfectly relevant for digits as well as letters (that vertical line which may indicate an L may also indicate a 1 when combined with other shapes). So, there are useful things to learn that are relevant for both ''tasks'', and it will probably be able to learn these things more easily if it can learn them all in the same network.
See also a this answer I gave to a related question in the past.
I'm trying to expand my neural network to also classify letters in license plate. But i'm worried if i just simply add more classes into output, for example add 10 letters into classification so total 20 classes, it would be hard for neural network to separate feature from each class.
You're far from where it becomes problematic. ImageNet has 1000 classes and is commonly done in a single network. See the AlexNet paper. If you want to learn more about CNNs, have a look at chapter 2 of "Analysis and Optimization of
Convolutional Neural Network Architectures". And when you're on it, see chapter 4 for hirarchical classification. You can read the summary for ... well, a summary of it.

neural networks for missing features

I have a dataset wit A...F features for training. Now my prediction data set to predict the key feature does not have observations of 3 feature used in the training set. So I have only a subset of features for prediction whereas the neural newtork is trained for a broader range of features.
How can I handle such problem? Can you use a neural network for the missing features? In my mind came the following: First, I use a neural network on training set, but now to train on the missing features. So I can predict the 3 missing features from the prediction data set. Now, I use a neural network on this new prediction data set.
Have you tried running the neural-network on your dataset even though features are missing? A neural-network does not need all features to be present.
You can simply set all missing features values to 0 for the neural network, as neural networks don't see a difference between 0 and feature is missing. Why not you ask? If you set an input value to 0, that means all the connections from that input node will have a 0 value a well: adding nothing to the hidden neurons that are connected to that input node.
But before you do that, try any of these:
source
As 1 seems the case for you!

Multi output neural network

I am trying to create a neural network that outputs more than a binary value.
The problem is the following:
I have recently stumbled upon this problem on kaggle https://www.kaggle.com/c/poker-rule-induction
Basically, the problem is getting the program to predict the hands the test set has to classify it from 0-9. I have already managed to solve this problem using RandomForest library.
My question is how can I solve this problem using a neural network?
I have already tried to follow some tutorials where you have 2 binary inputs and 1 binary output.
Dataset looks like the following:
If I understand you correctly, you are asking how to structure your neural network output neurons for binary classification. Instead of having one value that outputs a non-binary (0-9) which doesn't really work for many reasons, you can design the outputs to produce a binary vector.
Where...
1 = [0,1,0,0,0,0,0,0,0,0]
2 = [0,0,1,0,0,0,0,0,0,0]
3 = [0,0,0,1,0,0,0,0,0,0]
...etc
So, each item in the vector can be one of the 10 output neurons, and if that item is a 1, its position refers to its classification group. The best example of this, is the MNIST digit neural networks that also usually 10 neuron binary outputs.
Bare in mind the actual outputs will be decimals representing a probability / guess, that is close to either 0, or the 1.
This also means your target value has to be a vector that back propagates each item that corresponds to each neuron.

Different weights for different classes in neural networks and how to use them after learning

I trained a neural network using the Backpropagation algorithm. I ran the network 30 times manually, each time changing the inputs and the desired output. The outcome is that of a traditional classifier.
I tried it out with 3 different classifications. Since I ran the network 30 times with 10 inputs for each class I ended up with 3 distinct weights but the same classification had very similar weights with a very small amount of error. The network has therefore proven itself to have learned successfully.
My question is, now that the learning is complete and I have 3 distinct type of weights (1 for each classification), how could I use these in a regular feed forward network so it can classify the input automatically. I searched around to check if you can somewhat average out the weights but it looks like this is not possible. Some people mentioned bootstrapping the data:
Have I done something wrong during the backpropagation learning process? Or is there an extra step which needs to be done post the learning process with these different weights for different classes?
One way how I am imaging this is by implementing a regular feed forward network which will have all of these 3 types of weights. There will be 3 outputs and for any given input, one of the output neurons will fire which will result that the given input is mapped to that particular class.
The network architecture is as follows:
3 inputs, 2 hidden neurons, 1 output neuron
Thanks in advance
It does not make sense if you only train one class in your neural network each time, since the hidden layer can make weight combinations to 'learn' which class the input data may belong to. Learn separately will make the weights independent. The network won't know which learned weight to use if a new test input is given.
Use a vector as the output to represent the three different classes, and train the data altogether.
EDIT
P.S, I don't think the link post you provide is relevant with your case. The question in that post arises from different weights initialization (randomly) in neural network training. Sometimes people apply some seed methods to make the weight learning reproducible to avoid such a problem.
In addition to response by nikie, another possibility is to represent output as one (unique) output unit with continuous values. For example, ann classify for first class if output is in the [0, 1) interval, for second if is in the [1, 2) interval and third classes in [2, 3). This architecture is declared in letterature (and verified in my experience) to be less efficient that discrete represetnation with 3 neurons.

Interpreting Neural Network output

For a classification problem, how is the output of the network usually determined?
Say, there are three possible classes, each with a numerical identifier, would a reasonable solution be to sum the outputs and take that sum as the overall output of the network? Or would you take the average of the networks outputs?
There is plenty of information regarding ANN theory, but not much about application, but I apoligise if this is a silly question.
For a multi-layer perceptron classifier with 3 classes, one typically constructs a network with 3 outputs and trains the network so that (1,0,0) is the target output for the first class, (0,1,0) for the second class, and (0,0,1) for the third class. For classifying a new observation, you typically select the output with the greatest value (e.g., (0.12, 0.56, 0.87) would be classified as class 3).
I agree mostly with bogatron and further you will find many posts here advising on this kind of "multi-class classification" with neural networks.
Regarding your heading I would like to add that you can interpret that output as a probability since I struggled to find theoretical foundation for this. Going on I'll talk about a neural network with 3 neurons in the output layer, indicating 1 for the respective class.
Since the sum of all three outputs will always be 1 in training, the neural network will also give feed-forward output with a sum of one (so rather (0.12 0.36 0.52) than bogatrons example)) Then you can interpret these figures as the probability that the respective input belongs to class 1/2/3 (probability is 0.52 that it belongs to class 3)).
This is true when using the logistic function or the tanh as activation functions.
More on this:
Posterior probability via neural networks: http://www-vis.lbl.gov/~romano/mlgroup/papers/neural-networks-survey.pdf
How to convert the output of an artificial neural network into probabilities?

Resources