Is there a problem of adding more output neurons after finishing training my neural network .
for example I teach my neural network how to see oranges and apples and say which one is apple and which one is orange. Shades, shape and texture as inputs and orange and apple as outputs so there are 3 inputs and 2 outputs.
what if I trained them and I wanted to add two more outputs lets say banana and stawberey. If I did that does my neural network previous learning fail ? or do I make something wrong here ? or it is safe to do that ?
You will most likely need to re-train the network from scratch incorporating the old and new data and four classes instead of two. If you try to add new classes to existing network, you are liable to run into what is called catastrophic forgetting.. However, you may be fine with only re-training the final classifier, or fine-tuning from previously learned weights.
Related
When you have an input that you want to make a prediction on, does the input have to be run through the entire neural net?
Yes, that is how current neural networks work. The only exception are some state of the art networks such as CNNs with Adaptive Inference Graphs.
Yes. In most Neural Nets, each layer except the final one will generally produce outputs that do not correspond to output categories. Instead, these intermediate layers extract features and find patterns in the output from the previous layer. Only the final layer looks at all these high-level features (produced by the combination of all previous layers) to decide how the input should be categorised (example).
I'm trying to understand the logic behind the use of a trained Neural Network. If I'm right : we should save the weights from the previous training, then, reload them with the new input.
For example, I have this data set :
Input= [[0,1][1,1]]
Output=[[1],[0]]
Results after training = [[0.999...],[0.005...]]
And I have also saved the weights. What I don't understand is : how I should use the previous weights to make a prediction for example ? For example, I want to try a prediction with the following input [1,0]. I find a lot of resources online with Matlab or Python, but I don't find something to clearly understand what the calculations are, to do it "from scratch".
Thank you,
It is as simple as doing your feedforward step with learned weights.
these are the steps you do in general:
1- feed forward : giving inputs to produce output labels
2-calculating the cost base on true labels of inputs which you have in a supervised problem
3-going backward in network to update your weights base on the cost
After you finished the training , you don't do step 2 and 3, you just Do the first step. going forward in network with new inputs and the learned weights in training process. the output is your prediction.
I'm quite new to neural network and I recently built neural network for number classification in vehicle license plate. It has 3 layers: 1 input layer for 16*24(382 neurons) number image with 150 dpi , 1 hidden layer(199 neurons) with sigmoid activation function, 1 softmax output layer(10 neurons) for each number 0 to 9.
I'm trying to expand my neural network to also classify letters in license plate. But I'm worried if I just simply add more classes into output, for example add 10 letters into classification so total 20 classes, it would be hard for neural network to separate feature from each class. And also, I think it might cause problem when input is one of number and neural network wrongly classifies as one of letter with biggest probability, even though sum of probabilities of all number output exceeds that.
So I wonder if it is possible to build hierarchical neural network in following manner:
There are 3 neural networks: 'Item', 'Number', 'Letter'
'Item' neural network classifies whether input is numbers or letters.
If 'Item' neural network classifies input as numbers(letters), then input goes through 'Number'('Letter') neural network.
Return final output from Number(Letter) neural network.
And learning mechanism for each network is below:
'Item' neural network learns all images of numbers and letters. So there are 2 output.
'Number'('Letter') neural network learns images of only numbers(letter).
Which method should I pick to have better classification? Just simply add 10 more classes or build hierarchical neural networks with method above?
I'd strongly recommend training only a single neural network with outputs for all the kinds of images you want to be able to detect (so one output node per letter you want to be able to recognize, and one output node for every digit you want to be able to recognize).
The main reason for this is because recognizing digits and recognizing letters is really kind of exactly the same task. Intuitively, you can understand a trained neural network with multiple layers as performing the recognition in multiple steps. In the hidden layer it may learn to detect various kinds of simple, primitive shapes (e.g. the hidden layer may learn to detect vertical lines, horizontal lines, diagonal lines, certain kinds of simple curved shapes, etc.). Then, in the weights between hidden and output layers, it may learn how to recognize combinations of multiple of these primitive shapes as a specific output class (e.g. a vertical and a horizontal line in roughly the correct locations may be recoginzed as a capital letter L).
Those "things" it learns in the hidden layer will be perfectly relevant for digits as well as letters (that vertical line which may indicate an L may also indicate a 1 when combined with other shapes). So, there are useful things to learn that are relevant for both ''tasks'', and it will probably be able to learn these things more easily if it can learn them all in the same network.
See also a this answer I gave to a related question in the past.
I'm trying to expand my neural network to also classify letters in license plate. But i'm worried if i just simply add more classes into output, for example add 10 letters into classification so total 20 classes, it would be hard for neural network to separate feature from each class.
You're far from where it becomes problematic. ImageNet has 1000 classes and is commonly done in a single network. See the AlexNet paper. If you want to learn more about CNNs, have a look at chapter 2 of "Analysis and Optimization of
Convolutional Neural Network Architectures". And when you're on it, see chapter 4 for hirarchical classification. You can read the summary for ... well, a summary of it.
I have a dataset wit A...F features for training. Now my prediction data set to predict the key feature does not have observations of 3 feature used in the training set. So I have only a subset of features for prediction whereas the neural newtork is trained for a broader range of features.
How can I handle such problem? Can you use a neural network for the missing features? In my mind came the following: First, I use a neural network on training set, but now to train on the missing features. So I can predict the 3 missing features from the prediction data set. Now, I use a neural network on this new prediction data set.
Have you tried running the neural-network on your dataset even though features are missing? A neural-network does not need all features to be present.
You can simply set all missing features values to 0 for the neural network, as neural networks don't see a difference between 0 and feature is missing. Why not you ask? If you set an input value to 0, that means all the connections from that input node will have a 0 value a well: adding nothing to the hidden neurons that are connected to that input node.
But before you do that, try any of these:
source
As 1 seems the case for you!
I trained a neural network using the Backpropagation algorithm. I ran the network 30 times manually, each time changing the inputs and the desired output. The outcome is that of a traditional classifier.
I tried it out with 3 different classifications. Since I ran the network 30 times with 10 inputs for each class I ended up with 3 distinct weights but the same classification had very similar weights with a very small amount of error. The network has therefore proven itself to have learned successfully.
My question is, now that the learning is complete and I have 3 distinct type of weights (1 for each classification), how could I use these in a regular feed forward network so it can classify the input automatically. I searched around to check if you can somewhat average out the weights but it looks like this is not possible. Some people mentioned bootstrapping the data:
Have I done something wrong during the backpropagation learning process? Or is there an extra step which needs to be done post the learning process with these different weights for different classes?
One way how I am imaging this is by implementing a regular feed forward network which will have all of these 3 types of weights. There will be 3 outputs and for any given input, one of the output neurons will fire which will result that the given input is mapped to that particular class.
The network architecture is as follows:
3 inputs, 2 hidden neurons, 1 output neuron
Thanks in advance
It does not make sense if you only train one class in your neural network each time, since the hidden layer can make weight combinations to 'learn' which class the input data may belong to. Learn separately will make the weights independent. The network won't know which learned weight to use if a new test input is given.
Use a vector as the output to represent the three different classes, and train the data altogether.
EDIT
P.S, I don't think the link post you provide is relevant with your case. The question in that post arises from different weights initialization (randomly) in neural network training. Sometimes people apply some seed methods to make the weight learning reproducible to avoid such a problem.
In addition to response by nikie, another possibility is to represent output as one (unique) output unit with continuous values. For example, ann classify for first class if output is in the [0, 1) interval, for second if is in the [1, 2) interval and third classes in [2, 3). This architecture is declared in letterature (and verified in my experience) to be less efficient that discrete represetnation with 3 neurons.