Multi output neural network - machine-learning

Multi output neural network - machine-learning

I am trying to create a neural network that outputs more than a binary value.
The problem is the following:
I have recently stumbled upon this problem on kaggle https://www.kaggle.com/c/poker-rule-induction
Basically, the problem is getting the program to predict the hands the test set has to classify it from 0-9. I have already managed to solve this problem using RandomForest library.
My question is how can I solve this problem using a neural network?
I have already tried to follow some tutorials where you have 2 binary inputs and 1 binary output.
Dataset looks like the following:

If I understand you correctly, you are asking how to structure your neural network output neurons for binary classification. Instead of having one value that outputs a non-binary (0-9) which doesn't really work for many reasons, you can design the outputs to produce a binary vector.
Where...
1 = [0,1,0,0,0,0,0,0,0,0]
2 = [0,0,1,0,0,0,0,0,0,0]
3 = [0,0,0,1,0,0,0,0,0,0]
...etc
So, each item in the vector can be one of the 10 output neurons, and if that item is a 1, its position refers to its classification group. The best example of this, is the MNIST digit neural networks that also usually 10 neuron binary outputs.
Bare in mind the actual outputs will be decimals representing a probability / guess, that is close to either 0, or the 1.
This also means your target value has to be a vector that back propagates each item that corresponds to each neuron.

Related

Is it possible to train a binary classification neural network by only feeding it with input from only one class?

I'm basically trying to create a neural network that should tell me whether an input I'm giving it is valid or not. The problem is that I only have valid input with which I can train it.
Right now I am trying to come up with a working dense model that validates only mnist digits between 0 and 4. All other digits should be seen as invalid. First attempt was to train it with digits between 0 and 4 as valid and images with random pixels as invalid (with the same percent of black pixels as a normal image) but unfortunately it doesn't work. When I test it with digits between 5 and 9, they are seen as valid.
So I'm starting to think if it's even possible to train a neural network this way.
Also I realize there might be better ways to do this, maybe with an autoencoder or a different kind of network but right now I want to try this with only dense layers.
Thank you.

What you are looking for is one-class classification, also known as unary classification or class-modelling.
Quick google search suggests to train an autoencoder and define an object as in your class if the reconstruction error is below a specific threshold.
But if you start building up something like that i would suggest you to use something like One-Class K-Nearest Neighbor or One-Class SVM first to see if you get acceptable results. If so you can improve your results with the "extremly more complicated to develop"- solution using autoencoders

Machine learning multi-classification: Why use 'one-hot' encoding instead of a number

I'm currently working on a classification problem with tensorflow, and i'm new to the world of machine learning, but I don't get something.
I have successfully tried to train models that output the y tensor like this:
y = [0,0,1,0]
But I can't understand the principal behind it...
Why not just train the same model to output classes such as y = 3 or y = 4
This seems much more flexible, because I can imagine having a multi-classification problem with 2 million possible classes, and it would be much more efficient to output a number between 0-2,000,000 than to output a tensor of 2,000,000 items for every result.
What am I missing?

Ideally, you could train you model to classify input instances and producing a single output. Something like
y=1 means input=dog, y=2 means input=airplane. An approach like that, however, brings a lot of problems:
How do I interpret the output y=1.5?
Why I'm trying the regress a number like I'm working with continuous data while I'm, in reality, working with discrete data?
In fact, what are you doing is treating a multi-class classification problem like a regression problem.
This is locally wrong (unless you're doing binary classification, in that case, a positive and a negative output are everything you need).
To avoid these (and other) issues, we use a final layer of neurons and we associate an high-activation to the right class.
The one-hot encoding represents the fact that you want to force your network to have a single high-activation output when a certain input is present.
This, every input=dog will have 1, 0, 0 as output and so on.
In this way, you're correctly treating a discrete classification problem, producing a discrete output and well interpretable (in fact you'll always extract the output neuron with the highest activation using tf.argmax, even though your network hasn't learned to produce the perfect one-hot encoding you'll be able to extract without doubt the most likely correct output )

The answer is in how that final tensor, or single value, are calculated. In an NN, your y=3 would be build by a weighted sum over the values of the previous layer.
Trying to train towards single values would then imply a linear relationship between the category IDs where none exists: For the true value y=4, the output y=3 would be considered better than y=1 even though the categories are random, and may be 1: dogs, 3: cars, 4: cats

Neural networks use gradient descent to optimize a loss function. In turn, this loss function needs to be differentiable.
A discrete output would be (indeed is) a perfectly valid and valuable output for a classification network. Problem is, we don't know how to optimize this net efficiently.
Instead, we rely on a continuous loss function. This loss function is usually based on something that is more or less related to the probability of each label -- and for this, you need a network output that has one value per label.
Typically, the output that you describe is then deduced from this soft, continuous output by taking the argmax of these pseudo-probabilities.

Time Series Prediction using Recurrent Neural Networks

I am using a Bike Sharing dataset to predict the number of rentals in a day, given the input. I will use 2011 data to train and 2012 data to validate. I successfully built a linear regression model, but now I am trying to figure out how to predict time series by using Recurrent Neural Networks.
Data set has 10 attributes (such as month, working day or not, temperature, humidity, windspeed), all numerical, though an attribute is day (Sunday: 0, Monday:1 etc.).
I assume that one day can and probably will depend on previous days (and I will not need all 10 attributes), so I thought about using RNN. I don't know much, but I read some stuff and also this. I think about a structure like this.
I will have 10 input neurons, a hidden layer and 1 output neuron. I don't know how to decide on how many neurons the hidden layer will have.
I guess that I need a matrix to connect input layer to hidden layer, a matrix to connect hidden layer to output layer, and a matrix to connect hidden layers in neighbouring time-steps, t-1 to t, t to t+1. That's total of 3 matrices.
In one tutorial, activation function was sigmoid, although I'm not sure exactly, if I use sigmoid function, I will only get output between 0 and 1. What should I use as activation function? My plan is to repeat this for n times:
For each training data:
Forward propagate
Propagate the input to hidden layer, add it to propagation of previous hidden layer to current hidden layer. And pass this to activation function.
Propagate the hidden layer to output.
Find error and its derivative, store it in a list
Back propagate
Find current layers and errors from list
Find current hidden layer error
Store weight updates
Update weights (matrices) by multiplying them by learning rate.
Is this the correct way to do it? I want real numerical values as output, instead of a number between 0-1.

It seems to be the correct way to do it, if you are just wanting to learn the basics. If you want to build a neural network for practical use, this is a very poor approach and as Marcin's comment says, almost everyone who constructs neural nets for practical use do so by using packages which have an ready simulation of neural network available. Let me answer your questions one by one...
I don't know how to decide on how many neurons the hidden layer will have.
There is no golden rule to choose the right architecture for your neural network. There are many empirical rules people have established out of experience, and the right number of neurons are decided by trying out various combinations and comparing the output. A good starting point would be (3/2 times your input plus output neurons, i.e. (10+1)*(3/2)... so you could start with a 15/16 neurons in hidden layer, and then go on reducing the number based on your output.)
What should I use as activation function?
Again, there is no 'right' function. It totally depends on what suits your data. Additionally, there are many types of sigmoid functions like hyperbolic tangent, logistic, RBF, etc. A good starting point would be logistic function, but again you will only find the right function through trial and error.
Is this the correct way to do it? I want real numerical values as output, instead of a number between 0-1.
All activation functions(including the one assigned to output neuron) will give you an output of 0 to 1, and you will have to use multiplier to convert it to real values, or have some kind of encoding with multiple output neurons. Coding this manually will be complicated.
Another aspect to consider would be your training iterations. Doing it 'n' times doesn't help. You need to find the optimal training iterations with trial and error as well to avoid both under-fitting and over-fitting.
The correct way to do it would be to use packages in Python or R, which will allow you to train neural nets with large amount of customization quickly, where you can train and test multiple nets with different activation functions (and even different training algorithms) and network architecture without too much hassle. With some amount of trial and error, you will eventually find the net that gives you desirable output.

Having some troubles with the PyBrain Neural Network regression function

I would appreciate a some insights into the workings of the PyBrain's neural network. I have a dataset of different household features that correspond to a certain household income. The task is to create a regression based on neural networks to be able to predict the income for given features.
I've tried the simple constructor
pybrain.tools.shortcuts.buildNetwork(feature_count, 12, 1, recurrent=False)
and it kinda works. But if i change the hiddenlayer to use GaussianLayer or LinearLayer i am getting the NaNs as output during the training phase.
Is there maybe something else that needs to be taken care of when using these layers (I am guessing maybe feature selection, when they correlate)?
Thanks

I solved a neural network regression problem using pybrain where I had to forecast the load on a power station using weather features. This appears to be the same problem as yours, except in application. I followed the guide here: http://fastml.com/pybrain-a-simple-neural-networks-library-in-python/ which brought me 90% of the way towards the final solution. I had 8 inputs and one outputs.
One "gotcha" I found was that I had to normalise my input values to 0 -> 1. The MSE value would not decrease on each EPOCH otherwise. Also, if any of my input vaues were NaN, I got continuous Nan values out.
I hope this helps.

Different weights for different classes in neural networks and how to use them after learning

I trained a neural network using the Backpropagation algorithm. I ran the network 30 times manually, each time changing the inputs and the desired output. The outcome is that of a traditional classifier.
I tried it out with 3 different classifications. Since I ran the network 30 times with 10 inputs for each class I ended up with 3 distinct weights but the same classification had very similar weights with a very small amount of error. The network has therefore proven itself to have learned successfully.
My question is, now that the learning is complete and I have 3 distinct type of weights (1 for each classification), how could I use these in a regular feed forward network so it can classify the input automatically. I searched around to check if you can somewhat average out the weights but it looks like this is not possible. Some people mentioned bootstrapping the data:
Have I done something wrong during the backpropagation learning process? Or is there an extra step which needs to be done post the learning process with these different weights for different classes?
One way how I am imaging this is by implementing a regular feed forward network which will have all of these 3 types of weights. There will be 3 outputs and for any given input, one of the output neurons will fire which will result that the given input is mapped to that particular class.
The network architecture is as follows:
3 inputs, 2 hidden neurons, 1 output neuron
Thanks in advance

It does not make sense if you only train one class in your neural network each time, since the hidden layer can make weight combinations to 'learn' which class the input data may belong to. Learn separately will make the weights independent. The network won't know which learned weight to use if a new test input is given.
Use a vector as the output to represent the three different classes, and train the data altogether.
EDIT
P.S, I don't think the link post you provide is relevant with your case. The question in that post arises from different weights initialization (randomly) in neural network training. Sometimes people apply some seed methods to make the weight learning reproducible to avoid such a problem.

In addition to response by nikie, another possibility is to represent output as one (unique) output unit with continuous values. For example, ann classify for first class if output is in the [0, 1) interval, for second if is in the [1, 2) interval and third classes in [2, 3). This architecture is declared in letterature (and verified in my experience) to be less efficient that discrete represetnation with 3 neurons.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart