I want to know how a dataset is fed to a neural network. Suppose I have a weather dataset here. The features of the dataset are outlook, temperature, humidity, windy. The class attribute is play. As per the neural network, each input node of the neural network represents one feature. Something like that,
If the data of the dataset are following
outlook Temperature Humidity windy play
Sunny hot high false no
Sunny hot high True no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
I would like to know what values should replace the ? in the picture? I am confused between two inputs
1st input
2nd input
I think the input should be like the 1st input picture. For me 2nd input picture has no sense. However, somewhere on web, I read that the input array of each neuron should be row-wise input.
Any suggestion would be helpful.
What "row wise" means is that your data
outlook Temperature Humidity windy play
Sunny hot high false no
Sunny hot high True no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
should be fed row by row
so, you would get on first row
outlook=Sunny, Temperature=hot, Humidity=high, windy=false, play=no
each of these key=value assignments is a value for one "neuron" (note, that in reality neurons are not really able to process a string, so you will need some form of encoding, e.g. one hot encoding, which will change the effective number of neurons)
and so on and so forth.
You don't really "feed dataset in", in practise we do often use batching, where many rows are processed in parallel, but this is just an implementational efficiency trick for heavily optimised vectorised computation, but for pure understanding you should just ignore its existence, as it is an independent thing.
Related
Say I have a feature X that measures the number of people with some rare contagious virus in a population. X is reported at the end of each day by a government agency that monitors disease outbreaks. Suppose I also had a ton of past values of X grouped into 20 day chunks. So each group has 20 entries of X, representing the number of infected people on each of the given 20 days.
These groups of past data were then labeled:
Normal - the 20 day period showed totally normal values for X and there was not threat of a breakout in them.
Breakout - the 20 day period showed a breakout where the value of X went up above normal levels and then rapidly got worse.
False breakout - the 20 day period showed a false breakout where the value of X raised slightly but then dropped back to normal levels.
I would then like to use this data to train a model that could then be applied to a new breakout that is happening live to predict at any instance during the breakout the probability that it was going to be a real breakout or a false breakout. The only feature that I want this model to consider, for better or for worse, is the values for X.
Now I believe that in some way I could apply a convolutional layer to this network that can explore the feature/behavior of X that predict false breakouts VS real breakouts. I think this is possible because of how convolutional nets are used in image classification and applied to this problem it seems I could take the burden off of me as the algorithm creator to find ways in which X acts during false breakouts V real breakouts
I think this is an interesting application of a convolutional network and was wondering if anyone had any insight into how something like this could be approached?
In my opinion if you have only 20 values for each "example" to train your classification network, a CNN could be an overkill and probably also ineffective for the really small amount of input data it can analyze. Remember that a CNN works using kernels that progressively scans and convolve regions of data searching for patterns representation, and in my opinion you don't have enough data for that. Standard kernels for CNNs are around 3x3 or 5x5.
I think you're probably best off with a standard Neural Network with an input layer of dimension 20, an hidden layer of something around 64-128 neurons and an output layer of 3 classes (Normal, Breakout, False).
Activations would be ReLu and a final Softmax to obtain percentages.
A different case would be if you'd want to correlate blocks of 20 days taken in different countries, as an example. In that case you could create a sort of "image" with each column as the 20 days values of each country/place and training a CNN that way.
I am writing a little bit about googles deepdream. It's possible to check with deepdream learned networks, see research blog google the examplbe with the dumbbells.
In the example a network is trained to recognize a dumbbell. Then they use deepdream to see what the network has learned and the result is the network was trained bad. Because it recognize a dumbbell plus an arm as a dumbbell.
My question is, how will networks check in practice? With deepdream or which other method?
Best greetings
Generally in machine learning you validate your learned network on a dataset you did not use in the training process (a test set). So in this case, you would have a set of examples with and without dumbbells that was used to train the model, as well as a set (also consisting of dumbbells and without) that were not seen during the training procedure.
When you have your model, you let it predict the labels of the withheld set. You then compare these predicted labels to the actual ones:
Every time you predict a dumbbell correctly, you increment the amount of True Positives,
in case it correctly predicts the absence of a dumbbell, you increment the amount of True Negatives
when it predicted a dumbbell, but it should not be one, increment the amount of False Positives
Finally if it predicted no dumbbell, but there is one, you increment the amount of False Negatives
Based on these four, you can then calculate measures such as F1 score or accuracy to calculate the performance of the model. (Have a look at the following wiki: https://en.wikipedia.org/wiki/F1_score )
I am using a Bike Sharing dataset to predict the number of rentals in a day, given the input. I will use 2011 data to train and 2012 data to validate. I successfully built a linear regression model, but now I am trying to figure out how to predict time series by using Recurrent Neural Networks.
Data set has 10 attributes (such as month, working day or not, temperature, humidity, windspeed), all numerical, though an attribute is day (Sunday: 0, Monday:1 etc.).
I assume that one day can and probably will depend on previous days (and I will not need all 10 attributes), so I thought about using RNN. I don't know much, but I read some stuff and also this. I think about a structure like this.
I will have 10 input neurons, a hidden layer and 1 output neuron. I don't know how to decide on how many neurons the hidden layer will have.
I guess that I need a matrix to connect input layer to hidden layer, a matrix to connect hidden layer to output layer, and a matrix to connect hidden layers in neighbouring time-steps, t-1 to t, t to t+1. That's total of 3 matrices.
In one tutorial, activation function was sigmoid, although I'm not sure exactly, if I use sigmoid function, I will only get output between 0 and 1. What should I use as activation function? My plan is to repeat this for n times:
For each training data:
Forward propagate
Propagate the input to hidden layer, add it to propagation of previous hidden layer to current hidden layer. And pass this to activation function.
Propagate the hidden layer to output.
Find error and its derivative, store it in a list
Back propagate
Find current layers and errors from list
Find current hidden layer error
Store weight updates
Update weights (matrices) by multiplying them by learning rate.
Is this the correct way to do it? I want real numerical values as output, instead of a number between 0-1.
It seems to be the correct way to do it, if you are just wanting to learn the basics. If you want to build a neural network for practical use, this is a very poor approach and as Marcin's comment says, almost everyone who constructs neural nets for practical use do so by using packages which have an ready simulation of neural network available. Let me answer your questions one by one...
I don't know how to decide on how many neurons the hidden layer will have.
There is no golden rule to choose the right architecture for your neural network. There are many empirical rules people have established out of experience, and the right number of neurons are decided by trying out various combinations and comparing the output. A good starting point would be (3/2 times your input plus output neurons, i.e. (10+1)*(3/2)... so you could start with a 15/16 neurons in hidden layer, and then go on reducing the number based on your output.)
What should I use as activation function?
Again, there is no 'right' function. It totally depends on what suits your data. Additionally, there are many types of sigmoid functions like hyperbolic tangent, logistic, RBF, etc. A good starting point would be logistic function, but again you will only find the right function through trial and error.
Is this the correct way to do it? I want real numerical values as output, instead of a number between 0-1.
All activation functions(including the one assigned to output neuron) will give you an output of 0 to 1, and you will have to use multiplier to convert it to real values, or have some kind of encoding with multiple output neurons. Coding this manually will be complicated.
Another aspect to consider would be your training iterations. Doing it 'n' times doesn't help. You need to find the optimal training iterations with trial and error as well to avoid both under-fitting and over-fitting.
The correct way to do it would be to use packages in Python or R, which will allow you to train neural nets with large amount of customization quickly, where you can train and test multiple nets with different activation functions (and even different training algorithms) and network architecture without too much hassle. With some amount of trial and error, you will eventually find the net that gives you desirable output.
I have an input into a neural network used for classification, that was trained on a data set where the values were from 1-5, for example. And then I normalized all of this training data so that it was from 0-1. What would I feed into the network if I wanted to classify something where that input was outside of the 1-5 range. For example, how could a value of 5.3 be normalized?
There are a number of ways that the value could be handled depending on the conditions of your Neural Network. Some include:
1/. The Input may be maximised to a value of 1
2/. This may exceed 1 depending on the normalisation algorithm applied and whether the Neural Network was designed to allow it (Typically, if all data was normalised, these values should remain between 0 and 1)
3/. (Classification Only) - If the Inputs are categorical, rather than a quantitative value between 1 and 5, I'm not sure if a value of 5.3 would make sense. Perhaps adding another neuron for an 'unknown' state may help depending on your problem, but I have a gut feeling that this is overkill.
I am assuming that such a case has arisen as a result of unforeseen future cases being used for estimation purposes after training has been completed. Generally, handling would really come down to (i) the Programming of the Neural Network, and (ii) the calculation of the Normalised Input.
I've been trying to program an AI for tic tac toe using a multilayer perceptron and backpropagation. My idea was to train the neural network to be an accurate evaluation function for board states, but the problem is even after analyzing thousands of games, the network does not output accurate evaluations.
I'm using 27 input neurons; each square on the 3x3 board is associated with three input neurons that receive values of 0 or 1 depending on whether the square has an x, o or is blank. These 27 input neurons send signals to 10 hidden neurons (I chose 10 arbitrarily, but I have tried with 5 and 15 as well).
For training, I've had the program generate a series of games by playing against itself using the current evaluation function to select what are deemed optimal moves for each side. After generating a game, the NN compiles training examples (which comprise a board state and the correct output) by taking the correct output for a given board state to be the value (using the evaluation function) of the board state that follows it in the game sequence. I think this is what Gerald Tesauro did when programming TD-Gammon, but I might have misinterpreted the article. (note: I put the specific mechanism for updating weights at the bottom of this post).
I have tried various values for the learning rate, as well as varying numbers of hidden neurons, but nothing seems to work. Even after hours of "learning," there is no discernible improvement in strategy and the evaluation function is not anywhere close to accurate.
I realize that there are much easier ways to program tic tac toe, but I want to do it with a multilayer perceptron so that I may apply it to connect 4 later on. Is this even possible? I'm starting to think that there is no reliable evaluation function for a tic tac toe board with a reasonable amount of hidden neurons.
I assure you that I am not looking for some quick code to turn in for a homework assignment. I've been working unsuccessfully for a while now and would just like to know what I'm doing wrong. All advice is appreciated.
This is the specific mechanism I used for the NN:
Each of the 27 input neurons receives a 0 or 1, which passes through the differentiable sigmoid function 1/(1+e^(-x)). Each input neuron i sends this output (i.output), multiplied by some weight (i.weights[h]) to each hidden neuron h. The sum of these values is taken as input by the hidden neuron h (h.input), and this input passes through the sigmoid to form the output for each hidden neuron (h.output). I denote the lastInput to be the sum of (h.output * h.weight) across all of the hidden neurons. The outputted value of the board is then sigmoid(lastInput).
I denote the learning rate to be alpha, and err to be the correct output minus to actual output. Also I let dSigmoid(x) equal the derivative of the sigmoid at the point x.
The weight of each hidden neuron h is incremented by the value: (alpha*err*dSigmoid(lastInput)*h.output) and the weight of the signal from a given input neuron i to a given hidden neuron h is incremented by the value: (alpha*err*dSigmoid(lastInput)*h.weight*dSigmoid(h.input)*i.output).
I got these formulas from this lecture on backpropagation: http://www.youtube.com/watch?v=UnWL2w7Fuo8 .
Tic tac toe has 3^9 = 19683 states (actually, some of them aren't legal, but the order of magnitude is right). The ouput function isn't smooth, so I think the best a backpropagation network can do is "rote learning" a look-up table for all these states.
With that in mind, 10 hidden neurons seems very small, and there's no way you can train 20k different look-up-table entries by teaching a few thousand games. For that, the network would have to "extrapolate" from states it has been taught to states it has never seen, and I don't see how it could do that.
You might want to consider more than one hidden layer, as well as upping the size of the hidden layer. For comparison purposes, Fogel and Chellapilla used two layers of 40 and 10 neurons to program up a checkers player, so if you need something more than that, something is probably going terribly wrong.
You might also want to use bias inputs, if you're not already.
Your basic methodology seems sound, although I'm not 100% sure what you mean by this:
After generating a game, the NN compiles training examples (which comprise a board state and the correct output) by taking the correct output for a given board state to be the value (using the evaluation function) of the board state that follows it in the game sequence.
I think you mean that you're using some known-good method (like a minimax game tree) to determine the "correct" answers for the training examples. Can you explain that a little bit? Or, if I'm correct, it seems like there's a subtlety to deal with, in terms of symmetric boards, which might have more than one equally good best response. If you're only treating one of those as correct, that might lead to problems. (Or it might not, I'm not sure.)
Just to throw in another thought have your thought about using reinforcement learning for this task? It would be much easier to implement and much more effective. For example you could use Q learning which is often used for games.
Here you can find an implementation for training a Neural Network in Tik Tak Toe (variable board-size) using self-play. The gradient is back-propagated through the whole game employing a simple gradient-copy trick.