I don’t understand how the NEAT algorithm takes inputs and then outputs numbers based on the connection genes, I am familiar with using matrixes in fixed topology neural networks to feedforward inputs, however as each node in NEAT has its own number of connections and isn’t necessarily connected to every other node, I don’t understand, and after much searching I can’t find an answer on how NEAT produces outputs based on the inputs.
Could someone explain how it works?
That was also a question I struggled while implementing my own version of the algorithm.
You can find the answer in the NEAT Users Page: https://www.cs.ucf.edu/~kstanley/neat.html where the author says:
How are networks with arbitrary topologies activated?
The activation function, bool Network::activate(), gives the specifics. The
implementation is of course considerably different than for a simple layered
feedforward network. Each node adds up the activation from all incoming nodes
from the previous timestep. (The function also handles a special "time delayed"
connection, but that is not used by the current version of NEAT in any
experiments that we have published.) Another way to understand it is to realize
that activation does not travel all the way from the input layer to the output
layer in a single timestep. In a single timestep, activation only travels from
one neuron to the next. So it takes several timesteps for activation to get from
the inputs to the outputs. If you think about it, this is the way it works in a
real brain, where it takes time for a signal hitting your eyes to get to the
cortex because it travels over several neural connections.
So, if one of the evolved networks is not feedforward, the outputs of the network will change in different timesteps and this is particularly useful in continuous control problems, where the environment is not static, but also problematic in classification problems. The author also answers:
How do I ensure that a network stabilizes before taking its output(s) for a
classification problem?
The cheap and dirty way to do this is just to activate n times in a row where
n>1, and hope there are not too many loops or long pathways of hidden nodes.
The proper (and quite nice) way to do it is to check every hidden node and output
node from one timestep to the next, and see if nothing has changed, or at least
not changed within some delta. Once this criterion is met, the output must be
stable.
Note that output may not always stabilize in some cases. Also, for continuous
control problems, do not check for stabilization as the network never "settles"
but rather continuously reacts to a changing environment. Generally,
stabilization is used in classification problems, or in board games.
when I was dealing with this I researched into loop detection using matrix methods etc.
https://en.wikipedia.org/wiki/Adjacency_matrix#Matrix_powers
But I found the best way to feedforward inputs and get outputs was with loop detection using a timeout propagation delay at each node:
a feedforward implementation is simple and I started from there:
wait until all incoming connections to a node have a signal then sum-squash activate and send to all output connections of that node. Start from input nodes that already have a signal from the input vector. Manually 'shunt' output nodes with a sum-squash operation once there are no more nodes to be processed to get the output vector.
for circularity (traditional NEAT implementation) I did the same as feedforward with one more feature:
calculate the 'maximum possible loop size' of the network. an easy way to calculate this is ~2*(total number of nodes). No walk from input to any node in the network is larger than this without cycling, therefore the node MUST propagate in this many time steps unless it is part of a cycle.
Then I wait until all input connection signals arrive at a node OR timeout occurs (signal has not arrived at a connection within maximum loop size steps). If timeout occurs label the input connections that don't have signals as recurrent.
Once a connection is labelled recurrent, restart all timers on all nodes (to prevent a node later in the detected cycle from being labelled recurrent due to propagation latency)
Now forward propagation is the same as feed forward network except: don't wait for connections that are recurrent, sum-squash as soon as all non-recurrent connections have arrived (0 for recurrent connections that don't have a signal yet). This ensures that the first node reached in a cycle is set to recurrent, making it deterministic for any given topology and recurrent connections pass data to the next propagation time step.
This has some first time overhead but is concise and produces the same results with a given topology each time its ran. Note that this only works when all nodes have a path to output so you cant necessarily disable split connections (connections that were made from node addition operations) and prune randomly during evolution without making considerations.
(P.S. This also creates a traditional residual-recurrent network that in theory could be implemented as matrix operations trivially. If I had large networks I would first 'express' by running forward propagation once to get recurrent connections then create a 'tensor per layer' representation for matrix-multiplication operations using recurrent, weight, and signal connection attributes with recurrent connection attribute as a sparse binary mask. I actually started writing a Tensorflow implementation that performed all mutation/augmentation operations with tf.sparse_matrix operations and didn't use any tree objects but I had to use dense operations and the n^2 space consumed is too much for what I need but this allowed the use of the aforementioned adjacency matrix powers trick since in matrix form! At least one other person on Github has done tf NEAT but I'm unsure of their implementation. Also I found this interesting https://neat-python.readthedocs.io/en/latest/neat_overview.html)
Happy Hacking!
Related
I want my neural network to be trained on every new data that it classifies incorrectly. Assuming that I somehow label the data correctly every time the network makes a mistake, how many back props do i need to run on this single instance of new data in order to train my network for that particular case? Is there a better way to train a neural network on real time scenarios?
It depends on the optimization algorithm you use. The backpropagation by itself calculates only the gradient, which is used by the next iteration of the algorithm.
In the simplest case you can use a self-developed gradient descent and check the behavior of your cost function. If the cost function decreases less than some threshold epsilon, you might break the optimization loop for the current instance. You can also limit the maximum number of iterations.
It is worth using some advanced optimizers such fminunc in Matlab, which will stop by themselves when reached an optimum.
You may find this post about different termination conditions of gradient descent very useful.
I think, learning only using one single instance is not really efficient. The cost function can behave jerky. You may consider the batch learning method, where you learn using small batches of new instances. It should provide a better learning rate.
In order to illustrate how network's accuracy depends on the iteration number and on the batch size, I experimented a bit with a neural network used to recognize hand written digits. I had 4000 examples in the training set and 1000 examples in the validation set. Then I started the learning algorithm with different parameters and measured the resulted accuracy. You can see the result here:
Of course this plot describes only my particular case, but you can get some intuition on what to expect and on how to validate network parameters.
I am very new to the 'reservoir computing world', and I've heard that the Liquid State Machines (LSM) are a certain kind of spiking neuron network models (SNN). Exactly what is the difference in terms of the implementation between the two.
Another aspect on which I need some clarity is in regards to their counterpart the 'Leaky integrator models of Echo state network (ESN).
I found from another answer in the forum that 'as I see it (I could be wrong) the big difference between the two approaches is the individual unit. In liquid state machine use biological like neurons, and in the Echo state use more analog units. So in term of “very short term memory” the Liquid State approach each individual neuron remember its own history, where in the Echo state approach each individual neuron react base only on the current state, there for the memory stored in the activity between the units.
Please tell me if this is the correct and if not what is the actual concept behind them.
Spiking neurons is a neuron model. LSM on the other hand is a network model. So LSM is part of a group of network models with spiking neurons (also called graded response or analog). ESN has the same units like a normal Perceptron and is thus part of the other (more popular) paradigm where neurons fire at each propagation cycle. This gives a simple enough introduction. The basic idea is to not consider neurons to be binary/digital (on/off) but analog by decoding the time between spikes which is now thought to be main source of information transportation between neurons. Whether the human brain is actually analog or digital is unknown but there's evidence of both as well as the true mechanic being something completely different. So whether one model is in fact more realistic cannot really be said with certainty.
i also new to this field , liquid state Machine: consider a pound of water what happens when you threw a pebbles in it. A series concentric Circles are created which eventually vanish ,you can learn something about where and when those pebbles where dropped in water if you look at where ripples intersecting and in case of echo states machines the idea is similar like sound wave from echo sort of interfering with one another is similar to the way that multiple inputs would interfere with one another and they are also past instantiations interference with one another within high dimension network
I'm new to Artificial Neural Networks and NeuroEvolution algorithms in general. I'm trying to implement the algorithm called NEAT (NeuroEvolution of Augmented Topologies), but the description in original public paper missed the method of how to evolve the weights of a network, it says
Connection weights mutate as in any NE system, with each connection either perturbed or not at each generation
I've done some searching about how to mutate weights in NE systems, but can't find any detailed description, unfortunately.
I know that while training a neural network, usually the backpropagation algorithm is used to correct the weights, but it only works if you have a fixed topology (structure) through generations and you know the answer to the problem. In NeuroEvolution, you don't know the answer, you have only the fitness function, so it's not possible to use backpropagation here.
I have some experience with training a fixed-topology NN using a genetic algorithm (What the paper refers to as the "traditional NE approach"). There are several different mutation and reproduction operators we used for this and we selected those randomly.
Given two parents, our reproduction operators (could also call these crossover operators) included:
Swap either single weights or all weights for a given neuron in the network. So for example, given two parents selected for reproduction either choose a particular weight in the network and swap the value (for our swaps we produced two offspring and then chose the one with the best fitness to survive in the next generation of the population), or choose a particular neuron in the network and swap all the weights for that neuron to produce two offspring.
swap an entire layer's weights. So given parents A and B, choose a particular layer (the same layer in both) and swap all the weights between them to produce two offsping. This is a large move so we set it up so that this operation would be selected less often than the others. Also, this may not make sense if your network only has a few layers.
Our mutation operators operated on a single network and would select a random weight and either:
completely replace it with a new random value
change the weight by some percentage. (multiply the weight by some random number between 0 and 2 - practically speaking we would tend to constrain that a bit and multiply it by a random number between 0.5 and 1.5. This has the effect of scaling the weight so that it doesn't change as radically. You could also do this kind of operation by scaling all the weights of a particular neuron.
add or subtract a random number between 0 and 1 to/from the weight.
Change the sign of a weight.
swap weights on a single neuron.
You can certainly get creative with mutation operators, you may discover something that works better for your particular problem.
IIRC, we would choose two parents from the population based on random proportional selection, then ran mutation operations on each of them and then ran these mutated parents through the reproduction operation and ran the two offspring through the fitness function to select the fittest one to go into the next generation population.
Of course, in your case since you're also evolving the topology some of these reproduction operations above won't make much sense because two selected parents could have completely different topologies. In NEAT (as I understand it) you can have connections between non-contiguous layers of the network, so for example you can have a layer 1 neuron feed another in layer 4, instead of feeding directly to layer 2. That makes swapping operations involving all the weights of a neuron more difficult - you could try to choose two neurons in the network that have the same number of weights, or just stick to swapping single weights in the network.
I know that while training a NE, usually the backpropagation algorithm is used to correct the weights
Actually, in NE backprop isn't used. It's the mutations performed by the GA that are training the network as an alternative to backprop. In our case backprop was problematic due to some "unorthodox" additions to the network which I won't go into. However, if backprop had been possible, I would have gone with that. The genetic approach to training NNs definitely seems to proceed much more slowly than backprop probably would have. Also, when using an evolutionary method for adjusting weights of the network, you start needing to tweak various parameters of the GA like crossover and mutation rates.
In NEAT, everything is done through the genetic operators. As you already know, the topology is evolved through crossover and mutation events.
The weights are evolved through mutation events. Like in any evolutionary algorithm, there is some probability that a weight is changed randomly (you can either generate a brand new number or you can e.g. add a normally distributed random number to the original weight).
Implementing NEAT might seem an easy task but there is a lot of small details that make it fairly complicated in the end. You might want to look at existing implementations and use one of them or at least be inspired by them. Everything important can be found at the NEAT Users Page.
I am trying to solve regression task using recurrent neural network (I use pybrain to build it). After my network is fit I want to use it to make predictions. But prediction of recurrent network is affected by its previous prediction (whih in turn is affected by prediction before it etc).
Question is - once network is trained and I want to make predictions with it on a dataset, how to properly kickstart the prediction process. If I will just call .activate() on first example from a dataset for predictions that means that the recurrent connection will pass 0 to network and it will affect the subsequent predictions in an undesireable way. Is there a way to force fully trained recurrent network to think that previous activation result was of a some special value? If yes, which value is the best here (maybe mean of possible activation output values or smth like it?)
UPDATE. Ok, since no one had any ideas within a day on how to do this with recurrent network in pybrain, let me maybe a bit change a formulation to forget about pybrain. Consider that I build a pybrain network for regression (for example, predicting price of a stock). Network will be used with a dataset which has 10 features. I add one additional feature into the dataset and fill it with previous price of from a dataset. Thus I replicate a recurrent network (aditional input neuron replicates recurrent connection). The questions are:
1) In the dataset for training I fill this additional feature with previous price. But what to do with the FIRST record in a training dataset (I don't know previous price). Should leave it 0? It should a bad idea, previous price WAS NOT zero. Should I use mean of prices in training dataset? Any other suggestions?
2) Again, same question as #1 but for running fully trained network against test dataset. While running my network against test dataset I should always pick up its prediction and put the result into this new 11th input neuron before making next prediction. But again, what to do when I need to run first prediction in dataset (since I don't know previous price)?
This isn't my understanding of recurrent networks at all.
When you initially create a recurrent network the recurrent connections (say middle layer to middle layer) will be randomized, as with any other connection. This is their starting value. Each time you activate a recurrent network you'll alter those connections and thus your output will be altered.
Carrying this logic forwards, if you wrote some code to train a recurrent network and saved it to a file, you'd have in that file a recurrent network ready to go with your real data, albeit the first invocation will contain the recurrent feedback from your last activation during the training.
The thing you want to do is make sure that you re-save your recurrent network anytime you wish to persist it's state. For a simple FFN this wouldn't be an issue because you only change the state during training, but for a recurrent network you'll want to persist the state after any activation because the recurrent weights will have updated.
I don't think it's the case that a recurrent network will be poisoned because of the initial value of the recurrent connections; certainly I wouldn't trust the first invocation, but given they're designed for sequences that shouldn't be an issue in either case.
Regarding your updated question, I'm not at all convinced that arbitrarily adding a single input node will simulate this. In point of fact I suspect you'd absolutely break the networks predictive capabilities. In your example, starting with 10 input nodes, and lets pretend you have 20 middle nodes, just by adding an extra input node you'll generate an additional 20 connections to the network, that will be initially randomized. Every additional point will compound this change, and after 10 additional input nodes you'll have as many randomized connections as trained.
I don't see this working, and I certainly don't believe it would simulate recurrent learning in the way you think.
My goal is to solve the XOR problem using a Neural Network. I’ve read countless articles on the theory, proof, and mathematics behind a multi-layered neural network. The theory make sense (math… not so much) but I have a few simple questions regarding the evaluation and topology of a Neural Network.
I feel I am very close to solving this problem, but I am beginning to question my topology and evaluation techniques. The complexities of back propagation aside, I just want to know if my approach to evaluation is correct. With that in mind, here are my questions:
Assuming we have multiple inputs, does each respective input get its’ own node? Do we ever input both values into a single node? Does the order in which we enter this information matter?
While evaluating the graph output, does each node fire as soon as it gets a value? Or do we instead collect all the values from the above layer and then fire off once we’ve consumed all the input?
Does the order of evaluation matter? For example, if a given node in layer “b” is ready to fire – but other nodes in that same layer are still awaiting input – should the ready node fire anyway? Or should all nodes in the layer be loaded up before firing?
Should each layer be connected to all nodes in the following layer?
I’ve attached a picture which should help explain (some of) my questions.
Thank you for your time!
1) Yes, each input gets its own node, and that node is always the node for that input type. The order doesn't matter - you just need to keep it consistent. After all, an untrained neural net can learn to map any set of linearly separable inputs to outputs, so there can't be an order that you need to put the nodes in in order for it to work.
2 and 3) You need to collect all the values from a single layer before any node in the next layer fires. This is important if you're using any activation function other than a stepwise one, because the sum of the inputs will affect the value that is propagated forward. Thus, you need to know what that sum is before you propagate anything.
4) Which nodes to connect to which other nodes is up to you. Since your net won't be excessively large and XOR is a fairly straightforward problem, it will probably be simplest for you to connect all nodes in one layer to all nodes in the next layer (i.e. a fully-connected neural net). There might be specialized cases in other problems where it would be better to not use this topology, but there isn't an easy way to figure it out (most people either use trial and error or a genetic algorithm, as in NEAT), and you don't need to worry about it for the purposes of this problem.