Neural Networks for Large Repetitive Sets of Inputs - machine-learning

Suppose we want to make a neural network to predict the outcome of a race between some number of participants.
Each participant in the race has various statistics: Engine Power, Max Speed, Driver Experience, etc.
Now imagine we have been asked to build a system which can handle any number of participants from 2 to 400 participants (just to pick a concrete number).
From what I have learned about "traditional" Neural Nets so far, our choices are:
Build many different neural nets for each number of participants: n = 2, 3, 4, 5, ... , 400.
Train one neural network taking input from 400 participants. When a piece of data refers to a race with less that 400 participants (this will be a large percentage of the data) just set all remaining statistic inputs to 0.
Assuming this would work, is there any reason to expect one method to perform better than the other?
The former is more specialized, but you have much less training data per net, so my guess is that it would work out roughly the same?
Is there a standard way to approach problems similar to this?
We could imagine (simplistically) that the neural network first classifies the strength of each participant, and therefore, each time a new participant is added, it needs to apply this same analysis to these new inputs, potentially hinting that there might be a "smart" way to reduce the total amount of work required.
Is this just screaming for a convolutional neural network?

Between your two options, option 1 would involve repeating a lot of effort to train for different sizes, and would probably be very slow to train as a result.
Option 2 is a bit more workable, but the network would need extra training on different sized inputs.
Another option, which I think would be the most likely to work, would be to only train a neural net to choose a winner between two participants, and use this to create a ranking via many comparisons between pairs. Such an approach is described here.
We could imagine (simplistically) that the neural network first classifies the strength of each participant, and therefore, each time a new participant is added, it needs to apply this same analysis to these new inputs, potentially hinting that there might be a "smart" way to reduce the total amount of work required.
I think you've got the key idea here. Since we want to perform exactly the same analysis on each participants (assuming it makes no difference whether they're participant 1 or participant 400), this is an ideal problem for Weight Sharing. This means that the weights on the neurons doing the initial analysis on a participant are identical for each participant. When these weights change for one participant, they change for all participants.
While CNNs do use weight sharing, we don't need to use a CNN to use this technique. The details of how you'd go about doing this would depend on your framework.

Related

Number of backprops as performance metric for neural networks

I have been reading article about SRCNN and found that they are using "number of backprops" for evaluating how well network is performing, i.e. what network is able to learn after x backprops (as I understand). I would like to know what number of backprops actually means. Is this just the number of training data samples that there used during the training? Or maybe the number of mini-batches? Maybe it is one of the previous numbers multiplied by number of learnable parameters in the network? Or something completely different? Maybe there is some other more common name for this that I could loop up somewhere and read more about it because I was not able to find anything useful by searching "number of backprops" or "number of backpropagations"?
Bonus question: how widely this metric is used and how good is it?
I read their Paper from 2016:
author={C. Dong and C. C. Loy and K. He and X. Tang},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Image Super-Resolution Using Deep Convolutional Networks},
Since they don't even mention batches I assume they are doing a backpropagation to update their weights after each sample / image.
In other words their batchsize (mini-batchsize) is equal to 1 sample.
So number of backpropagations means amount of batches after all, which is quite a common metric, viz. in the paper PSNR (loss) over amount of batches (or usually loss over epochs).
Bonus question: I come to the conclusion they just didn't stick to the common thesaurus of machine learning, or deep learning.
BonusBonus question: They use the metric of loss after n batches to showcase how much the different network architectures could learn on trainigdatasets with different size.
I would assume that after it means how many the network has learned after back-propagating n times. Its more likely interchangeable with "after training over n samples..."
This maybe a bit different if they are using a recurrent network, as they could have more samples run in forward prop then in backwardprop. (For whatever reason I can't get the link to the paper to load, so unsure).
Based on your number of questions I think you might be overthinking this :)
Number of backprops is not a metric used commonly. Perhaps they use it here to showcase the speed of training based upon whatever optimization method's they are using. But for most common instances, it is not a relevant metric.

Training a multilayer perceptron to play cards

I'm writing a multilayer perceptron neural network for playing two-player card games. I'd like to know if there is a better way to optimize weights than testing neural nets with randomly regenerated weights against each other.
Here's the way I implemented the neural net.
The neurons in the first layer output field values representing states of cards in the deck. For each of these neurons there is an array of constant weights. For example, if the card is in AI's hand, the neuron outputs field equal to the first weight in the array, if the card is on the table - the second, and so forth. These constant input weights need to be optimized in the training process.
Next, there are several hidden layers of neurons. The topology is fixed. All neurons in the preceding layer are connected to every neuron in the following layer. The connections' weights need to be optimized.
The last layer of neurons represents player's actions. These correspond to card that can be played, plus a couple non-card-specific actions, like take cards from the table, or end turn. The largest output field value corresponding to a legal action determines the action to play.
There is a caveat. I want the neural net to find the optimum strategy, so I cannot train it on individual turns. Rather, I have to let it play until it wins or looses, and that's approximately 50 turns.
I'm wondering what is the best approach to training in this scenario, where one does not know the proper response for every turn, but only know if the problem was solved correctly after multiple NN evaluations, i.e. it won the game.
For now, I've only thought of a simple evolutionary approach, in which a group of randomly generated NNs play against each other multiple times, and a few most successful ones remain for the next round, where the NNs which didn't pass are replaced by other random ones. The problem I see is that in this approach it's going to take a long time for the weights to start converging. But since the fraction of wins is a function of many weights (I'm expecting to need several hundreds to properly model the problem) which have highly non-linear effect on the NN output, I don't see how I could use a function minimization technique.
Does anyone know if this weight optimization problem would lend itself better to anything other then the a Monte Carlo technique?
I think this depends on what your card game is. In general, I think this statement of yours is false:
There is a caveat. I want the neural net to find the optimum strategy, so I cannot train it on individual turns.
It should be possible to find a way to train your network on individual turns. For example, if both players can make the same exact set of moves at each turn, you can train the loser network according to what the winner did at each of the turns. Admittedly, this might not be the case for most card games, where the set of moves at a given turn is usually determined by the cards each player is holding.
If you're playing something like poker, look at this. The idea there is to train your network based on the history of a player you consider good enough to learn from. For example, if you have a lot of data about your favorite (poker) player's games, you can train a neural network to learn their moves. Then, at each turn of a new game, do what the neural network tells you to do given its previous training and the data you have available up to that turn: what cards you're holding, what cards are on the table, what cards you know your opponents to be holding etc.
You could also consider reinforcement learning, which can make use of neural nets, but is based on a different idea. This might help you deal with your "cannot train on individual turns" problem, without needing training data.

How to evolve weights of a neural network in Neuroevolution?

I'm new to Artificial Neural Networks and NeuroEvolution algorithms in general. I'm trying to implement the algorithm called NEAT (NeuroEvolution of Augmented Topologies), but the description in original public paper missed the method of how to evolve the weights of a network, it says
Connection weights mutate as in any NE system, with each connection either perturbed or not at each generation
I've done some searching about how to mutate weights in NE systems, but can't find any detailed description, unfortunately.
I know that while training a neural network, usually the backpropagation algorithm is used to correct the weights, but it only works if you have a fixed topology (structure) through generations and you know the answer to the problem. In NeuroEvolution, you don't know the answer, you have only the fitness function, so it's not possible to use backpropagation here.
I have some experience with training a fixed-topology NN using a genetic algorithm (What the paper refers to as the "traditional NE approach"). There are several different mutation and reproduction operators we used for this and we selected those randomly.
Given two parents, our reproduction operators (could also call these crossover operators) included:
Swap either single weights or all weights for a given neuron in the network. So for example, given two parents selected for reproduction either choose a particular weight in the network and swap the value (for our swaps we produced two offspring and then chose the one with the best fitness to survive in the next generation of the population), or choose a particular neuron in the network and swap all the weights for that neuron to produce two offspring.
swap an entire layer's weights. So given parents A and B, choose a particular layer (the same layer in both) and swap all the weights between them to produce two offsping. This is a large move so we set it up so that this operation would be selected less often than the others. Also, this may not make sense if your network only has a few layers.
Our mutation operators operated on a single network and would select a random weight and either:
completely replace it with a new random value
change the weight by some percentage. (multiply the weight by some random number between 0 and 2 - practically speaking we would tend to constrain that a bit and multiply it by a random number between 0.5 and 1.5. This has the effect of scaling the weight so that it doesn't change as radically. You could also do this kind of operation by scaling all the weights of a particular neuron.
add or subtract a random number between 0 and 1 to/from the weight.
Change the sign of a weight.
swap weights on a single neuron.
You can certainly get creative with mutation operators, you may discover something that works better for your particular problem.
IIRC, we would choose two parents from the population based on random proportional selection, then ran mutation operations on each of them and then ran these mutated parents through the reproduction operation and ran the two offspring through the fitness function to select the fittest one to go into the next generation population.
Of course, in your case since you're also evolving the topology some of these reproduction operations above won't make much sense because two selected parents could have completely different topologies. In NEAT (as I understand it) you can have connections between non-contiguous layers of the network, so for example you can have a layer 1 neuron feed another in layer 4, instead of feeding directly to layer 2. That makes swapping operations involving all the weights of a neuron more difficult - you could try to choose two neurons in the network that have the same number of weights, or just stick to swapping single weights in the network.
I know that while training a NE, usually the backpropagation algorithm is used to correct the weights
Actually, in NE backprop isn't used. It's the mutations performed by the GA that are training the network as an alternative to backprop. In our case backprop was problematic due to some "unorthodox" additions to the network which I won't go into. However, if backprop had been possible, I would have gone with that. The genetic approach to training NNs definitely seems to proceed much more slowly than backprop probably would have. Also, when using an evolutionary method for adjusting weights of the network, you start needing to tweak various parameters of the GA like crossover and mutation rates.
In NEAT, everything is done through the genetic operators. As you already know, the topology is evolved through crossover and mutation events.
The weights are evolved through mutation events. Like in any evolutionary algorithm, there is some probability that a weight is changed randomly (you can either generate a brand new number or you can e.g. add a normally distributed random number to the original weight).
Implementing NEAT might seem an easy task but there is a lot of small details that make it fairly complicated in the end. You might want to look at existing implementations and use one of them or at least be inspired by them. Everything important can be found at the NEAT Users Page.

Different weights for different classes in neural networks and how to use them after learning

I trained a neural network using the Backpropagation algorithm. I ran the network 30 times manually, each time changing the inputs and the desired output. The outcome is that of a traditional classifier.
I tried it out with 3 different classifications. Since I ran the network 30 times with 10 inputs for each class I ended up with 3 distinct weights but the same classification had very similar weights with a very small amount of error. The network has therefore proven itself to have learned successfully.
My question is, now that the learning is complete and I have 3 distinct type of weights (1 for each classification), how could I use these in a regular feed forward network so it can classify the input automatically. I searched around to check if you can somewhat average out the weights but it looks like this is not possible. Some people mentioned bootstrapping the data:
Have I done something wrong during the backpropagation learning process? Or is there an extra step which needs to be done post the learning process with these different weights for different classes?
One way how I am imaging this is by implementing a regular feed forward network which will have all of these 3 types of weights. There will be 3 outputs and for any given input, one of the output neurons will fire which will result that the given input is mapped to that particular class.
The network architecture is as follows:
3 inputs, 2 hidden neurons, 1 output neuron
Thanks in advance
It does not make sense if you only train one class in your neural network each time, since the hidden layer can make weight combinations to 'learn' which class the input data may belong to. Learn separately will make the weights independent. The network won't know which learned weight to use if a new test input is given.
Use a vector as the output to represent the three different classes, and train the data altogether.
EDIT
P.S, I don't think the link post you provide is relevant with your case. The question in that post arises from different weights initialization (randomly) in neural network training. Sometimes people apply some seed methods to make the weight learning reproducible to avoid such a problem.
In addition to response by nikie, another possibility is to represent output as one (unique) output unit with continuous values. For example, ann classify for first class if output is in the [0, 1) interval, for second if is in the [1, 2) interval and third classes in [2, 3). This architecture is declared in letterature (and verified in my experience) to be less efficient that discrete represetnation with 3 neurons.

extrapolation with recurrent neural network

I Wrote a simple recurrent neural network (7 neurons, each one is initially connected to all the neurons) and trained it using a genetic algorithm to learn "complicated", non-linear functions like 1/(1+x^2). As the training set, I used 20 values within the range [-5,5] (I tried to use more than 20 but the results were not changed dramatically).
The network can learn this range pretty well, and when given examples of other points within this range, it can predict the value of the function. However, it can not extrapolate correctly and predicting the values of the function outside the range [-5,5]. What are the reasons for that and what can I do to improve its extrapolation abilities?
Thanks!
Neural networks are not extrapolation methods (no matter - recurrent or not), this is completely out of their capabilities. They are used to fit a function on the provided data, they are completely free to build model outside the subspace populated with training points. So in non very strict sense one should think about them as an interpolation method.
To make things clear, neural network should be capable of generalizing the function inside subspace spanned by the training samples, but not outside of it
Neural network is trained only in the sense of consistency with training samples, while extrapolation is something completely different. Simple example from "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8" shows how NN behave in this context
All of these networks are consistent with training data, but can do anything outside of this subspace.
You should rather reconsider your problem's formulation, and if it can be expressed as a regression or classification problem then you can use NN, otherwise you should think about some completely different approach.
The only thing, which can be done to somehow "correct" what is happening outside the training set is to:
add artificial training points in the desired subspace (but this simply grows the training set, and again - outside of this new set, network's behavious is "random")
add strong regularization, which will force network to create very simple model, but model's complexity will not guarantee any extrapolation strength, as two model's of exactly the same complexity can have for example completely different limits in -/+ infinity.
Combining above two steps can help building model which to some extent "extrapolates", but this, as stated before, is not a purpose of a neural network.
As far as I know this is only possible with networks which do have the echo property. See Echo State Networks on scholarpedia.org.
These networks are designed for arbitrary signal learning and are capable to remember their behavior.
You can also take a look at this tutorial.
The nature of your post(s) suggests that what you're referring to as "extrapolation" would be more accurately defined as "sequence recognition and reproduction." Training networks to recognize a data sequence with or without time-series (dt) is pretty much the purpose of Recurrent Neural Network (RNN).
The training function shown in your post has output limits governed by 0 and 1 (or -1, since x is effectively abs(x) in the context of that function). So, first things first, be certain your input layer can easily distinguish between negative and positive inputs (if it must).
Next, the number of neurons is not nearly as important as how they're layered and interconnected. How many of the 7 were used for the sequence inputs? What type of network was used and how was it configured? Network feedback will reveal the ratios, proportions, relationships, etc. and aid in the adjustment of network weight adjustments to match the sequence. Feedback can also take the form of a forward-feed depending on the type of network used to create the RNN.
Producing an 'observable' network for the exponential-decay function: 1/(1+x^2), should be a decent exercise to cut your teeth on RNNs. 'Observable', meaning the network is capable of producing results for any input value(s) even though its training data is (far) smaller than all possible inputs. I can only assume that this was your actual objective as opposed to "extrapolation."

Resources