Temporal Difference Learning and Back-propagation - machine-learning

I have read this page of standford - https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html. I am not able to understand how TD learning is used in neural networks. I am trying to make a checkers AI which will use TD learning, similar to what they have implemented in backgammon. Please explain the working of TD Back-Propagation.
I have already referred this question - Neural Network and Temporal Difference Learning
But I am not able to understand the accepted answer. Please explain with a different approach if possible.

TD learning is not used in neural networks. Instead, neural networks are used in TD learning to store the value (or q-value) function.
I think that you are confusing backpropagation ( a neural networks' concept) with bootstrapping in RL. Bootstrapping uses a combination of recent information and previous estimations to generate new estimations.
When the state-space is large and it is not easy to store the value function in tables, neural networks are used as an approximation scheme to store the value function.
The discussion on forward/backward views is more about eligibility traces, etc. A case where RL bootstraps serval steps ahead in time. However, this is not practical and there are ways (such as eligibility traces) to leave a trail and update past states.
This should not be connected or confused with back propagation in neural networks. It has nothing to do with it.

Related

Clarification on a Neural Net that plays Snake

I'm new to neural networks/machine learning/genetic algorithms, and for my first implementation I am writing a network that learns to play snake (An example in case you haven't played it before) I have a few questions that I don't fully understand:
Before my questions I just want to make sure I understand the general idea correctly. There is a population of snakes, each with randomly generated DNA. The DNA is the weights used in the neural network. Each time the snake moves, it uses the neural net to decide where to go (using a bias). When the population dies, select some parents (maybe highest fitness), and crossover their DNA with a slight mutation chance.
1) If given the whole board as an input (about 400 spots) enough hidden layers (no idea how many, maybe 256-64-32-2?), and enough time, would it learn to not box itself in?
2) What would be good inputs? Here are some of my ideas:
400 inputs, one for each space on the board. Positive if snake should go there (the apple) and negative if it is a wall/your body. The closer to -1/1 it is the closer it is.
6 inputs: game width, game height, snake x, snake y, apple x, and apple y (may learn to play on different size boards if trained that way, but not sure how to input it's body, since it changes size)
Give it a field of view (maybe 3x3 square in front of head) that can alert the snake of a wall, apple, or it's body. (the snake would only be able to see whats right in front unfortunately, which could hinder it's learning ability)
3) Given the input method, what would be a good starting place for hidden layer sizes (of course plan on tweaking this, just don't know what a good starting place)
4) Finally, the fitness of the snake. Besides time to get the apple, it's length, and it's lifetime, should anything else be factored in? In order to get the snake to learn to not block itself in, is there anything else I could add to the fitness to help that?
Thank you!
In this post, I will advise you of:
How to map navigational instructions to action sequences with an LSTM
neural network
Resources that will help you learn how to use neural
networks to accomplish your task
How to install and configure neural
network libraries based on what I needed to learn the hard way
General opinion of your idea:
I can see what you're trying to do, and I believe that your game idea (of using randomly generated identities of adversaries that control their behavior in a way that randomly alters the way they're using artificial intelligence to behave intelligently) has a lot of potential.
Mapping navigational instructions to action sequences with a neural network
For processing your game board, because it involves dense (as opposed to sparse) data, you could find a Convolutional Neural Network (CNN) to be useful. However, because you need to translate the map to an action sequence, sequence-optimized neural networks (such as Recurrent Neural Networks) will likely be the most useful for you. I did find some studies that use neural networks to map navigational instructions to action sequences, construct the game map, and move a character through a game with many types of inputs:
Mei, H., Bansal, M., & Walter, M. R. (2015). Listen, attend, and walk: Neural mapping of navigational instructions to action sequences. arXiv preprint arXiv:1506.04089. Available at: Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences
Lample, G., & Chaplot, D. S. (2016). Playing FPS games with deep reinforcement learning. arXiv preprint arXiv:1609.05521. Available at: Super Mario as a String: Platformer Level Generation Via LSTMs
Lample, G., & Chaplot, D. S. (2016). Playing FPS games with deep reinforcement learning. arXiv preprint arXiv:1609.05521. Available at: Playing FPS Games with Deep Reinforcement Learning
Schulz, R., Talbot, B., Lam, O., Dayoub, F., Corke, P., Upcroft, B., & Wyeth, G. (2015, May). Robot navigation using human cues: A robot navigation system for symbolic goal-directed exploration. In Robotics and Automation (ICRA), 2015 IEEE International Conference on (pp. 1100-1105). IEEE. Available at: Robot Navigation Using Human Cues: A robot navigation system for symbolic goal-directed exploration
General opinion of what will help you
It sounds like you're missing some basic understanding of how neural networks work, so my primary recommendation to you is to study more of the underlying mechanics behind neural networks in general. It's important to keep in mind that a neural network is a type of machine learning model. So, it doesn't really make sense to just construct a neural network with random parameters. A neural network is a machine learning model that is trained from sample data, and once it is trained, it can be evaluated on test data (e.g. to perform predictions).
The root of machine learning is largely influenced by Bayesian statistics, so you might benefit from getting a textbook on Bayesian statistics to gain a deeper understanding of how machine-based classification works in general.
It will also be valuable for you to learn the differences between different types of neural networks, such as Long Short Term Memory (LSTM) and Convolutional Neural Networks (CNNs).
If you want to tinker with how neural networks can be used for classification tasks, try this:
Tensorflow Playground
To learn the math:
My professional opinion is that learning the underlying math of neural networks is very important. If it's intimidating, I give you my testimony that I was able to learn all of it on my own. But if you prefer learning in a classroom environment, then I recommend that you try that. A great resource and textbook for learning the mechanics and mathematics of neural networks is:
Neural Networks and Deep Learning
Tutorials for neural network libraries
I recommend that you try working through the tutorials for a neural network library, such as:
TensorFlow tutorials
Deep Learning tutorials with Theano
CNTK tutorials (CNTK 205: Artistic Style Transfer is particularly cool.)
Keras tutorial (Keras is a powerful high-level neural network library that can use either TensorFlow or Theano.)
I saw similar application. Inputs usually were snake coordinates, apple coordinates and some sensory data(is wall next to snake head or no in your case).
Using genetic algorithm is a good idea in this case. You doing only parametric learning(finding set of weights), but structure will be based on your estimation. GA can be also used for structure learning(finding topology of ANN). But using GA for both will be very computational hard.
Professor Floreano did something similar. He use GA for finding weights for neural network controller of robot. Robot was in labyrinth and perform some task. Neural network hidden layer was one neuron with recurrent joints on inputs and one lateral connection on himself. There was two outputs. Outputs were connected on input layer and hidden layer(mentioned one neuron).
But Floreano did something more interesting. He say, We don't born with determined synapses, our synapses change in our lifetime. So he use GA for finding rules for change of synapses. These rules was based on Hebbian learning. He perform node encoding(for all weights connected to neuron will apply same rule). On beginning, he initialized weights on small random values. Finding rules instead of numerical value of synapse leads to better results.
One from Floreno's articles.
And on the and my own experience. In last semester I and my schoolmate get a task finding the rules for synapse with GA but for Spiking neural network. Our SNN was controller for kinematic model of mobile robot and task was lead robot in to the chosen point. We obtained some results but not expected. You can see results here. So I recommend you use "ordinary" ANN instead off SNN because SNN brings new phenomens.

Building a Tetris AI using Neuroevolution

I am planning to create a Tetris AI using artificial neural network and train it with genetic algorithm for a project in my high school computer science class. I have a basic understanding of how an ANN works and how to implement it with a genetic algorithm. I have already written a working Neural Network based on this tutorial and I'm currently working on a genetic algorithm.
My questions are:
Which GA model is better for this situation (Tetris), and why?
What should I use for input for the neural network? Because currently, the method I'm using is to simply convert the state of the board (the pieces) into a one dimensional array and feed it into the neural network? Is there a better approach?
What should the size (number of layers, neurons per layer) the neural network be?
Are there any good sources of information that can help me?
Thank you!
Similar task was already solved by Google, but they solved it for all kinds of Atari games - https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf.
Carefully read this article and all of the related articles too
This is a reinforcement learning task, in my opinion the hardest task in ML domain. So there will be no short answer for your questions - except that probably you shouldn't use GA heuristic at all and rely on reinforcements methods.

How to model a for loop in a neural network

I am currently in the process of learning neural networks and can understand basic examples like AND, OR, Addition, Multiplication, etc.
Right now, I am trying to build a neural network that takes two inputs x and n, and computes pow(x, n). And, this would require the neural network to have some form of a loop, and I am not sure how I can model a network with a loop
Can this sort of computation be modelled on a neural network? I am assuming it is possible.. based on the recently released paper(Neural Turing Machine), but not sure how. Any pointers on this would be very helpful.
Thanks!
Feedforward neural nets are not Turing-complete, and in particular they cannot model loops of arbitrary order. However, if you fix the maximum n that you want to treat, then you can set up an architecture which can model loops with up to n repetitions. For instance, you could easily imagine that each layer could act as one iteration in the loop, so you might need n layers.
For a more general architecture that can be made Turing-complete, you could use Recurrent Neural Networks (RNN). One popular instance in this class are the so-called Long short-term memory (LSTM) networks by Hochreiter and Schmidhuber. Training such RNNs is quite different from training classical feedforward networks, though.
As you pointed out, Neural Turing Machines seem to working well to learn the basic algorithms. For instance, the repeat copy task which has been implemented in the paper, might tell us that NTM can learn the algorithm itself. As of now, NTMs have been used only for simple tasks so understanding its scope by using the pow(x,n) will be interesting given that repeat copy works well. I suggest reading Reinforcement Learning Neural Turing Machines - Revised for a deeper understanding.
Also, recent developments in the area of Memory Networks empower us to perform more complicated tasks. Hence, to make a neural network understand pow(x,n) might be possible. So go ahead and give it a shot!

In Q-learning with function approximation, is it possible to avoid hand-crafting features?

I have little background knowledge of Machine Learning, so please forgive me if my question seems silly.
Based on what I've read, the best model-free reinforcement learning algorithm to this date is Q-Learning, where each state,action pair in the agent's world is given a q-value, and at each state the action with the highest q-value is chosen. The q-value is then updated as follows:
Q(s,a) = (1-α)Q(s,a) + α(R(s,a,s') + (max_a' * Q(s',a'))) where α is the learning rate.
Apparently, for problems with high dimensionality, the number of states become astronomically large making q-value table storage infeasible.
So the practical implementation of Q-Learning requires using Q-value approximation via generalization of states aka features. For example if the agent was Pacman then the features would be:
Distance to closest dot
Distance to closest ghost
Is Pacman in a tunnel?
And then instead of q-values for every single state you would only need to only have q-values for every single feature.
So my question is:
Is it possible for a reinforcement learning agent to create or generate additional features?
Some research I've done:
This post mentions A Geramifard's iFDD method
http://www.icml-2011.org/papers/473_icmlpaper.pdf
http://people.csail.mit.edu/agf/Files/13RLDM-GQ-iFDD+.pdf
which is a way of "discovering feature dependencies", but I'm not sure if that is feature generation, as the paper assumes that you start off with a set of binary features.
Another paper that I found was apropos is Playing Atari with Deep Reinforcement Learning, which "extracts high level features using a range of neural network architectures".
I've read over the paper but still need to flesh out/fully understand their algorithm. Is this what I'm looking for?
Thanks
It seems like you already answered your own question :)
Feature generation is not part of the Q-learning (and SARSA) algorithm. In a process which is called preprocessing you can however use a wide array of algorithms (of which you showed some) to generate/extract features from your data. Combining different machine learning algorithms results in hybrid architectures, which is a term you might look into when researching what works best for your problem.
Here is an example of using features with SARSA (which is very similar to Q-learning).
Whether the papers you cited are helpful for your scenario, you'll have to decide for yourself. As always with machine learning, your approach is highly problem-dependent. If you're in robotics and it's hard to define discrete states manually, a neural network might be helpful. If you can think of heuristics by yourself (like in the pacman example) then you probably won't need it.

What is machine learning? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
What is machine learning ?
What does machine learning code do ?
When we say that the machine learns, does it modify the code of itself or it modifies history (database) which will contain the experience of code for given set of inputs?
What is a machine learning ?
Essentially, it is a method of teaching computers to make and improve predictions or behaviors based on some data. What is this "data"? Well, that depends entirely on the problem. It could be readings from a robot's sensors as it learns to walk, or the correct output of a program for certain input.
Another way to think about machine learning is that it is "pattern recognition" - the act of teaching a program to react to or recognize patterns.
What does machine learning code do ?
Depends on the type of machine learning you're talking about. Machine learning is a huge field, with hundreds of different algorithms for solving myriad different problems - see Wikipedia for more information; specifically, look under Algorithm Types.
When we say machine learns, does it modify the code of itself or it modifies history (Data Base) which will contain the experience of code for given set of inputs ?
Once again, it depends.
One example of code actually being modified is Genetic Programming, where you essentially evolve a program to complete a task (of course, the program doesn't modify itself - but it does modify another computer program).
Neural networks, on the other hand, modify their parameters automatically in response to prepared stimuli and expected response. This allows them to produce many behaviors (theoretically, they can produce any behavior because they can approximate any function to an arbitrary precision, given enough time).
I should note that your use of the term "database" implies that machine learning algorithms work by "remembering" information, events, or experiences. This is not necessarily (or even often!) the case.
Neural networks, which I already mentioned, only keep the current "state" of the approximation, which is updated as learning occurs. Rather than remembering what happened and how to react to it, neural networks build a sort of "model" of their "world." The model tells them how to react to certain inputs, even if the inputs are something that it has never seen before.
This last ability - the ability to react to inputs that have never been seen before - is one of the core tenets of many machine learning algorithms. Imagine trying to teach a computer driver to navigate highways in traffic. Using your "database" metaphor, you would have to teach the computer exactly what to do in millions of possible situations. An effective machine learning algorithm would (hopefully!) be able to learn similarities between different states and react to them similarly.
The similarities between states can be anything - even things we might think of as "mundane" can really trip up a computer! For example, let's say that the computer driver learned that when a car in front of it slowed down, it had to slow down to. For a human, replacing the car with a motorcycle doesn't change anything - we recognize that the motorcycle is also a vehicle. For a machine learning algorithm, this can actually be surprisingly difficult! A database would have to store information separately about the case where a car is in front and where a motorcycle is in front. A machine learning algorithm, on the other hand, would "learn" from the car example and be able to generalize to the motorcycle example automatically.
Machine learning is a field of computer science, probability theory, and optimization theory which allows complex tasks to be solved for which a logical/procedural approach would not be possible or feasible.
There are several different categories of machine learning, including (but not limited to):
Supervised learning
Reinforcement learning
Supervised Learning
In supervised learning, you have some really complex function (mapping) from inputs to outputs, you have lots of examples of input/output pairs, but you don't know what that complicated function is. A supervised learning algorithm makes it possible, given a large data set of input/output pairs, to predict the output value for some new input value that you may not have seen before. The basic method is that you break the data set down into a training set and a test set. You have some model with an associated error function which you try to minimize over the training set, and then you make sure that your solution works on the test set. Once you have repeated this with different machine learning algorithms and/or parameters until the model performs reasonably well on the test set, then you can attempt to use the result on new inputs. Note that in this case, the program does not change, only the model (data) is changed. Although one could, theoretically, output a different program, but that is not done in practice, as far as I am aware. An example of supervised learning would be the digit recognition system used by the post office, where it maps the pixels to labels in the set 0...9, using a large set of pictures of digits that were labeled by hand as being in 0...9.
Reinforcement Learning
In reinforcement learning, the program is responsible for making decisions, and it periodically receives some sort of award/utility for its actions. However, unlike in the supervised learning case, the results are not immediate; the algorithm could prescribe a large sequence of actions and only receive feedback at the very end. In reinforcement learning, the goal is to build up a good model such that the algorithm will generate the sequence of decisions that lead to the highest long term utility/reward. A good example of reinforcement learning is teaching a robot how to navigate by giving a negative penalty whenever its bump sensor detects that it has bumped into an object. If coded correctly, it is possible for the robot to eventually correlate its range finder sensor data with its bumper sensor data and the directions that sends to the wheels, and ultimately choose a form of navigation that results in it not bumping into objects.
More Info
If you are interested in learning more, I strongly recommend that you read Pattern Recognition and Machine Learning by Christopher M. Bishop or take a machine learning course. You may also be interested in reading, for free, the lecture notes from CIS 520: Machine Learning at Penn.
Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. Read more on Wikipedia
Machine learning code records "facts" or approximations in some sort of storage, and with the algorithms calculates different probabilities.
The code itself will not be modified when a machine learns, only the database of what "it knows".
Machine learning is a methodology to create a model based on sample data and use the model to make a prediction or strategy. It belongs to artificial intelligence.
Machine learning is simply a generic term to define a variety of learning algorithms that produce a quasi learning from examples (unlabeled/labeled). The actual accuracy/error is entirely determined by the quality of training/test data you provide to your learning algorithm. This can be measured using a convergence rate. The reason you provide examples is because you want the learning algorithm of your choice to be able to informatively by guidance make generalization. The algorithms can be classed into two main areas supervised learning(classification) and unsupervised learning(clustering) techniques. It is extremely important that you make an informed decision on how you plan on separating your training and test data sets as well as the quality that you provide to your learning algorithm. When you providing data sets you want to also be aware of things like over fitting and maintaining a sense of healthy bias in your examples. The algorithm then basically learns wrote to wrote on the basis of generalization it achieves from the data you have provided to it both for training and then for testing in process you try to get your learning algorithm to produce new examples on basis of your targeted training. In clustering there is very little informative guidance the algorithm basically tries to produce through measures of patterns between data to build related sets of clusters e.g kmeans/knearest neighbor.
some good books:
Introduction to ML (Nilsson/Stanford),
Gaussian Process for ML,
Introduction to ML (Alpaydin),
Information Theory Inference and Learning Algorithms (very useful book),
Machine Learning (Mitchell),
Pattern Recognition and Machine Learning (standard ML course book at Edinburgh and various Unis but relatively a heavy reading with math),
Data Mining and Practical Machine Learning with Weka (work through the theory using weka and practice in Java)
Reinforcement Learning there is a free book online you can read:
http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html
IR, IE, Recommenders, and Text/Data/Web Mining in general use alot of Machine Learning principles. You can even apply Metaheuristic/Global Optimization Techniques here to further automate your learning processes. e.g apply an evolutionary technique like GA (genetic algorithm) to optimize your neural network based approach (which may use some learning algorithm). You can approach it purely in form of a probablistic machine learning approach for example bayesian learning. Most of these algorithms all have a very heavy use of statistics. Concepts of convergence and generalization are important to many of these learning algorithms.
Machine learning is the study in computing science of making algorithms that are able to classify information they haven't seen before, by learning patterns from training on similar information. There are all sorts of kinds of "learners" in this sense. Neural networks, Bayesian networks, decision trees, k-clustering algorithms, hidden markov models and support vector machines are examples.
Based on the learner, they each learn in different ways. Some learners produce human-understandable frameworks (e.g. decision trees), and some are generally inscrutable (e.g. neural networks).
Learners are all essentially data-driven, meaning they save their state as data to be reused later. They aren't self-modifying as such, at least in general.
I think one of the coolest definitions of machine learning that I've read is from this book by Tom Mitchell. Easy to remember and intuitive.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
Shamelessly ripped from Wikipedia: Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases.
Quite simply, machine learning code accomplishes a machine learning task. That can be a number of things from interpreting sensor data to a genetic algorithm.
I would say it depends. No, modifying code is not normal, but is not outside the realm of possibility. I would also not say that machine learning always modifies a history. Sometimes we have no history to build off of. Sometime we simply want to react to the environment, but not actually learn from our past experiences.
Basically, machine learning is a very wide-open discipline that contains many methods and algorithms that make it impossible for there to be 1 answer to your 3rd question.
Machine learning is a term that is taken from the real world of a person, and applied on something that can't actually learn - a machine.
To add to the other answers - machine learning will not (usually) change the code, but it might change it's execution path and decision based on previous data or new gathered data and hence the "learning" effect.
there are many ways to "teach" a machine - you give weights to many parameter of an algorithm, and then have the machine solve it for many cases, each time you give her a feedback about the answer and the machine adjusts the weights according to how close the machine answer was to your answer or according to the score you gave it's answer, or according to some results test algorithm.
This is one way of learning and there are many more...

Resources