When studying Reinforcement learning, and exactly when it comes to Model-Free RL, there are two methods we use generally:
TD learning
Monte Carlo
When is each one of them used over the other? In other words, how do we figure out what method is best for our problem?
Sections 6.1 and 6.2 of Sutton & Barto give a very nice intuitive understanding of the difference between Monte Carlo and TD learning.
Having said that, there's of course the obvious incompatibility of MC methods with non-episodic tasks. In that case, you will always need some kind of bootstrapping.
Related
I have recently been looking into reinforcement learning. For this, I have been reading the famous book by Sutton, but there is something I do not fully understand yet.
For Monte-Carlo learning, we can choose between first-visit and every-visit algorithm, and it can be proved that both converges to the right solution asymptotically. But I guess that there are a difference between both (I understand the difference by definition, but I do not understand what are the drawbacks of each method). Should I in some cases use first-visit, and sometimes last visit ?
Thanks a lot,
Djaz
From my personal experience I have noticed first visit monte carlo converges faster and for control problems gets the optimal policy in fewer iterations.
I'm not sure if there exists a mathematical analysis on the rate of convergence of the two, but they both will converge to the true mean due to the law of large numbers.
Please guide me "Can we compare machine learning and Evolutionary algorithms ?" on their behavior to learn from previous and do best thing?
Is there any difference please mention it.
Evolutionary algorithms are one class of strategies that can be used in machine learning, just like backpropagation and many others.
Evolutionary algorithms usually converge slowly because they make no use of gradient information. On the other hand they provide at least a chance to escape from local optima and find the global one.
Sorry for the poor title,
I'm currently studying ML and I want to focus on a problem using the toolset I have acquired, which exludes reinforcement learning.
I want to create a NN that takes a simple 2D game level ( think of mario in the simplest case, simple fitness function, simple controls and easy feature selection) and outputs key sequence.
Since we don't know the correct key sequence(ks), I see 2 options,
1-) I find that out using genetic algorithm and use backprop or similar algorithms to associate levels with key sequences and predict a KS for a new level
2-) I build a huge NN and use genetic algorithm to solve whole internal structure of it.
What are the pros and cons of each approach? Why should I implement one instead of the other? Please remember that I'm fairly new to the topic and want to solve this problem with what I've learned until now, basics really.
What you are suggesting is in essence reinforcement learning, e.g. trying out "semi random" combinations and then using rewards to learn the network. The first approach is classical reinforcement learning and the other one is reinforcement learning using a neural network.
If you want to solve the topic like this there are plenty of tutorials and github repos available to help you solve this problem, with a simple google search.
I have read this page of standford - https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html. I am not able to understand how TD learning is used in neural networks. I am trying to make a checkers AI which will use TD learning, similar to what they have implemented in backgammon. Please explain the working of TD Back-Propagation.
I have already referred this question - Neural Network and Temporal Difference Learning
But I am not able to understand the accepted answer. Please explain with a different approach if possible.
TD learning is not used in neural networks. Instead, neural networks are used in TD learning to store the value (or q-value) function.
I think that you are confusing backpropagation ( a neural networks' concept) with bootstrapping in RL. Bootstrapping uses a combination of recent information and previous estimations to generate new estimations.
When the state-space is large and it is not easy to store the value function in tables, neural networks are used as an approximation scheme to store the value function.
The discussion on forward/backward views is more about eligibility traces, etc. A case where RL bootstraps serval steps ahead in time. However, this is not practical and there are ways (such as eligibility traces) to leave a trail and update past states.
This should not be connected or confused with back propagation in neural networks. It has nothing to do with it.
Are there any known approaches of making a machine learn calculus?
I've learnt that it is quite simple to teach calculating derivatives because it is possible to implement an algorithm.
Meanwhile, an implementation of integration is possible but is rarely or never fully implemented due to the algorithmic complexity.
I am curious whether there are any academic successes in the field of using machine learning science to evaluate and calculate integrals.
Edit
I am interested in teaching a computer to integrate using neural networks or similar methods.
My personal opinion it is not possible to feed into NN enough rules for integrating. Why? Because NN are good for linear regression ( AKA approximation ) or logical regression ( AKA classification ). Integration is neither of them. It is calculation task according to some strict algorithms. So from this prospective it's good idea to use some mathematical ways to integrate.
Update on 2020-10-23
Right now I'm in position of being ashamed by new developments according to news. Facebook recently announced that they developed some kind of AI, which is good in solving integrations.
There quite a few number of maths software that will compute derivatives and integral calculus for you. Some of the popular software include MATLAB, Maple, Mathematica, etc. These software will help you learn quite easily.
As for you making a machine learn calculus ...
You can read up on the following on wikipedia or other books,
Newton's Method - Solve the roots of a function numerically
Monte Carlo Integration - uses RNG to compute numeric integration
Runge Kutta Method - Solves ODE's iteratively
There are many more. These are just the ones I was taught in undergraduate school. They are also fairly simple to understand, depending on your level of academia. But in general, people have been try to numerically compute solutions to models since Newton. Computers have just made everything a lot easier.