Can you limit the number of actions when using q learning?

Can you limit the number of actions when using q learning? - machine-learning

I am currently implementing q learning to solve a maze which contains fires which initiate randomly. Would it be considered proper for me to code the action to not be an option for the agent if there is a fire in that direction or should my reward be doing this instead?
Thanks

TL;DR: It is absolutely okay to restrict actions.
The available actions can be state-dependent. This can be given by physical limitations (no possibility to enter the wall). A radical example of this is the application of RL to movement on a graph (see this: https://education.dellemc.com/content/dam/dell-emc/documents/en-us/2020KS_Nannapaneni-Optimal_path_routing_using_Reinforcement_Learning.pdf).
Additionally, you can restrict your actions even if they are allowed (e.g. physically possible) by designing the policy. In case of probabilistic policy, you can set the "fire" actions to have a probability zero.
For deeper reading: https://arxiv.org/pdf/1906.01772.pdf

Related

Lidars in Drake

I want to simulate lidars. I saw that a class DepthSensor was mentioned in the documentation, but I have not found its actual implementation. For now, I am planning on using the RgbdSensor class and use only the height I need of the depth point cloud I receive to simulate my lidars.
Just to get your input on that, maybe I missed something, but is there a specific class for lidars, and how would you go about adding lidars to a simulation?
Thanks in advance,
Arnaud

You've discovered an anchronism in the code. There had previously been a lidar-like sensor (called DepthSensor). The extant documentation refers to that class. The class's removal should've been accompanied by a clean up of the documentation.
The approach you are taking is the expected approach given Drake's current state.
There has always been an intention to re-introduce a lidar-like sensor in Drake's current architecture. It simply hasn't been a high priority.
I'd recommend you proceed with what you're currently doing (lidar from depth images) but, at the same time, post an issue requesting a lidar-like query with specific focus on the minimum lidar-properties that you require. A discussion regarding how that would differ from what you can actually get from the depth images would better inform of us your unique needs and how to prioritize it. (You can also indicate more advanced features that you need less but would be good to have, of course).
As for the question: how would you go about adding lidars?
That's problematic. Ideally, what you would need is ray-casting ability. The intent is for QueryObject to support such a query, but it hasn't happened yet. (It's certainly the underlying technology we'd have used to implement a LidarSensor.) In the absence of that kind of functionality, you'd essentially have to do it yourself in the most horrible, tedious way imaginable. I'd go so far as to suggest that it's not feasible with the current API.

What do 'random jumps' in Google's pageRank really mean?

I read somewhere that the added S matrix of 1/n elements together with the fudge factor 0.15 which Google uses is just not accurate and just comes to solve another problem.
On the other hand I have read somewhere else that it does have a meaning. And it is used for random jumps. We first ask whether a surfer wants to continue to click or not. So according to what I read the meaning is -85% continue to click -15% don't.
My question is... this is maybe good for first click. But how does this work in other iterations? How can anyone land at a random page? Isn't it the whole assumption of page rank that every one is linked to the other?
If I can just land on a page without coming from somewhere else then the ranking isn't accurate at all.
But most importantly I don't understand what does the added 1/n matrix mean? If I am at a page I can only click on clicks which I see. What does it mean to say that I can go somewhere else?
If they mean that I just Google search again then why don't call it a second chain? Why include it in the first ?
Also, is it 15% that I randomly jump or 15% that I stop surfing? (Or are they the same thing? )
And to my first question - is it a fudge inaccurate factor that is made to solve other problems or it does really mean something as said above and it IS a correct measurement to include it even by its own merit?

"Random jumps" could correspond to lots of things:
Entering an address in URL bar
Visiting a "Favorite" link
Visiting a home page (or any one of the links on it!)
Visiting a link from a content aggregator / social media
People do actually do these things when browsing online; going to a random page in your index is a very crude approximation of this behavior.
If you're Google or some other entity with lots of surfing/tracking data, you can actually measure the probabilities people "jump into" particular websites to get a better model! The random-jump probabilities don't need to be totally uniform; they just need to be non-zero for every website.
The random-jumps is the simplest way to ensure the matrix/corresponding chain is Ergodic which makes it easier to analyze and guarantees convergence.

Does RinSim natively support invisible, non-colliding agents on a PDP Roadmodel?

We are using the map of Leuven as a graph roadmodel. We aim to develop a system using exploration ants, as well as feasibility ants that roam around the map to propagate knowledge about the local environment. Exploration ants are sent out by vehicles and report back when they have found a favourable path to follow to pick up packages, while feasibility ants would be sent out by parcels and randomly roam around to notify vehicles of their existence.
Ideally, these ants are not visible on the map while roaming around and would not be bound by regular time constraints (faster movement than other vehicles).
Is there some kind of support for a delegate MAS system like this, and if not, what would be the best approach to implement it?

This question has been answered here.
As for your second question: Instead of adding ants in a RoadModel, you could view the ants as messages that contain some extra data (such as the path they they were sent along) and send them throughout the graph using a CommModel. This way, they are invisible and non-colliding.

Is there a way I can force the robots to be already localized?

I want the robots to be already localized when I start the program. Is there any way for this, so that I do not have to move the robot manually and localize it?
I have two robots and in order to explore with the second robot, I have to localize it first which takes a lot of time. By the time I localize it, the other robot explores whole of the map.
Thanks

From the documentation of amcl, the localization node listens for the initial pose on the topic initialpose, with message type geometry_msgs/PoseWithCovarianceStamped. This specifies with which localization estimate amcl initially starts running with. As you see, besides the mean of the pose estimate you can also provide a full covariance matrix telling the uncertainty of the pose estimate.
There is some default value that this initial pose is set to internally, but to resolve your issue, what you want to do is publish a message on the aforementioned topic, telling amcl to start with your specified initial pose.
You might do this via the command line using rostopic pub, or through rviz: see e.g. documentation and this Q&A for more information.
I just remembered that it is even possible to specify the initial pose estimate directly as startup parameters for amcl, see initial_pose_x and other similar parameters. This is appropriate for example if you can fix these parameters in your launch file before starting the node.

Programmable Logic Devices

I have a confusion in understanding the structure of PAL device.
My first question is that if we buy a PAL device , then how can we know that how many min terms are added by each OR gate in the OR array? In other words I am asking, is there any standard by which we can know the number inputs each OR gate has in the OR array?
The next thing is that we have an AND array in the PAL device which is programmable. Now suppose we have 4 inputs , then each AND gate in the AND array must need 8 inputs. It is up to us how many variables we apply on it, but there is a possiblity that we can apply all the variables on the AND gate therefore it should have 8 inputs. Please tell me am i right or not. If not then please explain.

I think there is no universal standard for either of your questions. The data-sheet for each device specifies those parameters. You should look up the data-sheets and decide what suits your needs.
Specifically on your second question, an ideal PAL should be as you say (like this simplified circuit). But usually you don't want to apply all the variables (and their negations) to the AND gates, so each AND gate can have less inputs (of course using the grid you can choose any of the variables to apply, just not all of them together).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Can you limit the number of actions when using q learning? - machine-learning

I am currently implementing q learning to solve a maze which contains fires which initiate randomly. Would it be considered proper for me to code the action to not be an option for the agent if there is a fire in that direction or should my reward be doing this instead? Thanks

Related

Lidars in Drake

What do 'random jumps' in Google's pageRank really mean?

Does RinSim natively support invisible, non-colliding agents on a PDP Roadmodel?

Is there a way I can force the robots to be already localized?

Programmable Logic Devices

Categories

Resources