Multi-agent reinforcement learning with external simulation platform - multi-agent-reinforcement-learning

I have a multi-agent coperative task to solve, for which I am using an external simulation environment created in technomatix plant simulation.
I can communicate with my simulation through com interface and also get the value of my observations and trigger actions and get value of rewards from the simulation.
How can I model my real time environment as PyGym environment?
So that I can use various base code algorithms rather than building it from scratch.
I currently dont find enough information available in the internet.
Thanking in advance.

Related

Can I use drake to test Visual SLAM algorithms?

I was wondering whether I could leverage the modularity drake gives to test Visual SLAM algorithms on realtime data. I would like to create 3 blocks that output acceleration, angular speed, and RGBD data. The blocks should pull information from a real sensor. Another block would process the data and produce the current transform of the camera and a global map. Effectively, I would like to cast my problem into a "Systems" framework so I can easily add filters where I need them.
My question is: Given other people's experience with this library, is Drake the right tool for the job for this usecase? Specifically, can I use this library to process real time information in a production setting?
Visual SLAM is not a use case I've implemented myself, but I believe the Drake Systems framework should be up to the task, depending on what you mean by "realtime".
We definitely ship RGBD data through the framework often.
We haven't made any attempt to support running Drake in hard realtime, but certainly can run at high rates. If you were to hit a performance bottleneck, we tend to be pretty responsive and would welcome PRs.
As for the "production-level", it is certainly our intention for the code / process to be mature enough for that setting, and numerous teams do already.

How to create a custom environment using OpenAI gym for reinforcement learning

I am a newbie in reinforcement learning working on a college project. The project is related to optimizing x86 hardware power. I am running proprietary software in Linux distribution (16.04). The goal is to use reinforcement learning and optimize the power of the System (keeping the performance degradation of the software as minimum as possible). The proprietary software is a cellular network.
As we already know, the primary functional blocks of Reinforcement learning are Agent and Environment. The basic idea is to use the cellular network running on x86 hardware as the environment for RL. This environment interacts with the agent implementing RL using state, actions, and reward.
From reading different materials, I could understand that I need to make my software as a custom environment from where I can retrieve the state features. The state features are the application layer KPIs like latency, throughput. Action space may include instructions to Linux to change the power (I can use some predefined set of power options). I did not decide about the reward function.
I read this post and decided that I should use OpenAI gym to create my custom environment.
My doubt is that using OpenAI gym for creating custom environments (for these type of setup) is correct. Am I going in the right direction (or) is there any alternative/best tools to create a custom environment. any tutorial or direction to create this custom environment is appreciated.

Machine learning: specific strategy learned because of playing against specific agent?

First of all I found difficulties formulating my question, feedback is welcome.
I have to make a machine learning agent to play dots and boxes.
I'm just in the early stages but came up with the question: if I let my machine learning agent (with a specific implementation) play against a copy of itself to learn and improve it's gameplay, wouldn't it just make a strategy against that specific kind of gameplay?
Would it be more interesting if I let my agent play and learn against different forms of other agents in an arbitrary fashion?
The idea of having an agent learn by playing against a copy of itself is referred to as self-play. Yes, in self-play, you can sometimes see that agents will "overfit" against their "training partner", resulting in an unstable learning process. See this blogpost by OpenAI (in particular, the "Multiplayer" section), where exactly this issue is described.
The easiest way to address this that I've seen appearing in research so far is indeed to generate a more diverse set of training partners. This can, for example, be done by storing checkpoints of multiple past versions of your agent in memory / in files, and randomly picking one of them as training partner at the start of every episode. This is roughly what was done during the self-training process of the original AlphaGo Go program by DeepMind (the 2016 version), and is also described in another blogpost by OpenAI.

How to get a specific machine type for ML Engine online prediction?

Is there an option to request a faster node for online prediction in ML Engine?
For example, when training I can configure any of these machines for my job:
standard,
large_model,
complex_model_s,
complex_model_m,
complex_model_l,
standard_gpu,
complex_model_m_gpu,
complex_model_l_gpu,
standard_p100,
complex_model_m_p100
See description of available clusters and machines for training here and here
I am struggling to find if it is possible to control what kind of machine runs my online prediction.
We are currently adding that capability and will let you know when it's publicly available.
ML Engine offers 4-core instance type in addition to the default serving instance type for online prediction. However the feature is still at alpha stage and it will only be available to a selected list of accounts who opted in as "Trusted Testers". Please contact cloudml-feedback#google.com if you need help to setup prediction service with faster node.

How to work with machine learning algorithms in embedded systems?

I'm doing a project to detect (classify) human activities using a ARM cortex-m0 microcontroller (Freedom - KL25Z) with an accelerometer. I intend to predict the activity of the user using machine learning.
The problem is, the cortex-m0 is not capable of processing training or predicting algorithms, so I would probably have to collect the data, train it in my computer and then embed it somehow, which I don't really know how to do it.
I saw some post in the internet saying that you can generate a matrix of weights and embed it in a microcontroller, so it would be a straightforward function to predict something ,based on the data you providing for this function. Would it be the right way of doing ?
Anyway my question is, how could I embedded a classification algorithm in a microcontroller?
I hope you guys can help me and give some guidance, I'm kind of lost here.
Thank you in advance.
I've been thinking about doing this myself to solve a problem that I've had a hard time developing a heuristic for by hand.
You're going to have to write your own machine-learning methods, because there aren't any machine learning libraries out there suitable for low-end MCUs, as far as I know.
Depending on how hard the problem is, it may still be possible to develop and train a simple machine learning algorithm that performs well on a low-end MCU. After-all, some of the older/simpler machine learning methods were used with satisfactory results on hardware with similar constraints.
Very generally, this is how I'd go about doing this:
Get the (labelled) data to a PC (through UART, SD-card, or whatever means you have available).
Experiment with the data and a machine learning toolkit (scikit-learn, weka, vowpal wabbit, etc). Make sure an off-the-shelf method is able to produce satisfactory results before moving forward.
Experiment with feature engineering and selection. Try to get the smallest feature set possible to save resources.
Write your own machine learning method that will eventually be used on the embedded system. I would probably choose perceptrons or decision trees, because these don't necessarily need a lot of memory. Since you have no FPU, I'd only use integers and fixed-point arithmetic.
Do the normal training procedure. I.e. use cross-validation to find the best tuning parameters, integer bit-widths, radix positions, etc.
Run the final trained predictor on the held-out testing set.
If the performance of your trained predictor was satisfactory on the testing set, move your relevant code (the code that calculates the predictions) and the model you trained (e.g. weights) to the MCU. The model/weights will not change, so they can be stored in flash (e.g. as a const array).
I think you may be limited by your hardware. You may want to get something a little more powerful. For your project you've chosen the M-series processor from ARM. This is the simplest platform that they offer, the architecture doesn't lend itself to the kind of processing you're trying to do. ARM has three basic classifications as follows:
M - microcontroller
R - real-time
A - applications
You want to get something that has strong hardware support for these complex calculations. You're starting point should be an A-series for this. If you need to do floating point arithmetic, you'll definitely need to start with the A-series and probably get one with NEON-FPU.
TI's Discovery series is a nice place to start, or maybe just use the Raspberry Pi (at least for the development part)?
However, if you insist on using the M0 I think you might be able to pull it off using something lightweight like ROS-C. I know there are packages with ROS that can do it, even though its mainly for robotics you may be able to adapt it to what you're doing.
Dependency Free ROS
Neural Networks and Machine Learning with ROS

Resources