How to create a new gym environment in OpenAI?

How to create a new gym environment in OpenAI? - machine-learning

I have an assignment to make an AI Agent that will learn to play a video game using ML. I want to create a new environment using OpenAI Gym because I don't want to use an existing environment. How can I create a new, custom Environment?
Also, is there any other way I can start to develop making AI Agent to play a specific video game without the help of OpenAI Gym?

See my banana-gym for an extremely small environment.
Create new environments
See the main page of the repository:
https://github.com/openai/gym/blob/master/docs/creating_environments.md
The steps are:
Create a new repository with a PIP-package structure
It should look like this
gym-foo/
README.md
setup.py
gym_foo/
__init__.py
envs/
__init__.py
foo_env.py
foo_extrahard_env.py
For the contents of it, follow the link above. Details which are not mentioned there are especially how some functions in foo_env.py should look like. Looking at examples and at gym.openai.com/docs/ helps. Here is an example:
class FooEnv(gym.Env):
metadata = {'render.modes': ['human']}
def __init__(self):
pass
def _step(self, action):
"""
Parameters
----------
action :
Returns
-------
ob, reward, episode_over, info : tuple
ob (object) :
an environment-specific object representing your observation of
the environment.
reward (float) :
amount of reward achieved by the previous action. The scale
varies between environments, but the goal is always to increase
your total reward.
episode_over (bool) :
whether it's time to reset the environment again. Most (but not
all) tasks are divided up into well-defined episodes, and done
being True indicates the episode has terminated. (For example,
perhaps the pole tipped too far, or you lost your last life.)
info (dict) :
diagnostic information useful for debugging. It can sometimes
be useful for learning (for example, it might contain the raw
probabilities behind the environment's last state change).
However, official evaluations of your agent are not allowed to
use this for learning.
"""
self._take_action(action)
self.status = self.env.step()
reward = self._get_reward()
ob = self.env.getState()
episode_over = self.status != hfo_py.IN_GAME
return ob, reward, episode_over, {}
def _reset(self):
pass
def _render(self, mode='human', close=False):
pass
def _take_action(self, action):
pass
def _get_reward(self):
""" Reward is given for XY. """
if self.status == FOOBAR:
return 1
elif self.status == ABC:
return self.somestate ** 2
else:
return 0
Use your environment
import gym
import gym_foo
env = gym.make('MyEnv-v0')
Examples
https://github.com/openai/gym-soccer
https://github.com/openai/gym-wikinav
https://github.com/alibaba/gym-starcraft
https://github.com/endgameinc/gym-malware
https://github.com/hackthemarket/gym-trading
https://github.com/tambetm/gym-minecraft
https://github.com/ppaquette/gym-doom
https://github.com/ppaquette/gym-super-mario
https://github.com/tuzzer/gym-maze

Its definitely possible. They say so in the Documentation page, close to the end.
https://gym.openai.com/docs
As to how to do it, you should look at the source code of the existing environments for inspiration. Its available in github:
https://github.com/openai/gym#installation
Most of their environments they did not implement from scratch, but rather created a wrapper around existing environments and gave it all an interface that is convenient for reinforcement learning.
If you want to make your own, you should probably go in this direction and try to adapt something that already exists to the gym interface. Although there is a good chance that this is very time consuming.
There is another option that may be interesting for your purpose. It's OpenAI's Universe
https://universe.openai.com/
It can integrate with websites so that you train your models on kongregate games, for example. But Universe is not as easy to use as Gym.
If you are a beginner, my recommendation is that you start with a vanilla implementation on a standard environment. After you get passed the problems with the basics, go on to increment...

Related

How to do BERTopic inference endpoint correctly? (hugging face)

I'm trying to create an endpoint using a custom endpoint handler by loading a trained model. I understand that my code is not very correct, but at least it works locally. Can you tell me how to do it correctly? I spent the whole day, but I couldn't find a single clue. AWS crashes with an error at the launch stage of the inference.
Attempts to load using AutoModel Class ended up with me not knowing where to get config.json.
There is no BERTopic tag and I don't have enough reputation to create it. Therefore, I will use BERT, but it's not the same thing.
from typing import Dict, List, Any
from bertopic import BERTopic
from transformers import pipeline, AutoModel, AutoConfig
from sentence_transformers import SentenceTransformer
class EndpointHandler():
def __init__(self, path=""):
self.model = BERTopic.load(/repository/model)
self.model.calculate_probabilities = False
def __call__(self, data: Dict[str, Any]) -> List[Dict[str, Any]]:
"""
data args:
text (:obj: `str`)
Return:
A :obj:`list` | `dict`: will be serialized and returned
"""
# get inputs
text = data.pop("text")
# run normal prediction
prediction = self.model.transform([text])
return prediction
I used this instruction to create my handler. Authors load models through their module classes, but its need pytorch and tensorflow models. I ran this code on my computer via test.py , it works and returns a response. This class is needed in order to run the model from the repository in automatic mode (in a couple of clicks) and change the business logic, for example, not to return the topic and keywords, but to generate images for each word using a different model and return an array of images.
Repository HuggingFace
AWS log pastebin

Creating different types of workers that are accessed using a single client

EDIT:
My question was horrifically put so I delete it and rephrase entirely here.
I'll give a tl;dr:
I'm trying to assign each computation to a designated worker that fits the computation type.
In long:
I'm trying to run a simulation, so I represent it using a class of the form:
Class Simulation:
def __init__(first_Client: Client, second_Client: Client)
self.first_client = first_client
self.second_client = second_client
def first_calculation(input):
with first_client.as_current():
return output
def second_calculation(input):
with second_client.as_current():
return output
def run(input):
return second_calculation(first_calculation(input))
This format has downsides like the fact that this simulation object is not pickleable.
I could edit the Simulation object to contain only addresses and not clients for example, but I feel as if there must be a better solution. For instance, I would like the simulation object to work the following way:
Class Simulation:
def first_calculation(input):
client = dask.distributed.get_client()
with client.as_current():
return output
...
Thing is, the dask workers best fit for the first calculation, are different than the dask workers best fit for the second calculation, which is the reason my Simulation object has two clients that connect to tow different schedulers to begin with. Is there any way to make it so there is only one client but two types of schedulers and to make it so the client knows to run the first_calculation to the first scheduler and the second_calculation to the second one?

Dask will chop up large computations in smaller tasks that can run in paralell. Those tasks will then be submitted by the client to the scheduler which in turn wil schedule those tasks on the available workers.
Sending the client object to a Dask scheduler will likely not work due to the serialization issue you mention.
You could try one of two approaches:
Depending on how you actually run those worker machines, you could specify different types of workers for different tasks. If you run on kubernetes for example you could try to leverage the node pool functionality to make different worker types available.
An easier approach using your existing infrastructure would be to return the results of your first computation back to the machine from which you are using the client using something like .compute(). And then use that data as input for the second computation. So in this case you're sending the actual data over the network instead of the client. If the size of that data becomes an issue you can always write the intermediary results to something like S3.

Dask does support giving specific tasks to specific workers with annotate. Here's an example snippet, where a delayed_sum task was passed to one worker and the doubled task was sent to the other worker. The assert statements check that those workers really were restricted to only those tasks. With annotate you shouldn't need separate clusters. You'll also need the most recent versions of Dask and Distributed for this to work because of a recent bug fix.
import distributed
import dask
from dask import delayed
local_cluster = distributed.LocalCluster(n_workers=2)
client = distributed.Client(local_cluster)
workers = list(client.scheduler_info()['workers'].keys())
with dask.annotate(workers=workers[0]):
delayed_sum = delayed(sum)([1, 2])
with dask.annotate(workers=workers[1]):
doubled = delayed_sum * 2
# use persist so scheduler doesn't clean up
# wrap in a distributed.wait to make sure they're there when we check the scheduler
distributed.wait([doubled.persist(), delayed_sum.persist()])
worker_restrictions = local_cluster.scheduler.worker_restrictions
assert worker_restrictions[delayed_sum.key] == {workers[0]}
assert worker_restrictions[doubled.key] == {workers[1]}

Executing same function every monday at midnight

In Rails 5, I have this method in my "Aulas" ("Classes" in Portuguese) Controller:
def set_week_classes
classes = Aula.all.to_a
#this_week_classes = classes.shift(2)
end
Considering that "classes" is an array, I would like to have "#this_week_classes = classes.shift(2)" being executed every Monday, at midnight (Brazil's time), getting the next two items of the classes array to be shown on the view. And also, I would like that when it reached the end of the array, it simply started all over, making "#this_week_classes" become again the first two items of the classes array. How could I make this happen? Thank you!

You can use sidekiq with some kind of a scheduling gem (like sidekiq-scheduler or sidekiq-cron). Depending on your installation you could also copy use a rake task and run it periodically using cron. If you use cloud then your provider definitely have some kind of scheduler available.
BTW all of your source code should probably be in English. Mixing some Portuguese class names doesn't look great and can be confusing for other contributors.
But if your only goal is to show some what classes are listed this week it's probably better to do sth like this:
classes = Aula.all.to_a # not the best for the memory
shift = DateTime.current.weeks_since(CONSTANT_TIME) % classes.size
#this_week_classes = ([classes]+[classes])[shift..(shift+2)] # [classes]+[classes] make sure that we won't get too little classes if we reach the and of the `classes` array

It appears you ask two questions at the same time:
Run a job every week at a certain time.
Rails offers different ways to do this. To me the best fit seems to work with an active job library.
Possibilities are:
https://github.com/javan/whenever cron runner. Let's you set up jobs for every week to run.
Another open library for this type of task is the gem delayed_job. It's not very performant but easy to include into small projects.
Cycle through an array of items.
Here a possibility is instead of actually shifting the items out of your array that you store the job run in your database. Keep in mind there are other possibilities that do not need you to change your database. Following code is not tested but should be seen as pseudocode.
def run_job
last_aula_job = AulaJob.all.order(:created_at).last
classes = Aula.all.to_a
total = classes.count
p = last_aula_job.last_pointer % count
#this_week_classes = classes[p..p+1]
# do something with #this_week_classes
AulaJob.create(last_pointer: p + 2)
end

ActiveJob: how to do simple operations without a full blown job class?

With delayed_job, I was able to do simple operations like this:
#foo.delay.increment!(:myfield)
Is it possible to do the same with Rails' new ActiveJob? (without creating a whole bunch of job classes that do these small operations)

ActiveJob is merely an abstraction on top of various background job processors, so many capabilities depend on which provider you're actually using. But I'll try to not depend on any backend.
Typically, a job provider consists of persistence mechanism and runners. When offloading a job, you write it into persistence mechanism in some way, then later one of the runners retrieves it and runs it. So the question is: can you express your job data in a format, compatible with any action you need?
That will be tricky.
Let's define what is a job definition then. For instance, it could be a single method call. Assuming this syntax:
Model.find(42).delay.foo(1, 2)
We can use the following format:
{
class: 'Model',
id: '42', # whatever
method: 'foo',
args: [
1, 2
]
}
Now how do we build such a hash from a given call and enqueue it to a job queue?
First of all, as it appears, we'll need to define a class that has a method_missing to catch the called method name:
class JobMacro
attr_accessor :data
def initialize(record = nil)
self.data = {}
if record.present?
self.data[:class] = record.class.to_s
self.data[:id] = record.id
end
end
def method_missing(action, *args)
self.data[:method] = action.to_s
self.data[:args] = args
GenericJob.perform_later(data)
end
end
The job itself will have to reconstruct that expression like so:
data[:class].constantize.find(data[:id]).public_send(data[:method], *data[:args])
Of course, you'll have to define the delay macro on your model. It may be best to factor it out into a module, since the definition is quite generic:
def delay
JobMacro.new(self)
end
It does have some limitations:
Only supports running jobs on persisted ActiveRecord models. A job needs a way to reconstruct the callee to call the method, I've picked the most probable one. You can also use marshalling, if you want, but I consider that unreliable: the unmarshalled object may be invalid by the time the job gets to execute. Same about "GlobalID".
It uses Ruby's reflection. It's a tempting solution to many problems, but it isn't fast and is a bit risky in terms of security. So use this approach cautiously.
Only one method call. No procs (you could probably do that with ruby2ruby gem). Relies on job provider to serialize arguments properly, if it fails to, help it with your own code. For instance, que uses JSON internally, so whatever works in JSON, works in que. Symbols don't, for instance.
Things will break in spectacular ways at first.
So make sure to set up your debugging tools before starting off.
An example of this is Sidekiq's backward (Delayed::Job) compatibility extension for ActiveRecord.

As far as I know, this is currently not supported. You can easily simulate this feature using a custom-defined proxy-job that accepts a model or instance, a method to be performed and a list of arguments.
However, for the sake of code testing and maintainability, this shortcut is not a good approach. It's more effective (even if you need to write a little bit more of code) to have a specific job for everything you want to enqueue. It forces you to think more about the design of your app.

I wrote a gem that can help you with that https://github.com/cristianbica/activejob-perform_later. But be aware that I believe that having methods all around your code that might be executed in workers is the perfect recipe for disaster is not handled carefully :)

Rails Limit Model To 1 Record

I am trying to create a section in my app where a user can update certain site wide attributes. An example is a sales tax percent. Even though this amount is relatively constant, it does change every few years.
Currently I have created a Globals model with attributes I want to keep track of. For example, to access these attributes where needed, I could simply do something like the following snippet.
(1+ Globals.first.sales_tax) * #item.total
What is the best way to handle variables that do not change often, and are applied site wide? If I use this method is there a way to limit the model to one record? A final but more sobering question.......Am I even on the right track?

Ok, so I've dealt with this before, as a design pattern, it is not the ideal way to do things IMO, but it can sometimes be the only way, especially if you don't have direct disk write access, as you would if deployed on Heroku. Here is the solution.
class Global < ActiveRecord::Base
validate :only_one
private
def only_one
if Global.count >= 1
errors.add :base, 'There can only be one global setting/your message here'
end
end
end
If you DO have direct disk access, you can create a YAML config file that you can read/write/dump to when a user edits a config variable.
For example, you could have a yaml file in config/locales/globals.yml
When you wanted to edit it, you could write
filepath = "#{Rails.root}/config/locales/globals.yml"
globals = YAML.load(File.read("#{Rails.root}/config/locales/globals.yml"))
globals.merge!({ sales_tax: 0.07 })
File.write(filepath) do |f|
f.write YAML.dump(globals)
end
More on the ruby yaml documentation
You could also use JSON, XML, or whatever markup language you want

It seems to me like you are pretty close, but depending on the data structure you end up with, I would change it to
(1+ Globals.last.sales_tax) * #item.total
and then build some type of interface that either:
Allows a user to create a new Globals object (perhaps duplicating the existing one) - the use case here being that there is some archive of when these things changed, although you could argue that this should really be a warehousing function (I'm not sure of the scope of your project).
Allows a user to update the existing Globals object using something like paper_trail to track the changes (in which case you might want validations like those presented by #Brian Wheeler).
Alternatively, you could pivot the Global object and instead use something like a kind or type column to delineate different values so that you would have:
(1+ Globals.where(kind: 'Colorado Sales Tax').last) * #item.total
and still build interfaces similar to the ones described above.

You can create a create a class and dump all your constants in it.
For instance:
class Global
#sales_tax = 0.9
def sales_tax
#sales_tax
end
end
and access it like:
Global.sales_tax
Or, you can define global variables something on the lines of this post

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to create a new gym environment in OpenAI? - machine-learning

Related

How to do BERTopic inference endpoint correctly? (hugging face)

Creating different types of workers that are accessed using a single client

Executing same function every monday at midnight

ActiveJob: how to do simple operations without a full blown job class?

Rails Limit Model To 1 Record

Categories

Resources