RL: Self-Play with On-Policy and Off-Policy

RL: Self-Play with On-Policy and Off-Policy - machine-learning

I try to implement self play with PPO.
Suppose we have a game with 2 agents. We control one player on each side and get information like observation and reward after each step. As far as I know, you can use the information of the right and left player to generate training data and to optimize the model. But that is only possible for off-policy, isn't it?
Because with on-policy e.g. PPO, you expect that the training data to be generated by the current network version and that is usually not the case during self play?
Thanks!

Exactly, this is also the same reason why you can use experience-replay (Replay BUffers) only for off-policy methods like Q-learning. Using sample steps that were not generated by the current policy violates the mathematical assumptions behind the gradients that are being backpropagated.

Related

How to use inverse dynamics controller on under-actuated systems

I'm trying to use Drake's inverse dyanmics controller on an arm with a floating base, and based on this discussion it seems like the most straightforward way to go about this is to use two separate plants since the controller only supports fully actuated systems.
Following Python bindings error when adding two plants to a scene graph in pyDrake, I attempted to create two plants using the following code:
def register_plant_with_scene_graph(scene_graph, plant):
plant.RegsterAsSourceForSceneGraph(scene_graph)
builder.Connect(
plant.get_geometry_poses_output_port(),
scene_graph.get_source_pose_port(plant.get_source_id()),
)
builder.Connect(
scene_graph.get_query_output_port(),
plant.get_geometry_query_input_port(),
)
builder = DiagramBuilder()
scene_graph = builder.AddSystem(SceneGraph())
plant_1 = builder.AddSystem(MultibodyPlant(time_step=0.0))
register_plant_with_scene_graph(scene_graph, plant_1)
plant_2 = builder.AddSystem(MultibodyPlant(time_step=0.0))
register_plant_with_scene_graph(scene_graph, plant_2)
which produced the error
AttributeError: 'MultibodyPlant_[float]' object has no attribute 'RegsterAsSourceForSceneGraph'
Which seems odd because according to the documentation, the function should exist.
Is this function available in the python bindings for drake? Also, more broadly, is this the correct way to approach using the inverse dynamics controller on a free-floating manipulator?

Inverse dynamics takes desired positions, velocities, and accelerations and computes the required torques. If your robot has a floating base, then you cannot accept arbitrary acceleration commands. For instance the total center of mass of your robot will be falling according to gravity; any acceleration that does not satisfy this requirement will not have a feasible solution to the inverse dynamics. I think there must be something more that we need to understand about your problem formulation.
Often when people ask this question, they are thinking of a robot that is relying on contact forces in addition to generalized force/torques in order to achieve the requested accelerations. In that case, the problem needs to include those contact forces as decision variables, too. Since contact forces have unilateral constraints (e.g. feet cannot pull on the ground), and friction cone constraints, this inverse dynamics problem is almost always formulated as a quadratic program. For instance, as in this paper. We don't currently provide that QP formulation in Drake, but it is not hard to write it against the MathematicalProgram interface. And we do have some older code that was removed from Drake (since it wasn't actively developed) that we can point you to if it helps.

How to change clipping and noise parameters during differentially private training with Tensorflow Federated

I'm using Tensorflow Federated (TFF) to train with differential privacy. Currently I am creating a Tensorflow Privacy NormalizedQuery and then passing it into a TFF DifferentiallyPrivateFactory to create an AggregationProcess:
_weights_type = tff.learning.framework.weights_type_from_model(placeholder_model)
query = tensorflow_privacy.GaussianSumQuery(l2_norm_clip=10.0, stddev=0.1)
query = tensorflow_privacy.NormalizedQuery(query, 20)
agg_proc = tff.aggregators.DifferentiallyPrivateFactory(query)
agg_proc = agg_proc.create(_weights_type.trainable)
After broadcasting the server state to clients I run a client update function and then use the AggregationProcess like this:
agg_output = agg_proc.next(
server_state.delta_aggregate_state,
client_outputs.weights_delta)
This works great, however I want to experiment with changing the l2_norm_clip and stddev several times during training (making clipping bigger and smaller at various training rounds) but it seems I can only set these parameters when I create the AggregationProcess.
Is is possible to change these parameters during training somehow?

I can think of two ways to do what you want: the easy way and the right way.
The right way is to make a new type of DPQuery that keeps track of the training round in its global state and adjusts the clip and stddev the way you want in its get_noised_result function. Then you can pass this new DPQuery to tff.aggregators.DifferentiallyPrivateFactory and use it like normal.
The easy way is to directly hack into the server_state.delta_aggregate_state. Somewhere in there you should find the global state of the DPQuery which should contain the l2_norm_clip and stddev which you can manipulate directly between rounds. This approach may be brittle because the aggregator state and the DPQuery state representations may be subject to change.

Computing the generalized forces arising due to ExternallyAppliedSpatialForces in Drake, e.g. due to Propeller

I am working with a diagram which includes a MultiBodyPlant with a Propeller connected to it. The Propeller actually realizes numerous physical propellers which are distributed among the bodies of the MultiBodyPlant.
I am able to simulate the dynamics of the combined system by setting the prop forces with FixValue, so I'm on the right track.
What I'd like to be able to do is, given a configuration for the system (i.e. the MultiBodyPlant context) and a chosen propeller command, compute the generalized forces acting on the system. My sense is that this is not immediately available since the simulation is actually using the RNEA, and so does not aggregate the forces all together in that way. For what I'm doing (and even just as a sanity check), I would like to compute the forces directly, not just their effect on the evolution of the state.
Is there an existing method to compute this built into Drake, or should I compute it manually using the spatial jacobian of each propeller frame and the applied SpatialForce of the corresponding propeller? (Something along the lines of this question: How to get the matrix that maps external forces to generalized forces?)
Many thanks for your help.

I understand better now. It's a very reasonable request! You have two systems at play: the Propeller and MultibodyPlant. Unfortunately, the quantity you want is all of Propeller and just a piece of MultibodyPlant. We don't offer direct access to the B(q) matrix in that case.
What you can do is call with either AutoDiffXd or symbolic::Expression, call CalcImplicitTimeDerivativesResidual on the Diagram to get the entire dynamics in implicit form (to avoid taking M(q) inverse). You could call it twice -- once with the Propeller inputs set up via FixValue as AutoDiffXd and/or symbolic::Variable and again with them as zero, then subtract the difference.
Note: CalcImplicitTimeDerivativesResidual is relatively new; I haven't pushed the python bindings for it yet (but it's been on my list). Do you need it from python?

I think that perhaps you are looking for the MultibodyPlant reaction_forces output port?

Direct Collocation/Direct Transcription with a Multibody Plant

I'm trying to do trajectory optimization for a custom robot I've specified with an sdf file.
I'd like to use direct collocation, but when I try to create the MultibodyPlant with time_step=0.0 I fail with a segfault. It works just fine when I use discrete time (e.g. Multibodyplant(time_step=.005).
However, if I use discrete time, the state is no longer continuous so I can't use direct collocation. So I tried to use direct transcription and I get the error
SystemExit: Failure at bazel-out/k8-opt/bin/systems/framework/_virtual_includes/context/drake/systems/framework/context.h:111 in num_total_states(): condition 'num_abstract_states() == 0' failed.
I think the reason is that DirectTranscription does not have a assume_non_continuous_states_are_fixed, the same issue as in this question: direct transcription for compass gait. So maybe the easiest solution to my problem is to request this feature..

I recommended above that we do add that assume_non_continuous_states_are_fixed to DirectTranscription. But the reason that this option was not implemented already is a little subtle, so I’ll add it here.
It’s not actually MultibodyPlant that is adding the abstract state, but SceneGraph. For dynamics/planning, you only need SceneGraph if you are relying on contact forces in your dynamics. For acrobots / cart-poles, etc, you can already use a MultibodyPlant with DirectTranscription by passing the MBP only (no SceneGraph) to the optimization. And for systems that do make and break contact, I would have said that DirectTranscription might not be the algorithm you want; although there are no hard and fast rules saying it won’t work. It’s just that you’ll end up with stiff differential equations which are hard to transcribe in a reasonable trajectory optimization that doesn’t reason explicitly about the contact.
I think I know your application, which involves wheels that are in contact and stay in contact. That means that you do need SceneGraph. That might be a case were this currently missing combination makes perfect sense, and we should add it.

Change in two 3D models

I'm trying to think of the best way to conduct some sort of analysis between two 3D models of the same object.
The first scan is of the original item and the second scan is after it has been put under some load x.
An example would be trying to find the difference between two types of metal.
I would like to be able to scan the initial metal cylinder, apply a measured load, scan it again, and then finally apply some sort of algorithm to compare the difference.
Is it possible to do this efficiently (maybe using Mablab) over say 50 - 100 items for an object around 5inch^3?
I am assuming I will need to work out some sort of utility function as the total mass should be the same?
Would machine learning be beneficial in this case?
Any suggestions or direction would be amazing.
Thank you :)
EDIT: The scan files are coming through as '.stl'

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

RL: Self-Play with On-Policy and Off-Policy - machine-learning

Exactly, this is also the same reason why you can use experience-replay (Replay BUffers) only for off-policy methods like Q-learning. Using sample steps that were not generated by the current policy violates the mathematical assumptions behind the gradients that are being backpropagated.

Related

How to use inverse dynamics controller on under-actuated systems

How to change clipping and noise parameters during differentially private training with Tensorflow Federated

Computing the generalized forces arising due to ExternallyAppliedSpatialForces in Drake, e.g. due to Propeller

Direct Collocation/Direct Transcription with a Multibody Plant

Change in two 3D models

Categories

Resources