PyDrake ComputePointPairPenetration() kills kernel - drake

In calling ComputePointPairPenetration() from a QueryObject in Drake in Python in a Jupyter Notebook environment, ComputePointPairPenetration() reliably kills the kernel. I'm not sure what's causing it and couldn't figure out how to get any error message.
In case it's relevant I'm running pydrake locally on a Mac.
Here is relevant code:
builder = DiagramBuilder()
plant, scene_graph = AddMultibodyPlantSceneGraph(builder, time_step=0.00001)
file_name = FindResource("models/robot.urdf")
model = Parser(plant).AddModelFromFile(file_name)
file_name = FindResource("models/object.urdf")
object_model = Parser(plant).AddModelFromFile(file_name)
plant.Finalize()
diagram = builder.Build()
# Run simulation...
# Get geometry info from scene graph
context = scene_graph.AllocateContext()
q_obj = scene_graph.get_query_output_port().Eval(context)
q_obj.ComputePointPairPenetration()
Edit:
#Sherm's comment fixed my problem :) Thank you so much!
For reference:
diagram_context = diagram.CreateDefaultContext()
scene_graph_context = scene_graph.GetMyContextFromRoot(diagram_context)
q_obj = scene_graph.get_query_output_port().Eval(scene_graph_context)
q_obj.ComputePointPairPenetration()

You created a local Context for scene_graph. Instead you want the full diagram context so that the ports are connected up properly (e.g. scene_graph has an input port that receives poses from MultibodyPlant). So the above should work if you ask the Diagram to create a Context, then ask for the SceneGraph subcontext for the calls you have above, rather than creating a standalone SceneGraph context.
This lets you extract the fully-connected subcontext:
scene_graph_context = scene_graph.GetMyContextFromRoot(diagram_context)

FTR Here's a similar formulation in a Drake Python unittest:
TestPlant.test_scene_graph_queries
Note that this takes an alternate route (using diagram.GetMutableSubsystemContext instead of scene_graph.GetMyContextFromroot), namely because it's doing scalar-type conversion as well.
If you're curious about scalar-type conversion (esp. if you're going to be doing optimization, e.g. needing AutoDiffXd), please see:
Drake C++ API: System Scalar Conversion Overview
Drake Python API: pydrake.systems.scalar_conversion
Additionally, here are examples of scalar-converting both MultibodyPlant and SceneGraph for testing InverseKinematics constraint classes:
inverse_kinematics.py: TestConstraints

Related

DiagramBuilder: Cannot operate on ports of System plant until it has been registered using AddSystem

I have an issue working with DiagramBuilder and ManipulationStation classes.
It appears to me, that c++ API and the python bindings work differently in my case.
C++ API behaves as expected, while the python bindings result in the runtime error:
DiagramBuilder: Cannot operate on ports of System plant until it has been registered using AddSystem
How I use C++ API
In one of the ManipulationStation::Setup...() methods I inject a block of code, that adds an extra manipuland
const std::string sdf_path = FindResourceOrThrow("drake/examples/manipulation_station/models/bolt_n_nut.sdf");
RigidTransform<double> X_WC(RotationMatrix<double>::Identity(), Vector3d(0.0, -0.3, 0.1));
bolt_n_nut_ = internal::AddAndWeldModelFrom(sdf_path, "nut_and_bolt", lant_->world_frame(), "bolt", X_WC, plant_);
I inject another block of code into the method ManipulationStation::Finalize:
auto zero_torque = builder.template AddSystem<systems::ConstantVectorSource<double>>(Eigen::VectorXd::Zero(plant_->num_velocities(bolt_n_nut_)));
builder.Connect(zero_torque->get_output_port(), plant_->get_actuation_input_port(bolt_n_nut_));
With these changes, the simulation runs as expected.
How I use python bindings
plant = station.get_multibody_plant()
manipuland_path = get_manipuland_resource_path()
bolt_with_nut = Parser(plant=plant).AddModelFromFile(manipuland_path)
X_WC = RigidTransform(RotationMatrix.Identity(), [0.0, -0.3, 0.1])
plant.WeldFrames(plant.world_frame(), plant.GetFrameByName('bolt', bolt_with_nut), X_WC)
...
station.Finalize()
zero_torque = builder.AddSystem(ConstantValueSource(AbstractValue.Make([0.])))
builder.Connect(zero_torque.get_output_port(), plant.get_actuation_input_port(bolt_with_nut_model))
This triggers a RuntimeError with a message as above; The port, which causes this error is nut_and_bolt_actuation.
My vague understanding of the problem is the (in) visibility of nut_and_bolt System, due to having two distinct DiagramBuilders in a process: 1) a one is inside ManipulationStation 2) another is in the python code, that instantiates this ManipulationStation object.
Using ManipulationStation via python bindings is a preference for me, because that way I would've avoided depending on a custom build of drake library.
Thanks for your insight!
I agree with your assessment: you have two different DiagramBuilder objects here. This does not have anything to due with C++ or Python; the ManipulationStation is itself a Diagram (created using its own DiagramBuilder), and you have a second DiagramBuilder (in either c++ or python) that is connecting the ManipulationStation together with other elements. You are trying to connect a system that is in the external diagram to a port that is in the internal diagram, but is not exposed.
The solution would be to have the ManipulationStation diagram expose the extra nut and bolt actuation port so that you can connect to it from the second builder.
If you prefer Python, I've switched my course to using a completely python version of the manipulation station. I find this version is much easier to adapt to different student projects. (To be clear, the setup is in python, but at simulation time all of the elements are c++ and it doesn't call back to python; so the performance is almost identical.)

In PyTorch, how to convert the cuda() related codes into CPU version?

I have some existing PyTorch codes with cuda() as below, while net is a MainModel.KitModel object:
net = torch.load(model_path)
net.cuda()
and
im = cv2.imread(image_path)
im = Variable(torch.from_numpy(im).unsqueeze(0).float().cuda())
I want to test the code in a machine without any GPU, so I want to convert the cuda-code into CPU version. I tried to look at some relevant posts regarding the CPU/GPU switch of PyTorch, but they are related to the usage of device and thus doesn't apply to my case.
As pointed out by kHarshit in his comment, you can simply replace .cuda() call with .cpu():
net.cpu()
# ...
im = torch.from_numpy(im).unsqueeze(0).float().cpu()
However, this requires changing the code in multiple places every time you want to move from GPU to CPU and vice versa.
To alleviate this difficulty, pytorch has a more "general" method .to().
You may have a device variable defining where you want pytorch to run, this device can also be the CPU (!).
for instance:
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
Once you determined once in your code where you want/can run, simply use .to() to send your model/variables there:
net.to(device)
# ...
im = torch.from_numpy(im).unsqueeze(0).float().to(device)
BTW,
You can use .to() to control the data type (.float()) as well:
im = torch.from_numpy(im).unsqueeze(0).to(device=device, dtype=torch.float)
PS,
Note that the Variable API has been deprecated and is no longer required.
net = torch.load(model_path, map_location=torch.device('cpu'))
Pytorch docs: https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-on-cpu-load-on-gpu

how to properly load pickled machine learning models on server

It is a Python Flask app.
The same code works in my local when I run this app on local. But on a server that I rent from DigitalOcean, it gives me this problem.
I have been trying to load this machine learning model(a classification model I trained using sklearn) at run. But it gives me this error OR sometimes, with some solutions I saw in Stackoverflow added, hangs forever.
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0
I tried every solution in Stackoverflow: like adding encoding="latin-1" or "bytes" on loading the pickle. I tried below and many other combinations with different arguments that people recommended in Stackoverflow.
def load_model(file_path):
script_directory = os.path.split(os.path.abspath(__file__))[0]
abs_filepath = os.path.join(script_directory, file_path)
with open(abs_filepath, 'rb') as f:
classifier = pickle.loads(f.read())
return classifier
def load_model(file_path):
script_directory = os.path.split(os.path.abspath(__file__))[0]
abs_filepath = os.path.join(script_directory, file_path)
with open(abs_filepath, 'r') as f:
# also with 'rb'
classifier = pickle.load(f, encoding="bytes")
# also with "latin-1" "latin1" etc.. and load, loads, f, and f.read()
return classifier
model = load_model("modelname.pickle")
What is wrong with this?
It seems as there is a problem when you pickle the model using python2 and try to load this model using python3.
Did you try to load this model using Python 2?
Do you have the same mistake when you run it with parameter encoding=latin1? If it is a different error maybe you need to run
dill._dill._reverse_typemap["ObjectType"] = object
before loading it.
The problem is described pretty well here:
https://rebeccabilbro.github.io/convert-py2-pickles-to-py3/

nashorn CompiledScript graalvm equivalent

I have a fairly big file that needs to often be evaluated,
with nashorn I used to do something like that :
CompiledScript compiledScript = ((Compilable) engine).compile(text);
and later on, I could call many times the following :
Context context = new SimpleScriptContext();
compiledScript.eval(context);
this was quite fast.
Using the new Polyglot API, I do :
Source source = Source.newBuilder("js", myFile).build();
then :
Context context = Context.newBuilder("js").option("js.nashorn-compat", "true").build();
context.eval(source)
Using jmh, I have a big performance difference between the two
Benchmark Mode Cnt Score Error Units
JmhBenchmark.testEvalGraal avgt 5 42,855 ± 11,118 ms/op
JmhBenchmark.testEvalNashorn avgt 5 2,739 ± 1,101 ms/op
If I do the eval on the same context, it is working properly, but I don't want to have a shared context between two consecutive eval (unless the concept of Context of Graal is not the same as the one from Nashorn).
To reproduce your ScriptEngine setup with GraalVM, you should re-use the same Engine (org.graalvm.polyglot.Engine) and .close() the context after use:
Source source = Source.newBuilder("js", myFile).build();
Engine engine = Engine.create();
and later:
Context context = Context.newBuilder("js")
.engine(engine)
.option("js.nashorn-compat", "true").build();
context.eval(source);
context.close();
Quoting the Context.Builder.engine documentation:
Explicitly sets the underlying engine to use. By default, every context has its own isolated engine. If multiple contexts are created from one engine, then they may share/cache certain system resources like ASTs or optimized code by specifying a single underlying engine.

Image processing in TensorFlow distributed session

I am testing out the Tensorflow Distributed (https://www.tensorflow.org/deploy/distributed) with my local machine (Windows) and Ubuntu VM. Where,
I have followed this link Distributed tensorflow replicated training example: grpc_tensorflow_server - No such file or directory and set up the Tensorflow so called server like as per below.
import tensorflow as tf
parameter_servers = ["10.0.3.15:2222"]
workers = ["10.0.3.15:2222","10.0.3.15:2223"]
cluster = tf.train.ClusterSpec({"local": parameter_servers, "worker": workers})
server = tf.train.Server(cluster, job_name="local", task_index=0)
server.join()
Where “10.0.3.15” – is my Ubuntu local ip address.
In the windows host machine – I am doing some simple image preprocessing using open cv and extending the graph session to the VM. I have used following code for that.
*import tensorflow as tf
from OpenCVTest import *
with tf.Session("grpc:// 10.0.3.15:2222") as sess:
### Open CV calling section ###
img = cv2.imread('Data/ball.jpg')
grey_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
flat_img_array = img.flatten()
x = tf.placeholder(tf.float32, shape=(flat_img_array[0],flat_img_array[1]))
y = tf.multiply(x, x)
sess.run(y)*
I can see that my session is running on my Ubunu machine. Please see below screenshot.
Test_result
[ Note- In the image you would notice, in Windows console I am calling the session and Ubuntu terminal is listening to that same session. ]
But strange thing I have observed is that for the OpenCV preprocessing operation (grey_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)) it’s leveraging local OpenCV package. I was in assumption that when I am running a session on any other server it should do all the operation on that server. In my case as I am running the session on Ubuntu VM, it should run all the operation which has been defined with tf.Session("grpc:// 10.0.3.15:2222") in this should also be running on that ubuntu VM leveraging VM’s local packages, but that’s not happening.
Is my understanding of the sess.run(y) distributed correct ? When we run the session in a distributed manner. Does it only extend the graph computation load to another machine through gRPC ?
I would summarize my ask like this - “I am planning to do large pre-preprocessing before feeding the value to the tensor and I want to do it in a distributed way. What would be the better approach to follow ? My initial understanding was I can with tensorflow distributed but with this test I think I may not able to do it.“
Any thoughts would be of real help.
Thank you.

Resources