nashorn CompiledScript graalvm equivalent - graalvm

I have a fairly big file that needs to often be evaluated,
with nashorn I used to do something like that :
CompiledScript compiledScript = ((Compilable) engine).compile(text);
and later on, I could call many times the following :
Context context = new SimpleScriptContext();
compiledScript.eval(context);
this was quite fast.
Using the new Polyglot API, I do :
Source source = Source.newBuilder("js", myFile).build();
then :
Context context = Context.newBuilder("js").option("js.nashorn-compat", "true").build();
context.eval(source)
Using jmh, I have a big performance difference between the two
Benchmark Mode Cnt Score Error Units
JmhBenchmark.testEvalGraal avgt 5 42,855 ± 11,118 ms/op
JmhBenchmark.testEvalNashorn avgt 5 2,739 ± 1,101 ms/op
If I do the eval on the same context, it is working properly, but I don't want to have a shared context between two consecutive eval (unless the concept of Context of Graal is not the same as the one from Nashorn).

To reproduce your ScriptEngine setup with GraalVM, you should re-use the same Engine (org.graalvm.polyglot.Engine) and .close() the context after use:
Source source = Source.newBuilder("js", myFile).build();
Engine engine = Engine.create();
and later:
Context context = Context.newBuilder("js")
.engine(engine)
.option("js.nashorn-compat", "true").build();
context.eval(source);
context.close();
Quoting the Context.Builder.engine documentation:
Explicitly sets the underlying engine to use. By default, every context has its own isolated engine. If multiple contexts are created from one engine, then they may share/cache certain system resources like ASTs or optimized code by specifying a single underlying engine.

Related

DiagramBuilder: Cannot operate on ports of System plant until it has been registered using AddSystem

I have an issue working with DiagramBuilder and ManipulationStation classes.
It appears to me, that c++ API and the python bindings work differently in my case.
C++ API behaves as expected, while the python bindings result in the runtime error:
DiagramBuilder: Cannot operate on ports of System plant until it has been registered using AddSystem
How I use C++ API
In one of the ManipulationStation::Setup...() methods I inject a block of code, that adds an extra manipuland
const std::string sdf_path = FindResourceOrThrow("drake/examples/manipulation_station/models/bolt_n_nut.sdf");
RigidTransform<double> X_WC(RotationMatrix<double>::Identity(), Vector3d(0.0, -0.3, 0.1));
bolt_n_nut_ = internal::AddAndWeldModelFrom(sdf_path, "nut_and_bolt", lant_->world_frame(), "bolt", X_WC, plant_);
I inject another block of code into the method ManipulationStation::Finalize:
auto zero_torque = builder.template AddSystem<systems::ConstantVectorSource<double>>(Eigen::VectorXd::Zero(plant_->num_velocities(bolt_n_nut_)));
builder.Connect(zero_torque->get_output_port(), plant_->get_actuation_input_port(bolt_n_nut_));
With these changes, the simulation runs as expected.
How I use python bindings
plant = station.get_multibody_plant()
manipuland_path = get_manipuland_resource_path()
bolt_with_nut = Parser(plant=plant).AddModelFromFile(manipuland_path)
X_WC = RigidTransform(RotationMatrix.Identity(), [0.0, -0.3, 0.1])
plant.WeldFrames(plant.world_frame(), plant.GetFrameByName('bolt', bolt_with_nut), X_WC)
...
station.Finalize()
zero_torque = builder.AddSystem(ConstantValueSource(AbstractValue.Make([0.])))
builder.Connect(zero_torque.get_output_port(), plant.get_actuation_input_port(bolt_with_nut_model))
This triggers a RuntimeError with a message as above; The port, which causes this error is nut_and_bolt_actuation.
My vague understanding of the problem is the (in) visibility of nut_and_bolt System, due to having two distinct DiagramBuilders in a process: 1) a one is inside ManipulationStation 2) another is in the python code, that instantiates this ManipulationStation object.
Using ManipulationStation via python bindings is a preference for me, because that way I would've avoided depending on a custom build of drake library.
Thanks for your insight!
I agree with your assessment: you have two different DiagramBuilder objects here. This does not have anything to due with C++ or Python; the ManipulationStation is itself a Diagram (created using its own DiagramBuilder), and you have a second DiagramBuilder (in either c++ or python) that is connecting the ManipulationStation together with other elements. You are trying to connect a system that is in the external diagram to a port that is in the internal diagram, but is not exposed.
The solution would be to have the ManipulationStation diagram expose the extra nut and bolt actuation port so that you can connect to it from the second builder.
If you prefer Python, I've switched my course to using a completely python version of the manipulation station. I find this version is much easier to adapt to different student projects. (To be clear, the setup is in python, but at simulation time all of the elements are c++ and it doesn't call back to python; so the performance is almost identical.)

In PyTorch, how to convert the cuda() related codes into CPU version?

I have some existing PyTorch codes with cuda() as below, while net is a MainModel.KitModel object:
net = torch.load(model_path)
net.cuda()
and
im = cv2.imread(image_path)
im = Variable(torch.from_numpy(im).unsqueeze(0).float().cuda())
I want to test the code in a machine without any GPU, so I want to convert the cuda-code into CPU version. I tried to look at some relevant posts regarding the CPU/GPU switch of PyTorch, but they are related to the usage of device and thus doesn't apply to my case.
As pointed out by kHarshit in his comment, you can simply replace .cuda() call with .cpu():
net.cpu()
# ...
im = torch.from_numpy(im).unsqueeze(0).float().cpu()
However, this requires changing the code in multiple places every time you want to move from GPU to CPU and vice versa.
To alleviate this difficulty, pytorch has a more "general" method .to().
You may have a device variable defining where you want pytorch to run, this device can also be the CPU (!).
for instance:
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
Once you determined once in your code where you want/can run, simply use .to() to send your model/variables there:
net.to(device)
# ...
im = torch.from_numpy(im).unsqueeze(0).float().to(device)
BTW,
You can use .to() to control the data type (.float()) as well:
im = torch.from_numpy(im).unsqueeze(0).to(device=device, dtype=torch.float)
PS,
Note that the Variable API has been deprecated and is no longer required.
net = torch.load(model_path, map_location=torch.device('cpu'))
Pytorch docs: https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-on-cpu-load-on-gpu

PyDrake ComputePointPairPenetration() kills kernel

In calling ComputePointPairPenetration() from a QueryObject in Drake in Python in a Jupyter Notebook environment, ComputePointPairPenetration() reliably kills the kernel. I'm not sure what's causing it and couldn't figure out how to get any error message.
In case it's relevant I'm running pydrake locally on a Mac.
Here is relevant code:
builder = DiagramBuilder()
plant, scene_graph = AddMultibodyPlantSceneGraph(builder, time_step=0.00001)
file_name = FindResource("models/robot.urdf")
model = Parser(plant).AddModelFromFile(file_name)
file_name = FindResource("models/object.urdf")
object_model = Parser(plant).AddModelFromFile(file_name)
plant.Finalize()
diagram = builder.Build()
# Run simulation...
# Get geometry info from scene graph
context = scene_graph.AllocateContext()
q_obj = scene_graph.get_query_output_port().Eval(context)
q_obj.ComputePointPairPenetration()
Edit:
#Sherm's comment fixed my problem :) Thank you so much!
For reference:
diagram_context = diagram.CreateDefaultContext()
scene_graph_context = scene_graph.GetMyContextFromRoot(diagram_context)
q_obj = scene_graph.get_query_output_port().Eval(scene_graph_context)
q_obj.ComputePointPairPenetration()
You created a local Context for scene_graph. Instead you want the full diagram context so that the ports are connected up properly (e.g. scene_graph has an input port that receives poses from MultibodyPlant). So the above should work if you ask the Diagram to create a Context, then ask for the SceneGraph subcontext for the calls you have above, rather than creating a standalone SceneGraph context.
This lets you extract the fully-connected subcontext:
scene_graph_context = scene_graph.GetMyContextFromRoot(diagram_context)
FTR Here's a similar formulation in a Drake Python unittest:
TestPlant.test_scene_graph_queries
Note that this takes an alternate route (using diagram.GetMutableSubsystemContext instead of scene_graph.GetMyContextFromroot), namely because it's doing scalar-type conversion as well.
If you're curious about scalar-type conversion (esp. if you're going to be doing optimization, e.g. needing AutoDiffXd), please see:
Drake C++ API: System Scalar Conversion Overview
Drake Python API: pydrake.systems.scalar_conversion
Additionally, here are examples of scalar-converting both MultibodyPlant and SceneGraph for testing InverseKinematics constraint classes:
inverse_kinematics.py: TestConstraints

Override dask scheduler to concurrently load data on multiple workers

I want to run graphs/futures on my distributed cluster which all have a 'load data' root task and then a bunch of training tasks that run on that data. A simplified version would look like this:
from dask.distributed import Client
client = Client(scheduler_ip)
load_data_future = client.submit(load_data_func, 'path/to/data/')
train_task_futures = [client.submit(train_func, load_data_future, params)
for params in train_param_set]
Running this as above the scheduler gets one worker to read the file, then it spills that data to disk to share it with the other workers. However, loading the data is usually reading from a large HDF5 file, which can be done concurrently, so I was wondering if there was a way to force all workers to read this file concurrently (they all compute the root task) instead of having them wait for one worker to finish then slowly transferring the data from that worker.
I know there is the client.run() method which I can use to get all workers to read the file concurrently, but how would you then get the data you've read to feed into the downstream tasks?
I cannot use the dask data primitives to concurrently read HDF5 files because I need things like multi-indexes and grouping on multiple columns.
Revisited this question and found a relatively simple solution, though it uses internal API methods and involves a blocking call to client.run(). Using the same variables as in the question:
from distributed import get_worker
client_id = client.id
def load_dataset():
worker = get_worker()
data = {'load_dataset-0': load_data_func('path/to/data')}
info = worker.update_data(data=data, report=False)
worker.scheduler.update_data(who_has={key: [worker.address] for key in data},
nbytes=info['nbytes'], client=client_id)
client.run(load_dataset)
Now if you run client.has_what() you should see that each worker holds the key load_dataset-0. To use this in downstream computations you can simply create a future for the key:
from distributed import Future
load_data_future = Future('load_dataset-0', client=client)
and this can be used with client.compute() or dask.delayed as usual. Indeed the final line from the example in the question would work fine:
train_task_futures = [client.submit(train_func, load_data_future, params)
for params in train_param_set]
Bear in mind that it uses internal API methods Worker.update_data and Scheduler.update_data and works fine as of distributed.__version__ == 1.21.6 but could be subject to change in future releases.
As of today (distributed.__version__ == 1.20.2) what you ask for is not possible. The closest thing would be to compute once and then replicate the data explicitly
future = client.submit(load, path)
wait(future)
client.replicate(future)
You may want to raise this as a feature request at https://github.com/dask/distributed/issues/new

Grails Xslt Transformation, OutOfMemoryError: Java heap space

I am getting this exception when transforming a xml with xslt:
Caused by: java.lang.OutOfMemoryError: Java heap space
at net.sf.saxon.tree.tiny.TinyTree.condense(TinyTree.java:430)
at net.sf.saxon.tree.tiny.TinyBuilder.close(TinyBuilder.java:206)
at net.sf.saxon.event.ReceivingContentHandler.endDocument(ReceivingContentHandler.java:244)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:449)
at net.sf.saxon.event.Sender.send(Sender.java:177)
at net.sf.saxon.Controller.makeSourceTree(Controller.java:1910)
at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:573)
at net.sf.saxon.jaxp.TransformerImpl.transform(TransformerImpl.java:185)
at com.lomnido.service.XsltTransformService.$tt__transform(XsltTransformService.groovy:27)
I am using Saxon-HE, version 9.7.0-5
My code:
TransformerFactory factory = TransformerFactory.newInstance();
StreamSource xsltStream = new StreamSource(xslt)
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
Transformer transformer = factory.newTransformer(xsltStream);
StreamSource ins = new StreamSource(input);
File tmp = File.createTempFile("test", "xslttransform")
StreamResult out = new StreamResult(tmp);
transformer.transform(ins, out);
The size of the xml file is about 100MB. Is there any way how I could avoid this problem? Is there something like streaming the input file? Is there an alternative to saxon? I need xslt 2.0 for my transformations.
Best regards,
Peter
Processing a 100Mb source document should be perfectly feasible without resorting to XSLT 3.0 streaming. Just make sure you have allocated enough memory to the Java VM. The source document generally takes about 5 times the raw XML size, but of course it depends on the detail. But if you run with -Xmx2g, I certainly wouldn't expect this to fail unless something unusual is going on.
Once the size reaches 500Mb you probably do want to start to think about using XSLT 3.0 streaming. But you haven't said anything about what the transformation is doing, so it could be very easy, it could be fairly difficult, or it could be impossible, depending on the actual transformation to be performed.

Resources