Syntax/Functions used in the OpenCL-Implemantation of OpenCV - opencv

I try to understand the use of OpenCL within OpenCV but I don´t get it:
This is an example Codepart from orb.cpp where a Kernel with the name ORB_HarrisResponses located in orb.cl is created (propably):
ocl::Kernel hr_ker("ORB_HarrisResponses", ocl::features2d::orb_oclsrc,
format("-D ORB_RESPONSES -D blockSize=%d -D scale_sq_sq=%.12ef -
D HARRIS_K=%.12ff", blockSize, scale_sq_sq, harris_k));
return hr_ker.args(ocl::KernelArg::ReadOnlyNoSize(imgbuf),
ocl::KernelArg::PtrReadOnly(layerinfo),
ocl::KernelArg::PtrReadOnly(keypoints),
ocl::KernelArg::PtrWriteOnly(responses),
nkeypoints).run(1, globalSize, 0, true);
But this isn't the regular OpenCL-Syntax (functions like clCreateKernel ...). Does someone know where I can get a basic understanding of the OpenCV`s OpenCL implementations to answer questions like:
Where is the connection between the "normal" OpenCL and the OpenCV OpenCL?
Where the program is built from the kernel source files?
Where is the function, which creates the kernel explained?
etc
I couldn´t find a document or related questions on the web.
Thanks
Edit: Thanks for answering it helped to understand a few things:
ocl::Kernel hr_ker("ORB_HarrisResponses", ocl::features2d::orb_oclsrc,
format("-D ORB_RESPONSES -D blockSize=%d -D scale_sq_sq=%.12ef -D HARRIS_K=%.12ff", blockSize, scale_sq_sq, harris_k));
In this part the kernel code ORB_HarrisResponses located in orb.cl build within the string ocl::features2d::orb_oclsrc is created as hr_ker (right?).
But what does the format(...) thing do?
if hr_ker.empty() return false;
return hr_ker.args(ocl::KernelArg::ReadOnlyNoSize(imgbuf),
ocl::KernelArg::PtrReadOnly(layerinfo),
ocl::KernelArg::PtrReadOnly(keypoints),
ocl::KernelArg::PtrWriteOnly(responses),
nkeypoints).run(1, globalSize, 0, true);
In this part of the Kernel arguments imgbuf, layerinfo, keypoints are set and output of the kernel is stored in responses.
What is going on with nkeypoints?
Why no ocl::KernelArg infront of this parameter?
The kernel in orb.cl has 7 arguments but only 5 are set, why?
What exactly is returned from return hr_ker.args(...)?

This syntax is kind of internal OpenCV "sugar" to not repeat some common code blocks. Unfortunately there is no good documentation so the only way to learn it is looking through source code and examples.
Some tips for you:
Connection between OpenCL API and opencv are in modules\core\src\ocl.cpp (see Kernel, Kernel::Impl, Program, ProgramSource, KernelArg classes).
Source code of kernels stored in *.cl files (for example ORB kernels are in modules\features2d\src\opencl\orb.cl file). On module building code of kernels are copying to auto-generated cpp file (for example opencl_kernels_features2d.cpp) and code can be accessed by ocl::features2d::orb_oclsrc.
To use opencl implementation in opencv you need to pass to function cv::UMat instead of regular cv::Mat (see CV_OCL_RUN_ macro and cv::OutputArray::isUMat() method).
Basically all opencl implementation inside opencv does the following:
Defines kernel parameters, like global size, block size, etc.
Creates cv::ocl::Kernel using string with source code and defined parameters. (If kernel is not created or there is no opencl implementation for specified input parameters processing is passed to regular cpu code).
Pass kernel arguments via cv::ocl::KernelArgs. There is several types of parameters to optimize processing: read-only, write-only, constant, etc.
Run kernel.
So for end user using opencl implementation is transparent. If something goes wrong processing is switched to cpu implementation.
Let's discuss following code snippet:
return hr_ker.args(ocl::KernelArg::ReadOnlyNoSize(imgbuf),
ocl::KernelArg::PtrReadOnly(layerinfo),
ocl::KernelArg::PtrReadOnly(keypoints),
ocl::KernelArg::PtrWriteOnly(responses),
nkeypoints).run(1, globalSize, 0, true);
and ocl function declaration:
ORB_HarrisResponses(__global const uchar* imgbuf, int imgstep, int imgoffset0,
__global const int* layerinfo, __global const int* keypoints,
__global float* responses, int nkeypoints )
nkeypoints is integer, so no need to wrap it to ocl::KernelArg. It will be passed directly to kernel.
ocl::KernelArg::ReadOnlyNoSize actually expands to three parameters: imgbuf, imgstep, imgoffset0.
Other kernel arguments doesn't expand, so it represent single parameter.
hr_ker.args returns reference to cv::ocl::Kernel so you may use following construction: kernel.args(...).run(...).
Some useful links:
cv::format documentation. It works like boost::format.
Hope it will help.

Related

DiagramBuilder: Cannot operate on ports of System plant until it has been registered using AddSystem

I have an issue working with DiagramBuilder and ManipulationStation classes.
It appears to me, that c++ API and the python bindings work differently in my case.
C++ API behaves as expected, while the python bindings result in the runtime error:
DiagramBuilder: Cannot operate on ports of System plant until it has been registered using AddSystem
How I use C++ API
In one of the ManipulationStation::Setup...() methods I inject a block of code, that adds an extra manipuland
const std::string sdf_path = FindResourceOrThrow("drake/examples/manipulation_station/models/bolt_n_nut.sdf");
RigidTransform<double> X_WC(RotationMatrix<double>::Identity(), Vector3d(0.0, -0.3, 0.1));
bolt_n_nut_ = internal::AddAndWeldModelFrom(sdf_path, "nut_and_bolt", lant_->world_frame(), "bolt", X_WC, plant_);
I inject another block of code into the method ManipulationStation::Finalize:
auto zero_torque = builder.template AddSystem<systems::ConstantVectorSource<double>>(Eigen::VectorXd::Zero(plant_->num_velocities(bolt_n_nut_)));
builder.Connect(zero_torque->get_output_port(), plant_->get_actuation_input_port(bolt_n_nut_));
With these changes, the simulation runs as expected.
How I use python bindings
plant = station.get_multibody_plant()
manipuland_path = get_manipuland_resource_path()
bolt_with_nut = Parser(plant=plant).AddModelFromFile(manipuland_path)
X_WC = RigidTransform(RotationMatrix.Identity(), [0.0, -0.3, 0.1])
plant.WeldFrames(plant.world_frame(), plant.GetFrameByName('bolt', bolt_with_nut), X_WC)
...
station.Finalize()
zero_torque = builder.AddSystem(ConstantValueSource(AbstractValue.Make([0.])))
builder.Connect(zero_torque.get_output_port(), plant.get_actuation_input_port(bolt_with_nut_model))
This triggers a RuntimeError with a message as above; The port, which causes this error is nut_and_bolt_actuation.
My vague understanding of the problem is the (in) visibility of nut_and_bolt System, due to having two distinct DiagramBuilders in a process: 1) a one is inside ManipulationStation 2) another is in the python code, that instantiates this ManipulationStation object.
Using ManipulationStation via python bindings is a preference for me, because that way I would've avoided depending on a custom build of drake library.
Thanks for your insight!
I agree with your assessment: you have two different DiagramBuilder objects here. This does not have anything to due with C++ or Python; the ManipulationStation is itself a Diagram (created using its own DiagramBuilder), and you have a second DiagramBuilder (in either c++ or python) that is connecting the ManipulationStation together with other elements. You are trying to connect a system that is in the external diagram to a port that is in the internal diagram, but is not exposed.
The solution would be to have the ManipulationStation diagram expose the extra nut and bolt actuation port so that you can connect to it from the second builder.
If you prefer Python, I've switched my course to using a completely python version of the manipulation station. I find this version is much easier to adapt to different student projects. (To be clear, the setup is in python, but at simulation time all of the elements are c++ and it doesn't call back to python; so the performance is almost identical.)

Pointcloud Visualization in Drake Visualizer in Python

I would like to visualize pointcloud in drake-visualizer using python binding.
I imitated how to publish images through lcm from here, and checked out these two issues (14985, 14991). The snippet is as follows :
point_cloud_to_lcm_point_cloud = builder.AddSystem(PointCloudToLcm())
point_cloud_to_lcm_point_cloud.set_name('pointcloud_converter')
builder.Connect(
station.GetOutputPort('camera0_point_cloud'),
point_cloud_to_lcm_point_cloud.get_input_port()
)
point_cloud_lcm_publisher = builder.AddSystem(
LcmPublisherSystem.Make(
channel="DRAKE_POINT_CLOUD_camera0",
lcm_type=lcmt_point_cloud,
lcm=None,
publish_period=0.2,
# use_cpp_serializer=True
)
)
point_cloud_lcm_publisher.set_name('point_cloud_publisher')
builder.Connect(
point_cloud_to_lcm_point_cloud.get_output_port(),
point_cloud_lcm_publisher.get_input_port()
)
However, I got the following runtime error:
RuntimeError: DiagramBuilder::Connect: Mismatched value types while connecting output port lcmt_point_cloud of System pointcloud_converter (type drake::lcmt_point_cloud) to input port lcm_message of System point_cloud_publisher (type drake::pydrake::Object)
When I set 'use_cpp_serializer=True', the error becomes
LcmPublisherSystem.Make(
File "/opt/drake/lib/python3.8/site-packages/pydrake/systems/_lcm_extra.py", line 71, in _make_lcm_publisher
serializer = _Serializer_[lcm_type]()
File "/opt/drake/lib/python3.8/site-packages/pydrake/common/cpp_template.py", line 90, in __getitem__
return self.get_instantiation(param)[0]
File "/opt/drake/lib/python3.8/site-packages/pydrake/common/cpp_template.py", line 159, in get_instantiation
raise RuntimeError("Invalid instantiation: {}".format(
RuntimeError: Invalid instantiation: _Serializer_[lcmt_point_cloud]
I saw the cpp example here, so maybe this issue is specific to python binding.
I also saw this python example, but thought using 'PointCloudToLcm' might be more convenient.
P.S.
I am aware of the development in recent commits on MeshcatVisualizerCpp and MeshcatPointCloudVisualizerCpp, but I am still on the drake-dev stable build 0.35.0-1 and want to stay on drake visualizer until the meshcat c++ is more mature.
The old version in pydrake.systems.meshcat_visualizer.MeshcatVisualizer is a bit too slow on my current use-case (multiple objects drop). I can visualize the pointcloud with this visualization setting, but it took too much machine resources.
Only the message types that are specifically bound in lcm_py_bind_cpp_serializers.cc can be used on an LCM message input/output port connection between C++ and Python. For all other LCM message types, the input/output port connection must be from a Python system to a Python system or a C++ System to a C++ System.
The lcmt_image_array is listed there, but not the lcmt_point_cloud.
If you're stuck using Drake's v0.35.0 capabilities, then I don't see any great solutions. Some options:
(1) Write your own PointCloudToLcm system in Python (by re-working the C++ code into Python, possibly with a narrower set of supported features / channels for simplicity).
(2) Write your own small C++ helper function MakePointCloudPublisherSystem(...) that calls LcmPublisherSystem::Make<lcmt_point_cloud> function in C++, and bind it into Python. Then your Python code can call MakePointCloudPublisherSystem() and successfully connect that to the existing C++ PointCloudToLcm.

Passing functions defined in Rcpp in each node through "foreach" [duplicate]

I'm trying to understand what is happening behind the Rcpp::sourceCpp() call on a parallelized environment. Recently, this was partially addressed in the question: Using Rcpp function in parLapply on Windows.
Within this post, Dirk said,
"You need to run the sourceCpp() call in each spawned process, or else get them your code."
This was in response to questioner's use of distributing the Rcpp function to the worker processes. The questioner was sending the Rcpp function via:
clusterExport(cl = cl, varlist = "payoff")
I'm confused as to why this doesn't work. My thoughts are that this was what the objective of the clusterExport() is for.
The issue here is that the compiled code is not "exportable" to the spawned processes without being embedded in a package due to how binaries are linked into R's processes.
Traditionally, the clusterExport() statement allows for R specific code to be distributed to workers.
By using clusterExport() on an Rcpp function, you are only receiving the R declaration and not the underlying shared library. That is to say, the R CMD SHLIB given in Attributes.R is not shared with / exported to the workers. As a result, when a call is then made to an Rcpp function on the worker, R cannot find the correct shared library.
Take the previous question's function:
Rcpp::cppFunction("NumericVector payoff( double strike, NumericVector data) {
return pmax(data - strike, 0);
}")
Note: I'm using cppFunction() instead of sourceCpp() but the results are equivalent since cppFunction() calls sourceCpp() to create the function.
Typing the function name:
payoff
Yields the R declaration with a shared library pointer.
function (strike, data)
.Primitive(".Call")(<pointer: 0x1015ec130>, strike, data)
This shared library is only available on process that compiled the function.
Hence, why it is always ideal to embed compiled code within a package and then distribute the package.

memory trace of all variables in program with DBI tool

I am using intel pin as my primary DBI tool.
I am interested to know how can I trace all variables allocated in a program .
suppose, we have the following snippet in C:
int *ptr_one, *ptr_two, g;
ptr_one = (int *)malloc(sizeof(int));
ptr_two = (int *)malloc(sizeof(int));
*ptr_one = 25;
*ptr_two = 24;
g = 130;
free(ptr_two);
g = 210;
*ptr_two = 50;
I want to know how can I trace specific variables / memory references in my program . for example on the above code, I like to trace the variable "g" in my program with Intel Pin, how it can be done?
for dynamically allocated variables, I'm monitoring malloc/free calls & follow their addresses, but for static ones I do not have any idea .
Another matter is, for dynamically allocated variables, I like to trace them across the whole program, suppose in the above code, I want to monitor (ptr_two) variable changes and modification during my program from start to finish .
If anyone have some idea about that, it can be nice to share it here, sample codes appreciated in Intel Pin .
thank you all .
Simply stated, you can't associate a name from your source code (be it variable or function name) with a memory location on the compiled binary: this information is (probably) lost on the final binary.
This is not true in two cases:
1) If your binary is exporting functions: in this case other binaries must have a means to call the function by name (minus some subtleties), in which case the information must be available somewhere; for example on Windows, binaries that export functions, variables or classes have an export table.
2) You have symbolic information: in your example, either for the global variable or other local variable, you have to use the symbolic information provided by the compiler.
On Linux you will need an external tool / library / program (e.g. libelf.so or libdwarf.so) to parse the symbolic information from the symbol tables (usually dynsym / symtab) if the binary is not stripped.
On windows you have to rely on the program database (*.pdb files); the format is mostly undocumented (although MS is trying to document it) and you have to use either the DbgHelp API or the DIA SDK.
As stated by the PIN user guide (emphasis is mine):
Pin provides access to function names using the symbol object (SYM).
Symbol objects only provide information about the function symbols in
the application. Information about other types of symbols (e.g. data
symbols), must be obtained independently by the tool.
If you have symbolic information you can then associate a variable name - obtained from an external tool - with an address (relative to the module base for global vars or a stack location for local ones). At runtime it is then just a matter of converting the relative address to a virtual one.

Obtain LWP id from a pthread_t on Solaris to use with processor_bind

On Solaris, processor_bind is used to set affinity for threads. You need to know the LWPID of the target thread or use the constant P_MYID to refer to yourself.
I have a function that looks like this:
void set_affinity(pthread_t thr, int cpu_number)
{
id_t lwpid = what_do_I_call_here(thr);
processor_bind(P_LWPID, lwpid, cpu_number, NULL);
}
In reality my function has a bunch of cross platform stuff in it that I've elided for clarity.
The key point is that I'd like to set the affinity of an arbitrary pthread_t so I can't use P_MYID.
How can I achieve this using processor_bind or an alternative interface?
Following up on this, and due to my confusion:
The lwpid is what is created by
pthread_create( &lwpid, NULL, some_func, NULL);
Thread data is available externally to a process that is not the one making the pthread_create() call - via the /proc interface
/proc/<pid>/lwp/<lwpid>/ lwpid == 1 is the main thread, 2 .. n are the lwpid in the above example.
But this tells you almost nothing about which thread you are dealing with, except that it is the lwpid in the example above.
/proc/pid/lwp/lwpid/lwpsinfo
can be read into a struct lwpsinfo which has some more information, from which you might be able to ascertain if you are looking at the thread you want. see /usr/include/sys/procfs.h
Or man -s 4 proc
The Solaris 11 kernel has critical threads optimization. You setup which threads require special care, the kernel does the rest. This appears to be what you want. Please read this short explanation to see if I understood what you want.
https://blogs.oracle.com/observatory/entry/critical_threads_optimization
The above is an alternate. It may not fly at all for you. But is the preferred mechanism, per Oracle.
For Solaris 10, use the pthread_t tid of the LWP with an idtype_t of P_LWPID in your call to processor_bind. This works in Solaris 8 -> 11. It works ONLY for LWP's in the process. It is not clear to me if that is your model.
HTH

Resources