ROS Human-Robot mapping (Baxter) - ros

I'm having some difficulties understanding the concept of teleoperation in ROS so hoping someone can clear some things up.
I am trying to control a Baxter robot (in simulation) using a HTC Vive device. I have a node (publisher) which successfully extracts PoseStamped data (containing pose data in reference to the lighthouse base stations) from the controllers and publishes this on separate topics for right and left controllers.
So now I wish to create the subscribers which receive the pose data from controllers and converts it to a pose for the robot. What I'm confused about is the mapping... after reading documentation regarding Baxter and robotics transformation, I don't really understand how to map human poses to Baxter.
I know I need to use IK services which essentially calculate the co-ordinates required to achieve a pose (given the desired location of the end effector). But it isn't as simple as just plugging in the PoseStamped data from the node publishing controller data to the ik_service right?
Like a human and robot anatomy is quite different so I'm not sure if I'm missing a vital step in this.
Seeing other people's example codes of trying to do the same thing, I see that some people have created a 'base'/'human' pose which hard codes co-ordinates for the limbs to mimic a human. Is this essentially what I need?
Sorry if my question is quite broad but I've been having trouble finding an explanation that I understand... Any insight is very much appreciated!

You might find my former student's work on motion mapping using a kinect sensor with a pr2 informative. It shows two methods:
Direct joint angle mapping (eg if the human has the arm in a right angle then the robot should also have the arm in a right angle).
An IK method that controls the robot's end effector based on the human's hand position.
I know I need to use IK services which essentially calculate the
co-ordinates required to achieve a pose (given the desired location of
the end effector). But it isn't as simple as just plugging in the
PoseStamped data from the node publishing controller data to the
ik_service right?
Yes, indeed, this is a fairly involved process! In both cases, we took advantage of the kinects api to access the human's joint angle values and the position of the hand. You can read about how Microsoft research implemented the human skeleton tracking algorithm here:
https://www.microsoft.com/en-us/research/publication/real-time-human-pose-recognition-in-parts-from-a-single-depth-image/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2F%3Fid%3D145347
I am not familiar with the Vive device. You should see if it offers a similar api for accessing skeleton tracking information since reverse engineering Microsoft's algorithm will be challenging.

Related

When I call it "AR Experience"?

In my case I am trying to build an application that measure the distance between the camera and any detected human body, exactly like this.
I started with android platform, the best match was Use ARCore as input for Machine Learning models, but I have no clue to how to change it to on Stream_mode.
After losing hope on android, I found that I can use MedeaPipe pose detection to detect the human body and by measuring the distance between two poses I can estimate how far the person is. But I know that ARCore uses what called hitTest, which it uses depth api to measure the distance.
Also, there is a MedeaPipeUnityPlugin.
So my questions are:
Does MedeaPipe provide an AR Experience, if it used as I mentioned? and If there is another way to use MediaPipe, please let me know.
Do we call it AR Experience, even if we do not have a 3D understanding of the environment?

Creating Meshes from Pointclouds of Urban Scenes

I want to create high fidelity meshes of urban street scenes using pointcloud data. The data consists of pointclouds using a HDL64-e and the scenes are very similar to the one in the Kitti Dataset.
Currently im only able to use the 'raw' point clouds and odometry of the car. Previous works already implemented the LeGO-LOAM algorithm to create a monolithic map and better odometry estimates.
Available Data:
Point Clouds with 10Hz timings
Odometry estimates with higher frequencies (LOAM Output)
Monolithic map of the scene (LOAM Output) (~1.500.000 Points)
I already did some research and came to the conclusion, that I can either
use the monolithic map with algorithms like Poisson Reconstruction, Advancing Front, etc... (using CGAL)
go the robotics way and use some packages like Voxgraph (which uses Marching Cubes internally)
As we might want to integrate image data at a later step the second option would be preferred.
Questions:
Is there a State-of-the-Art way to go?
Is it possible to get a mesh that can preserve small features like curbs and sign posts? (I know there might be a feasable limit on how fine the mesh can be)
I am very interested in some feedback and a discourse on how to tackle this problem 'the right way'.
Thank you for your suggestions/answers in advance!

Lane tracking with a camera: how to get distance from camera to the lane?

i am doing final year project as lane tracking using a camera. the most challenging task now is how i can measure distance between the camera (the car that carries it actually) and the lane.
While the lane is easily recognized (Hough line transform) but i found no way to measure distance to it.
given the fact that there is a way to measure distance to object in front of camera based on Pixel width of the object, but it does not work here be because the nearest point of the line, is blind in the camera.
What you want is to directly infer the depth map with a monocular camera.
You can refer my answer here
https://stackoverflow.com/a/64687551/11530294
Usually, we need a photometric measurement from a different position in the world to form a geometric understanding of the world(a.k.a depth map). For a single image, it is not possible to measure the geometric, but it is possible to infer depth from prior understanding.
One way for a single image to work is to use a deep learning-based method to direct infer depth. Usually, the deep learning-based approaches are all based on python, so if you only familiar with python, then this is the approach that you should go for. If the image is small enough, i think it is possible for realtime performance. There are many of this kind of work using CAFFE, TF, TORCH etc. you can search on git hub for more option. The one I posted here is what i used recently
reference:
Godard, Clément, et al. "Digging into self-supervised monocular depth estimation." Proceedings of the IEEE international conference on computer vision. 2019.
Source code: https://github.com/nianticlabs/monodepth2
The other way is to use a large FOV video for a single camera-based SLAM. This one has various constraints such as need good features, large FOV, slow motion, etc. You can find many of this work such as DTAM, LSDSLAM, DSO, etc. There are a couple of other packages from HKUST or ETH that does the mapping given the position(e.g if you have GPS/compass), some of the famous names are REMODE+SVO open_quadtree_mapping etc.
One typical example for a single camera-based SLAM would be LSDSLAM. It is a realtime SLAM.
This one is implemented based on ROS-C++, I remember they do publish the depth image. And you can write a python node to subscribe to the depth directly or the global optimized point cloud and project it into a depth map of any view angle.
reference: Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Springer, Cham, 2014.
source code: https://github.com/tum-vision/lsd_slam

Image analysis technique to determine approximate change in view over a short period of time?

I am working on an open source package for robot owners. I want to do a decent job of detecting when the robot is having movement problems. One of the problems the robot commonly has is that the back wheel gets "tucked underneath" in a bad way and makes it turn very slowly when on carpet. I believe that with a combination of accelerometer value inspection and (I hope) a relatively simple yet robust vision analysis technique, I will be able to tell when the robot is having this specific problem.
What I need is to be able to analyze two images, separated by about 1/2 second in time, and get a numerical value that tells about how close they are, but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis. I've heard of an algorithm called optical flow that is used in object and scene tracking, but I'm hoping I don't need something heavyweight.
Is there an machine vision algorithm/function that can analyze two JPEG's and tell if they belong to the same scene and viewpoint, yet can also deliver a numerical monotonically increasing value that tells me rough how different they are? If I could get that numerical value and compare it to the number of milliseconds past, while examining the current accelerometer activity, I believe I can detect when the robot is having the "slow turn of death" problem.
If so, please tell me the basic technique involved, and if you know of machine vision library that implements it, which one it is.
but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis
What you are suggesting is a complex problem by itself, so forget about 'lightweight' solutions. Probably you are going to need something like optical flow.
Other options I would recommend you looking into are:
Vanishing points detection and variation from image to image. This quite fits into your problem domain Wikipedia
Disparity map: related to optical flow. Used for stereographic vision, but I think you can use it for the kind of application you are looking for. Take a look at this

Using encoders and robotc to map a line circuit

I am looking for a way to use the encoder information from the motors that drive the wheels of my robot to map a line circuit. The robot navigates around using a single light sensor following a line and on its second lap I want it to recognize where it is in the circuit. I've read a lot about SLAM but not sure I could implement this with robotc and only the encoder information.
Any help and advice on the best way to tackle this would be greatly appreciated.n
You can use an Odometry model to make a prediction on the movement of your robot. Assuming a vehicle with a preferred forward direction on a plane, you would have (x,y,theta) as your state, and then have a state transition depending on your encoder values. What the function looks like really depends on the configuration of your robot. I remember that Introduction to Autonomous Mobile Robots had a good coverage on the subject. You'll find lots of examples on the net, though. Simultaneous Localization and Mapping (SLAM) would be to use a probabilistic Odometry model, and then perform some correction based on your sensor. At first I thought this wasn't very feasible with your setup, but I actually think it is. Using an Occupancy-Grid based Rao-Blackwellized Particle Filter might give you some good results. I haven't used the CAS Toolbox, but have a look as it seems a good place to start.

Resources