How to reduce hand-in-eye calibration error for robot integrated with realsense camera? - ros

I am new to handeye calibration and I wanted to know whether for a research application you would use a pre-developed tool like easyhandeye moveit calibration for doing the handeye calibration?
I have mounted an aruco board on a table and the camera on the robot arm, then used moveit easy handeye calibration to record different poses of he robot and use an AX=XB solver for doing the handeye calibration.
When I do this I have a large error (about an offset of few centimeteres) between the comanded position sent to the robot and the actual position reached. The reprojection error is quite small (<0.01). In my final application I would like to command poses from image processing data, so I guess the error compounds and becomes larger...Any ideas how can I reduce this?
Any papers will be helpful! I alsso want to know if normally it would be better to write a code from scratch for this and then use ML/stats/other optimisation methods to reduce the error?

Related

Lane tracking with a camera: how to get distance from camera to the lane?

i am doing final year project as lane tracking using a camera. the most challenging task now is how i can measure distance between the camera (the car that carries it actually) and the lane.
While the lane is easily recognized (Hough line transform) but i found no way to measure distance to it.
given the fact that there is a way to measure distance to object in front of camera based on Pixel width of the object, but it does not work here be because the nearest point of the line, is blind in the camera.
What you want is to directly infer the depth map with a monocular camera.
You can refer my answer here
https://stackoverflow.com/a/64687551/11530294
Usually, we need a photometric measurement from a different position in the world to form a geometric understanding of the world(a.k.a depth map). For a single image, it is not possible to measure the geometric, but it is possible to infer depth from prior understanding.
One way for a single image to work is to use a deep learning-based method to direct infer depth. Usually, the deep learning-based approaches are all based on python, so if you only familiar with python, then this is the approach that you should go for. If the image is small enough, i think it is possible for realtime performance. There are many of this kind of work using CAFFE, TF, TORCH etc. you can search on git hub for more option. The one I posted here is what i used recently
reference:
Godard, Clément, et al. "Digging into self-supervised monocular depth estimation." Proceedings of the IEEE international conference on computer vision. 2019.
Source code: https://github.com/nianticlabs/monodepth2
The other way is to use a large FOV video for a single camera-based SLAM. This one has various constraints such as need good features, large FOV, slow motion, etc. You can find many of this work such as DTAM, LSDSLAM, DSO, etc. There are a couple of other packages from HKUST or ETH that does the mapping given the position(e.g if you have GPS/compass), some of the famous names are REMODE+SVO open_quadtree_mapping etc.
One typical example for a single camera-based SLAM would be LSDSLAM. It is a realtime SLAM.
This one is implemented based on ROS-C++, I remember they do publish the depth image. And you can write a python node to subscribe to the depth directly or the global optimized point cloud and project it into a depth map of any view angle.
reference: Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Springer, Cham, 2014.
source code: https://github.com/tum-vision/lsd_slam

Calibration parameters for FLIR Blackfly cameras?

I am currently doing a project with a FLIR Blackfly camera (BFS-U3-120S4C to be exact), and have not been getting a low enough reprojection error in my calibration experiments.
I know that manufacturers often have the intrinsic parameters and distortion coefficients available, and FLIR has a Spinnaker Python API to operate their cameras and they advertise them as Machine Vision cameras, so I strongly suspected that this would be the case for them.
However, I have not been successful in finding this in my Google searches. Does anyone know of FLIR have given out the values for these parameters somewhere?
Thanks for the help!

Kinect robotic arm detection

can i use Kinect sensor to detect the motion of a robotic arm (KUKA LBR iiwa 7R800) and calculate it's links angles in order to make it control another robotic arm.
Of course this is possible but I don't think it is a good idea.
Suboptimal accuracy, lag due to processing of the 3d-data. I guess there are also cases where you cannot see all joints/links.
Kuka robots can output their joint angles directly. Use this data for synchronization or control both robots using the same external data.
Any measurement error might cause unwanted movements which in case of industrial robots can cause severe damage!

Generic web camera calibration

I am building a website that does cool things using computer vision techniques, with videos live recorded and uploaded by users using their webcam. For this, I need camera intrinsic and distortion parameters. I am trying to figure out what would be the best way to compute these given the user uploaded videos. We can make no assumptions about what videos user might upload - but a reasonable assumption is that a human might be present in the video. I am still in the initial stages of this, but I am interested in knowing how others have solved this problem.
To be specific, below are the questions that I would appreciate someone experienced in the group might comment upon:
What algorithms, libraries and techniques are available to extract intrinsic and distortion parameters of any generic webcam available in the market? [I say "extract" and not "calibrate" to include cases where intrinsic parameters are just a method call away with no calibration necessary].
In general, how much variance have you observed in the intrinsic and distortion parameters in the webcams available in the market? Did you approximate them with a single intrinsic and distortion parameters or what approach did you follow?
What camera self-calibration methods, if any, could be employed in these scenarios? Are there any opensource or commercial libraries available which might be of some help?
If we aim to calibrate the webcams using the videos user record and upload, what assumptions in the parameters [like fx==fy or no distortion params] makes sense and sounds reasonable to you?
Would a reasonable approximation of intrinsic and distortion params for all the cameras make sense? What would be a reasonable approach to validate how good particular intrinsic and distortion parameters are for a specific webcam?
Are there any other issues that need to be considered?
Sometimes I am the one who comes with the bad news :) So do I now.
For almost all your points there the clear answer is No, None, Not, and so on. Only for the last point, with the other issues, the answer is not a no, but a long list :).
Actually, camera calibration without a chessboard and some specific constraints is almost impossible.
The closest implementation to a no-assumptions calibration is found in the stitching module in OpenCV. Hovewer, it is not perfect, and it's not working on random videos. Give it a try.
There is the famous Camera Calibration Toolbox, a good Matlab implementation of extracting intrinsic and extrinsic parameters.
There is a variance not only amongst webcams, but also of:
Different modules
Different zoom levels (Affects the optics)
I think that this is a really hard problem, if you restrict yourself to making no assumptions regarding the video. Both the calibration and the evaluation is hard if you don't use something known - such as checker board in Camera Calibration Toolbox.
Many algorithms, including the currently used in opencv requires that known points can be detected (e.g corners in a chess board). You would have to require that your users took pictures of this known patterns, which ruin the concept of random videos. I dont have a solution to this but you might want to consider requiring users to record videos of structures scenes(no specific patterns or objects) and use the algorithm described in:
"Camera calibration with lens distortion from low-rank textures"
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5995548&tag=1
Haven't tried it myself though.

Stereovision algorithms

For my project, supposed to segment closest hand region from camera, I initially try openCV's stereovision example. However, disparity map looks very bad and its useless for me.
Is there any other method which is better than openCV implementation and have some output(image-video). Because, my time is limited, I must choose one better algorithm and implement this.
Thank you.
OpenCV implements a number of stereo block matching algorithms some of them pretty cutting edge.
Disparity maps always look bad except in very simple circumstances - the first step is to try and improve the source images, the lighting and the background. I
If it was easy then everybody would eb doing it and there would be no market for expensive 3D laser scanners.
Try the different block matching algorithms provided by OpenCV. The little bit of experimentation I've done so far seems to indicate that cv::StereoSGBM gives better disparity maps than cv::StereoBM, but is slower.
The performance of the block matching algorithms will depend on what parameters they are initialized with. Have a look at the stereo examples again here, notice line 195-222 where the algorithms are initialized.
I also suggest you use some basic GUI (OpenCV:s highgui for example) to manipulate these parameters real-time when finetuning the algorithm.

Resources