Distance travelled by a robot using Optical Flow - localization

is there a way to find out the distance travelled by a robot using Optical Flow? For example, using OpenCV, I'm able to find out the velocity of each pixel between 2 images taken by a camera. However, I don't know where to go with this to find out the corresponding distance travelled by the robot. Can you suggest a way to do this?
My main aim is to do the localization of the robot and for that I need the distance travelled by it between 2 instances.

No, not directly. You can determine distance to objects, and then back calculate distance travelled from there, but it will likely be computationally expensive.

What you are lookig is a SLAM (Simultaneous localisation and mapping) method, that is also used in with feature matching methods using SIFT SURF or FAST.
Read e.g. Quantitative Evaluation of Feature Extractors for Visual SLAM for more information.

Related

Measure real distance between two points using iOS Depth camera

Right now I'm exploring features of iOS Depth camera and now I want to obtain the distance in real-world metrics between two points (for example, between two eyes).
I have successfully connected iOS Depth camera functionality and I have AVDepthData in my hands but I'm not quite sure how I can get a real-world distance between two specific points.
I believe I could calculate it if I have depth and viewing angle, but I don't see that the latter is presented as parameter. Also I know that this task could be handled with ARKit, but I'm really curious how I can implement it myself. I mean ARKit uses Depth camera as well, so there must be an algorithm where Depth maps is all I need to calculate the real distance
Could you please give me an advice how to tackle this task? Thanks in advance!

How to improve accuracy of camera extrinsics calibration

I have a multi-camera system where the field of views are mostly non-overlapping. I have been researching on methods to calibrate the camera extrinsics and the first thing I'm going to try is to take a picture of a chessboard at a known location and use solvePnP from OpenCV to find the extrinsic rotation and translation vectors for each camera separately (following the method described in the answer here).
My problem is, this method uses only one measurement and as every measurement it is prone to errors. I assume that by taking multiple measurements, either by changing the position or the orientation of the chessboard, the accuracy can be improved. But what would be the best way to combine the rotation and translation obtained from the different measurements? A simple average?
In theory I would think that an option could be using solvePnP on all the points at the same time. Since I am calculating extrinsics the camera can't be moved so I would have to change to position and/or orientation of the board for each picture and measure the 3D points positions as accurately as possible each time.
I'm also wondering if using two chessboards in the same picture would be a possible solution, even if OpenCV doesn't seem to support multiple chessboard detection.
Is there a better way to measure extrinsics or anything that I'm missing?

How to do grid-based (dense) optical flow on a masked image?

I am trying to track multiple people using a video camera. I do not want to use blob segmentation techniques.
What I want to do:
Perform background subtraction to obtain a mask isolating the peoples' motion.
Perform grid based optical flow on those areas -
What would be my best bet?
I am struggling to implement. I have tried blob detection and also some optical flow based examples (sparse), sparse didn't really do it for me as I wasn't getting enough feature points from goodfeaturestotrack() - I would like to end up with at least 20 track able points per person so that's why I think a grid based method would be better for me, I will use the motion vectors obtained to classify different people ( clustering on magnitude and direction possibly? )
I am using opencv3 with Python 3.5 - but am still quite noobish in this field.
Would appreciate some guidance immensely!
For a sparse optical flow ( in OpenCV the pyramidal Lucas Kanade method) you don't need good features-to-track mandatory to get the positions.
The calcOpticalFlowPyrLK function allows you to estimate the motion at predefined positions and these can be given by you too.
So just initialized a grid of cv::Point2f by your self, e.g. create a list of points and set the positions to the grid points located at your blobs, and run calcOpticalFlowPyrLK().
The idea of the good features-to-track method is that it gives you the points where the calcOpticalFlowPyrLK() result is more likely to be accurate and this is on image locations with edge-like structures. But in my experiences this gives not always the optimal feature point set. I prefer to use regular grids as feature point sets.

Relative camera pose estimation: obtaining metric translation using noisy estimate

I am currently working on pose estimation of one camera with respect to another using opencv, in a setup where camera1 is fixed and camera2 is free to move. I know the intrinsics of both the cameras. I have the pose estimation module using epipolar geometry and computing essential matrix using the five-point algorithm to figure out the R and t of camera2 with respect to camera1; but I would like to get the metric translation. To help achieve this, I have two GPS modules, one on camera1 and one on camera2. For now, if we assume camera1's GPS is flawless and accurate; camera2's GPS exhibits some XY noise, I would need a way to use the opencv pose estimate on top of this noisy GPS to get the final accurate translation.
Given that info, my question has two parts:
Because the extrinsics between the cameras keep changing, would it be possible to use bundle adjustment to refine my pose?
And can I somehow incorporate my (noisy) GPS measurements in a bundle adjustment framework as an initial estimate, and obtain a more accurate estimate of metric translation as my end result?
1) No, bundle adjustment has another function and you would not be able to work with it anyway because you would have an unknown scale for every pair you use with 5-point. You should instead use a perspective-n-point algorithm after the first pair of images.
2) Yes, it's called sensor fusion and you need to first calibrate (or know) the transformation between your GPS sensor coordinates and your camera coordinates. There is an open source framework you can use.

Finding real world distance using an image

I am working on a project where I need to figure out real world distance of an object with reference to a fixed point using an image.
I have detected the object in a 2D image, using SURF. My object is inside a box now. What will give me the position of the centroid of the object. How can I use this to find out the real word distance?
If I plan to incorporate stereo vision, triangulating the centroid of the object, what is the difference between the distance I obtain here and in the previous method?
On a single image, probably the best starting point to learn about estimating metrical properties is Antonio Criminisi's work (2000 vintage, but still very relevant): http://www.cs.illinois.edu/~dhoiem/courses/vision_spring10/sources/criminisi00.pdf

Resources