Relative camera pose estimation: obtaining metric translation using noisy estimate - opencv

I am currently working on pose estimation of one camera with respect to another using opencv, in a setup where camera1 is fixed and camera2 is free to move. I know the intrinsics of both the cameras. I have the pose estimation module using epipolar geometry and computing essential matrix using the five-point algorithm to figure out the R and t of camera2 with respect to camera1; but I would like to get the metric translation. To help achieve this, I have two GPS modules, one on camera1 and one on camera2. For now, if we assume camera1's GPS is flawless and accurate; camera2's GPS exhibits some XY noise, I would need a way to use the opencv pose estimate on top of this noisy GPS to get the final accurate translation.
Given that info, my question has two parts:
Because the extrinsics between the cameras keep changing, would it be possible to use bundle adjustment to refine my pose?
And can I somehow incorporate my (noisy) GPS measurements in a bundle adjustment framework as an initial estimate, and obtain a more accurate estimate of metric translation as my end result?

1) No, bundle adjustment has another function and you would not be able to work with it anyway because you would have an unknown scale for every pair you use with 5-point. You should instead use a perspective-n-point algorithm after the first pair of images.
2) Yes, it's called sensor fusion and you need to first calibrate (or know) the transformation between your GPS sensor coordinates and your camera coordinates. There is an open source framework you can use.

Related

How to improve accuracy of camera extrinsics calibration

I have a multi-camera system where the field of views are mostly non-overlapping. I have been researching on methods to calibrate the camera extrinsics and the first thing I'm going to try is to take a picture of a chessboard at a known location and use solvePnP from OpenCV to find the extrinsic rotation and translation vectors for each camera separately (following the method described in the answer here).
My problem is, this method uses only one measurement and as every measurement it is prone to errors. I assume that by taking multiple measurements, either by changing the position or the orientation of the chessboard, the accuracy can be improved. But what would be the best way to combine the rotation and translation obtained from the different measurements? A simple average?
In theory I would think that an option could be using solvePnP on all the points at the same time. Since I am calculating extrinsics the camera can't be moved so I would have to change to position and/or orientation of the board for each picture and measure the 3D points positions as accurately as possible each time.
I'm also wondering if using two chessboards in the same picture would be a possible solution, even if OpenCV doesn't seem to support multiple chessboard detection.
Is there a better way to measure extrinsics or anything that I'm missing?

Finding the relative pose between two cameras with 2D and 3D correspondences

I have two images obtained by a calibrated camera from two different poses. I also have correspondences of 2D points between the images. Some of the points have depth information, so I also know their 3D coordinates. I want to calculate the relative pose between the images.
I know I can compute a fundamental matrix or an essential matrix from the 2D points. I also know PnP can find the pose with 2D-to-3D correspondences and that it's also doable getting just correspondences of 3D points. However, I don't know any algorithm that takes advantage of all the available information. Is there any?
There is only one such algorithm: Bundle Adjustment - everything else is a hack. Get your initial estimates separately, use any "reasonable & simple" hacky way of merging them to get an initial estimate, then byte the bullet and bundle. If you are coding in C++, Google's Ceres is my recommended B.A. library.

Estimating real world translation between two cameras through feature matching

I have two calibrated cameras looking at an overlapping scene. I am trying to estimate the pose of camera2 with respect to camera1 (because camera2 can be moving; but both camera1 and 2 will always have some features that are overlapping).
I am identifying features using SIFT, computing the fundamental matrix and eventually the essential matrix. Once I solve for R and t (one of the four possible solutions), I obtain the translation up-to-scale, but is it possible to somehow compute the translation in real world units? There are no objects of known size in the scene; but I do have the calibration data for both the cameras. I've gone through some info about Structure from Motion and stereo pose estimation, but the concept of scale and the correlation with real world translation is confusing me.
Thanks!
This is the classical scale problem with structure from motion.
The short answer is that you must have some other source of information in order to resolve scale.
This information can be about points in the scene (e.g. terrain map), or some sensor reading from the moving camera (IMU, GPS, etc.)

Estimate pose of moving camera from a stationary camera, both looking at the same scene

Assume I have two independent cameras looking at the same scene (there are features that are visible from both) and that I know the calibration parameters of both the cameras individually (I can also perform stereo calibration at a certain baseline but I don't know if that would be useful). One of the cameras is fixed and stable, the other is noisy in terms of its pose (translation and rotation). As the pose keeps changing over time, is it possible to accurately estimate the pose of the moving camera with respect to the stationary one using image data from both cameras (in opencv)?
I've been doing a little bit of reading, and this is what I've gathered so far:
Find features using SIFT and the point correspondences.
Find the fundamental matrix.
Find essential matrix and perform SVD to obtain the R and t values between the cameras.
Does this approach work on a frame-by-frame basis? And how does the setup help in getting the scale factor? Pointers and suggestions would be very helpful.
Thanks!

Estimating pose of one camera given another with known baseline

I am a beginner when it comes to computer vision so I apologize in advance. Basically, the idea I am trying to code is that given two cameras that can simulate a multiple baseline stereo system; I am trying to estimate the pose of one camera given the other.
Looking at the same scene, I would incorporate some noise in the pose of the second camera, and given the clean image from camera 1, and slightly distorted/skewed image from camera 2, I would like to estimate the pose of camera 2 from this data as well as the known baseline between the cameras. I have been reading up about homography matrices and related implementation in opencv, but I am just trying to get some suggestions about possible approaches. Most of the applications of the homography matrix that I have seen talk about stitching or overlaying images, but here I am looking for a six degrees of freedom attitude of the camera from that.
It'd be great if someone can shed some light on these questions too: Can an approach used for this be extended to more than two cameras? And is it also possible for both the cameras to have some 'noise' in their pose, and yet recover the 6dof attitude at every instant?
Let's clear up your question first. I guess You are looking for the pose of the camera relative to another camera location. This is described by Homography only for pure camera rotations. For General motion that includes translation this is described by rotation and translation matrices. If the fields of view of the cameras overlap the task can be solved with structure from motion which still estimates only 5 dof. This means that translation is estimated up to scale. If there is a chessboard with known dimensions in the cameras' field of view you can easily solve for 6dof by running a PnP algorithm. Of course, cameras should be calibrated first. Finally, in 2008 Marc Pollefeys came up with an idea how to estimate 6 dof from two moving cameras with non-overlapping fields of view without using any chess boards. To give you more detail please tell a bit for the intended appljcation you are looking for.

Resources