Estimating pose of one camera given another with known baseline - opencv

I am a beginner when it comes to computer vision so I apologize in advance. Basically, the idea I am trying to code is that given two cameras that can simulate a multiple baseline stereo system; I am trying to estimate the pose of one camera given the other.
Looking at the same scene, I would incorporate some noise in the pose of the second camera, and given the clean image from camera 1, and slightly distorted/skewed image from camera 2, I would like to estimate the pose of camera 2 from this data as well as the known baseline between the cameras. I have been reading up about homography matrices and related implementation in opencv, but I am just trying to get some suggestions about possible approaches. Most of the applications of the homography matrix that I have seen talk about stitching or overlaying images, but here I am looking for a six degrees of freedom attitude of the camera from that.
It'd be great if someone can shed some light on these questions too: Can an approach used for this be extended to more than two cameras? And is it also possible for both the cameras to have some 'noise' in their pose, and yet recover the 6dof attitude at every instant?

Let's clear up your question first. I guess You are looking for the pose of the camera relative to another camera location. This is described by Homography only for pure camera rotations. For General motion that includes translation this is described by rotation and translation matrices. If the fields of view of the cameras overlap the task can be solved with structure from motion which still estimates only 5 dof. This means that translation is estimated up to scale. If there is a chessboard with known dimensions in the cameras' field of view you can easily solve for 6dof by running a PnP algorithm. Of course, cameras should be calibrated first. Finally, in 2008 Marc Pollefeys came up with an idea how to estimate 6 dof from two moving cameras with non-overlapping fields of view without using any chess boards. To give you more detail please tell a bit for the intended appljcation you are looking for.

Related

How to improve accuracy of camera extrinsics calibration

I have a multi-camera system where the field of views are mostly non-overlapping. I have been researching on methods to calibrate the camera extrinsics and the first thing I'm going to try is to take a picture of a chessboard at a known location and use solvePnP from OpenCV to find the extrinsic rotation and translation vectors for each camera separately (following the method described in the answer here).
My problem is, this method uses only one measurement and as every measurement it is prone to errors. I assume that by taking multiple measurements, either by changing the position or the orientation of the chessboard, the accuracy can be improved. But what would be the best way to combine the rotation and translation obtained from the different measurements? A simple average?
In theory I would think that an option could be using solvePnP on all the points at the same time. Since I am calculating extrinsics the camera can't be moved so I would have to change to position and/or orientation of the board for each picture and measure the 3D points positions as accurately as possible each time.
I'm also wondering if using two chessboards in the same picture would be a possible solution, even if OpenCV doesn't seem to support multiple chessboard detection.
Is there a better way to measure extrinsics or anything that I'm missing?

How to calibrate 4 camera set around a circle?

Four cameras are arranged in a ring shape. How to calibrate the relative postures of the four cameras, that is, the attitudes of the other three cameras relative to the camera 0, the difficulties are:
When using a calibration plate, four cameras cannot see the calibration plate at the same time, and only two cameras can see the calibration plate, such as calibrating cam1 relative to cam0, then calibrating cam2 relative to cam0, and cam2 can only be relative to cam0. The indirect calculation, causing errors;
In the case of only calibrating two cameras, such as cam0 and cam1, the calibration plates seen by both cameras are tilted, and the calibration plate changes angle is small, which also causes errors.
Is there any better way to calibrate, thank you
There are many ways and papers introduced to this.
The similiest way is to calibrate two at a time. The pair need to be havig largest common FOV. But there are other methods as well.
You can use structure from motion-based method to move the camera around and jointly optimize for the camera poses. It was first published in CVPR between 2010 to 2016. forgot the exact year, but it about camera calibration with minimal or zero overlap.
You can add an IMU and use kalibra to calibrate them. Anchor all image to this IMU. https://github.com/ethz-asl/kalibr/wiki/camera-imu-calibration.
An alternative that I frequently use is the Robotics HAND EYE calibration System used in VINSMONO https://github.com/HKUST-Aerial-Robotics/VINS-Mono. The VINSMONO one requires no complicated pattern. just moving around.
For my paper, We use sea level vanishing line and vanishing point to calibrate cameras which cant get the same chessboard pattern in the same view.
Han Wang, Wei Mou, Xiaozheng Mou, Shenghai Yuan, Soner Ulun, Shuai Yang, Bok-Suk Shin, “An Automatic Self-Calibration Approach for Wide Baseline Stereo Cameras Using Sea Surface Images”, Unmanned Systems, Vol. 3, No. 4. pp. 277-290. 2015
There are others as well such as using vicon to image tracking system or many other methods. Just find one which you think is suitable for you and try it out.

Reconstruct 3D points from two images, given camera movement

I am trying to reconstruct the real-world coordinates of 3D points from two images taken from the same camera. The camera is not calibrated, but the movement (translation and rotation) is known. In short:
Requirement:
No calibration
Extra constraints other than image point correspondences:
Known camera translation and rotation
Same camera used in all views
I understand that, from image point correspondences alone, a scene can be reconstructed only up to a projective transformation. With more constraints, an affine or similarity reconstruction may be done. In my case, I need a similarity reconstruction.
Given the above constraints, is a similarity reconstruction possible? If possible, how should I go about doing it?
I have tried to attack the problem from a few angles. Since I am not mathematically fluent, I try to use opencv as much as possible.
findFundamentalMat() from the two images, hopefully extract the two camera matrices somehow, then triangulatePoints(). As you could have guessed, I got stuck in the middle, unable to obtain camera matrices from fundamental matrix.
The textbook "Multiple View Geometry in Computer Vision" (by Hartley and Zisserman) gives an expression (p.256, Result 9.14) that expresses the camera matrices in terms of fundamental matrix and one of the epipoles. However, without knowing the camera's intrinsic parameters (requirement: no calibration), I don't see how I can get the epipole.
I also try to treat my problem as a stereo system and use opencv's stereo*** functions. But they all seem to require human intervention to calibrate, which violates my requirement.
So, that's why I ask the question here today. The key is still, given those extra constraints, is a similarity reconstruction possible? I am not smart enough to understand the wealth of knowledge out there, and not able to come up with my own solution. Any help is appreciated.

Estimate pose of moving camera from a stationary camera, both looking at the same scene

Assume I have two independent cameras looking at the same scene (there are features that are visible from both) and that I know the calibration parameters of both the cameras individually (I can also perform stereo calibration at a certain baseline but I don't know if that would be useful). One of the cameras is fixed and stable, the other is noisy in terms of its pose (translation and rotation). As the pose keeps changing over time, is it possible to accurately estimate the pose of the moving camera with respect to the stationary one using image data from both cameras (in opencv)?
I've been doing a little bit of reading, and this is what I've gathered so far:
Find features using SIFT and the point correspondences.
Find the fundamental matrix.
Find essential matrix and perform SVD to obtain the R and t values between the cameras.
Does this approach work on a frame-by-frame basis? And how does the setup help in getting the scale factor? Pointers and suggestions would be very helpful.
Thanks!

How to compute the rotation and translation between 2 cameras?

I am aware of the chessboard camera calibration technique, and have implemented it.
If I have 2 cameras viewing the same scene, and I calibrate both simultaneously with the chessboard technique, can I compute the rotation matrix and translation vector between them? How?
If you have the 3D camera coordinates of the corresponding points, you can compute the optimal rotation matrix and translation vector by Rigid Body Transformation
If You are using OpenCV already then why don't you use cv::stereoCalibrate.
It returns the rotation and translation matrices. The only thing you have to do is to make sure that the calibration chessboard is seen by both of the cameras.
The exact way is shown in .cpp samples provided with OpenCV library( I have 2.2 version and samples were installed by default in /usr/local/share/opencv/samples).
The code example is called stereo_calib.cpp. Although it's not explained clearly what they are doing there (for that You might want to look to "Learning OpenCV"), it's something You can base on.
If I understood you correctly, you have two calibrated cameras observing a common scene, and you wish to recover their spatial arrangement. This is possible (provided you find enough image correspondences) but only up to an unknown factor on translation scale. That is, we can recover rotation (3 degrees of freedom, DOF) and only the direction of the translation (2 DOF). This is because we have no way to tell whether the projected scene is big and the cameras are far, or the scene is small and cameras are near. In the literature, the 5 DOF arrangement is termed relative pose or relative orientation (Google is your friend).
If your measurements are accurate and in general position, 6 point correspondences may be enough for recovering a unique solution. A relatively recent algorithm does exactly that.
Nister, D., "An efficient solution to the five-point relative pose problem," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.26, no.6, pp.756,770, June 2004
doi: 10.1109/TPAMI.2004.17
Update:
Use a structure from motion/bundle adjustment package like Bundler to solve simultaneously for the 3D location of the scene and relative camera parameters.
Any such package requires several inputs:
camera calibrations that you have.
2D pixel locations of points of interest in cameras (use a interest point detection like Harris, DoG (first part of SIFT)).
Correspondences between points of interest from each camera (use a descriptor like SIFT, SURF, SSD, etc. to do the matching).
Note that the solution is up to a certain scale ambiguity. You'll thus need to supply a distance measurement either between the cameras or between a pair of objects in the scene.
Original answer (applies primarily to uncalibrated cameras as the comments kindly point out):
This camera calibration toolbox from Caltech contains the ability to solve and visualize both the intrinsics (lens parameters, etc.) and extrinsics (how the camera positions when each photo is taken). The latter is what you're interested in.
The Hartley and Zisserman blue book is also a great reference. In particular, you may want to look at the chapter on epipolar lines and fundamental matrix which is free online at the link.

Resources