Estimate motion of monocular camera - opencv

I had read lectures and topics and I work on it since weeks, but I can't found the way to describe the motion of my camera. I don't want to reconstruct 3D world. I'm using OpenCV.
I have a monocular camera and an unknown word. I have the intrinsic and distortion parameters. I have features and correspondances. So I'm looking for the rotation and the translation between two frames. I would like to consider my first image as the origin of the XYZ axes.
I use the Fundamental matrix, and the Essential matrix to find the extrinsics parameters (R, T) but I'm not convinced. I had these results:
[0.040437..., 0.116076..., -0.992416...,
0.076999..., -0.99063..., -0.112731...,
-0.996211.., -0.071848.., -0.048994...]
[0.6924183...; 0.081694...; -716885...]
How can I check if they are good?
I calculated the euclidean distance to see the distance in 3D but I had erroneous values.
Please, can anyone give me some details, or guide me? I hope I explained myself well.

By word do you mean world? This question is also not really on topic for stackoverflow since it deals with a theory and not code.
To answer your question. If you and R and T then you can compute the 3D coordinate for each point. From that you can reproject each point onto the other camera and compute the residual error between the observed and predicted point. If the error is within a pixel or so it's probably valid.

Basically this way you will have unknown scale factor for each consecutive frame, so you can get strange values for R and T. But you can use some initialization like known motion in order to perform first triangulation of the scene. Next you can use solvePnP in order to calculate next [R|T].
Try to read about PTAMM which is one of the most interesting implementations of monocular SLAM


Reconstruct 3D points from two images, given camera movement

I am trying to reconstruct the real-world coordinates of 3D points from two images taken from the same camera. The camera is not calibrated, but the movement (translation and rotation) is known. In short:
No calibration
Extra constraints other than image point correspondences:
Known camera translation and rotation
Same camera used in all views
I understand that, from image point correspondences alone, a scene can be reconstructed only up to a projective transformation. With more constraints, an affine or similarity reconstruction may be done. In my case, I need a similarity reconstruction.
Given the above constraints, is a similarity reconstruction possible? If possible, how should I go about doing it?
I have tried to attack the problem from a few angles. Since I am not mathematically fluent, I try to use opencv as much as possible.
findFundamentalMat() from the two images, hopefully extract the two camera matrices somehow, then triangulatePoints(). As you could have guessed, I got stuck in the middle, unable to obtain camera matrices from fundamental matrix.
The textbook "Multiple View Geometry in Computer Vision" (by Hartley and Zisserman) gives an expression (p.256, Result 9.14) that expresses the camera matrices in terms of fundamental matrix and one of the epipoles. However, without knowing the camera's intrinsic parameters (requirement: no calibration), I don't see how I can get the epipole.
I also try to treat my problem as a stereo system and use opencv's stereo*** functions. But they all seem to require human intervention to calibrate, which violates my requirement.
So, that's why I ask the question here today. The key is still, given those extra constraints, is a similarity reconstruction possible? I am not smart enough to understand the wealth of knowledge out there, and not able to come up with my own solution. Any help is appreciated.

Camera pose estimation

I'm currently working on a project that deals with the reconstruction based on a set of images, in a multi-view stereo approach. As such I need to know the several images pose in space. I find matching features using surf, and from the correspondences I find the essential matrix.
Now comes the problem: It is possible to decompose the essential matrix with SVD, but this can lead to 4 different results, as I read in a book. How can I obtain the correct one, assuming this is possible?
What other algorithms can I use for this?
Wikipedia says:
It turns out, however, that only one of the four classes of solutions
can be realized in practice. Given a pair of corresponding image
coordinates, three of the solutions will always produce a 3D point
which lies behind at least one of the two cameras and therefore cannot
be seen. Only one of the four classes will consistently produce 3D
points which are in front of both cameras. This must then be the
correct solution.
If you have the extrinsic calibration parameters for the camera in the first frame, or if you assume that it lies at a default calibration, say translation of (0,0,0) and rotation of (0,0,0), then you can determine which of the decompositions is the valid one.
Thanks to Zaphod answer I was able to solve my problem. Here's what I did:
First I calculated the Essential Matrix (E) from a set of point correspondences in both images.
Using SVD, decomposed it into 2 solutions. Using the negated Essential Matrix -E (which also satisfies the same constraints) I arrived at 2 more solutions for a total of 4 possible camera positions and orientations.
Then, for all solutions I triangulated the point correspondences and determined which intersected in front of both cameras, by taking the dot product of the point coordinate and each of the cameras viewing direction. I both are positive, then that intersection is in front of both cameras.
In the end the solution that delivers the most intersections in front of the cameras is the chosen one.

motion reconstruction from a single camera

I have a single calibrated camera (known intrinsic parameters, i.e. camera matrix K is known, as well as the distortion coefficients).
I would like to reconstruct the camera's 3d trajectory. There is no a-priori knowledge about the scene.
simplifying the problem by presenting two images that look on the same scene and extracting two set of corresponding matched feature points from them (SIFT, SURF, ORB, etc.)
My problem is how can I calculate the camera extrinsic parameters (i.e. the rotation matrix R and the translation vector t ) between the to viewpoints?
I have managed to calculate the fundamental matrix, and since K is know, the essential matrix as well. using David Nister's efficient solution to the Five-Point Relative Pose Problem I've managed to get 4 possible solution but:
the constraint on the essential matrix E ~ U * diag (s,s,0) * V' doesn't always apply - causing incorrect results.
[EDIT]: taking the average singular value seems to correct the results :) one down
how can I tell which one of the four is the correct one?
Your solution to point 1 is correct: diag( (s1 + s2)/2, (s1 + s2)/2, 0).
As for telling which one of the four solutions is correct, only one will give positive depths for all points with respect to the camera frame. That's the one you want.
Code for checking which solution is correct can be found here: from
They use the determinants of U and V to determine the solution with the correct orientation. Look for the comment "then four possibilities are". Since you're only estimating the essential matrix, it's susceptible to noise and does not behave well at all if all of the points are coplanar.
Also, the translation is only recovered to within a constant scaling factor, so the fact that you're seeing a normalized translation vector of unit magnitude is exactly correct. The reason is that the depth is unknown and estimated to be 1. You'll have to find some way to recover the depth as in the code for the eight-point algorithm + 3d reconstruction (Algorithm 5.1 in the bookcode link.)
The book the sample code above is taken from is also a very good reference. Chapter 5, the one you're interested in, is available on the Sample Chapters link.
Congrats on your hard work, sounds like you've tried hard to learn these techniques.
For actual production-strength code, I'd advise to download libmv and ceres, and re-code your solution using them.
Your two questions are really one: invalid solutions are rejected based on the data you have collected. In particular, Nister's (as well as Stewenius's) algorithm is normally used in the inner loop of a RANSAC-like solver, which selects for the solution with the best fit / max number of inliers.

OpenCV: Camera Pose Estimation

I try to match two overlapping images captured with a camera. To do this, I'd like to use OpenCV. I already extracted the features with the SurfFeatureDetector. Now I try to to compute the rotation and translation vector between the two images.
As far as I know, I should use cvFindExtrinsicCameraParams2(). Unfortunately, this method require objectPoints as an argument. These objectPoints are the world coordinates of the extracted features. These are not known in the current context.
Can anybody give me a hint how to solve this problem?
The problem of simultaneously computing relative pose between two images and the unknown 3d world coordinates has been treated here:
Berthold K. P. Horn. Relative orientation revisited. Berthold K. P. Horn. Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 545 Technology ...
EDIT: here is a link to the paper:
Please see my answer to a related question where I propose a solution to this problem:
OpenCV extrinsic camera from feature points
EDIT: You may want to take a look at bundle adjustments too,
That assumes an initial estimate is available.
EDIT: I found some code resources you might want to take a look at:
Resource I:
Two View Geometry Estimation with Outliers
C++ code for finding the relative orientation of two calibrated
cameras in presence of outliers. The obtained solution is optimal in
the sense that the number of inliers is maximized.
Resource II: Relative orientation from
5 points: a somewhat more polished C routine implementing the minimal
solution for relative orientation of two calibrated cameras from
unknown 3D points. 5 points are required and there can be as many as
10 feasible solutions (but 2-5 is more common). Also requires a few
CLAPACK routines for linear algebra. There's also a short technical
report on this (included with the source).
Resource III:
vector_to_rel_pose Compute the relative orientation between two
cameras given image point correspondences and known camera parameters
and reconstruct 3D space points.
There is a theoretical solution, however, the OpenCV implementation of camera pose estimation lacks the needed tools.
The theoretical approach:
Step 1: extract the homography (the matrix describing the geometrical transform between images). use findHomography()
Step 2. Decompose the result matrix into rotations and translations. Use cv::solvePnP();
Problem: findHomography() returns a 3x3 matrix, corresponding to a projection from a plane to another. solvePnP() needs a 3x4 matrix, representing the 3D rotation/translation of the objects. I think that with some approximations, you can modify the solvePnP to give you some results, but it requires a lot of math and a very good understanding of 3D geometry.
Read more about at

How to compute the rotation and translation between 2 cameras?

I am aware of the chessboard camera calibration technique, and have implemented it.
If I have 2 cameras viewing the same scene, and I calibrate both simultaneously with the chessboard technique, can I compute the rotation matrix and translation vector between them? How?
If you have the 3D camera coordinates of the corresponding points, you can compute the optimal rotation matrix and translation vector by Rigid Body Transformation
If You are using OpenCV already then why don't you use cv::stereoCalibrate.
It returns the rotation and translation matrices. The only thing you have to do is to make sure that the calibration chessboard is seen by both of the cameras.
The exact way is shown in .cpp samples provided with OpenCV library( I have 2.2 version and samples were installed by default in /usr/local/share/opencv/samples).
The code example is called stereo_calib.cpp. Although it's not explained clearly what they are doing there (for that You might want to look to "Learning OpenCV"), it's something You can base on.
If I understood you correctly, you have two calibrated cameras observing a common scene, and you wish to recover their spatial arrangement. This is possible (provided you find enough image correspondences) but only up to an unknown factor on translation scale. That is, we can recover rotation (3 degrees of freedom, DOF) and only the direction of the translation (2 DOF). This is because we have no way to tell whether the projected scene is big and the cameras are far, or the scene is small and cameras are near. In the literature, the 5 DOF arrangement is termed relative pose or relative orientation (Google is your friend).
If your measurements are accurate and in general position, 6 point correspondences may be enough for recovering a unique solution. A relatively recent algorithm does exactly that.
Nister, D., "An efficient solution to the five-point relative pose problem," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.26, no.6, pp.756,770, June 2004
doi: 10.1109/TPAMI.2004.17
Use a structure from motion/bundle adjustment package like Bundler to solve simultaneously for the 3D location of the scene and relative camera parameters.
Any such package requires several inputs:
camera calibrations that you have.
2D pixel locations of points of interest in cameras (use a interest point detection like Harris, DoG (first part of SIFT)).
Correspondences between points of interest from each camera (use a descriptor like SIFT, SURF, SSD, etc. to do the matching).
Note that the solution is up to a certain scale ambiguity. You'll thus need to supply a distance measurement either between the cameras or between a pair of objects in the scene.
Original answer (applies primarily to uncalibrated cameras as the comments kindly point out):
This camera calibration toolbox from Caltech contains the ability to solve and visualize both the intrinsics (lens parameters, etc.) and extrinsics (how the camera positions when each photo is taken). The latter is what you're interested in.
The Hartley and Zisserman blue book is also a great reference. In particular, you may want to look at the chapter on epipolar lines and fundamental matrix which is free online at the link.
