Rotation and Position Tracking with OpenCV and Optical Flow

Rotation and Position Tracking with OpenCV and Optical Flow - opencv

I would like to track the rotation and translation of an object in OpenCV using Optical Flow. So far I've got something like this:
Call goodFeaturesToTrack to find initial features
Call calcOpticalFlowPyrLK to track the movement of feature points
Call findHomography to find how the points in image A moved to image B
Call perspectiveTransform to move points based on the homography
Call solvePnPRansac to find the rotation matrix and translation
vector
At this point I'm trying to take the difference between the rotation and translation Mat between images and add them to an initial Rotation and Translation Matrix.
cv::solvePnPRansac(pattern.points3d, _points2d, calibration.getIntrinsic(), calibration.getDistorsion(), raux, taux);
raux.convertTo(Rvec, CV_32F);
taux.convertTo(Tvec, CV_32F);
cv::Mat_<float> rotMat(3, 3);
cv::Rodrigues(Rvec, rotMat);
cv::Mat_<float> transDiff = _prevTranslation - Tvec;
cv::Mat_<float> rotDiff = _prevRotation - rotMat;
_absRotation += rotDiff;
_absTranslation += transDiff;
The problem with this approach is translation vector doesn't follow the images. The vector tends to stay in the range
[0.02 0.2 -1.5]
It doesn't stray far from this position.
Thanks.

Related

camera frame world coordinates relative to fiducial

I am trying to determine camera position in world coordinates, relative to a fiducial position based on fiducial marker found in a scene.
My methodology for determining the viewMatrix is described here:
Determine camera pose?
I have the rotation and translation, [R|t], from the trained marker to the scene image. Given camera calibration training, and thus the camera intrinsic results, I should be able to discern the cameras position in world coordinates based on the perspective & orientation of the marker found in the scene image.
Can anybody direct me to a discussion or example similar to this? I'd like to know my cameras position based on the fiducial marker, and I'm sure that something similar to this has been done before, I'm just not searching the correct keywords.
Appreciate your guidance.

What do you mean under world coordinates? If you mean object coordinates then you should use the inverse transformation of solvepnp's result.
Given a view matrix [R|t], we have that inv([R|t]) = [R'|-R'*t], where R' is the transpose of R. In OpenCV:
cv::Mat rvec, tvec;
cv::solvePnP(objectPoints, imagePoints, intrinsics, distortion, rvec, tvec);
cv::Mat R;
cv::Rodrigues(rvec, rotation);
R = R.t(); // inverse rotation
tvec = -R * tvec; // translation of inverse
// camPose is a 4x4 matrix with the pose of the camera in the object frame
cv::Mat camPose = cv::Mat::eye(4, 4, R.type());
R.copyTo(camPose.rowRange(0, 3).colRange(0, 3)); // copies R into camPose
tvec.copyTo(camPose.rowRange(0, 3).colRange(3, 4)); // copies tvec into camPose
Update #1:
Result of solvePnP
solvePnP estimates the object pose given a set of object points (model coordinates), their corresponding image projections (image coordinates), as well as the camera matrix and the distortion coefficients.
The object pose is given by two vectors, rvec and tvec. rvec is a compact representation of a rotation matrix for the pattern view seen on the image. That is, rvec together with the corresponding tvec brings the fiducial pattern from the model coordinate space (in which object points are specified) to the camera coordinate space.
That is, we are in the camera coordinate space, it moves with the camera, and the camera is always at the origin. The camera axes have the same directions as image axes, so
x-axis is pointing in the right side from the camera,
y-axis is pointing down,
and z-axis is pointing to the direction of camera view
The same would apply to the model coordinate space, so if you specified the origin in upper right corner of the fiducial pattern, then
x-axis is pointing to the right (e.g. along the longer side of your pattern),
y-axis is pointing to the other side (e.g. along the shorter one),
and z-axis is pointing to the ground.
You can specify the world origin as the first point of the object points that is the first object is set to (0, 0, 0) and all other points have z=0 (in case of planar patterns). Then tvec (combined rvec) points to the origin of the world coordinate space in which you placed the fiducial pattern. solvePnP's output has the same units as the object points.
Take a look at to the following: 6dof positional tracking. I think this is very similar as you need.

OpenCV: solvePnP detection problems

I've got problem with precise detection of markers using OpenCV.
I've recorded video presenting that issue: http://youtu.be/IeSSW4MdyfU
As you see I'm markers that I'm detecting are slightly moved at some camera angles. I've read on the web that this may be camera calibration problems, so I'll tell you guys how I'm calibrating camera, and maybe you'd be able to tell me what am I doing wrong?
At the beginnig I'm collecting data from various images, and storing calibration corners in _imagePoints vector like this
std::vector<cv::Point2f> corners;
_imageSize = cvSize(image->size().width, image->size().height);
bool found = cv::findChessboardCorners(*image, _patternSize, corners);
if (found) {
cv::Mat *gray_image = new cv::Mat(image->size().height, image->size().width, CV_8UC1);
cv::cvtColor(*image, *gray_image, CV_RGB2GRAY);
cv::cornerSubPix(*gray_image, corners, cvSize(11, 11), cvSize(-1, -1), cvTermCriteria(CV_TERMCRIT_EPS+ CV_TERMCRIT_ITER, 30, 0.1));
cv::drawChessboardCorners(*image, _patternSize, corners, found);
}
_imagePoints->push_back(_corners);
Than, after collecting enough data I'm calculating camera matrix and coefficients with this code:
std::vector< std::vector<cv::Point3f> > *objectPoints = new std::vector< std::vector< cv::Point3f> >();
for (unsigned long i = 0; i < _imagePoints->size(); i++) {
std::vector<cv::Point2f> currentImagePoints = _imagePoints->at(i);
std::vector<cv::Point3f> currentObjectPoints;
for (int j = 0; j < currentImagePoints.size(); j++) {
cv::Point3f newPoint = cv::Point3f(j % _patternSize.width, j / _patternSize.width, 0);
currentObjectPoints.push_back(newPoint);
}
objectPoints->push_back(currentObjectPoints);
}
std::vector<cv::Mat> rvecs, tvecs;
static CGSize size = CGSizeMake(_imageSize.width, _imageSize.height);
cv::Mat cameraMatrix = [_userDefaultsManager cameraMatrixwithCurrentResolution:size]; // previously detected matrix
cv::Mat coeffs = _userDefaultsManager.distCoeffs; // previously detected coeffs
cv::calibrateCamera(*objectPoints, *_imagePoints, _imageSize, cameraMatrix, coeffs, rvecs, tvecs);
Results are like you've seen in the video.
What am I doing wrong? is that an issue in the code? How much images should I use to perform calibration (right now I'm trying to obtain 20-30 images before end of calibration).
Should I use images that containg wrongly detected chessboard corners, like this:
or should I use only properly detected chessboards like these:
I've been experimenting with circles grid instead of of chessboards, but results were much worse that now.
In case of questions how I'm detecting marker: I'm using solvepnp function:
solvePnP(modelPoints, imagePoints, [_arEngine currentCameraMatrix], _userDefaultsManager.distCoeffs, rvec, tvec);
with modelPoints specified like this:
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
and imagePoints are coordinates of marker corners in processing image (I'm using custom algorithm to do that)

In order to properly debug your problem I would need all the code :-)
I assume you are following the approach suggested in the tutorials (calibration and pose) cited by #kobejohn in his comment and so that your code follows these steps:
collect various images of chessboard target
find chessboard corners in images of point 1)
calibrate the camera (with cv::calibrateCamera) and so obtain as a result the intrinsic camera parameters (let's call them intrinsic) and the lens distortion parameters (let's call them distortion)
collect an image of your own custom target (the target is seen at 0:57 in your video) and it is shown in the following figure and find some relevant points in it (let's call the point you found in image image_custom_target_vertices and world_custom_target_vertices the corresponding 3D points).
estimate the rotation matrix (let's call it R) and the translation vector (let's call it t) of the camera from the image of your own custom target you get in point 4), with a call to cv::solvePnP like this one cv::solvePnP(world_custom_target_vertices,image_custom_target_vertices,intrinsic,distortion,R,t)
giving the 8 corners cube in 3D (let's call them world_cube_vertices) you get the 8 2D image points (let's call them image_cube_vertices) by means of a call to cv2::projectPoints like this one cv::projectPoints(world_cube_vertices,R,t,intrinsic,distortion,image_cube_vertices)
draw the cube with your own draw function.
Now, the final result of the draw procedure depends on all the previous computed data and we have to find where the problem lies:
Calibration: as you observed in your answer, in 3) you should discard the images where the corners are not properly detected. You need a threshold for the reprojection error in order to discard "bad" chessboard target images. Quoting from the calibration tutorial:
Re-projection Error
Re-projection error gives a good estimation of just how exact is the
found parameters. This should be as close to zero as possible. Given
the intrinsic, distortion, rotation and translation matrices, we first
transform the object point to image point using cv2.projectPoints().
Then we calculate the absolute norm between what we got with our
transformation and the corner finding algorithm. To find the average
error we calculate the arithmetical mean of the errors calculate for
all the calibration images.
Usually you will find a suitable threshold with some experiments. With this extra step you will get better values for intrinsic and distortion.
Finding you own custom target: it does not seem to me that you explain how you find your own custom target in the step I labeled as point 4). Do you get the expected image_custom_target_vertices? Do you discard images where that results are "bad"?
Pose of the camera: I think that in 5) you use intrinsic found in 3), are you sure nothing is changed in the camera in the meanwhile? Referring to the Callari's Second Rule of Camera Calibration:
Second Rule of Camera Calibration: "Thou shalt not touch the lens
after calibration". In particular, you may not refocus nor change the
f-stop, because both focusing and iris affect the nonlinear lens
distortion and (albeit less so, depending on the lens) the field of
view. Of course, you are completely free to change the exposure time,
as it does not affect the lens geometry at all.
And then there may be some problems in the draw function.

So, I've experimented a lot with my code, and I still haven't fixed the main issue (shifted objects), but I've managed to answer some of calibration questions I've asked.
First of all - in order to obtain good calibration results you have to use images with properly detected grid elements/circles positions!. Using all captured images in calibration process (even those that aren't properly detected) will result bad calibration.
I've experimented with various calibration patterns:
Asymmetric circles pattern (CALIB_CB_ASYMMETRIC_GRID), give much worse results than any other pattern. By worse results I mean that it produces a lot of wrongly detected corners like these:
I've experimented with CALIB_CB_CLUSTERING and it haven't helped much - in some cases (different light environment) it got better, but not much.
Symmetric circles pattern (CALIB_CB_SYMMETRIC_GRID) - better results than asymmetric grid, but still I've got much worse results than standard grid (chessboard). It often produces errors like these:
Chessboard (found using findChessboardCorners function) - this method is producing best possible results - it doesn't produce misaligned corners very often, and almost every calibration is producing similar results to best-possible results from symmetric circles grid
For every calibration I've been using 20-30 images that were coming from different angles. I've tried even with 100+ images but it haven't produced noticeable change in calibration results than smaller amount of images. It's worth noticing that larger number of test images is increasing time needed to compute camera parameters in non-linear way (100 test images in 480x360 resolution are computing 25 minutes in iPad4, compared with 4 minutes with ~50 images)
I've also experimented with solvePNP parameters - but is also haven't gave me any acceptable results: I've tried all 3 detection methods (ITERATIVE, EPNP and P3P), but I haven't seen aby noticeable change.
Also I've tried with useExtrinsicGuess set to true, and I've used rvec and tvec from previous detection, but this one resulted with complete disapperance of detected cube.
I've ran out of ideas - what else could be affecting these shifting problems?

For those still interested:
this is an old question, but I think your problem is not the bad calibration.
I developed an AR app for iOS, using OpenCV and SceneKit, and I have had your same issue.
I think your problem is the wrong render position of the cube:
OpenCV's solvePnP returns the X, Y, Z coordinates of the marker center, but you wanna render the cube over the marker, at a specific distance along the Z axis of the marker, exactly at one half of the cube side size. So you need to improve the Z coordinate of the marker translation vector of this distance.
In fact, when you see your cube from the top, the cube is render properly.
I have done an image in order to explain the problem, but my reputation prevent to post it.

Extract face rotation from homography in a video

I'm trying to determine the orientation of a face in a video.
The video starts with the frontal image of the face, so it has no rotation. In the following frames the head rotates and i'm trying to determine the rotation, which will lead me to determine the face orientation based on the camera position.
I'm using OpenCV and C++ for the job.
I'm using SURF descriptors to find points on the face which i use to calculate an homography between the two images. Being the two frames very close to each other, the head rotation will be minimal in that interval and my homography matrix will be close to the identity matrix.
This is my homography matrix:
H = findHomography(k1,k2,RANSAC,8);
where k1 and k2 are the keypoints extracted with SURF.
I'm using decomposeProjectionMatrix to extract the rotation matrix but now i'm not sure how to interpret the rotMatrix. This one too is basically (1 0 0; 0 1 0; 0 0 1) (where the 0 are numbers in a range from e-10 to e-16).
In theory, what is was trying to do was to find the angle of the rotation at each frame and store it somewhere, so that if i get a 1° change in each frame, after 10 frames i know that my head has changed its orientation by 10°.
I spend some time reading everything i could find about QR decomposition, homography matrices and so on, but i haven't been able to get around this. Hence, any help would be really appreciated.
Thanks!

The upper-left 2x2 of the homography matrix is a 2D rotation matrix. If you work through the multiplication of the matrix with a point (i.e. take R*p), you'll see it's equivalent to:
newX = oldVector dot firstRow
newY = oldVector dot secondRow
In other words, the first row of the matrix is a unit vector which is the x axis of the new head. (If there's a scale difference between the frames it won't be a unit vector, but this method will still work.) So you should be able to calculate
rotation = atan2(second entry of first row, first entry of first row)

How to determine world coordinates of a camera?

I have a rectangular target of known dimensions and location on a wall, and a mobile camera on a robot. As the robot is driving around the room, I need to locate the target and compute the location of the camera and its pose. As a further twist, the camera's elevation and azimuth can be changed using servos. I am able to locate the target using OpenCV, but I am still fuzzy on calculating the camera's position (actually, I've gotten a flat spot on my forehead from banging my head against a wall for the last week). Here is what I am doing:
Read in previously computed camera intrinsics file
Get the pixel coordinates of the 4 points of the target rectangle from the contour
Call solvePnP with the world coordinates of the rectangle, the pixel coordinates, the camera matrix and the distortion matrix
Call projectPoints with the rotation and translation vectors
???
I have read the OpenCV book, but I guess I'm just missing something on how to use the projected points, rotation and translation vectors to compute the world coordinates of the camera and its pose (I'm not a math wiz) :-(
2013-04-02
Following the advice from "morynicz", I have written this simple standalone program.
#include <Windows.h>
#include "opencv\cv.h"
using namespace cv;
int main (int argc, char** argv)
{
const char *calibration_filename = argc >= 2 ? argv [1] : "M1011_camera.xml";
FileStorage camera_data (calibration_filename, FileStorage::READ);
Mat camera_intrinsics, distortion;
vector<Point3d> world_coords;
vector<Point2d> pixel_coords;
Mat rotation_vector, translation_vector, rotation_matrix, inverted_rotation_matrix, cw_translate;
Mat cw_transform = cv::Mat::eye (4, 4, CV_64FC1);
// Read camera data
camera_data ["camera_matrix"] >> camera_intrinsics;
camera_data ["distortion_coefficients"] >> distortion;
camera_data.release ();
// Target rectangle coordinates in feet
world_coords.push_back (Point3d (10.91666666666667, 10.01041666666667, 0));
world_coords.push_back (Point3d (10.91666666666667, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 10.01041666666667, 0));
// Coordinates of rectangle in camera
pixel_coords.push_back (Point2d (284, 204));
pixel_coords.push_back (Point2d (286, 249));
pixel_coords.push_back (Point2d (421, 259));
pixel_coords.push_back (Point2d (416, 216));
// Get vectors for world->camera transform
solvePnP (world_coords, pixel_coords, camera_intrinsics, distortion, rotation_vector, translation_vector, false, 0);
dump_matrix (rotation_vector, String ("Rotation vector"));
dump_matrix (translation_vector, String ("Translation vector"));
// We need inverse of the world->camera transform (camera->world) to calculate
// the camera's location
Rodrigues (rotation_vector, rotation_matrix);
Rodrigues (rotation_matrix.t (), camera_rotation_vector);
Mat t = translation_vector.t ();
camera_translation_vector = -camera_rotation_vector * t;
printf ("Camera position %f, %f, %f\n", camera_translation_vector.at<double>(0), camera_translation_vector.at<double>(1), camera_translation_vector.at<double>(2));
printf ("Camera pose %f, %f, %f\n", camera_rotation_vector.at<double>(0), camera_rotation_vector.at<double>(1), camera_rotation_vector.at<double>(2));
}
The pixel coordinates I used in my test are from a real image that was taken about 27 feet left of the target rectangle (which is 62 inches wide and 20 inches high), at about a 45 degree angle. The output is not what I'm expecting. What am I doing wrong?
Rotation vector
2.7005
0.0328
0.4590
Translation vector
-10.4774
8.1194
13.9423
Camera position -28.293855, 21.926176, 37.650714
Camera pose -2.700470, -0.032770, -0.459009
Will it be a problem if my world coordinates have the Y axis inverted from that of OpenCV's screen Y axis? (the origin of my coordinate system is on the floor to the left of the target, while OpenCV's orgin is the top left of the screen).
What units is the pose in?

You get the translation and rotation vectors from solvePnP, which are telling where is the object in camera's coordinates. You need to get an inverse transform.
The transform camera -> object can be written as a matrix [R T;0 1] for homogeneous coordinates. The inverse of this matrix would be, using it's special properties, [R^t -R^t*T;0 1] where R^t is R transposed. You can get R matrix from Rodrigues transform. This way You get the translation vector and rotation matrix for transformation object->camera coordiantes.
If You know where the object lays in the world coordinates You can use the world->object transform * object->camera transform matrix to extract cameras translation and pose.
The pose is described either by single vector or by the R matrix, You surely will find it in Your book. If it's "Learning OpenCV" You will find it on pages 401 - 402 :)
Looking at Your code, You need to do something like this
cv::Mat R;
cv::Rodrigues(rotation_vector, R);
cv::Mat cameraRotationVector;
cv::Rodrigues(R.t(),cameraRotationVector);
cv::Mat cameraTranslationVector = -R.t()*translation_vector;
cameraTranslationVector contains camera coordinates. cameraRotationVector contains camera pose.

It took me forever to understand it, but the pose meaning is the rotation over each axes - x,y,z.
It is in radians. The values are between Pie to minus Pie (-3.14 - 3.14)
Edit:
I've might been mistaken. I read that the pose is the vector which indicates the direction of the camera, and the length of the vector indicates how much to rotate the camera around that vector.

Finding Homography And Warping Perspective

With FeatureDetector I get features on two images with the same element and match this features with BruteForceMatcher.
Then I'm using OpenCv function findHomography to get homography matrix
H = findHomography( src2Dfeatures, dst2Dfeatures, outlierMask, RANSAC, 3);
and getting H matrix, then align image with
warpPerspective(img1,alignedSrcImage,H,img2.size(),INTER_LINEAR,BORDER_CONSTANT);
I need to know rotation angle, scale, displacement of detected element. Is there any simple way to get this than some big equations? Some evaluated formulas just to put data in?

Homography would match projections of your elements lying on a plane or lying arbitrary in 3D if the camera goes through a pure rotation or zoom and no translation. So here are the cases we are talking about with indication of what is the input to our calculations:
- planar target, pure rotation, intra-frame homography
- planar target, rotation and translation, target to frame homography
- 3D target, pure rotation, frame to frame mapping (constrained by a fundamental matrix)
In case of the planar target, a pure rotation is easy to calculate through your frame-to-frame Homography (H12):
given intrinsic camera matrix A, plane to image homographies for frame H1, and H2 that can be expressed as H1=A, H2=A*R, H12 = H2*H1-1=ARA-1 and thus R=A-1H12*A
In case of elements lying on a plane, rotation with translation of the camera (up to unknown scale) can be calculated through decomposition of target-to-frame homography. Note that the target can be just one of the views. Assuming you have your original planar target as an image (taken at some reference orientation) your task is to decompose the homography between images H12 which can be done through SVD. The first two columns of H represent the first two columns of the rotation martrix and be be recovered through H=ULVT, [r1 r2] = UDVT where D is 3x2 Identity matrix with the last row being all 0. The third column of a rotation matrix is just a vector product of the first two columns. The last column of the Homography is a translation vector times some constant.
Finally for arbitrary configuration of points in 3D and pure camera rotation, the rotation is calculated using the essential matrix decomposition rather than homography, see this

cv::decomposeProjectionMatrix();
and
cv::RQDecomp3x3();
are both similar to what you want to achive.
None of them is perfect. The theory behind them and why you cannot extract all params from a 3x3 matrix is a bit cumbersome. But the short answer is that a 3x3 proj matrix is a simplification from the complete 4x4 one, based on the fact that all points stay in the same plane.

You can try to use levenberg marquardt optimalization, where parameters will be translation and rotation, equations will represent by computed distances between features from two images(use only inliers from ransac homography).
Here is C++ implementation of LM http://www.ics.forth.gr/~lourakis/levmar/

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart