image transformations - image-processing

So I've been using gnu-gsl and CImg to implement some of the fundamental projective space techniques for affine and metric rectification.
I've completed computing an affine rectification but, I'm having a hard time figuring out how to apply the affine rectification matrix to the original (input) image.
My current thought process is to iterate across the input image for each pixel coordinate. Then multiply the original pixel coordinate (converted to a homogeneous coordinate) by the affine rectification matrix to get the output pixel coordinate.
Then access the output image using the output pixel coordinate and conduct a blend (addition) operation on the output image's pixel location with the pixel color from the original image.
Does that sound right? I'm getting a lot of really weird values after multiplying the original pixel coordinate by the affine rectification matrix.

No, your values should not be weird. Why don't you make a simple example, a small scale with a small translation; e.g.
x' = 1.01*x + 0.0*y + 5;
y' = 0.0*x + 0.98*y + 10;
Now the pixel at (10,10) should map to (15.1,19.8), right ?
If you want to make a nice output image, you should find the forward projection and then back project to the input image and interpolate there rather than try to blend into the output image. Otherwise you will end up with gaps in the output.
You need to be careful with your terminology here; it sounds to me like you are doing projections, sometimes called warping in the computer graphics community. Rectification is something else, but it depends on what you are doing.

Related

extrinsic matrix computation with opencv

I am using opencv to calibrate my webcam. So, what I have done is fixed my webcam to a rig, so that it stays static and I have used a chessboard calibration pattern and moved it in front of the camera and used the detected points to compute the calibration. So, this is as we can find in many opencv examples (https://docs.opencv.org/3.1.0/dc/dbb/tutorial_py_calibration.html)
Now, this gives me the camera intrinsic matrix and a rotation and translation component for mapping each of these chessboard views from the chessboard space to world space.
However, what I am interested in is the global extrinsic matrix i.e. once I have removed the checkerboard, I want to be able to specify a point in the image scene i.e. x, y and its height and it gives me the position in the world space. As far as I understand, I need both the intrinsic and extrinsic matrix for this. How should one proceed to compute the extrinsic matrix from here? Can I use the measurements that I have already gathered from the chessboard calibration step to compute the extrinsic matrix as well?
Let me place some context. Consider the following picture, (from https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html):
The camera has "attached" a rigid reference frame (Xc,Yc,Zc). The intrinsic calibration that you successfully performed allows you to convert a point (Xc,Yc,Zc) into its projection on the image (u,v), and a point (u,v) in the image to a ray in (Xc,Yc,Zc) (you can only get it up to a scaling factor).
In practice, you want to place the camera in an external "world" reference frame, let's call it (X,Y,Z). Then there is a rigid transformation, represented by a rotation matrix, R, and a translation vector T, such that:
|Xc| |X|
|Yc|= R |Y| + T
|Zc| |Z|
That's the extrinsic calibration (which can be written also as a 4x4 matrix, that's what you call the extrinsic matrix).
Now, the answer. To obtain R and T, you can do the following:
Fix your world reference frame, for example the ground can be the (x,y) plane, and choose an origin for it.
Set some points with known coordinates in this reference frame, for example, points in a square grid in the floor.
Take a picture and get the corresponding 2D image coordinates.
Use solvePnP to obtain the rotation and translation, with the following parameters:
objectPoints: the 3D points in the world reference frame.
imagePoints: the corresponding 2D points in the image in the same order as objectPoints.
cameraMatris: the intrinsic matrix you already have.
distCoeffs: the distortion coefficients you already have.
rvec, tvec: these will be the outputs.
useExtrinsicGuess: false
flags: you can use CV_ITERATIVE
Finally, get R from rvec with the Rodrigues function.
You will need at least 3 non-collinear points with corresponding 3D-2D coordinates for solvePnP to work (link), but more is better. To have good quality points, you could print a big chessboard pattern, put it flat in the floor, and use it as a grid. What's important is that the pattern is not too small in the image (the larger, the more stable your calibration will be).
And, very important: for the intrinsic calibration, you used a chess pattern with squares of a certain size, but you told the algorithm (which does kind of solvePnPs for each pattern), that the size of each square is 1. This is not explicit, but is done in line 10 of the sample code, where the grid is built with coordinates 0,1,2,...:
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
And the scale of the world for the extrinsic calibration must match this, so you have several possibilities:
Use the same scale, for example by using the same grid or by measuring the coordinates of your "world" plane in the same scale. In this case, you "world" won't be at the right scale.
Recommended: redo the intrinsic calibration with the right scale, something like:
objp[:,:2] = (size_of_a_square*np.mgrid[0:7,0:6]).T.reshape(-1,2)
Where size_of_a_square is the real size of a square.
(Haven't done this, but is theoretically possible, do it if you can't do 2) Reuse the intrinsic calibration by scaling fx and fy. This is possible because the camera sees everything up to a scale factor, and the declared size of a square only changes fx and fy (and the T in the pose for each square, but that's another story). If the actual size of a square is L, then replace fx and fy Lfx and Lfy before calling solvePnP.

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Warping Perspective using arbitary rotation angle

I have an image of a chessboard taken at some angle. Now I want to warp perspective so the chessboard image look again as if was taken directly from above.
I know that I can try to use 'findHomography' between matched points but I wanted to avoid it and use e.g. rotation data from mobile sensors to build homography matrix on my own. I calibrated my camera to get intrinsic parameters. Then lets say the following image has been taken at ~60degrees angle around x-axis. I thought that all I have to do is to multiply camera matrix with rotation matrix to obtain homography matrix. I tried to use the following code but looks like I'm not understanding something correctly because it doesn't work as expected (result image completely black or white.
import cv2
import numpy as np
import math
camera_matrix = np.array([[ 5.7415988502105745e+02, 0., 2.3986181527877352e+02],
[0., 5.7473682183375217e+02, 3.1723734404756237e+02],
[0., 0., 1.]])
distortion_coefficients = np.array([ 1.8662919398453856e-01, -7.9649812697463640e-01,
1.8178068172317731e-03, -2.4296638847737923e-03,
7.0519002388825025e-01 ])
theta = math.radians(60)
rotx = np.array([[1, 0, 0],
[0, math.cos(theta), -math.sin(theta)],
[0, math.sin(theta), math.cos(theta)]])
homography = np.dot(camera_matrix, rotx)
im = cv2.imread('data/chess1.jpg')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
im_warped = cv2.warpPerspective(gray, homography, (480, 640), flags=cv2.WARP_INVERSE_MAP)
cv2.imshow('image', im_warped)
cv2.waitKey()
pass
I also have distortion_coefficients after calibration. How can those be incorporated into the code to improve results?
This answer is awfully late by several years, but here it is ...
(Disclaimer: my use of terminology in this answer may be imprecise or incorrect. Please do look up on this topic from other more credible sources.)
Remember:
Because you only have one image (view), you can only compute 2D homography (perspective correspondence between one 2D view and another 2D view), not the full 3D homography.
Because of that, the nice intuitive understanding of the 3D homography (rotation matrix, translation matrix, focal distance, etc.) are not available to you.
What we say is that with 2D homography you cannot factorize the 3x3 matrix into those nice intuitive components like 3D homography does.
You have one matrix - (which is the product of several matrices unknown to you) - and that is it.
However,
OpenCV provides a getPerspectiveTransform function which solves the 3x3 perspective matrix (using homogenous coordinate system) for a 2D homography between two planar quadrilaterals.
Link to documentation
To use this function,
Find the four corners of the chessboard on the image. These will be your source coordinates.
Supply four rectangle corners of your choice. These will be your destination coordinates.
Pass the source coordinates and destination coordinates into the getPerspectiveTransform to generate a 3x3 matrix that is able to dewarp your chessboard to an upright rectangle.
Notes to remember:
Mind the ordering of the four corners.
If the source coordinates are picked in clockwise order, the destination also needs to be picked in clockwise order.
Likewise, if counter-clockwise order is used, do it consistently.
Likewise, if z-order (top left, top right, bottom left, bottom right) is used, do it consistently.
Failure to order the corners consistently will generate a matrix that executes the point-to-point correspondence exactly (mathematically speaking), but will not generate a usable output image.
The aspect ratio of the destination rectangle can be chosen arbitrarily. In fact, it is not possible to deduce the "original aspect ratio" of the object in world coordinates, because "this is 2D homography, not 3D".
One problem is that to multiply by a camera matrix you need some concept of a z coordinate. You should start by getting basic image warping given Euler angles to work before you think about distortion coefficients. Have a look at this answer for a slightly more detailed explanation and try to duplicate my result. The idea of moving your image down the z axis and then projecting it with your camera matrix can be confusing, let me know if any part of it does not make sense.
You do not need to calibrate the camera nor estimate the camera orientation (the latter, however, in this case would be very easy: just find the vanishing points of those orthogonal bundles of lines, and take their cross product to find the normal to the plane, see Hartley & Zisserman's bible for details).
The only thing you need to do is estimate the homography that maps the checkers to squares, then apply it to the image.

How can I transform an image using matrices R and T (extrinsic parameters matrices) in opencv?

I have a rotation-translation matrix [R T] (3x4).
Is there a function in opencv that performs the rotation-translation described by [R T]?
A lot of solutions to this question I think make hidden assumptions. I will try to give you a quick summary of how I think about this problem (I have had to think about it a lot in the past). Warping between two images is a 2 dimensional process accomplished by a 3x3 matrix called a homography. What you have is a 3x4 matrix which defines a transform in 3 dimensions. You can convert between the two by treating your image as a flat plane in 3 dimensional space. The trick then is to decide on the initial position in world space of your image plane. You can then transform its position and project it onto a new image plane with your camera intrinsics matrix.
The first step is to decide where your initial image lies in world space, note that this does not have to be the same as your initial R and T matrices specify. Those are in world coordinates, we are talking about the image created by that world, all the objects in the image have been flattened into a plane. The simplest decision here is to set the image at a fixed displacement on the z axis and no rotation. From this point on I will assume no rotation. If you would like to see the general case I can provide it but it is slightly more complicated.
Next you define the transform between your two images in 3d space. Since you have both transforms with respect to the same origin, the transform from [A] to [B] is the same as the transform from [A] to your origin, followed by the transform from the origin to [B]. You can get that by
transform = [B]*inverse([A])
Now conceptually what you need to do is to take your first image, project its pixels onto the geometric interpretation of your image in 3d space, then transform those pixels in 3d space by the transform above, then project them back onto a new 2d image with your camera matrix. Those steps need to be combined into a single 3x3 matrix.
cv::Matx33f convert_3x4_to_3x3(cv::Matx34f pose, cv::Matx33f camera_mat, float zpos)
{
//converted condenses the 3x4 matrix which transforms a point in world space
//to a 3x3 matrix which transforms a point in world space. Instead of
//multiplying pose by a 4x1 3d homogeneous vector, by specifying that the
//incoming 3d vectors will ALWAYS have a z coordinate of zpos, one can instead
//multiply converted by a homogeneous 2d vector and get the same output for x and y.
cv::Matx33f converted(pose(0,0),pose(0,1),pose(0,2)*zpos+pose(0,3),
pose(1,0),pose(1,1),pose(1,2)*zpos+pose(1,3),
pose(2,0),pose(2,1),pose(2,2)*zpos+pose(2,3));
//This matrix will take a homogeneous 2d coordinate and "projects" it onto a
//flat plane at zpos. The x and y components of the incoming homogeneous 2d
//coordinate will be correct, the z component is dropped.
cv::Matx33f projected(1,0,0,
0,1,0,
0,0,zpos);
projected = projected*camera_mat.inv();
//now we have the pieces. A matrix which can take an incoming 2d point, and
//convert it into a pseudo 3d point (x and y correspond to 3d, z is unused)
//and a matrix which can take our pseudo 3d point and transform it correctly.
//Now we just need to turn our transformed pseudo 3d point back into a 2d point
//in our new image, to do that simply multiply by the camera matrix.
return camera_mat*converted*projected;
}
This is probably a more complicated answer than you were looking for but I hope it gives you an idea of what you are asking. This can be very confusing and I glazed over some parts of it quickly, feel free to ask for clarification. If you need the solution to work without the assumption that the initial image appears without rotation let me know, I just didn't want to make it more complicated than it needed to be.

Finding Homography And Warping Perspective

With FeatureDetector I get features on two images with the same element and match this features with BruteForceMatcher.
Then I'm using OpenCv function findHomography to get homography matrix
H = findHomography( src2Dfeatures, dst2Dfeatures, outlierMask, RANSAC, 3);
and getting H matrix, then align image with
warpPerspective(img1,alignedSrcImage,H,img2.size(),INTER_LINEAR,BORDER_CONSTANT);
I need to know rotation angle, scale, displacement of detected element. Is there any simple way to get this than some big equations? Some evaluated formulas just to put data in?
Homography would match projections of your elements lying on a plane or lying arbitrary in 3D if the camera goes through a pure rotation or zoom and no translation. So here are the cases we are talking about with indication of what is the input to our calculations:
- planar target, pure rotation, intra-frame homography
- planar target, rotation and translation, target to frame homography
- 3D target, pure rotation, frame to frame mapping (constrained by a fundamental matrix)
In case of the planar target, a pure rotation is easy to calculate through your frame-to-frame Homography (H12):
given intrinsic camera matrix A, plane to image homographies for frame H1, and H2 that can be expressed as H1=A, H2=A*R, H12 = H2*H1-1=ARA-1 and thus R=A-1H12*A
In case of elements lying on a plane, rotation with translation of the camera (up to unknown scale) can be calculated through decomposition of target-to-frame homography. Note that the target can be just one of the views. Assuming you have your original planar target as an image (taken at some reference orientation) your task is to decompose the homography between images H12 which can be done through SVD. The first two columns of H represent the first two columns of the rotation martrix and be be recovered through H=ULVT, [r1 r2] = UDVT where D is 3x2 Identity matrix with the last row being all 0. The third column of a rotation matrix is just a vector product of the first two columns. The last column of the Homography is a translation vector times some constant.
Finally for arbitrary configuration of points in 3D and pure camera rotation, the rotation is calculated using the essential matrix decomposition rather than homography, see this
cv::decomposeProjectionMatrix();
and
cv::RQDecomp3x3();
are both similar to what you want to achive.
None of them is perfect. The theory behind them and why you cannot extract all params from a 3x3 matrix is a bit cumbersome. But the short answer is that a 3x3 proj matrix is a simplification from the complete 4x4 one, based on the fact that all points stay in the same plane.
You can try to use levenberg marquardt optimalization, where parameters will be translation and rotation, equations will represent by computed distances between features from two images(use only inliers from ransac homography).
Here is C++ implementation of LM http://www.ics.forth.gr/~lourakis/levmar/

Resources