Image 3D rotation OpenCV - opencv

I need to perform a 3D rotation of a 2D image on x and y axis.
I read that i have to use the Homographic matrix on OpenCV , but i don't know how to set the matrix to perform a common rotation angle. For example 30 degree on x axis or 45° on y axis.
I read this post : Translating and Rotating an Image in 3D using OpenCV. I have tried different values of the f but it doesn't work.
I want to know which parameters of the matrix i have to change and how (formula).
Thank you!

Follow that same post, but replace your rotation matrix. Familiarize yourself with the Rorigues() function. You can send it a 1 x 3 array of the x, y, and z rotations. It will give you a a 3 x 3 rotation matrix. Plug this matrix in as the first 3 columns and 3 rows of R (leave the rest the same). If you don't want any translation, make sure you set the variable dist to 0 in the code on that page.

Related

Coordinate transformation in OpenCV

I have a polyline figure, given as an array of relative x and y point coordinates (0.0 to 1.0).
I have to draw the figure with random position, scale and rotation angle.
How can I do it in the best way?
You could use a simple transformation with RT matrix.
Let X = (x y 1)^t be coordinates of one point of your figure. Let R be a 2x2 rotation matrix, and T be 2x1 translation vector of the transformation You plan to make. RT matrix A will have the form of A = [R T;0 0 1]. To get transformed coordinates of point X, You need to do this simple calculation AX = X', where X' are the new coordinates. Now, to get the whole figure transformed, instead of using a single column, You use a matrix where each column has x coordinate in first row, y in the second and 1 in the third row.
Of course You can try to use functions provided by OpenCV, shown in this tutorial, or ones intended for vectors of points instead of whole images, but the way above makes You actually understand what are You doing ;)

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Extract face rotation from homography in a video

I'm trying to determine the orientation of a face in a video.
The video starts with the frontal image of the face, so it has no rotation. In the following frames the head rotates and i'm trying to determine the rotation, which will lead me to determine the face orientation based on the camera position.
I'm using OpenCV and C++ for the job.
I'm using SURF descriptors to find points on the face which i use to calculate an homography between the two images. Being the two frames very close to each other, the head rotation will be minimal in that interval and my homography matrix will be close to the identity matrix.
This is my homography matrix:
H = findHomography(k1,k2,RANSAC,8);
where k1 and k2 are the keypoints extracted with SURF.
I'm using decomposeProjectionMatrix to extract the rotation matrix but now i'm not sure how to interpret the rotMatrix. This one too is basically (1 0 0; 0 1 0; 0 0 1) (where the 0 are numbers in a range from e-10 to e-16).
In theory, what is was trying to do was to find the angle of the rotation at each frame and store it somewhere, so that if i get a 1° change in each frame, after 10 frames i know that my head has changed its orientation by 10°.
I spend some time reading everything i could find about QR decomposition, homography matrices and so on, but i haven't been able to get around this. Hence, any help would be really appreciated.
Thanks!
The upper-left 2x2 of the homography matrix is a 2D rotation matrix. If you work through the multiplication of the matrix with a point (i.e. take R*p), you'll see it's equivalent to:
newX = oldVector dot firstRow
newY = oldVector dot secondRow
In other words, the first row of the matrix is a unit vector which is the x axis of the new head. (If there's a scale difference between the frames it won't be a unit vector, but this method will still work.) So you should be able to calculate
rotation = atan2(second entry of first row, first entry of first row)

Rotate, Scale and Translate around image centre in OpenCV

I really hope this isn't a waste of anyone's time but I've run into a small problem. I am able to construct the transformation matrix using the following:
M =
s*cos(theta) -s*sin(theta) t_x
s*sin(theta) s*cos(theta) t_y
0 0 1
This works if I give the correct values for theta, s (scale) and tx/ty and then use this matrix as one of the arguments for cv::warpPerspective. The problem lies in that this matrix rotates about the (0,0) pixel whereas I would like it to rotate about the centre pixel (cols/2, rows/2). How can incoporate the centre point rotation into this matrix?
Two possibilities. The first is to use the function getRotationMatrix2D which takes the center of rotation as an argument, and gives you a 2x3 matrix. Add the third row and you're done.
A second possibility is to construct an additional matrix that translates the picture before and after the rotation:
T =
1 0 -cols/2
0 1 -rows/2
0 0 1
Multiply your rotation matrix M with this one to get the total transform -TMT (e.g. with function gemm) and apply this one with warpPerspective.

trying to understand the Affine Transform

I am playing with the affine transform in OpenCV and I am having trouble getting an intuitive understanding of it workings, and more specifically, just how do I specify the parameters of the map matrix so I can get a specific desired result.
To setup the question, the procedure I am using is 1st to define a warp matrix, then do the transform.
In OpenCV the 2 routines are (I am using an example in the excellent book OpenCV by Bradski & Kaehler):
cvGetAffineTransorm(srcTri, dstTri, warp_matrix);
cvWarpAffine(src, dst, warp_mat);
To define the warp matrix, srcTri and dstTri are defined as:
CvPoint2D32f srcTri[3], dstTri[3];
srcTri[3] is populated as follows:
srcTri[0].x = 0;
srcTri[0].y = 0;
srcTri[1].x = src->width - 1;
srcTri[1].y = 0;
srcTri[2].x = 0;
srcTri[2].y = src->height -1;
This is essentially the top left point, top right point, and bottom left point of the image for starting point of the matrix. This part makes sense to me.
But the values for dstTri[3] just are confusing, at least, when I vary a single point, I do not get the result I expect.
For example, if I then use the following for the dstTri[3]:
dstTri[0].x = 0;
dstTri[0].y = 0;
dstTri[1].x = src->width - 1;
dstTri[1].y = 0;
dstTri[2].x = 0;
dstTri[2].y = 100;
It seems that the only difference between the src and the dst point is that the bottom left point is moved to the right by 100 pixels. Intuitively, I feel that the bottom part of the image should be shifted to the right by 100 pixels, but this is not so.
Also, if I use the exact same values for dstTri[3] that I use for srcTri[3], I would think that the transform would produce the exact same image--but it does not.
Clearly, I do not understand what is going on here. So, what does the mapping from the srcTri[] to the dstTri[] represent?
Here is a mathematical explanation of an affine transform:
this is a matrix of size 3x3 that applies the following transformations on a 2D vector: Scale in X axis, scale Y, rotation, skew, and translation on the X and Y axes.
These are 6 transformations and thus you have six elements in your 3x3 matrix. The bottom row is always [0 0 1].
Why? because the bottom row represents the perspective transformation in axis x and y, and affine transformation does not include perspective transform.
(If you want to apply perspective warping use homography: also 3x3 matrix )
What is the relation between 6 values you insert into affine matrix and the 6 transformations it does? Let us look at this 3x3 matrix like
e*Zx*cos(a), -q1*sin(a) , dx,
e*q2*sin(a), Z y*cos(a), dy,
0 , 0 , 1
The dx and
dy elements are translation in x and y axis (just move the picture left-right, up down).
Zx is the relative scale(zoom) you apply to the image in X axis.
Zy is the same as above for y axis
a is the angle of rotation of the image. This is tricky since when you want to rotate by 'a' you have to insert sin(), cos() in 4 different places in the matrix.
'q' is the skew parameter. It is rarely used. It will cause your image to skew on the side (q1 causes y axis affects x axis and q2 causes x axis affect y axis)
Bonus: 'e' parameter is actually not a transformation. It can have values 1,-1. If it is 1 then nothing happens, but if it is -1 than the image is flipped horizontally. You can use it also to flip the image vertically but, this type of transformation is rarely used.
Very important Note!!!!!
The above explanation is mathematical. It assumes you multiply the matrix by the column vector from the right. As far as I remember, Matlab uses reverse multiplication (row vector from the left) so you will need to transpose this matrix. I am pretty sure that OpenCV uses regular multiplication but you need to check it.
Just enter only translation matrix (x shifted by 10 pixels, y by 1).
1,0,10
0,1,1
0,0,1
If you see a normal shift than everything is OK, but If shit appears than transpose the matrix to:
1,0,0
0,1,0
10,1,1

Resources