Rotate, Scale and Translate around image centre in OpenCV

Rotate, Scale and Translate around image centre in OpenCV - image-processing

I really hope this isn't a waste of anyone's time but I've run into a small problem. I am able to construct the transformation matrix using the following:
M =
s*cos(theta) -s*sin(theta) t_x
s*sin(theta) s*cos(theta) t_y
0 0 1
This works if I give the correct values for theta, s (scale) and tx/ty and then use this matrix as one of the arguments for cv::warpPerspective. The problem lies in that this matrix rotates about the (0,0) pixel whereas I would like it to rotate about the centre pixel (cols/2, rows/2). How can incoporate the centre point rotation into this matrix?

Two possibilities. The first is to use the function getRotationMatrix2D which takes the center of rotation as an argument, and gives you a 2x3 matrix. Add the third row and you're done.
A second possibility is to construct an additional matrix that translates the picture before and after the rotation:
T =
1 0 -cols/2
0 1 -rows/2
0 0 1
Multiply your rotation matrix M with this one to get the total transform -TMT (e.g. with function gemm) and apply this one with warpPerspective.

Related

OpenCv warpPerspective meaning of elements of homography

I have a question regarding the meaning of the elements from an projective transformation matrix e.g. in an homography used by OpenCv warpPerspective.
I know the basic of an affin transformation, but here I'm more interested in the projective transformation, meaning in the below shown matrix the elements A31 and A32:
A11 A12 A13
A21 A22 A23
A31 A32 1
I played around with the values a bit which means having a fixed numbers for all other element. Meaning:
1 0 0
0 1 0
A31 A32 1
to have just the projective elements.
But what exactly causing the elements A31 and A32 ? Like A13 and A23 are responsible for the horizontal and vertical translation.
Is there an simple explanation for this two elements? Like having a positive value means ...., having a negativ value meaning ... . S.th. like that.
Hope anyone can help me.

Newton's descriptions are correct, but it might be helpful to actually see the transformations to understand what's going on, and how they might work together with other values in the transformation matrix to make a bit more sense. I'll give some python/OpenCV examples with animations to show what these values do.
import numpy as np
import cv2
img = cv2.imread('img1.png')
h, w = img.shape[:2]
# initializations
max_m20 = 2e-3
nsteps = 50
M = np.eye(3)
So here I'm setting the transformation matrix to be the identity (no transformation). We want to see the effect of changing the element at (2, 0) in the transformation matrix M, so we'll animate by looping through nsteps linearly spaced between 0 to max_m20.
for m20 in np.linspace(0, max_m20, nsteps):
M[2, 0] = m20
warped = cv2.warpPerspective(img, M, (w, h))
cv2.imshow('warped', warped)
k = cv2.waitKey(1)
if k == ord('q') & 0xFF:
break
I applied this on an image taken from Oxford's Visual Geometry Group.
So indeed, we can see that this is similar to either rotating your camera around a point that is aligned with the left edge of the image, or rotating the image itself around an axis. However, it is a little different than that. Note that the top edge stays along the top the whole time, which is a little strange. Instead of we rotate around an axis like above, we would imagine that the top edge would start to come down on the right edge too. Like this:
Well, if you're thinking about transformations, one easy way to get this transformation is to take the transformation above, and add some skew distortion so that the right top side is being pushed down as that bottom right corner is being pushed up. And that's actually exactly how this view was created:
M = np.eye(3)
max_m20 = 2e-3
max_m10 = 0.6
for m20, m10 in zip(np.linspace(0, max_m20, nsteps), np.linspace(0, max_m10, nsteps)):
M[2, 0] = m20
M[1, 0] = m10
warped = cv2.warpPerspective(img, M, (w, h))
cv2.imshow('warped', warped)
k = cv2.waitKey(1)
if k == ord('q') & 0xFF:
break
So the right way to think about the perspective in these matrices is, IMO, with the skew entries and the last row together. Those are the two places in the homography matrix where angles actually get modified*; otherwise, it's just rotation, scaling, and translation---all of which are angle preserving.
*Note: Actually, angles can be changed in one more way that I didn't mention. Affine transformations allow for non-uniform scaling, which means you can stretch a shape in width and not in height or vice-versa, which would also change the angles. Imagine if you had a triangle and stretched it only in width; the angles would change. So it turns out that non-uniform scaling (i.e. when the first and middle element of the transformation matrix are different values) can also modify angles in addition to the perspective change and shearing distortions.
Note that in these examples, the same applies to the second entry in the last row with the other skew location; the only difference is it happens at the top instead of the left side. Negative values in both cases is akin to rotating the plane along that axis towards, instead of farther away from, the camera.

The 3x1 ,3x2 elements of homography matrix change the plane of the image. Thats the difference between Affine and Homography matrices. For instance consider this- The A31 changes the plane of your image along the left edge. Its like sticking your image to a stick like a flag and rotating. The positive is clock wise and the negative is reverse. The other element does the same from the top edge. But together, they set a plane for your image. That's the simplest way i could put it.

Extract face rotation from homography in a video

I'm trying to determine the orientation of a face in a video.
The video starts with the frontal image of the face, so it has no rotation. In the following frames the head rotates and i'm trying to determine the rotation, which will lead me to determine the face orientation based on the camera position.
I'm using OpenCV and C++ for the job.
I'm using SURF descriptors to find points on the face which i use to calculate an homography between the two images. Being the two frames very close to each other, the head rotation will be minimal in that interval and my homography matrix will be close to the identity matrix.
This is my homography matrix:
H = findHomography(k1,k2,RANSAC,8);
where k1 and k2 are the keypoints extracted with SURF.
I'm using decomposeProjectionMatrix to extract the rotation matrix but now i'm not sure how to interpret the rotMatrix. This one too is basically (1 0 0; 0 1 0; 0 0 1) (where the 0 are numbers in a range from e-10 to e-16).
In theory, what is was trying to do was to find the angle of the rotation at each frame and store it somewhere, so that if i get a 1° change in each frame, after 10 frames i know that my head has changed its orientation by 10°.
I spend some time reading everything i could find about QR decomposition, homography matrices and so on, but i haven't been able to get around this. Hence, any help would be really appreciated.
Thanks!

The upper-left 2x2 of the homography matrix is a 2D rotation matrix. If you work through the multiplication of the matrix with a point (i.e. take R*p), you'll see it's equivalent to:
newX = oldVector dot firstRow
newY = oldVector dot secondRow
In other words, the first row of the matrix is a unit vector which is the x axis of the new head. (If there's a scale difference between the frames it won't be a unit vector, but this method will still work.) So you should be able to calculate
rotation = atan2(second entry of first row, first entry of first row)

Image 3D rotation OpenCV

I need to perform a 3D rotation of a 2D image on x and y axis.
I read that i have to use the Homographic matrix on OpenCV , but i don't know how to set the matrix to perform a common rotation angle. For example 30 degree on x axis or 45° on y axis.
I read this post : Translating and Rotating an Image in 3D using OpenCV. I have tried different values of the f but it doesn't work.
I want to know which parameters of the matrix i have to change and how (formula).
Thank you!

Follow that same post, but replace your rotation matrix. Familiarize yourself with the Rorigues() function. You can send it a 1 x 3 array of the x, y, and z rotations. It will give you a a 3 x 3 rotation matrix. Plug this matrix in as the first 3 columns and 3 rows of R (leave the rest the same). If you don't want any translation, make sure you set the variable dist to 0 in the code on that page.

Redraw image from 3d perspective to 2d

I need an inverse perspective transform written in Pascal/Delphi/Lazarus. See the following image:
I think I need to walk through destination pixels and then calculate the corresponding position in the source image (To avoid problems with rounding errors etc.).
function redraw_3d_to_2d(sourcebitmap:tbitmap, sourceaspect:extended, point_a, point_b, point_c, point_d:tpoint, megapixelcount:integer):tbitmap;
var
destinationbitmap:tbitmap;
x,y,sx,sy:integer;
begin
destinationbitmap:=tbitmap.create;
destinationbitmap.width=megapixelcount*sourceaspect*???; // I dont how to calculate this
destinationbitmap.height=megapixelcount*sourceaspect*???; // I dont how to calculate this
for x:=0 to destinationbitmap.width-1 do
for y:=0 to destinationbitmap.height-1 do
begin
sx:=??;
sy:=??;
destinationbitmap.canvas.pixels[x,y]=sourcebitmap.canvas.pixels[sx,sy];
end;
result:=destinationbitmap;
end;
I need the real formula... So an OpenGL solution would not be ideal...

Note: There is a version of this with proper math typesetting on the Math SE.
Computing a projective transformation
A perspective is a special case of a projective transformation, which in turn is defined by four points.
Step 1: Starting with the 4 positions in the source image, named (x1,y1) through (x4,y4), you solve the following system of linear equations:
[x1 x2 x3] [λ] [x4]
[y1 y2 y3]∙[μ] = [y4]
[ 1 1 1] [τ] [ 1]
The colums form homogenous coordinates: one dimension more, created by adding a 1 as the last entry. In subsequent steps, multiples of these vectors will be used to denote the same points. See the last step for an example of how to turn these back into two-dimensional coordinates.
Step 2: Scale the columns by the coefficients you just computed:
[λ∙x1 μ∙x2 τ∙x3]
A = [λ∙y1 μ∙y2 τ∙y3]
[λ μ τ ]
This matrix will map (1,0,0) to a multiple of (x1,y1,1), (0,1,0) to a multiple of (x2,y2,1), (0,0,1) to a multiple of (x3,y3,1) and (1,1,1) to (x4,y4,1). So it will map these four special vectors (called basis vectors in subsequent explanations) to the specified positions in the image.
Step 3: Repeat steps 1 and 2 for the corresponding positions in the destination image, in order to obtain a second matrix called B.
This is a map from basis vectors to destination positions.
Step 4: Invert B to obtain B⁻¹.
B maps from basis vectors to the destination positions, so the inverse matrix maps in the reverse direction.
Step 5: Compute the combined Matrix C = A∙B⁻¹.
B⁻¹ maps from destination positions to basis vectors, while A maps from there to source positions. So the combination maps destination positions to source positions.
Step 6: For every pixel (x,y) of the destination image, compute the product
[x'] [x]
[y'] = C∙[y]
[z'] [1]
These are the homogenous coordinates of your transformed point.
Step 7: Compute the position in the source image like this:
sx = x'/z'
sy = y'/z'
This is called dehomogenization of the coordinate vector.
All this math would be so much easier to read and write if SO were to support MathJax… ☹
Choosing the image size
The above aproach assumes that you know the location of your corners in the destination image. For these you have to know the width and height of that image, which is marked by question marks in your code as well. So let's assume the height of your output image were 1, and the width were sourceaspect. In that case, the overall area would be sourceaspect as well. You have to scale that area by a factor of pixelcount/sourceaspect to achieve an area of pixelcount. Which means that you have to scale each edge length by the square root of that factor. So in the end, you have
pixelcount = 1000000.*megapixelcount;
width = round(sqrt(pixelcount*sourceaspect));
height = round(sqrt(pixelcount/sourceaspect));

Use Graphics32, specifically TProjectiveTransformation (to use with the Transform method). Don't forget to leave some transparent margin in your source image so you don't get jagged edges.

trying to understand the Affine Transform

I am playing with the affine transform in OpenCV and I am having trouble getting an intuitive understanding of it workings, and more specifically, just how do I specify the parameters of the map matrix so I can get a specific desired result.
To setup the question, the procedure I am using is 1st to define a warp matrix, then do the transform.
In OpenCV the 2 routines are (I am using an example in the excellent book OpenCV by Bradski & Kaehler):
cvGetAffineTransorm(srcTri, dstTri, warp_matrix);
cvWarpAffine(src, dst, warp_mat);
To define the warp matrix, srcTri and dstTri are defined as:
CvPoint2D32f srcTri[3], dstTri[3];
srcTri[3] is populated as follows:
srcTri[0].x = 0;
srcTri[0].y = 0;
srcTri[1].x = src->width - 1;
srcTri[1].y = 0;
srcTri[2].x = 0;
srcTri[2].y = src->height -1;
This is essentially the top left point, top right point, and bottom left point of the image for starting point of the matrix. This part makes sense to me.
But the values for dstTri[3] just are confusing, at least, when I vary a single point, I do not get the result I expect.
For example, if I then use the following for the dstTri[3]:
dstTri[0].x = 0;
dstTri[0].y = 0;
dstTri[1].x = src->width - 1;
dstTri[1].y = 0;
dstTri[2].x = 0;
dstTri[2].y = 100;
It seems that the only difference between the src and the dst point is that the bottom left point is moved to the right by 100 pixels. Intuitively, I feel that the bottom part of the image should be shifted to the right by 100 pixels, but this is not so.
Also, if I use the exact same values for dstTri[3] that I use for srcTri[3], I would think that the transform would produce the exact same image--but it does not.
Clearly, I do not understand what is going on here. So, what does the mapping from the srcTri[] to the dstTri[] represent?

Here is a mathematical explanation of an affine transform:
this is a matrix of size 3x3 that applies the following transformations on a 2D vector: Scale in X axis, scale Y, rotation, skew, and translation on the X and Y axes.
These are 6 transformations and thus you have six elements in your 3x3 matrix. The bottom row is always [0 0 1].
Why? because the bottom row represents the perspective transformation in axis x and y, and affine transformation does not include perspective transform.
(If you want to apply perspective warping use homography: also 3x3 matrix )
What is the relation between 6 values you insert into affine matrix and the 6 transformations it does? Let us look at this 3x3 matrix like
e*Zx*cos(a), -q1*sin(a) , dx,
e*q2*sin(a), Z y*cos(a), dy,
0 , 0 , 1
The dx and
dy elements are translation in x and y axis (just move the picture left-right, up down).
Zx is the relative scale(zoom) you apply to the image in X axis.
Zy is the same as above for y axis
a is the angle of rotation of the image. This is tricky since when you want to rotate by 'a' you have to insert sin(), cos() in 4 different places in the matrix.
'q' is the skew parameter. It is rarely used. It will cause your image to skew on the side (q1 causes y axis affects x axis and q2 causes x axis affect y axis)
Bonus: 'e' parameter is actually not a transformation. It can have values 1,-1. If it is 1 then nothing happens, but if it is -1 than the image is flipped horizontally. You can use it also to flip the image vertically but, this type of transformation is rarely used.
Very important Note!!!!!
The above explanation is mathematical. It assumes you multiply the matrix by the column vector from the right. As far as I remember, Matlab uses reverse multiplication (row vector from the left) so you will need to transpose this matrix. I am pretty sure that OpenCV uses regular multiplication but you need to check it.
Just enter only translation matrix (x shifted by 10 pixels, y by 1).
1,0,10
0,1,1
0,0,1
If you see a normal shift than everything is OK, but If shit appears than transpose the matrix to:
1,0,0
0,1,0
10,1,1

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart