I have a question regarding the meaning of the elements from an projective transformation matrix e.g. in an homography used by OpenCv warpPerspective.
I know the basic of an affin transformation, but here I'm more interested in the projective transformation, meaning in the below shown matrix the elements A31 and A32:
A11 A12 A13
A21 A22 A23
A31 A32 1
I played around with the values a bit which means having a fixed numbers for all other element. Meaning:
1 0 0
0 1 0
A31 A32 1
to have just the projective elements.
But what exactly causing the elements A31 and A32 ? Like A13 and A23 are responsible for the horizontal and vertical translation.
Is there an simple explanation for this two elements? Like having a positive value means ...., having a negativ value meaning ... . S.th. like that.
Hope anyone can help me.
Newton's descriptions are correct, but it might be helpful to actually see the transformations to understand what's going on, and how they might work together with other values in the transformation matrix to make a bit more sense. I'll give some python/OpenCV examples with animations to show what these values do.
import numpy as np
import cv2
img = cv2.imread('img1.png')
h, w = img.shape[:2]
# initializations
max_m20 = 2e-3
nsteps = 50
M = np.eye(3)
So here I'm setting the transformation matrix to be the identity (no transformation). We want to see the effect of changing the element at (2, 0) in the transformation matrix M, so we'll animate by looping through nsteps linearly spaced between 0 to max_m20.
for m20 in np.linspace(0, max_m20, nsteps):
M[2, 0] = m20
warped = cv2.warpPerspective(img, M, (w, h))
cv2.imshow('warped', warped)
k = cv2.waitKey(1)
if k == ord('q') & 0xFF:
break
I applied this on an image taken from Oxford's Visual Geometry Group.
So indeed, we can see that this is similar to either rotating your camera around a point that is aligned with the left edge of the image, or rotating the image itself around an axis. However, it is a little different than that. Note that the top edge stays along the top the whole time, which is a little strange. Instead of we rotate around an axis like above, we would imagine that the top edge would start to come down on the right edge too. Like this:
Well, if you're thinking about transformations, one easy way to get this transformation is to take the transformation above, and add some skew distortion so that the right top side is being pushed down as that bottom right corner is being pushed up. And that's actually exactly how this view was created:
M = np.eye(3)
max_m20 = 2e-3
max_m10 = 0.6
for m20, m10 in zip(np.linspace(0, max_m20, nsteps), np.linspace(0, max_m10, nsteps)):
M[2, 0] = m20
M[1, 0] = m10
warped = cv2.warpPerspective(img, M, (w, h))
cv2.imshow('warped', warped)
k = cv2.waitKey(1)
if k == ord('q') & 0xFF:
break
So the right way to think about the perspective in these matrices is, IMO, with the skew entries and the last row together. Those are the two places in the homography matrix where angles actually get modified*; otherwise, it's just rotation, scaling, and translation---all of which are angle preserving.
*Note: Actually, angles can be changed in one more way that I didn't mention. Affine transformations allow for non-uniform scaling, which means you can stretch a shape in width and not in height or vice-versa, which would also change the angles. Imagine if you had a triangle and stretched it only in width; the angles would change. So it turns out that non-uniform scaling (i.e. when the first and middle element of the transformation matrix are different values) can also modify angles in addition to the perspective change and shearing distortions.
Note that in these examples, the same applies to the second entry in the last row with the other skew location; the only difference is it happens at the top instead of the left side. Negative values in both cases is akin to rotating the plane along that axis towards, instead of farther away from, the camera.
The 3x1 ,3x2 elements of homography matrix change the plane of the image. Thats the difference between Affine and Homography matrices. For instance consider this- The A31 changes the plane of your image along the left edge. Its like sticking your image to a stick like a flag and rotating. The positive is clock wise and the negative is reverse. The other element does the same from the top edge. But together, they set a plane for your image. That's the simplest way i could put it.
Related
I'm filming with 6 RGB cameras a scene that I want to reconstruct in 3D, kind of like in the following picture. And I forgot to take a calibration chessboard. So I used a blank rectangle board instead and filmed it, as I would film a regular chessboard.
First step, calibration --> OK.
I obviously couldn't use cv2.findChessboardCorners, so I made a small program that would allow me to click and store the location of each 4 corners. I calibrated from these 4 points for about 10-15 frames as a test.
Tl;Dr: It seemed to work great.
Next step, triangulation. --> NOT OK
I use direct linear transform (DLT) to triangulate my points from all 6 cameras.
Tl;Dr: It's not working so well.
Image and world coordinates are connected this way: ,
which can be written .
A singular value decomposition (SVD) gives
3 of the 4 points are correctly triangulated, but the blue one that should lie on the origin has a wrong x coordinate.
WHY?
Why only one point, and why only the x coordinate?
Does it have anything to do with the fact that I calibrate from a 4 points board?
If so, can you explain; and if not, what else could it be?
Update: I tried for an other frame while the board is somewhere else, and the triangulation is fine.
So there is the mystery: some points are randomly triangulated wrong (or at least the one at the origin), while most of the others are fine. Again, why?
My guess is that it comes from the triangulation rather than from the calibration, and that there is no connexion with my sloppy calibration process.
One common issue I came across is the ambiguity in the solutions found by DLT. Indeed, solving AQ = 0 or solving AC C-¹Q gives the same solution. See page 46 here. But I don't know what to do about it.
I'm now fairly sure this is not a calibration issue but I don't want to delete this part of my post.
I used ret, K, D, R, T = cv2.calibrateCamera(objpoints, imgpoints, imSize, None, None). It worked seamlessly, and the points where
perfectly reprojected on my original image with
cv2.projectPoints(objpoints, R, T, K, D).
I computed my projection matrix P as , and R, _ = cv2.Rodrigues(R)
How is it that I get a solution while I have only 4 points per image?
Wouldn't I need 6 of them at least? We have .We
can solve P by SVD under the form This is 2
equations per point, for 11 independent unknown P parameters. So 4
points make 8 equations, which shouldn't be enough. And yet
cv2.calibrateCamera still gives a solution. It must be using
another method? I came across Perspective-n-Point (PnP), is it what
opencv uses? In which case, is it directly optimizing K, R, and T and
thus needs less points?I could artificially add a few points
to get more than the 4 corner points of my board (for example, the
centers of the edges, or the center of the rectangle). But is it
really the issue?
When calibrating, one needs to decompose the projection matrix into
intrinsic and extrinsic matrices. But this decomposition is not
unique and has 4 solutions. See there section 'I'm seeing
double' and Chapt.21 of Hartley&Zisserman about Cheirality
for more information. It is not my issue since my camera points
are correctly reprojected to the image plane and my cameras are
correctly set up on my 3D scene.
I did not quite understand what you are asking, it is rather vague. However, I think you are miscalculating your projection matrix.
if I'm not mistaken, you will surely define 4 3D points representing your rectangle in real world space in this way for example:
pt_3D = [[ 0 0 0]
[ 0 1 0]
[ 1 1 0]
[ 1 0 0]]
you will then retrieve the corresponding 2D points (in order) of each image, and generate two vectors as follows:
objpoints = [pt_3D, pt_3D, ....] # N times
imgpoints = [pt_2D_img1, pt_3D_img2, ....] # N times ( N images )
You can then calibrate your camera and recover the camera poses as well as the projection matrices as follows:
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, imSize, None, None)
cv2.projectPoints(objpoints, rvecs, tvecs, K, dist)
for rvec, tvec in zip(rvecs, tvecs):
Rt, _ = cv2.Rodrigues(rvec)
R = Rt.T
T = - R # tvec
pose_Matrix = np.vstack(( np.hstack((R,T)) , [0, 0, 0, 1])) ( transformation matrix == camera pose )
Projection_Matrix = K # TransformationMatrix.T[:3, :4]
You don't have to apply the DLT or the triangulation (all is done in the cv2.calibrateCamera () function, and the 3D points remain what you define yourself
I want to find sharp edges in a heightmap image, while ignoring shallow edges.
OpenCV offers multiple approaches to finding edges in a 2d Image: Canny, Sobel, etc.
However, all these approaches work by comparing the intensity values on both sides of the edge.
If the 2D Image represents a height map of a 3D object, then this results in some weird behaviour.
In a height map, the height of a 3D object at a given X/Y coordinate is represented as the intensity of the 2D Pixel at that X/Y coordinate:
In the above picture, at the edge B the intensity changes only slightly between the left and right side, even though it is a sharp corner.
At the edge A, there is a bigchange in the intensity between pixels on the left side of the edge and the right, even though it is only a shallow angle.
So there is no threshold for Canny or Sobel that will preserve the sharp edge but filter the shallow edge.
(In the above example, the edge B has one side with an ascending slope, and one side with a descending slope. I could filter for this feature; but that would remove the edges C and D as well)
How can I get a binary edge image, containing only edges above a certain angle? (e.g. edge B, C, and D, but not A)
Or alternatively, how can I get a gradient derivative image, where the intensity of each pixel is proportional to the angle of the edge at that pixel?
Probably you'll want to use second derivative instead of first for this task.
Here's my intuition: taking derivative of height (intensity in your case) at each position on an evenly spaced grid would be proportional to arctan of the surface slope between sampling points (or at sampling points if you use a 2-sided derivative approximation). But since you want to detect sharp edges - you are looking for a derivative of slope at the sampling points. This means that you can set a threshold on a derivative of arctan of derivative of intensity to achieve your goal (luckily there's no "need to go deeper" :) )
You will have to be extra careful with taking a derivative of "slope angles" that you'll get - depending on the coordinate system you may come across ambiguity of angle difference (there are 2 ways to get from one angle to another, which are different in general case; you're looking for the "shorter" one). You can look for possible solution here
I have a rather simple approach that I came across wile reading a blog post.
It involves computing the median value of the gray scale image. Using this value we can now set two threshold values:
lower: max(0, (1.0 - 0.33) * v)
upper: min(255, (1.0 + 0.33) * v)
Now pass these two values as parameters into the cv2.Canny() function.
You will now be able to perform an optimized edge detection given any image. The crux of this answer depends on the median value of the image which varies for different images.
If i understand your question correctly, "what you need is basically a corner with high intensity values".
If that is so then look for Harris corner detector which would help you to find points with high gradient change in both direction.
http://docs.opencv.org/2.4/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html
Once you detect the corners you can filter the corners which have high intensity by using a suitable threshold.
In the below picture, I have the 2D locations of the green points and I want to calculate the locations of the red points, or, as an intermediate step, I want to calculate the locations of the blue points. All in 2D.
Of course, I do not only want to find those locations for the picture above. In the end, I want an automated algorithm which takes a set of checkerboard corner points to calculate the outer corners.
I need the resulting coordinates to be as accurate as possible, so I think that I need a solution which does not only take the outer green points into account, but which also uses all the other green points' locations to calculate a best fit for the outer corners (red or blue).
If OpenCV can do this, please point me into that direction.
In general, if all you have is the detection of some, but not all, the inner corners, the problem cannot be solved. This is because the configuration is invariant to translation - shifting the physical checkerboard by whole squares would produce the same detected corner position on the image, but due to different physical corners.
Further, the configuration is also invariant to rotations by 180 deg in the checkerboard plane and, unless you are careful to distinguish between the colors of the squares adjacent each corner, to rotations by 90 deg and reflections with respect the center and the midlines.
This means that, in addition to detecting the corners, you need to extract from the image some features of the physical checkerboard that can be used to break the above invariance. The simplest break is to detect all 9 corners of one row and one column, or at least their end-corners. They can be used directly to rectify the image by imposing the condition that their lines be at 90 deg angle. However, this may turn out to be impossible due to occlusions or detector failure, and more sophisticated methods may be necessary.
For example, you can try to directly detect the chessboard edges, i.e. the fat black lines at the boundary. One way to do that, for example, would be to detect the letters and numbers nearby, and use those locations to constrain a line detector to nearby areas.
By the way, if the photo you posted is just a red herring, and you are interested in detecting general checkerboard-like patterns, and can control the kind of pattern, there are way more robust methods of doing it. My personal favorite is the "known 2D crossratios" pattern of Matsunaga and Kanatani.
I solved it robustly, but not accurately, with the following solution:
Find lines with at least 3 green points closely matching the line. (thin red lines in pic)
Keep bounding lines: From these lines, keep those with points only to one side of the line or very close to the line.
Filter bounding lines: From the bounding lines, take the 4 best ones/those with most points on them. (bold white lines in pic)
Calculate the intersections of the 4 remaining bounding lines (none of the lines are perfectly parallel, so this results in 6 intersections, of which we want only 4).
From the intersections, remove the one farthest from the average position of the intersections until only 4 of them are left.
That's the 4 blue points.
You can then feed these 4 points into OpenCV's findPerspectiveTransform function to find a perspective transform (aka a homography):
Point2f* srcPoints = (Point2f*) malloc(4 * sizeof(Point2f));
std::vector<Point2f> detectedCorners = CheckDet::getOuterCheckerboardCorners(srcImg);
for (int i = 0; i < MIN(4, detectedCorners.size()); i++) {
srcPoints[i] = detectedCorners[i];
}
Point2f* dstPoints = (Point2f*) malloc(4 * sizeof(Point2f));
int dstImgSize = 400;
dstPoints[0] = Point2f(dstImgSize * 1/8, dstImgSize * 1/8);
dstPoints[1] = Point2f(dstImgSize * 7/8, dstImgSize * 1/8);
dstPoints[2] = Point2f(dstImgSize * 7/8, dstImgSize * 7/8);
dstPoints[3] = Point2f(dstImgSize * 1/8, dstImgSize * 7/8);
Mat m = getPerspectiveTransform(srcPoints, dstPoints);
For our example image, the input and output of findPerspectiveTranform looks like this:
input
(349.1, 383.9) -> ( 50.0, 50.0)
(588.9, 243.3) -> (350.0, 50.0)
(787.9, 404.4) -> (350.0, 350.0)
(506.0, 593.1) -> ( 50.0, 350.0)
output
( 1.6 -1.1 -43.8 )
( 1.4 2.4 -1323.8 )
( 0.0 0.0 1.0 )
You can then transform the image's perspective to board coordinates:
Mat plainBoardImg;
warpPerspective(srcImg, plainBoardImg, m, Size(dstImgSize, dstImgSize));
Results in the following image:
For my project, the red points that you can see on the board in the question are not needed anymore, but I'm sure they can be calculated easily from the homography by inverting it and then using the inverse for back-tranforming the points (0, 0), (0, dstImgSize), (dstImgSize, dstImgSize), and (dstImgSize, 0).
The algorithm works surprisingly reliable, however, it does not use all the available information, because it uses only the outer points (those which are connected with the white lines). It does not use any data of the inner points for additional accuracy. I would still like to find an even better solution, which uses the data of the inner points.
I need an inverse perspective transform written in Pascal/Delphi/Lazarus. See the following image:
I think I need to walk through destination pixels and then calculate the corresponding position in the source image (To avoid problems with rounding errors etc.).
function redraw_3d_to_2d(sourcebitmap:tbitmap, sourceaspect:extended, point_a, point_b, point_c, point_d:tpoint, megapixelcount:integer):tbitmap;
var
destinationbitmap:tbitmap;
x,y,sx,sy:integer;
begin
destinationbitmap:=tbitmap.create;
destinationbitmap.width=megapixelcount*sourceaspect*???; // I dont how to calculate this
destinationbitmap.height=megapixelcount*sourceaspect*???; // I dont how to calculate this
for x:=0 to destinationbitmap.width-1 do
for y:=0 to destinationbitmap.height-1 do
begin
sx:=??;
sy:=??;
destinationbitmap.canvas.pixels[x,y]=sourcebitmap.canvas.pixels[sx,sy];
end;
result:=destinationbitmap;
end;
I need the real formula... So an OpenGL solution would not be ideal...
Note: There is a version of this with proper math typesetting on the Math SE.
Computing a projective transformation
A perspective is a special case of a projective transformation, which in turn is defined by four points.
Step 1: Starting with the 4 positions in the source image, named (x1,y1) through (x4,y4), you solve the following system of linear equations:
[x1 x2 x3] [λ] [x4]
[y1 y2 y3]∙[μ] = [y4]
[ 1 1 1] [τ] [ 1]
The colums form homogenous coordinates: one dimension more, created by adding a 1 as the last entry. In subsequent steps, multiples of these vectors will be used to denote the same points. See the last step for an example of how to turn these back into two-dimensional coordinates.
Step 2: Scale the columns by the coefficients you just computed:
[λ∙x1 μ∙x2 τ∙x3]
A = [λ∙y1 μ∙y2 τ∙y3]
[λ μ τ ]
This matrix will map (1,0,0) to a multiple of (x1,y1,1), (0,1,0) to a multiple of (x2,y2,1), (0,0,1) to a multiple of (x3,y3,1) and (1,1,1) to (x4,y4,1). So it will map these four special vectors (called basis vectors in subsequent explanations) to the specified positions in the image.
Step 3: Repeat steps 1 and 2 for the corresponding positions in the destination image, in order to obtain a second matrix called B.
This is a map from basis vectors to destination positions.
Step 4: Invert B to obtain B⁻¹.
B maps from basis vectors to the destination positions, so the inverse matrix maps in the reverse direction.
Step 5: Compute the combined Matrix C = A∙B⁻¹.
B⁻¹ maps from destination positions to basis vectors, while A maps from there to source positions. So the combination maps destination positions to source positions.
Step 6: For every pixel (x,y) of the destination image, compute the product
[x'] [x]
[y'] = C∙[y]
[z'] [1]
These are the homogenous coordinates of your transformed point.
Step 7: Compute the position in the source image like this:
sx = x'/z'
sy = y'/z'
This is called dehomogenization of the coordinate vector.
All this math would be so much easier to read and write if SO were to support MathJax… ☹
Choosing the image size
The above aproach assumes that you know the location of your corners in the destination image. For these you have to know the width and height of that image, which is marked by question marks in your code as well. So let's assume the height of your output image were 1, and the width were sourceaspect. In that case, the overall area would be sourceaspect as well. You have to scale that area by a factor of pixelcount/sourceaspect to achieve an area of pixelcount. Which means that you have to scale each edge length by the square root of that factor. So in the end, you have
pixelcount = 1000000.*megapixelcount;
width = round(sqrt(pixelcount*sourceaspect));
height = round(sqrt(pixelcount/sourceaspect));
Use Graphics32, specifically TProjectiveTransformation (to use with the Transform method). Don't forget to leave some transparent margin in your source image so you don't get jagged edges.
I am playing with the affine transform in OpenCV and I am having trouble getting an intuitive understanding of it workings, and more specifically, just how do I specify the parameters of the map matrix so I can get a specific desired result.
To setup the question, the procedure I am using is 1st to define a warp matrix, then do the transform.
In OpenCV the 2 routines are (I am using an example in the excellent book OpenCV by Bradski & Kaehler):
cvGetAffineTransorm(srcTri, dstTri, warp_matrix);
cvWarpAffine(src, dst, warp_mat);
To define the warp matrix, srcTri and dstTri are defined as:
CvPoint2D32f srcTri[3], dstTri[3];
srcTri[3] is populated as follows:
srcTri[0].x = 0;
srcTri[0].y = 0;
srcTri[1].x = src->width - 1;
srcTri[1].y = 0;
srcTri[2].x = 0;
srcTri[2].y = src->height -1;
This is essentially the top left point, top right point, and bottom left point of the image for starting point of the matrix. This part makes sense to me.
But the values for dstTri[3] just are confusing, at least, when I vary a single point, I do not get the result I expect.
For example, if I then use the following for the dstTri[3]:
dstTri[0].x = 0;
dstTri[0].y = 0;
dstTri[1].x = src->width - 1;
dstTri[1].y = 0;
dstTri[2].x = 0;
dstTri[2].y = 100;
It seems that the only difference between the src and the dst point is that the bottom left point is moved to the right by 100 pixels. Intuitively, I feel that the bottom part of the image should be shifted to the right by 100 pixels, but this is not so.
Also, if I use the exact same values for dstTri[3] that I use for srcTri[3], I would think that the transform would produce the exact same image--but it does not.
Clearly, I do not understand what is going on here. So, what does the mapping from the srcTri[] to the dstTri[] represent?
Here is a mathematical explanation of an affine transform:
this is a matrix of size 3x3 that applies the following transformations on a 2D vector: Scale in X axis, scale Y, rotation, skew, and translation on the X and Y axes.
These are 6 transformations and thus you have six elements in your 3x3 matrix. The bottom row is always [0 0 1].
Why? because the bottom row represents the perspective transformation in axis x and y, and affine transformation does not include perspective transform.
(If you want to apply perspective warping use homography: also 3x3 matrix )
What is the relation between 6 values you insert into affine matrix and the 6 transformations it does? Let us look at this 3x3 matrix like
e*Zx*cos(a), -q1*sin(a) , dx,
e*q2*sin(a), Z y*cos(a), dy,
0 , 0 , 1
The dx and
dy elements are translation in x and y axis (just move the picture left-right, up down).
Zx is the relative scale(zoom) you apply to the image in X axis.
Zy is the same as above for y axis
a is the angle of rotation of the image. This is tricky since when you want to rotate by 'a' you have to insert sin(), cos() in 4 different places in the matrix.
'q' is the skew parameter. It is rarely used. It will cause your image to skew on the side (q1 causes y axis affects x axis and q2 causes x axis affect y axis)
Bonus: 'e' parameter is actually not a transformation. It can have values 1,-1. If it is 1 then nothing happens, but if it is -1 than the image is flipped horizontally. You can use it also to flip the image vertically but, this type of transformation is rarely used.
Very important Note!!!!!
The above explanation is mathematical. It assumes you multiply the matrix by the column vector from the right. As far as I remember, Matlab uses reverse multiplication (row vector from the left) so you will need to transpose this matrix. I am pretty sure that OpenCV uses regular multiplication but you need to check it.
Just enter only translation matrix (x shifted by 10 pixels, y by 1).
1,0,10
0,1,1
0,0,1
If you see a normal shift than everything is OK, but If shit appears than transpose the matrix to:
1,0,0
0,1,0
10,1,1