what is u,v image coordinates? - image-processing

I saw u,v image coordinates at bottom of here. I downloaded the data and one sample is [214.65 222.52 145.72 165.42 96.492 114.22 64.985 71.877 43.323
33.477 128.98 173.29 120.12 160.49 121.11 134.89 128. 98.462
175.26 177.23 177.23 151.63 178.22 130.95 177.23 98.462 212.68
175.26 214.65 118.15 215.63 80.738 208.74 68.923 249.11 173.29
242.22 122.09 237.29 86.646 234.34 48.246].
I did search but did not find explanation of u,v image coordinates and how to convert to x-y coordinates. It is not UV mapping, because the data is not between [0, 1]. I may be wrong.
any comments welcomed. Thanks

To be more confident, we can plot these points using Matlab/Octave or OpenCV on the corresponding color image and see if their positions match to the labeled joints. For joint structure we can look at the same README file W, T0, T1, T2, T3, I0, I1, I2, I3, M0, M1, M2, M3, R0, R1, R2, R3, L0, L1, L2, L3. Every joint has 2 coordinates so the sequence of 42 numbers correspond to u, v (X, Y) coordinates of corresponding joints in the sequence.
I tried to directly plot an image and 2D points in Matlab/Octave using this code:
clc; clear;
im = imread('0001_color_composed.png');
data = csvread('0001_joint2D.txt');
x = zeros(length(data)/2,1);
y = x;
for i = 1: length(data)/2
x(i) = data(2*i-1);
y(i) = data(2*i);
end
imshow(im);
hold on;
plot(x, y, 'go');
and these image and annotation. As you can see in the resulting image below all u, v coordinates correspond to pixel coordinates in X and Y counted from the top left corner of image in pixels, i.e. u = X, v = Y (as if image shown using imshow(), the origin of the coordinate frame for consecutive plots is set to the image coordinate frame origin which is the top left corner).

Related

OpenCV: stereoRectify results in empty image

I have 2 cameras and want to calculate the disparity between them.
The translation between those cameras is mainly in z-Direction (i.e. "inside the image plane") and a little bit in x and y direction. For example: (0.2, 0.2, 0.8)
When I now calculate the rectification parameters with the stereoRectify() method, the output images are just black. Using other translation vectors works just fine, but the results are wrong of course.
Why is it like that and how can I solve the problem?
Edit: These translation values result in both rectified images black. Other (wrong) translation vectors work just fine. Changing the value of alpha doesn't change much.
rotation_quat = Quaternion(0.999999913938509, 0.00029714546497339216, -0.00011465939948083866, 0.0002658585515330323)
rotation = rotation_quat.rotation_matrix.astype(np.float64)
translation = np.array([0.2,0.2,0.8]).astype(np.float64)
R1, R2, P1, P2, Q, roi1, roi2 = cv2.stereoRectify(intrinsicMatrix1, distCoeffs1, intrinsicMatrix2, distCoeffs2, img1.shape[::-1], rotation, translation, alpha=1)
map11, map12 = cv2.initUndistortRectifyMap(intrinsicMatrix1, distCoeffs1, R1, P1, img1.shape[::-1], cv2.CV_32FC1)
map21, map22 = cv2.initUndistortRectifyMap(intrinsicMatrix2, distCoeffs1, R2, P2, img2.shape[::-1], cv2.CV_32FC1)
# rectify
img1_rect = cv2.remap(img1, map11, map12, cv2.INTER_LANCZOS4)
img2_rect = cv2.remap(img2, map21, map22, cv2.INTER_LANCZOS4)
cv2.imshow("img1_rect", img1_rect)
cv2.waitKey(0)
cv2.imshow("img2_rect", img2_rect)
cv2.waitKey(0)

Inverse Perspective Transform?

I am trying to find the bird's eye image from a given image. I also have the rotations and translations (also intrinsic matrix) required to convert it into the bird's eye plane. My aim is to find an inverse homography matrix(3x3).
rotation_x = np.asarray([[1,0,0,0],
[0,np.cos(R_x),-np.sin(R_x),0],
[0,np.sin(R_x),np.cos(R_x),0],
[0,0,0,1]],np.float32)
translation = np.asarray([[1, 0, 0, 0],
[0, 1, 0, 0 ],
[0, 0, 1, -t_y/(dp_y * np.sin(R_x))],
[0, 0, 0, 1]],np.float32)
intrinsic = np.asarray([[s_x * f / (dp_x ),0, 0, 0],
[0, 1 * f / (dp_y ) ,0, 0 ],
[0,0,1,0]],np.float32)
#The Projection matrix to convert the image coordinates to 3-D domain from (x,y,1) to (x,y,0,1); Not sure if this is the right approach
projection = np.asarray([[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[0, 0, 1]], np.float32)
homography_matrix = intrinsic # translation # rotation # projection
inv = cv2.warpPerspective(source_image, homography_matrix,(w,h),flags = cv2.INTER_CUBIC | cv2.WARP_INVERSE_MAP)
My question is, Is this the right approach, as I can manual set a suitable ty,rx, but not for the one (ty,rx) which is provided.
First premise: your bird's eye view will be correct only for one specific plane in the image, since a homography can only map planes (including the plane at infinity, corresponding to a pure camera rotation).
Second premise: if you can identify a quadrangle in the first image that is the projection of a rectangle in the world, you can directly compute the homography that maps the quad into the rectangle (i.e. the "birds's eye view" of the quad), and warp the image with it, setting the scale so the image warps to a desired size. No need to use the camera intrinsics. Example: you have the image of a building with rectangular windows, and you know the width/height ratio of these windows in the world.
Sometimes you can't find rectangles, but your camera is calibrated, and thus the problem you describe comes into play. Let's do the math. Assume the plane you are observing in the given image is Z=0 in world coordinates. Let K be the 3x3 intrinsic camera matrix and [R, t] the 3x4 matrix representing the camera pose in XYZ world frame, so that if Pc and Pw represent the same 3D point respectively in camera and world coordinates, it is Pc = R*Pw + t = [R, t] * [Pw.T, 1].T, where .T means transposed. Then you can write the camera projection as:
s * p = K * [R, t] * [Pw.T, 1].T
where s is an arbitrary scale factor and p is the pixel that Pw projects onto. But if Pw=[X, Y, Z].T is on the Z=0 plane, the 3rd column of R only multiplies zeros, so we can ignore it. If we then denote with r1 and r2 the first two columns of R, we can rewrite the above equation as:
s * p = K * [r1, r2, t] * [X, Y, 1].T
But K * [r1, r2, t] is a 3x3 matrix that transforms points on a 3D plane to points on the camera plane, so it is a homography.
If the plane is not Z=0, you can repeat the same argument replacing [R, t] with [R, t] * inv([Rp, tp]), where [Rp, tp] is the coordinate transform that maps a frame on the plane, with the plane normal being the Z axis, to the world frame.
Finally, to obtain the bird's eye view, you select a rotation R whose third column (the components of the world's Z axis in camera frame) is opposite to the plane's normal.

Estimating distance from camera to ground plane point

How can I calculate distance from camera to a point on a ground plane from an image?
I have the intrinsic parameters of the camera and the position (height, pitch).
Is there any OpenCV function that can estimate that distance?
You can use undistortPoints to compute the rays backprojecting the pixels, but that API is rather hard to use for your purpose. It may be easier to do the calculation "by hand" in your code. Doing it at least once will also help you understand what exactly that API is doing.
Express your "position (height, pitch)" of the camera as a rotation matrix R and a translation vector t, representing the coordinate transform from the origin of the ground plane to the camera. That is, given a point in ground plane coordinates Pg = [Xg, Yg, Zg], its coordinates in camera frame are given by
Pc = R * Pg + t
The camera center is Cc = [0, 0, 0] in camera coordinates. In ground coordinates it is then:
Cg = inv(R) * (-t) = -R' * t
where inv(R) is the inverse of R, R' is its transpose, and the last equality is due to R being an orthogonal matrix.
Let's assume, for simplicity, that the the ground plane is Zg = 0.
Let K be the matrix of intrinsic parameters. Given a pixel q = [u, v], write it in homogeneous image coordinates Q = [u, v, 1]. Its location in camera coordinates is
Qc = Ki * Q
where Ki = inv(K) is the inverse of the intrinsic parameters matrix. The same point in world coordinates is then
Qg = R' * Qc + Cg
All the points Pg = [Xg, Yg, Zg] that belong to the ray from the camera center through that pixel, expressed in ground coordinates, are then on the line
Pg = Cg + lambda * (Qg - Cg)
for lambda going from 0 to positive infinity. This last formula represents three equations in ground XYZ coordinates, and you want to find the values of X, Y, Z and lambda where the ray intersects the ground plane. But that means Zg=0, so you have only 3 unknowns. Solve them (you recover lambda from the 3rd equation, then substitute in the first two), and you get Xg and Yg of the solution to your problem.

How to find point position relation between two images using homography?

I'm working on vision project with OPENCV : how to find point x position in an image B
knowing it's position in image A?
and that the two projections are of the same plane ( the floor).
thanks for your answers.
Assume that you have a Point A with homogeneous coordinate :
A = [x, y, 1]
Maybe you also have an homography H like this:
H = [h00 h01 h02;
h10 h11 h12;
h20 h21 h22]
Then do the product H * A' where A' is the transpose of A
You will get something like this
Bs = [X, Y, s]
You have to normalize the result by s in order to have
B = [Xb, Yb, 1]

icp transformation matrix interpretation

I'm using PCL to obtain the transformation matrix from ICP (getTransformationMatrix()).
The result obtained for exemple for a translation movement without rotation is
0.999998 0.000361048 0.00223594 -0.00763852
-0.000360518 1 -0.000299474 -0.000319525
-0.00223602 0.000298626 0.999998 -0.00305045
0 0 0 1
how can I find the trasformation from the matrix?
The idea is to see the error made between the stimation and the real movement
I have not used the library you refer to here, but it is pretty clear to me that the result you provide is a homogenous transform i.e. the upper left 3x3 matrix (R) is the rotation matrix and the right 3x1 (T) is the translation:
M1 = [ **[** [R], [T] **], [** 0 0 0 1 **]** ]
refer to the 'Matrix Representation' section here:
http://en.wikipedia.org/wiki/Kinematics
This notation is used so that you can get the final point after successive transforms by multiplying the transform matrices.
If you have a point p0 transformed n times you get the point p1 as:
P0 = [[p0_x], [p0_y], [p0_z], [1]]
P1 = [[p1_x], [p1_y], [p1_z], [1]]
M = M1*M2*...*Mn
P1 = M*P0
tROTA is the matrix with translation and rotation:
auto trafo = icp.getFinalTransformation();
Eigen::Transform<float, 3, Eigen::Affine> tROTA(trafo);
float x, y, z, roll, pitch, yaw;
pcl::getTranslationAndEulerAngles(tROTA, x, y, z, roll, pitch, yaw);

Resources