I am very new to ROS and am working on building a system from ground-up to understand the concepts better. I am trying to convert a depthmap (received from visionary-t time of flight camera as a sensor_msgs/Image message) into a pointcloud. I am looping through the width and the height of the image (in my case 176x144 px) say (u, v) and the value at (u, v) is Z in meters. I then use the intrinsic camera metrics (c_x, c_y, f_x, f_y) to convert the local (u, v) co-ordinates to global (X, Y) co-ordinates and for this, I make use of the pinhole camera model.
X = (u - c_x) * Z / f_x ;
Y = (v - c_y) * Z / f_y
I then save these points into pcl::PointXYZ. My camera is mounted on top and the view is of a table with some objects on it. Although my table is flat, when I convert the depthmaps to pointclouds, I see that in the pointcloud, the table has a convex shape and is not flat.
Can someone please suggest what could be the reason for this convex shape and how do I rectify this?
There might be something wrong about how you use the intrinsics.
There is a post regarding the "reverse projection of image coordinates": Computing x,y coordinate (3D) from image point
Maybe it helps you.
Related
I've a radial distortion function which gives me relative distortion from 0 (image center) to the relative full image field (field height 1) in percent. For example this function would give me a distortion of up to 5% at the full relative field height of 1.
I tried to use this together with opencv undistort function to apply distortion but don't know how to fill the matrices.
As said, I've a source image only and don't know anything about the camera parameters like focal length, except that I know the distortion function.
How should I set the matrix in cv2.undistort(src_image, matrix, ...) ?
The OpenCv routine that's easier to use in your case is cv::remap, not undistort.
In the following I assume your distortion purely radial. Similar considerations apply if you have it already decomposed in (x, y).
So you have a distortion function d(r) of the distance r = sqrt((x - x_c)^2 + (y - y_c)^2) of a pixel (x, y) from the image center (x_c, y_c). The function expresses the relative change of the radius r_d of a pixel in the distorted image from the undistorted one r: (r_d - r) / r = d(r), or, equivalently, r_d = r * (1 - d(r)).
If you are given a distorted image, and want to remove the distortion, you need to invert the above equation (i.e. solve it analytically or numerically), finding the value of r for every r_d in the range of interest. Then you can trivially create two arrays, map_x and map_y, that represent the mapping from distorted to undistorted coordinates: for a given pair (x_d, y_d) of integer pixel coordinates in the distorted image, you compute the associated r_d = sqrt(((x_d - x_c)^2 + (y_d - y_c)^2), then the corresponding r as function of r_d from solving the equation, go back to (x, y), and assign map_x[y_d, x_d] = x; map_y[y_d, x_d] = y. Finally, you pass those to cv::remap.
My problem is exactly the reverse of How to determine world coordinates of a camera?.
If R and t are the vector of orientation and position of the camera in the world space? How do I transform easily back to the same space like the rvec and tvec?
If you say you do have an R and t in world space, then this is not very accurate.
However, let us assume that R and t (a 3x3 rotation matrix and 3x1 translation vector) represent the orientation and translation which are used to transform a point Xw in world space to a point in camera space Xc (no homogeneuos coordinates) :
Xc = R*Xw + t
These are the R and t, which are part of your projection matrix P (which is applicable with homogeneous coordinates) and the result of solvePnP (see Note at the bottom):
P = K[R|t]
The t is the world origin in camera space.
The other way round, to transform a point in camera space to world space, can be easily derived since the inverse of the orthogonal matrix R is R' (R transposed):
Xw = R'*Xc - R't
As you can see R' (3x3 matrix) and - R't (3x1 vector) are now the orientation matrix respectively translation vector to transform from camera to world space (more precise - R't is the camera origin/center in world space often referred to as C).
Note: rvec is in the form of rotation vectors, as you may know Rodrigues() is used to switch between the two representations.
I have a polyline figure, given as an array of relative x and y point coordinates (0.0 to 1.0).
I have to draw the figure with random position, scale and rotation angle.
How can I do it in the best way?
You could use a simple transformation with RT matrix.
Let X = (x y 1)^t be coordinates of one point of your figure. Let R be a 2x2 rotation matrix, and T be 2x1 translation vector of the transformation You plan to make. RT matrix A will have the form of A = [R T;0 0 1]. To get transformed coordinates of point X, You need to do this simple calculation AX = X', where X' are the new coordinates. Now, to get the whole figure transformed, instead of using a single column, You use a matrix where each column has x coordinate in first row, y in the second and 1 in the third row.
Of course You can try to use functions provided by OpenCV, shown in this tutorial, or ones intended for vectors of points instead of whole images, but the way above makes You actually understand what are You doing ;)
I do some image processing task in 3D and I have a problem.
I use a simulator which provides me an special kind of cameras which can tell the distance between the position of camera and any arbitrary point, using the pixels of that point in the image of camera. For example I can get the distance between camera and the object which is placed in pixel 21:34.
Now I need to calculate the real distance between two arbitrary pixels in the image of camera.
It is easy when camera is vertical and placed on the above of the field and all objects are on the ground but when camera is horizontal the depth of objects in image is different.
So, how should I do?
Simple 3D reconstruction will accomplish this. The distance from camera to points in 3D is along optical axis that is Z, which you already have. You will need X, Y as well:
X = u*Z/f;
Y = v*Z/f,
where f is camera focal length in pixels, Z your distance in mm or meters and u,v is an image centered coordinates: u = column-width/2, v = height/2-row. Note the asymmetry due to the fact that rows go down while Y and v go up. As soon as you get your X, Y, Z the distance in 3D is given by Euclidean formula:
dist = sqrt((X1-X2)2+(Y1-Y2)2+(Z1-Z2)2)
I want to convert the detected face rectangle into 3D coordinates. I have the intrinsic parameters of my webcam and my head dimension, how can I determine the depth Z using the projection equation?
x = fx X / Z + u
y = fy Y / Z + v
I understand that fx fy and u v are intrinsic parameters, and that X Y are given by my head dimension, x y are given by the detected face rectangle. It seems that only one equation is enough to determine Z. How to use both of them? Or I am wrong?
You are correct that you do not strictly need both of them to compute the depth. However, you may want to use both to improve accuracy.
Another thing to keep in mind is that if your camera is not looking perpendicularly at the planar object (e.g. face) you measured, one or both of the measurements may not be useful to compute the depth. For example if your camera is looking up at a rectangle, only the width will be a good measure for the depth, because the height is compressed by the viewing angle. I don't think this matters for your face detector though, because the proportions of the face are assumed fixed anyway?