Recover plane from homography - opencv

I have used openCV to calculate the homography relating to views of the same plane by using features and matching them. Is there any way to recover the plane itsself or the plane normal from this homography? (I am looking for an equation where H is the input and the normal n is the output.)

If you have the calibration of the cameras, you can extract the normal of the plane, but not the distance to the plane (i.e. the transformation that you obtain is up to scale), as Wikipedia explains. I don't know any implementation to do it, but here you are a couple of papers that deal with that problem (I warn you it is not straightforward): Faugeras & Lustman 1988, Vargas & Malis 2005.
You can recover the real translation of the transformation (i.e. the distance to the plane) if you have at least a real distance between two points on the plane. If that is the case, the easiest way to go with OpenCV is to first calculate the homography, then obtain four points on the plane with their 2D coordinates and the real 3D ones (you should be able to obtain them if you have a real measurement on the plane), and using PnP finally. PnP will give you a real transformation.

Rectifying an image is defined as making epipolar lines horizontal and lying in the same row in both images. From your description I get that you simply want to warp the plane such that it is parallel to the camera sensor or the image plane. This has nothing to do with rectification - I’d rather call it an obtaining a bird’s-eye view or a top view.
I see the source of confusion though. Rectification of images usually involves multiplication of each image with a homography matrix. In your case though each point in sensor plane b:
Xb = Hab * Xa = (Hb * Ha^-1) * Xa, where Ha is homography from the plane in the world to the sensor a; Ha and intrinsic camera matrix will give you a plane orientation but I don’t see an easy way to decompose Hab into Ha and Hb.
A classic (and hard) way is to find a Fundamental matrix, restore the Essential matrix from it, decompose the Essential matrix into camera rotation and translation (up to scale), rectify both images, perform a dense stereo, then fit a plane equation into 3d points you reconstruct.
If you interested in the ground plane and you operate an embedded device though, you don’t even need two frames - a top view can be easily recovered from a single photo, camera elevation from the ground (H) and a gyroscope (or orientation vector) readings. A simple diagram below explains the process in 2D case: first picture shows how to restore Z (depth) coordinate to every point on the ground plane; the second picture shows a plot of the top view with vertical axis being z and horizontal axis x = (img.col-w/2)*Z/focal; Here is img.col is image column, w - image width, and focal is camera focal length. Note that a camera frustum looks like a trapezoid in a birds eye view.

Related

Measure distance to object with a single camera in a static scene

let's say I am placing a small object on a flat floor inside a room.
First step: Take a picture of the room floor from a known, static position in the world coordinate system.
Second step: Detect the bottom edge of the object in the image and map the pixel coordinate to the object position in the world coordinate system.
Third step: By using a measuring tape measure the real distance to the object.
I could move the small object, repeat this three steps for every pixel coordinate and create a lookup table (key: pixel coordinate; value: distance). This procedure is accurate enough for my use case. I know that it is problematic if there are multiple objects (an object could cover an other object).
My question: Is there an easier way to create this lookup table? Accidentally changing the camera angle by a few degrees destroys the hard work. ;)
Maybe it is possible to execute the three steps for a few specific pixel coordinates or positions in the world coordinate system and perform some "calibration" to calculate the distances with the computed parameters?
If the floor is flat, its equation is that of a plane, let
a.x + b.y + c.z = 1
in the camera coordinates (the origin is the optical center of the camera, XY forms the focal plane and Z the viewing direction).
Then a ray from the camera center to a point on the image at pixel coordinates (u, v) is given by
(u, v, f).t
where f is the focal length.
The ray hits the plane when
(a.u + b.v + c.f) t = 1,
i.e. at the point
(u, v, f) / (a.u + b.v + c.f)
Finally, the distance from the camera to the point is
p = √(u² + v² + f²) / (a.u + b.v + c.f)
This is the function that you need to tabulate. Assuming that f is known, you can determine the unknown coefficients a, b, c by taking three non-aligned points, measuring the image coordinates (u, v) and the distances, and solving a 3x3 system of linear equations.
From the last equation, you can then estimate the distance for any point of the image.
The focal distance can be measured (in pixels) by looking at a target of known size, at a known distance. By proportionality, the ratio of the distance over the size is f over the length in the image.
Most vision libraries (including opencv) have built in functions that will take a couple points from a camera reference frame and the related points from a Cartesian plane and generate your warp matrix (affine transformation) for you. (some are fancy enough to include non-linearity mappings with enough input points, but that brings you back to your time to calibrate issue)
A final note: most vision libraries use some type of grid to calibrate off of ie a checkerboard patter. If you wrote your calibration to work off of such a sheet, then you would only need to measure distances to 1 target object as the transformations would be calculated by the sheet and the target would just provide the world offsets.
I believe what you are after is called a Projective Transformation. The link below should guide you through exactly what you need.
Demonstration of calculating a projective transformation with proper math typesetting on the Math SE.
Although you can solve this by hand and write that into your code... I strongly recommend using a matrix math library or even writing your own matrix math functions prior to resorting to hand calculating the equations as you will have to solve them symbolically to turn it into code and that will be very expansive and prone to miscalculation.
Here are just a few tips that may help you with clarification (applying it to your problem):
-Your A matrix (source) is built from the 4 xy points in your camera image (pixel locations).
-Your B matrix (destination) is built from your measurements in in the real world.
-For fast recalibration, I suggest marking points on the ground to be able to quickly place the cube at the 4 locations (and subsequently get the altered pixel locations in the camera) without having to remeasure.
-You will only have to do steps 1-5 (once) during calibration, after that whenever you want to know the position of something just get the coordinates in your image and run them through step 6 and step 7.
-You will want your calibration points to be as far away from eachother as possible (within reason, as at extreme distances in a vanishing point situation, you start rapidly losing pixel density and therefore source image accuracy). Make sure that no 3 points are colinear (simply put, make your 4 points approximately square at almost the full span of your camera fov in the real world)
ps I apologize for not writing this out here, but they have fancy math editing and it looks way cleaner!
Final steps to applying this method to this situation:
In order to perform this calibration, you will have to set a global home position (likely easiest to do this arbitrarily on the floor and measure your camera position relative to that point). From this position, you will need to measure your object's distance from this position in both x and y coordinates on the floor. Although a more tightly packed calibration set will give you more error, the easiest solution for this may simply be to have a dimension-ed sheet(I am thinking piece of printer paper or a large board or something). The reason that this will be easier is that it will have built in axes (ie the two sides will be orthogonal and you will just use the four corners of the object and used canned distances in your calibration). EX: for a piece of paper your points would be (0,0), (0,8.5), (11,8.5), (11,0)
So using those points and the pixels you get will create your transform matrix, but that still just gives you a global x,y position on axes that may be hard to measure on (they may be skew depending on how you measured/ calibrated). So you will need to calculate your camera offset:
object in real world coords (from steps above): x1, y1
camera coords (Xc, Yc)
dist = sqrt( pow(x1-Xc,2) + pow(y1-Yc,2) )
If it is too cumbersome to try to measure the position of the camera from global origin by hand, you can instead measure the distance to 2 different points and feed those values into the above equation to calculate your camera offset, which you will then store and use anytime you want to get final distance.
As already mentioned in the previous answers you'll need a projective transformation or simply a homography. However, I'll consider it from a more practical view and will try to summarize it short and simple.
So, given the proper homography you can warp your picture of a plane such that it looks like you took it from above (like here). Even simpler you can transform a pixel coordinate of your image to world coordinates of the plane (the same is done during the warping for each pixel).
A homography is basically a 3x3 matrix and you transform a coordinate by multiplying it with the matrix. You may now think, wait 3x3 matrix and 2D coordinates: You'll need to use homogeneous coordinates.
However, most frameworks and libraries will do this handling for you. What you need to do is finding (at least) four points (x/y-coordinates) on your world plane/floor (preferably the corners of a rectangle, aligned with your desired world coordinate system), take a picture of them, measure the pixel coordinates and pass both to the "find-homography-function" of your desired computer vision or math library.
In OpenCV that would be findHomography, here an example (the method perspectiveTransform then performs the actual transformation).
In Matlab you can use something from here. Make sure you are using a projective transformation as transform type. The result is a projective tform, which can be used in combination with this method, in order to transform your points from one coordinate system to another.
In order to transform into the other direction you just have to invert your homography and use the result instead.

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

OpenCV - Projection, homography matrix and bird's eye view

I'd like to get homography matrix to Bird's eye view and I know the projection Matrix of the camera. Is there any relation between them?
Thanks.
A projection matrix is defined as a product of camera's intrinsic (e.g. focal length, principal points, etc.) and extrinsic (rotation and translation) matrices. The question is w.r.t what your rotation and translation are? For example, I can imagine another camera or an object in 3D with respect to which you have these rotations and translations. Otherwise your projection is just an intrinsic matrix.
Think first about the pieces of information you need to know to obtain a bird’s eye view: you need to know at least how your camera is oriented w.r.t ground surface. If you also know camera elevation you can create a metric reconstruction. But since you mentioned a homography, I assume that you consider a bird’s eye view of a flat surface since a homography maps the points on two flat surfaces, in your case the points on a flat ground to the points on your flat sensor.
Let’s consider a pinhole camera equation. It basically says that
[u, v, 1]T ~ A*[R|t][x, y, z, 1]T, where A is a camera intrinsic matrix. Now since you deal with a ground plane, you can align a new coordinate system with it by setting z=0; R|t are rotation and translation matrices from this coordinate system into your camera-aligned system;
Next, note that your R|t is a 3x4 matrix and it looses one dimension since z=0; it becomes 3x3 or Homography which is equal now to H=A*R’|t; Ok all we did was proving that a homography mapping existed between the ground and your sensor;
Now, you want another kind of homography that happens during pure camera rotations and zooms between points on sensor before and after rotations/zoom; that is you want to rotate the camera down and possibly zoom out. Again, think in terms of a pinhole camera equation: originally you had H1=A ( here I threw out R|T as irrelevant for now) and then you rotated your camera and you have H2=AR; in other words, H1 is how you make your image now and H2 is how you want your image look like.
The relations between two is what you want to find, H12, and it is also a homography since the Homography is a family of transformations (use this simple heuristics: what happens in a family stays in the family). Since the same surface can generate images either with H1 or H2 we can assemble H12 by undoing H1 (back to the ground plane) and applying H2 (from the ground to a sensor bird’s eye view); in a way this resembles operations with vectors, you just have to respect the order of matrix application from the right to the left:
H12 = H2*H1-1=A*R*A-1=P*A-1 , where we substituted the expressions for H1, H2 and finally for a projection matrix (in case you do have it)
This is your answer, and if the rotation R is unknown it can be guessed from the camera orientation w.r.t. the ground or calculated using solvePnP() from the opeCV library. Finally, when I do this on a cell phone I just use its accelerometer readings as a good approximation since when a cell phone is not accelerated the readings represent a gravity vector which gives the rotation w.r.t. flat horizontal ground.
When you plot your bird’s eye view as an image you will notice that its boundaries turned from rectangular into some kind of a trapezoid (due to a camera frustum shape) and there are some holes at the distant locations (due to the insufficient sampling rate). You can interpolate inside the holes using wrapPerspective()

Converting a 2D image point to a 3D world point

I know that in the general case, making this conversion is impossible since depth information is lost going from 3d to 2d.
However, I have a fixed camera and I know its camera matrix. I also have a planar calibration pattern of known dimensions - let's say that in world coordinates it has corners (0,0,0) (2,0,0) (2,1,0) (0,1,0). Using opencv I can estimate the pattern's pose, giving the translation and rotation matrices needed to project a point on the object to a pixel in the image.
Now: this 3d to image projection is easy, but how about the other way? If I pick a pixel in the image that I know is part of the calibration pattern, how can I get the corresponding 3d point?
I could iteratively choose some random 3d point on the calibration pattern, project to 2d, and refine the 3d point based on the error. But this seems pretty horrible.
Given that this unknown point has world coordinates something like (x,y,0) -- since it must lie on the z=0 plane -- it seems like there should be some transformation that I can apply, instead of doing the iterative nonsense. My maths isn't very good though - can someone work out this transformation and explain how you derive it?
Here is a closed form solution that I hope can help someone. Using the conventions in the image from your comment above, you can use centered-normalized pixel coordinates (usually after distortion correction) u and v, and extrinsic calibration data, like this:
|Tx| |r11 r21 r31| |-t1|
|Ty| = |r12 r22 r32|.|-t2|
|Tz| |r13 r23 r33| |-t3|
|dx| |r11 r21 r31| |u|
|dy| = |r12 r22 r32|.|v|
|dz| |r13 r23 r33| |1|
With these intermediate values, the coordinates you want are:
X = (-Tz/dz)*dx + Tx
Y = (-Tz/dz)*dy + Ty
Explanation:
The vector [t1, t2, t3]t is the position of the origin of the world coordinate system (the (0,0) of your calibration pattern) with respect to the camera optical center; by reversing signs and inversing the rotation transformation we obtain vector T = [Tx, Ty, Tz]t, which is the position of the camera center in the world reference frame.
Similarly, [u, v, 1]t is the vector in which lies the observed point in the camera reference frame (starting from camera center). By inversing the rotation transformation we obtain vector d = [dx, dy, dz]t, which represents the same direction in world reference frame.
To inverse the rotation transformation we take advantage of the fact that the inverse of a rotation matrix is its transpose (link).
Now we have a line with direction vector d starting from point T, the intersection of this line with plane Z=0 is given by the second set of equations. Note that it would be similarly easy to find the intersection with the X=0 or Y=0 planes or with any plane parallel to them.
Yes, you can. If you have a transformation matrix that maps a point in the 3d world to the image plane, you can just use the inverse of this transformation matrix to map a image plane point to the 3d world point. If you already know that z = 0 for the 3d world point, this will result in one solution for the point. There will be no need to iteratively choose some random 3d point. I had a similar problem where I had a camera mounted on a vehicle with a known position and camera calibration matrix. I needed to know the real world location of a lane marking captured on the image place of the camera.
If you have Z=0 for you points in world coordinates (which should be true for planar calibration pattern), instead of inversing rotation transformation, you can calculate homography for your image from camera and calibration pattern.
When you have homography you can select point on image and then get its location in world coordinates using inverse homography.
This is true as long as the point in world coordinates is on the same plane as the points used for calculating this homography (in this case Z=0)
This approach to this problem was also discussed below this question on SO: Transforming 2D image coordinates to 3D world coordinates with z = 0

Project 2d points in camera 1 image to camera 2 image after a stereo calibration

I am doing stereo calibration of two cameras (let's name them L and R) with opencv. I use 20 pairs of checkerboard images and compute the transformation of R with respect to L. What I want to do is use a new pair of images, compute the 2d checkerboard corners in image L, transform those points according to my calibration and draw the corresponding transformed points on image R with the hope that they will match the corners of the checkerboard in that image.
I tried the naive way of transforming the 2d points from [x,y] to [x,y,1], multiply by the 3x3 rotation matrix, add the rotation vector and then divide by z, but the result is wrong, so I guess it's not that simple (?)
Edit (to clarify some things):
The reason I want to do this is basically because I want to validate the stereo calibration on a new pair of images. So, I don't actually want to get a new 2d transformation between the two images, I want to check if the 3d transformation I have found is correct.
This is my setup:
I have the rotation and translation relating the two cameras (E), but I don't have rotations and translations of the object in relation to each camera (E_R, E_L).
Ideally what I would like to do:
Choose the 2d corners in image from camera L (in pixels e.g. [100,200] etc).
Do some kind of transformation on the 2d points based on matrix E that I have found.
Get the corresponding 2d points in image from camera R, draw them, and hopefully they match the actual corners!
The more I think about it though, the more I am convinced that this is wrong/can't be done.
What I am probably trying now:
Using the intrinsic parameters of the cameras (let's say I_R and I_L), solve 2 least squares systems to find E_R and E_L
Choose 2d corners in image from camera L.
Project those corners to their corresponding 3d points (3d_points_L).
Do: 3d_points_R = (E_L).inverse * E * E_R * 3d_points_L
Get the 2d_points_R from 3d_points_R and draw them.
I will update when I have something new
It is actually easy to do that but what you're making several mistakes. Remember after stereo calibration R and L relate the position and orientation of the second camera to the first camera in the first camera's 3D coordinate system. And also remember to find the 3D position of a point by a pair of cameras you need to triangulate the position. By setting the z component to 1 you're making two mistakes. First, most likely you have used the common OpenCV stereo calibration code and have given the distance between the corners of the checker board in cm. Hence, z=1 means 1 cm away from the center of camera, that's super close to the camera. Second, by setting the same z for all the points you are saying the checker board is perpendicular to the principal axis (aka optical axis, or principal ray), while most likely in your image that's not the case. So you're transforming some virtual 3D points first to the second camera's coordinate system and then projecting them onto the image plane.
If you want to transform just planar points then you can find the homography between the two cameras (OpenCV has the function) and use that.

Resources