How to transform an image based on the position of camera - ios

I'm trying to create a perspective projection of an image based on the look direction. I'm unexperienced on this field and can't manage to do that myself, however. Will you help me, please?
There is an image and an observer (camera). If camera can be considered an object on an invisible sphere and the image a plane going through the middle of the sphere, then camera position can be expressed as:
x = d cos(θ) cos(φ)
y = d sin(θ)
z = d sin(φ) cos(θ)
Where θ is latitude, φ is longitude and d is the distance (radius) from the middle of the sphere where the middle of the image is.
I found these formulae somwhere, but I'm not sure about the coordinates (I don't know but it looks to me that x should be z but I guess it depends on the coordinate system).
Now, what I need to do is make a proper transformation of my image so it looks as if viewed from the camera (in a proper perspective). Would you be so kind to tell me a few words how this could be done? What steps should I take?
I'm developing an iOS app and I thought I could use the following method from the QuartzCore. But I have no idea what angle I should pass to this method and how to derive the new x, y, z coordinates from the camera position.
CATransform3D CATransform3DRotate (CATransform3D t, CGFloat angle,
CGFloat x, CGFloat y, CGFloat z)
So far I have successfully created a simple viewing perspective by:
using an identity matrix (as the CATransform3D parameter) with .m34 set to 1/-1000,
rotating my image by the angle of φ with the (0, 1, 0) vector,
concatenating the result with a rotation by θ and the (1, 0, 0) vector,
scaling based on the d is ignored (I scale the image based on some other criteria).
But the result I got was not what I wanted (which was obvious) :-/. The perspective looks realistic as long as one of these two angles is close to 0. Therefore I thought there could be a way to calculate somehow a proper angle and the x, y and z coordinates to achieve a proper transformation (which might be wrong because it's just my guess).

I think I managed to find a solution, but unfortunately based on my own calculations, thoughts and experiments, so I have no idea if it is correct. Seems to be OK, but you know...
So if the coordinate system is like this:
and the plane of the image to be transformed goes through the X and the Y axis, and its centre is in the origin of the system, then the following coordinates:
x = d sin(φ) cos(θ)
y = d sin(θ)
z = d cos(θ) cos(φ)
define a vector that starts in the origin of the coordinate system and points to the position of the camera that is observing the image. The d can be set to 1 so we get a unit vector at once without further normalization. Theta is the angle in the ZY plane and phi is the angle in the ZX plane. Theta raises from 0° to 90° from the Z+ to the Y+ axis, whereas phi raises from 0° to 90° from the Z+ to the X+ axis (and to -90° in the opposite direction, in both cases).
Hence the transformation vector is:
x1 = -y / z
y1 = -x / z
z1 = 0.
I'm not sure about z1 = 0, however rotation around the Z axis seemed wrong to me.
The last thing to calculate is the angle by which the image has to be transformed. In my humble opinion this should be the angle between the vector that points to the camera (x, y, z) and the vector normal to the image, which is the Z axis (0, 0, 1).
The dot product of two vectors gives the cosine of the angle between them, so the angle is:
α = arccos(x * 0 + y * 0 + z * 1) = arccos(z).
Therefore the alpha angle and the x1, y1, z1 coordinates are the parameters of CATransform3DRotate method I mentioned in my question.
I would be grateful if somebody could tell me if this approach is correct. Thanks a lot!

Related

Correct non-nadir view for GSD calculation (UAV)

Hello stackoverflow community,
So I am working on a project that requires calculating the ground sampling distance (GSD) in order to retrive the meter/pixel scale.
The GSD for nadir view (camera looking directly to the ground) formula is as follow :
GSD = (flight altitude x sensor height) / (focal length x image height and/or width).
and I read on multiple article like : https://www.mdpi.com/2072-4292/13/4/573
That if the camera has a tilt angle on one axis a correction as follow is requried :
where θ is the tilt angle and phi as they said in the article :
φ describes the angular position of the pixel in the image: it is
zero in correspondence of the optical axis of the camera, while it can
have positive or negative values for the other pixels
and the figure on their article is this :
So I hope you are on the same page as me, now I have two questions :
1- First how do I exactly calculate the angular position of a given pixel with respect to the optical axis (how to calculate the phi)
2- The camera in my case is rotated on two axis & not just one like their example, like the camera doesn't look exactly to the road but like oriented to one of the sides, more like this one :
So would there be more changes on the formula ? I am not sure how to get the right formula geometrically
The angular position of a pixel
As explained in the article you linked, you can compute the pixel angle by knowing the camera intrinsic parameters. Firstly let's do a bit of theory: the intrinsics matrix is used to compute the projection of a world point in the image plane of the camera. The OpenCV documentation explains it very well, it is expressed like this:
( x ) ( fx 0 cx ) ( X )
s * ( y ) = ( 0 fy cy ) * ( Y )
( 1 ) ( 0 0 1 ) ( Z )
where fx,fy is your focals, cx,cy is the optical centre, x,y is the position of the pixel in your image and X,Y,Z is your world point in meters or millimetres or whatever.
Now by inverting the matrix you can instead compute the world vector from the pixel position. World vector and not world point because the distance d between the camera and the real object is unknown.
( X ) ( x )
d * ( Y ) = A^-1 * ( y )
( Z ) ( 1 )
And then you can simply compute the angle between the optical axis and this world vector to get your phi angle, for example with the formula detailed in this answer using the y-axis of the camera as normal. In pseudo-code:
intrinsic_inv = invert(intrinsic)
world_vector = multiply(intrinsic_inv, (x, y, 1))
optical axis = (0, 0, 1)
normal = (0, 1, 0)
dot = dot_product(world_vector, optical_axis)
det = dot_product(normal, cross_product(world_vector, optical_axis))
phi = atan2(det, dot)
The camera angles
You can express the rotation of the camera by three angles: the tilt, the pan, and the roll angles. Take a look at this image I quickly googled if you want to visualize what they correspond to.
The tilt angle is the one named theta in your article, you already know it. The pan angle doesn't have an impact on the GSD, at least if we suppose that the ground is perfectly flat. If the pan angle was what you were referring to with the second rotation axis, then you'll have nothing to do.
However, if you have a non-zero roll angle this will become tricky. If you are in that case I would recommend a paradigm change to avoid dealing with angles. You can instead express the camera position using an affine transformation (rotation matrix and translation vector). This will allow you to transform the problem into a general analytical geometry problem, and then estimate the depths and scales by doing the intersection of the world vector with the ground plane. It would change the previous pseudo-code to give something like:
intrinsic_inv = invert(intrinsic)
world_vector = multiply(intrinsic_inv, (x, y, 1))
world_vector = multiply(rotation, world_vector) + translation
world_point = intersection(world_vector, ground_plane)
And then the scale can be computed by doing the differences between adjacent pixel world points.

Cartesian to Spherical coordinate conversion specific case when Φ is zero and θ is indeterminant, Phase unwrapping

Following is the conversion for spherical to cartesian coordinate
X = r cosθ sinΦ
Y = r sinθ sinΦ
Z = rcosΦ
we are using the reverse computation to compute spherical coordinate from cartesian coordinate which is defined as
r = √(x^2+y^2+z^2 )
θ = atan⁡(Y./X)
Φ = atan⁡(√(X^2+Y^2 )./Z)
The problem arises when Y and X are zero so θ can take any arbitrary value so during Matlab computations this results in NAN(not a number ) which makes θ discontinuous. Is there any interpolation technique to remove this discontinuity and how to interpret θ in this case.
θ is a matrix at various point and it gives following result it has jumps and black patched that represent discontinuity whereas I need to generate the following image with smooth variation. Please see the obtained theta and correct theta variation by clicking on the link and suggest some changes.
Discontinuous_Theta_variation
Correct Theta variation
While doing conversion from Cartesian to Spherical coordinate system, however the formulas which are written here are correct but you first need to understand their physical significance.
'r' is the distance of the point from origin. θ is the angle from the positive x axis to the line which is made by projecting the given point to XY plane. And Φ being the angle from positive z-axis to the line which joins origin and given point.
http://www.learningaboutelectronics.com/Articles/Cartesian-rectangular-to-spherical-coordinate-converter-calculator.php#answer
So say, for a point which has X and Y coordinates as 0, that means it lies on z axis and hence, its projection on XY plane lies on the origin. So we cannot exactly determine the angle of origin from X axis. But please note that, since the point lies on Z axis, so Φ=0 or pi (depending whether Z is positive or negative).
So while coding this problem, you may adapt this approach that you first check for Φ, if it is 0 or pi then theta = 0 (by default).
I hope this serves the purpose.

Estimate depth of a 2D pixel given intrinsic, extrinsic, and a constraint of Y=0

I have a single-view camera at a certain height (h) from the ground. Through calibration I have obtained intrinsic parameters K, Rotation matrix and translation vector [R|t] and, because I have full access to the camera and the environment, I can measure whatever I want.
My goal is to estimate depth of a pixel [u,v] on the camera given that I know that the pixel is on the floor (so it is at y=-h with respect to the camera).
Given this constraint, I did the following (without success):
create a new 3D point P1 from [u, v] and the camera parameters + focal length: [u - cx, v - cy, f]
multiply P by the inverse of my camera matrix K and call the result P2
multiply P2 by the inverse of the [R|t] matrix and call the result P3
P3 is a 4x1 vector, so we normalize it and bring it down to 3x1 [X1, Y1, Z1]. This point should be the world coordiante projection of my [u, v] point
Solve X and Z when Y=-h in the following way:
x = x1 * (-h / y1)
y = z1 * (-h / y1)
Unfortunately it dosen't look right! I have settled on this problem for 2 weeks now, so it would be really great to get some help from the community. I'm sure it's something obvious that I am missing out.
Thanks again
The homogeneous image coordinate is P1 = [u,v,1], or [f*u,f*v,f].
The multiplication with the inverse of the camera matrix gives you a ray along which the 3D point is located.
P2 ~= K⁻¹ * P1 (~= is equality up to a scale factor)
Let's assume the camera is located at C (which is (0,0,0,1) in the camera's coordinate system), and the vector P2 has the form [x,y,z,0]. (The zero at the end makes it translation invariant!)
Then the 3D point you are looking for is located at C + k*P2 and you must solve for the variable k.
P3 = Rt⁻¹ * (C + k*P2)
P4 = C2 + k * P3
C2 is the camera position in world coordinates.
P3 is the vector in world coordinates. P4 is your point at Y=-h
Finally, plug in your constraint Y=-h and calculate k using the y components:
k = (-h - C2_y) / P3_y

Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]
First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.
To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

Hough Transform Equation

I was wondering why the Hough Transform uses rho=xcos(theta) + ysin(theta) for representation of a straight line (y=mx+b). I tried to work through this (and went to the wikipedia article about this), but can not find a way to derive one from the other.
Does anyone know how to derive one from the other?
Thank you in advance.
Derivation:
The equation x/a + y/b = 1:
defines a line
has x-intercept = a
has y-intercept = b
From trigonometry, recall how a ray rotated by angle t will project onto the x- and y- axes according to (angle=t, radius=1) -> (x=cos(t), y=sin(t))*
Draw the tangent line at the labelled point. Trigonometry (or even geometry with similar triangles) tells us that the tangent line intersects at x=1/cos(t), y=1/sin(t). Thus the line a distance 1 away will have a=1/cos(t) and b=1/sin(t), and thus described by x/(1/cos(t)) + y/(1/sin(t)) = 1...
... which is just cos(t) x + sin(t) y = rho where rho=1
You can see that rho corresponds to how far the line is from the origin (either by playing around with the equation, or by noting that multiplication here just scales all values by the same amount, effectively rescaling the grid).
*see http://en.wikipedia.org/wiki/File:Unit_circle.svg for credit
That's just a transform from a linear coordinate system to a rotational one. The reason for this is outlined in the Wikipedia article:
In the Hough transform, a main idea is to consider the characteristics of the straight line not as image points (x1, y1), (x2, y2), etc., but instead, in terms of its parameters, i.e., the slope parameter m and the intercept parameter b. Based on that fact, the straight line y = mx + b can be represented as a point (b, m) in the parameter space. However, one faces the problem that vertical lines give rise to unbounded values of the parameters m and b. For computational reasons, it is therefore better to use a different pair of parameters, denoted r and θ (theta), for the lines in the Hough transform.
And to transform between the two, use the equation y = -(cos(theta)/sin(theta))x + r/sin(theta). Thus m = -(cos(theta)/sin(theta)) and b = r/sin(theta). These obviously break down when sin(theta)=0 or theta=0, which is why the rotational coordinate system is preferred (there aren't any problems with lines with infinite slopes).
Polar coordinate system is 2-D coordinate system that has a reference point (like origin) called pole and a line from the pole called the polar axis. Each point in the Polar coordinate system is represented as (rho, theta) where 'rho' is the distance between the pole (origin) and the point and 'theta' is the angle between the polar axis and the line joining the pole and the point. See here.
The Polar coordinate (rho, theta) can be converted to Cartesian coordinate (x,y) using the following trigonometric equations.
x = rho cos theta ----(1)
y = rho sin theta ----(2)
Refer here for more info.
How do we get the above equations?
The equations use concepts of a right angle (trignonometry)
See the picture here.
cos theta = adjacent-side/hypotenuse = x/rho, thus we get (1)
sin theta = opposite-side/hypotenuse = y/rho, thus we get (2)
and Pythagorean theorem says,
hypotenuse^2 = adjacent side ^2 + opposite side^2, so
rho^2 = x^2 + y^2 ----(3)
Now let's derive the relationship between Cartesian coordinate (x,y) and Polar coordinate (rho,theta)
rho^2 = x^2 + y^2 ---- from (3)
rho^2 = x*x + y*y
rho^2 = x(rho cos theta) + y (rho sin theta) ---- from (1) and (2)
rho^2 = rho(x cos theta + y sin theta)
rho = x cos theta + y sin theta
Each pair of rho, theta relates to an x,y pair for a given line in that the distance rho from the origin at angle theta places you at an x,y coordinate on the line.
The rho, theta equation is used instead of y=mx+b so that a sequence of rho, theta values can be put into the equation without computational problems that arise with the y=mx+b method where the slope is undefined (that is, the line is vertical).

Resources