How to generate a probability distribution on an image - image-processing

I have a question as follows:
Suppose I have an image(size=360x640(row by col)), and I have a center coordinate that's say is (20, 100). What I want is to generate a probability distribution that has the highest value in that center (20,100), and lower probability value in the neighbor and much more lower value farer than the center.
All I figure out is to put a multivariate gaussian (since the dimension is 2D) and set mean to the center(20,100). But is that correct and how do I design the covariance matrix?
Thanks!!

You could do it in 2D by generating radial and polar coordinates
Along the line:
Pi = 3.1415926
cx = 20
cy = 100
r = sqrt( -2*log(1-U(0,1)) )
a = 2*Pi*U(0,1)
x = scale*r*cos(a)
y = scale*r*sin(a)
return (x + cx, y + cy)
where scale is a parameter to make it from unitless gaussian to some unit applicable to your problem. U(0,1) is uniform in [0...1) random value.
Reference: Box-Muller sampling.
If you want generic 2D gaussian, meaning ellipse in 2D, then you'll have to use different scales for X and Y, and rotate (x,y) vector by predefined angle using well-known rotation matrix

Related

Apply relative radial distortion function to image w/o knowing anything about the camera

I've a radial distortion function which gives me relative distortion from 0 (image center) to the relative full image field (field height 1) in percent. For example this function would give me a distortion of up to 5% at the full relative field height of 1.
I tried to use this together with opencv undistort function to apply distortion but don't know how to fill the matrices.
As said, I've a source image only and don't know anything about the camera parameters like focal length, except that I know the distortion function.
How should I set the matrix in cv2.undistort(src_image, matrix, ...) ?
The OpenCv routine that's easier to use in your case is cv::remap, not undistort.
In the following I assume your distortion purely radial. Similar considerations apply if you have it already decomposed in (x, y).
So you have a distortion function d(r) of the distance r = sqrt((x - x_c)^2 + (y - y_c)^2) of a pixel (x, y) from the image center (x_c, y_c). The function expresses the relative change of the radius r_d of a pixel in the distorted image from the undistorted one r: (r_d - r) / r = d(r), or, equivalently, r_d = r * (1 - d(r)).
If you are given a distorted image, and want to remove the distortion, you need to invert the above equation (i.e. solve it analytically or numerically), finding the value of r for every r_d in the range of interest. Then you can trivially create two arrays, map_x and map_y, that represent the mapping from distorted to undistorted coordinates: for a given pair (x_d, y_d) of integer pixel coordinates in the distorted image, you compute the associated r_d = sqrt(((x_d - x_c)^2 + (y_d - y_c)^2), then the corresponding r as function of r_d from solving the equation, go back to (x, y), and assign map_x[y_d, x_d] = x; map_y[y_d, x_d] = y. Finally, you pass those to cv::remap.

Correct non-nadir view for GSD calculation (UAV)

Hello stackoverflow community,
So I am working on a project that requires calculating the ground sampling distance (GSD) in order to retrive the meter/pixel scale.
The GSD for nadir view (camera looking directly to the ground) formula is as follow :
GSD = (flight altitude x sensor height) / (focal length x image height and/or width).
and I read on multiple article like : https://www.mdpi.com/2072-4292/13/4/573
That if the camera has a tilt angle on one axis a correction as follow is requried :
where θ is the tilt angle and phi as they said in the article :
φ describes the angular position of the pixel in the image: it is
zero in correspondence of the optical axis of the camera, while it can
have positive or negative values for the other pixels
and the figure on their article is this :
So I hope you are on the same page as me, now I have two questions :
1- First how do I exactly calculate the angular position of a given pixel with respect to the optical axis (how to calculate the phi)
2- The camera in my case is rotated on two axis & not just one like their example, like the camera doesn't look exactly to the road but like oriented to one of the sides, more like this one :
So would there be more changes on the formula ? I am not sure how to get the right formula geometrically
The angular position of a pixel
As explained in the article you linked, you can compute the pixel angle by knowing the camera intrinsic parameters. Firstly let's do a bit of theory: the intrinsics matrix is used to compute the projection of a world point in the image plane of the camera. The OpenCV documentation explains it very well, it is expressed like this:
( x ) ( fx 0 cx ) ( X )
s * ( y ) = ( 0 fy cy ) * ( Y )
( 1 ) ( 0 0 1 ) ( Z )
where fx,fy is your focals, cx,cy is the optical centre, x,y is the position of the pixel in your image and X,Y,Z is your world point in meters or millimetres or whatever.
Now by inverting the matrix you can instead compute the world vector from the pixel position. World vector and not world point because the distance d between the camera and the real object is unknown.
( X ) ( x )
d * ( Y ) = A^-1 * ( y )
( Z ) ( 1 )
And then you can simply compute the angle between the optical axis and this world vector to get your phi angle, for example with the formula detailed in this answer using the y-axis of the camera as normal. In pseudo-code:
intrinsic_inv = invert(intrinsic)
world_vector = multiply(intrinsic_inv, (x, y, 1))
optical axis = (0, 0, 1)
normal = (0, 1, 0)
dot = dot_product(world_vector, optical_axis)
det = dot_product(normal, cross_product(world_vector, optical_axis))
phi = atan2(det, dot)
The camera angles
You can express the rotation of the camera by three angles: the tilt, the pan, and the roll angles. Take a look at this image I quickly googled if you want to visualize what they correspond to.
The tilt angle is the one named theta in your article, you already know it. The pan angle doesn't have an impact on the GSD, at least if we suppose that the ground is perfectly flat. If the pan angle was what you were referring to with the second rotation axis, then you'll have nothing to do.
However, if you have a non-zero roll angle this will become tricky. If you are in that case I would recommend a paradigm change to avoid dealing with angles. You can instead express the camera position using an affine transformation (rotation matrix and translation vector). This will allow you to transform the problem into a general analytical geometry problem, and then estimate the depths and scales by doing the intersection of the world vector with the ground plane. It would change the previous pseudo-code to give something like:
intrinsic_inv = invert(intrinsic)
world_vector = multiply(intrinsic_inv, (x, y, 1))
world_vector = multiply(rotation, world_vector) + translation
world_point = intersection(world_vector, ground_plane)
And then the scale can be computed by doing the differences between adjacent pixel world points.

pytorch affine_grid: what is the theta input?

When trying to use torch.nn.functional.affine_grid, it requires a theta affine matrix of size (N x 3 x 4) according to the documentation. I thought a general affine matrix is (N x 4 x 4). What is the supposed affine matrix format in pytorch?
An example of 3D rotation affine input would be ideal. Appreciate your help.
The dimensions you mention are applicable for the case of 3D inputs, that is you wish to apply 3D geometric transforms on the input tensor x of shape bxcxdxhxw.
A transformation to points in 3D (represented as 4-vector in homogeneous coordinates as (x, y, z, 1)) should be, in the general case, a 4x4 matrix as you noted.
However, since we restrict ourselves to homogeneous coordinates, i.e., the fourth coordinate must be 1, the 4th row of the matrix must be (0, 0, 0, 1) (see this).
Therefore, there's no need to explicitly code this last row.
To conclude, a 3D transformation composed of a 3x3 rotation R and 3d translation t is simply the 3x4 matrix:
theta = [R t]

Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]
First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.
To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

How to transform an image based on the position of camera

I'm trying to create a perspective projection of an image based on the look direction. I'm unexperienced on this field and can't manage to do that myself, however. Will you help me, please?
There is an image and an observer (camera). If camera can be considered an object on an invisible sphere and the image a plane going through the middle of the sphere, then camera position can be expressed as:
x = d cos(θ) cos(φ)
y = d sin(θ)
z = d sin(φ) cos(θ)
Where θ is latitude, φ is longitude and d is the distance (radius) from the middle of the sphere where the middle of the image is.
I found these formulae somwhere, but I'm not sure about the coordinates (I don't know but it looks to me that x should be z but I guess it depends on the coordinate system).
Now, what I need to do is make a proper transformation of my image so it looks as if viewed from the camera (in a proper perspective). Would you be so kind to tell me a few words how this could be done? What steps should I take?
I'm developing an iOS app and I thought I could use the following method from the QuartzCore. But I have no idea what angle I should pass to this method and how to derive the new x, y, z coordinates from the camera position.
CATransform3D CATransform3DRotate (CATransform3D t, CGFloat angle,
CGFloat x, CGFloat y, CGFloat z)
So far I have successfully created a simple viewing perspective by:
using an identity matrix (as the CATransform3D parameter) with .m34 set to 1/-1000,
rotating my image by the angle of φ with the (0, 1, 0) vector,
concatenating the result with a rotation by θ and the (1, 0, 0) vector,
scaling based on the d is ignored (I scale the image based on some other criteria).
But the result I got was not what I wanted (which was obvious) :-/. The perspective looks realistic as long as one of these two angles is close to 0. Therefore I thought there could be a way to calculate somehow a proper angle and the x, y and z coordinates to achieve a proper transformation (which might be wrong because it's just my guess).
I think I managed to find a solution, but unfortunately based on my own calculations, thoughts and experiments, so I have no idea if it is correct. Seems to be OK, but you know...
So if the coordinate system is like this:
and the plane of the image to be transformed goes through the X and the Y axis, and its centre is in the origin of the system, then the following coordinates:
x = d sin(φ) cos(θ)
y = d sin(θ)
z = d cos(θ) cos(φ)
define a vector that starts in the origin of the coordinate system and points to the position of the camera that is observing the image. The d can be set to 1 so we get a unit vector at once without further normalization. Theta is the angle in the ZY plane and phi is the angle in the ZX plane. Theta raises from 0° to 90° from the Z+ to the Y+ axis, whereas phi raises from 0° to 90° from the Z+ to the X+ axis (and to -90° in the opposite direction, in both cases).
Hence the transformation vector is:
x1 = -y / z
y1 = -x / z
z1 = 0.
I'm not sure about z1 = 0, however rotation around the Z axis seemed wrong to me.
The last thing to calculate is the angle by which the image has to be transformed. In my humble opinion this should be the angle between the vector that points to the camera (x, y, z) and the vector normal to the image, which is the Z axis (0, 0, 1).
The dot product of two vectors gives the cosine of the angle between them, so the angle is:
α = arccos(x * 0 + y * 0 + z * 1) = arccos(z).
Therefore the alpha angle and the x1, y1, z1 coordinates are the parameters of CATransform3DRotate method I mentioned in my question.
I would be grateful if somebody could tell me if this approach is correct. Thanks a lot!

Resources