Hough Transform Equation - image-processing

I was wondering why the Hough Transform uses rho=xcos(theta) + ysin(theta) for representation of a straight line (y=mx+b). I tried to work through this (and went to the wikipedia article about this), but can not find a way to derive one from the other.
Does anyone know how to derive one from the other?
Thank you in advance.

Derivation:
The equation x/a + y/b = 1:
defines a line
has x-intercept = a
has y-intercept = b
From trigonometry, recall how a ray rotated by angle t will project onto the x- and y- axes according to (angle=t, radius=1) -> (x=cos(t), y=sin(t))*
Draw the tangent line at the labelled point. Trigonometry (or even geometry with similar triangles) tells us that the tangent line intersects at x=1/cos(t), y=1/sin(t). Thus the line a distance 1 away will have a=1/cos(t) and b=1/sin(t), and thus described by x/(1/cos(t)) + y/(1/sin(t)) = 1...
... which is just cos(t) x + sin(t) y = rho where rho=1
You can see that rho corresponds to how far the line is from the origin (either by playing around with the equation, or by noting that multiplication here just scales all values by the same amount, effectively rescaling the grid).
*see http://en.wikipedia.org/wiki/File:Unit_circle.svg for credit

That's just a transform from a linear coordinate system to a rotational one. The reason for this is outlined in the Wikipedia article:
In the Hough transform, a main idea is to consider the characteristics of the straight line not as image points (x1, y1), (x2, y2), etc., but instead, in terms of its parameters, i.e., the slope parameter m and the intercept parameter b. Based on that fact, the straight line y = mx + b can be represented as a point (b, m) in the parameter space. However, one faces the problem that vertical lines give rise to unbounded values of the parameters m and b. For computational reasons, it is therefore better to use a different pair of parameters, denoted r and θ (theta), for the lines in the Hough transform.
And to transform between the two, use the equation y = -(cos(theta)/sin(theta))x + r/sin(theta). Thus m = -(cos(theta)/sin(theta)) and b = r/sin(theta). These obviously break down when sin(theta)=0 or theta=0, which is why the rotational coordinate system is preferred (there aren't any problems with lines with infinite slopes).

Polar coordinate system is 2-D coordinate system that has a reference point (like origin) called pole and a line from the pole called the polar axis. Each point in the Polar coordinate system is represented as (rho, theta) where 'rho' is the distance between the pole (origin) and the point and 'theta' is the angle between the polar axis and the line joining the pole and the point. See here.
The Polar coordinate (rho, theta) can be converted to Cartesian coordinate (x,y) using the following trigonometric equations.
x = rho cos theta ----(1)
y = rho sin theta ----(2)
Refer here for more info.
How do we get the above equations?
The equations use concepts of a right angle (trignonometry)
See the picture here.
cos theta = adjacent-side/hypotenuse = x/rho, thus we get (1)
sin theta = opposite-side/hypotenuse = y/rho, thus we get (2)
and Pythagorean theorem says,
hypotenuse^2 = adjacent side ^2 + opposite side^2, so
rho^2 = x^2 + y^2 ----(3)
Now let's derive the relationship between Cartesian coordinate (x,y) and Polar coordinate (rho,theta)
rho^2 = x^2 + y^2 ---- from (3)
rho^2 = x*x + y*y
rho^2 = x(rho cos theta) + y (rho sin theta) ---- from (1) and (2)
rho^2 = rho(x cos theta + y sin theta)
rho = x cos theta + y sin theta

Each pair of rho, theta relates to an x,y pair for a given line in that the distance rho from the origin at angle theta places you at an x,y coordinate on the line.
The rho, theta equation is used instead of y=mx+b so that a sequence of rho, theta values can be put into the equation without computational problems that arise with the y=mx+b method where the slope is undefined (that is, the line is vertical).

Related

How to generate a probability distribution on an image

I have a question as follows:
Suppose I have an image(size=360x640(row by col)), and I have a center coordinate that's say is (20, 100). What I want is to generate a probability distribution that has the highest value in that center (20,100), and lower probability value in the neighbor and much more lower value farer than the center.
All I figure out is to put a multivariate gaussian (since the dimension is 2D) and set mean to the center(20,100). But is that correct and how do I design the covariance matrix?
Thanks!!
You could do it in 2D by generating radial and polar coordinates
Along the line:
Pi = 3.1415926
cx = 20
cy = 100
r = sqrt( -2*log(1-U(0,1)) )
a = 2*Pi*U(0,1)
x = scale*r*cos(a)
y = scale*r*sin(a)
return (x + cx, y + cy)
where scale is a parameter to make it from unitless gaussian to some unit applicable to your problem. U(0,1) is uniform in [0...1) random value.
Reference: Box-Muller sampling.
If you want generic 2D gaussian, meaning ellipse in 2D, then you'll have to use different scales for X and Y, and rotate (x,y) vector by predefined angle using well-known rotation matrix

How to find a line from polar coordinates (Hough Transform Confusion)

I recently started a CV course and am going through old homeworks (the current ones aren't released). I've implemented a Hough Lines function, I loop through each point, if it's an edge, then I loop through 0-180 (or -90 to 90) theta values, and calculate rho, and finally store in an array.
When I tried to convert back from Polar Coordinates, I can find an X,Y pair (using rho * sin(theta), and rho * cos(theta)), however I don't understand how to convert that to a line in Cartesian space. To have a line you need either 2 points or a point and a direction (assuming ray then of course)
I just understand where the point is.
I've done some searching but can't seem to quite find the answer, folks tend to say, polar tells you x, then bam you have a line in cartesian, but I seem to be missing that connection where the "bam" was.
What I mean is described here;
Explain Hough Transformation
Also Vector/line from polar coordinates
Where it's asked how do I draw a line from polar coords, which the response was well here's x and y. but to me never mentions rest of that solution.
Is the line somehow related to y = mx+b where m is theta and b is rho?
If not how do I convert back to a line in cartesian space.
EDIT:
After reviewing Sunreef's answer, and trying to convert so y was on it's own side, I discovered this answer as well:
How to convert coordinates back to image (x,y) from hough transformation (rho, theta)?
It appears what I think I'm looking for is this
m = -cotθ
c = p*cosecθ
EDIT#2
I found some other examples on the net. Basically yes I'll need rho * sin(theta) and rho*cos(theta)
The other part that was messing me up was that I needed to convert to radians, once i did that, I started getting good results.
You are right that you can get some base point at the line as
(X0, Y0) = (rho * cos(theta), rho * sin(theta))
and you can find (unit) direction vector of this line as perpendicular to normal:
(dx, dy) = ( -sin(theta), cos(theta))
Taken from Wikipedia:
The non-radial line that crosses the radial line ϕ = ɣ perpendicularly at the point (r0, ɣ) has the equation: r(ϕ) = r0 * sec(ϕ - ɣ).
If I suppose that the coordinates you have for your line are ɣ and r0, then you can rewrite this equation like this:
r(ϕ) * cos(ϕ) * cos(ɣ) + r(ϕ) * sin(ϕ) * sin(ɣ) - r0 = 0
And we know that when translating polar to cartesian coordinates, if we have a point P(r, ϕ) in the polar plane, then its coordinates in the cartesian plane will be:
x = r * cos(ϕ)
y = r * sin(ϕ)
So the equation above becomes a line equation as follows:
x * cos(ɣ) + y * sin(ɣ) - r0 = 0
This is the equation of your line in cartesian coordinates.
(Tell me if you see some mistakes, I did that quickly)

Cartesian to Spherical coordinate conversion specific case when Φ is zero and θ is indeterminant, Phase unwrapping

Following is the conversion for spherical to cartesian coordinate
X = r cosθ sinΦ
Y = r sinθ sinΦ
Z = rcosΦ
we are using the reverse computation to compute spherical coordinate from cartesian coordinate which is defined as
r = √(x^2+y^2+z^2 )
θ = atan⁡(Y./X)
Φ = atan⁡(√(X^2+Y^2 )./Z)
The problem arises when Y and X are zero so θ can take any arbitrary value so during Matlab computations this results in NAN(not a number ) which makes θ discontinuous. Is there any interpolation technique to remove this discontinuity and how to interpret θ in this case.
θ is a matrix at various point and it gives following result it has jumps and black patched that represent discontinuity whereas I need to generate the following image with smooth variation. Please see the obtained theta and correct theta variation by clicking on the link and suggest some changes.
Discontinuous_Theta_variation
Correct Theta variation
While doing conversion from Cartesian to Spherical coordinate system, however the formulas which are written here are correct but you first need to understand their physical significance.
'r' is the distance of the point from origin. θ is the angle from the positive x axis to the line which is made by projecting the given point to XY plane. And Φ being the angle from positive z-axis to the line which joins origin and given point.
http://www.learningaboutelectronics.com/Articles/Cartesian-rectangular-to-spherical-coordinate-converter-calculator.php#answer
So say, for a point which has X and Y coordinates as 0, that means it lies on z axis and hence, its projection on XY plane lies on the origin. So we cannot exactly determine the angle of origin from X axis. But please note that, since the point lies on Z axis, so Φ=0 or pi (depending whether Z is positive or negative).
So while coding this problem, you may adapt this approach that you first check for Φ, if it is 0 or pi then theta = 0 (by default).
I hope this serves the purpose.

Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]
First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.
To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

How to transform an image based on the position of camera

I'm trying to create a perspective projection of an image based on the look direction. I'm unexperienced on this field and can't manage to do that myself, however. Will you help me, please?
There is an image and an observer (camera). If camera can be considered an object on an invisible sphere and the image a plane going through the middle of the sphere, then camera position can be expressed as:
x = d cos(θ) cos(φ)
y = d sin(θ)
z = d sin(φ) cos(θ)
Where θ is latitude, φ is longitude and d is the distance (radius) from the middle of the sphere where the middle of the image is.
I found these formulae somwhere, but I'm not sure about the coordinates (I don't know but it looks to me that x should be z but I guess it depends on the coordinate system).
Now, what I need to do is make a proper transformation of my image so it looks as if viewed from the camera (in a proper perspective). Would you be so kind to tell me a few words how this could be done? What steps should I take?
I'm developing an iOS app and I thought I could use the following method from the QuartzCore. But I have no idea what angle I should pass to this method and how to derive the new x, y, z coordinates from the camera position.
CATransform3D CATransform3DRotate (CATransform3D t, CGFloat angle,
CGFloat x, CGFloat y, CGFloat z)
So far I have successfully created a simple viewing perspective by:
using an identity matrix (as the CATransform3D parameter) with .m34 set to 1/-1000,
rotating my image by the angle of φ with the (0, 1, 0) vector,
concatenating the result with a rotation by θ and the (1, 0, 0) vector,
scaling based on the d is ignored (I scale the image based on some other criteria).
But the result I got was not what I wanted (which was obvious) :-/. The perspective looks realistic as long as one of these two angles is close to 0. Therefore I thought there could be a way to calculate somehow a proper angle and the x, y and z coordinates to achieve a proper transformation (which might be wrong because it's just my guess).
I think I managed to find a solution, but unfortunately based on my own calculations, thoughts and experiments, so I have no idea if it is correct. Seems to be OK, but you know...
So if the coordinate system is like this:
and the plane of the image to be transformed goes through the X and the Y axis, and its centre is in the origin of the system, then the following coordinates:
x = d sin(φ) cos(θ)
y = d sin(θ)
z = d cos(θ) cos(φ)
define a vector that starts in the origin of the coordinate system and points to the position of the camera that is observing the image. The d can be set to 1 so we get a unit vector at once without further normalization. Theta is the angle in the ZY plane and phi is the angle in the ZX plane. Theta raises from 0° to 90° from the Z+ to the Y+ axis, whereas phi raises from 0° to 90° from the Z+ to the X+ axis (and to -90° in the opposite direction, in both cases).
Hence the transformation vector is:
x1 = -y / z
y1 = -x / z
z1 = 0.
I'm not sure about z1 = 0, however rotation around the Z axis seemed wrong to me.
The last thing to calculate is the angle by which the image has to be transformed. In my humble opinion this should be the angle between the vector that points to the camera (x, y, z) and the vector normal to the image, which is the Z axis (0, 0, 1).
The dot product of two vectors gives the cosine of the angle between them, so the angle is:
α = arccos(x * 0 + y * 0 + z * 1) = arccos(z).
Therefore the alpha angle and the x1, y1, z1 coordinates are the parameters of CATransform3DRotate method I mentioned in my question.
I would be grateful if somebody could tell me if this approach is correct. Thanks a lot!

Resources