Explanation of rho and theta parameters in HoughLines - opencv

Can you give me a quick definition of rho and theta parameters in OpenCV's HoughLines function
void cv::HoughLines ( InputArray image,
OutputArray lines,
double rho,
double theta,
int threshold,
double srn = 0,
double stn = 0,
double min_theta = 0,
double max_theta = CV_PI
)
The only thing I found in the doc is:
rho: Distance resolution of the accumulator in pixels.
theta: Angle resolution of the accumulator in radians.
Do this mean that if I set rho=2 then 1/2 of my image's pixels will be ignored ... a kind of stride=2 ?

I have searched for this for hours and still haven't found a place where it is neatly explained. But picking up the pieces, I think I got it.
The algorithm goes over every edge pixel (result of Canny, for example) and calculates ρ using the equation ρ = x * cosθ + y * sinθ, for many values of θ.
The actual step of θ is defined by the function parameter, so if you use the usual math.pi / 180.0 value of theta, the algorithm will compute ρ 180 times in total for just one edge pixel in the image. If you would use a larger theta, there would be fewer calculations, fewer accumulator columns/buckets and therefore fewer lines found.
The other parameter ρ defines how "fat" a row of the accumulator is. With a value of 1, you are saying that you want the number of accumulator rows to be equal to the biggest ρ possible, which is the diagonal of the image you're processing. So if for some two values of θ you get close values for ρ, they will still go into separate accumulator buckets because you are going for precision. For a larger value of the parameter rho, those two values might end up in the same bucket, which will ultimately give you more lines because more buckets will have a large vote count and therefore exceed the threshold.
Some helpful resources:
http://docs.opencv.org/3.1.0/d6/d10/tutorial_py_houghlines.html
https://www.mathworks.com/help/vision/ref/houghtransform.html
https://www.youtube.com/watch?v=2oGYGXJfjzw

To detect lines with Hough Transform, the best way is to represents lines with an equation of two parameters rho and theta as shown on this image. The equation is the following :
x cos⁡(θ)+y sin⁡(θ)=ρ
where (x,y) are line parameters.
This writing in (θ,ρ) parameters allow the detection to be less position-depending than a writing as y=a*x+b
(θ,ρ) in this context give the discretization for these two parameters

Related

Hough Transform Accumulator to Cartesian

I'm studying a course on vision systems and one of the questions posed was;
For the accumulator shown;
Determine the most likely r,θ combination representing the straight line of the greatest strength in the original image.
From my understanding of the accumulator this would be r = 60, θ = 150 as the 41 votes is the highest number of votes in this cluster of large votes. Am I correct with this combination?
And hence calculate the equation of this line in the form y = mx + c
I'm not sure of the conversion steps required to convert the r = 60, θ = 150 to y = mx + c with the information given since r = 60, θ = 150 denotes 1 point on the line.
State the resolution of your answer and give your reasoning
I assume the resolution is got to do with some of the steps in the auscultation and not the actual resolution of the original image since that's irrelevant to the edges detected in the image.
Any guidance on the above 3 points would be greatly appreciated!
Yes, this is correct.
This is asking you what the slope and intercept are of the line given r and theta. r and theta are not one point on the line, they are one point of the accumulator. r and theta describe a line using the line equation in polar coordinates: . This is the cool thing about the hough transform, every line in one space, (i.e. image space) can be described by a point in another space (r, theta). This could be done with m and b from the line equation , but as we all know, m is undefined for vertical lines. This is the reason the polar line equation is used. It is important to note that the line described by the HT r and theta refers to a line from the origin extending to the actual line in the image. This means your image line y = mx + b equation will need to be orthogonal to the polar equation. The wiki article on the HT describes this well and shows examples. I would recommend drawing a diagram of your r and theta extending to a line like this:
Then use trig to get two points on the red line. Two points are enough to give you m and b from the line equation.
I'm not entirely sure what "resolution" refers to in this context. But it does seem like your line estimator will have some precision loss since r is every 20 mm and theta is every 15 degrees. Perhaps it is asking what degree of error you could get given an accumulator of this resolution.

Feature Scaling with Octave

I want to do feature scaling datasets by using means and standard deviations, and my code is below; but apparently it is not a univerisal code, since it seems only work with one dataset. Thus I am wondering what is wrong with my code, any help will be appreciated! Thanks!
X is the dataset I am currently using.
mu = mean(X);
sigma = std(X);
m = size(X, 1);
mu_matrix = ones(m, 1) * mu;
sigma_matrix = ones(m, 1) * sigma;
featureNormalize = (X-mu_matrix)/sigma;
Thank you for clarifying what you think the code should be doing in the comments.
My answer will effectively answer why what you think is happening is not what is happening.
First let's talk about the mean and std functions. When their input is a vector (whether this is vertically or horizontally aligned), then this will return a single number which is the mean or standard deviation of that vector respectively, as you might expect.
However, when the input is a matrix, then you need to know what it does differently. Unless you specify the direction (dimension) in which you should be calculating means / std, then it will calculate means along the rows, i.e. returning a single number for each column. Therefore, the end-result of this operation will be a horizontal vector.
Therefore, both mu and sigma will be horizontal vectors in your code.
Now let's move on to the 'matrix multiplication' operator (i.e. *).
When using the matrix multiplication operator, if you multiply a horizontal vector with a vertical vector (i.e. the usual matrix multiplication operation), your output is a single number (i.e. a scalar). However, if you reverse the orientations, as in, you multiply a vertical vector by a horizontal one, you will in fact be calculating a 'Kronecker product' instead. Since the output of the * operation is completely defined by the rows of the first input, and the columns of the second input, whether you're getting a matrix multiplication or a kronecker product is implicit and entirely dependent on the orientation of your inputs.
Therefore, in your case, the line mu_matrix = ones(m, 1) * mu; is not in fact appending a vector of ones, like you say. It is in fact performing the kronecker product between a vertical vector of ones, and the horizontal vector that is your mu, effectively creating an m-by-n matrix with mu repeated vertically for m rows.
Therefore, at the end of this operation, as the variable naming would suggest, mu_matrix is in fact a matrix (same with sigma_matrix), having the same size as X.
Your final step is X- mu_sigma, which gives you at each element, the difference between x and mu at that element. Then you "divide" with the sigma matrix.
Here is why I asked if you were sure you should be using ./ instead of /.
/ is the matrix division operator. With / You are effectively performing matrix multiplication by an inverse matrix, since D / S is mathematically equivalent to D * inv(S). It seems to me you should be using ./ instead, to simply divide each element by the standard deviation of that column (which is why you had to repeat the horizontal vector over m rows in sigma_matrix, so that you could use it for 'elementwise division'), since what you are trying to do is to normalise each row (i.e. observation) of a particular column, by the standard deviation that is specific to that column (i.e. feature).

OpenCV HoughLinesP threshold parameter

C++: void HoughLinesP(InputArray image, OutputArray lines, double rho, double theta, int threshold, double minLineLength=0, double maxLineGap=0 )
I have difficulties understanding the parameter in below. Could some explain it like one would for dummies type thing?.
threshold – Accumulator threshold parameter. Only those lines are returned that get enough votes ( >\texttt{threshold} ).
Seems you have not understood the Hough Algorithm. I hope a intro or short description on hough line detection algo below will surely help you.
Ref: Wiki,Tutorial.
Consider a edge (sobel or canny's output) image.
For each edge point (Xi,Yi) in the image calculate: Pi = XicosT+YiSinT
Increment the accumulator at A(Pi,T) = A(Pi,T) + 1.
where, Pi - Distance and T (theta) - Angle. Theta ranges from 0-360.
A(Pi,T) is referred as hough space. By finding the accumulator bins with the highest values, typically by looking for local maxima in the accumulator space, the most likely lines can be extracted. Usually finding highest values are done by a parameter threshold.
Try changing the threshold values, you will find significant change in the line detection.

Why is weight vector orthogonal to decision plane in neural networks

I am beginner in neural networks. I am learning about perceptrons.
My question is Why is weight vector perpendicular to decision boundary(Hyperplane)?
I referred many books but all are mentioning that weight vector is perpendicular to decision boundary but none are saying why?
Can anyone give me an explanation or reference to a book?
The weights are simply the coefficients that define a separating plane. For the moment, forget about neurons and just consider the geometric definition of a plane in N dimensions:
w1*x1 + w2*x2 + ... + wN*xN - w0 = 0
You can also think of this as being a dot product:
w*x - w0 = 0
where w and x are both length-N vectors. This equation holds for all points on the plane. Recall that we can multiply the above equation by a constant and it still holds so we can define the constants such that the vector w has unit length. Now, take out a piece of paper and draw your x-y axes (x1 and x2 in the above equations). Next, draw a line (a plane in 2D) somewhere near the origin. w0 is simply the perpendicular distance from the origin to the plane and w is the unit vector that points from the origin along that perpendicular. If you now draw a vector from the origin to any point on the plane, the dot product of that vector with the unit vector w will always be equal to w0 so the equation above holds, right? This is simply the geometric definition of a plane: a unit vector defining the perpendicular to the plane (w) and the distance (w0) from the origin to the plane.
Now our neuron is simply representing the same plane as described above but we just describe the variables a little differently. We'll call the components of x our "inputs", the components of w our "weights", and we'll call the distance w0 a bias. That's all there is to it.
Getting a little beyond your actual question, we don't really care about points on the plane. We really want to know which side of the plane a point falls on. While w*x - w0 is exactly zero on the plane, it will have positive values for points on one side of the plane and negative values for points on the other side. That's where the neuron's activation function comes in but that's beyond your actual question.
Intuitively, in a binary problem the weight vector points in the direction of the '1'-class, while the '0'-class is found when pointing away from the weight vector. The decision boundary should thus be drawn perpendicular to the weight vector.
See the image for a simplified example: You have a neural network with only 1 input which thus has 1 weight. If the weight is -1 (the blue vector), then all negative inputs will become positive, so the whole negative spectrum will be assigned to the '1'-class, while the positive spectrum will be the '0'-class. The decision boundary in a 2-axis plane is thus a vertical line through the origin (the red line). Simply said it is the line perpendicular to the weight vector.
Lets go through this example with a few values. The output of the perceptron is class 1 if the sum of all inputs * weights is larger than 0 (the default threshold), otherwise if the output is smaller than the threshold of 0 then the class is 0. Your input has value 1. The weight applied to this single input is -1, so 1 * -1 = -1 which is less than 0. The input is thus assigned class 0 (NOTE: class 0 and class 1 could have just been called class A or class B, don't confuse them with the input and weight values). Conversely, if the input is -1, then input * weight is -1 * -1 = 1, which is larger than 0, so the input is assigned to class 1. If you try every input value then you will see that all the negative values in this example have an output larger than 0, so all of them belong to class 1. All positive values will have an output of smaller than 0 and therefore will be classified as class 0. Draw the line which separates all positive and negative input values (the red line) and you will see that this line is perpendicular to the weight vector.
Also note that the weight vector is only used to modify the inputs to fit the wanted output. What would happen without a weight vector? An input of 1, would result in an output of 1, which is larger than the threshold of 0, so the class is '1'.
The second image on this page shows a perceptron with 2 inputs and a bias. The first input has the same weight as my example, while the second input has a weight of 1. The corresponding weight vector together with the decision boundary are thus changed as seen in the image. Also the decision boundary has been translated to the right due to an added bias of 1.
Here is a viewpoint from a more fundamental linear algebra/calculus standpoint:
The general equation of a plane is Ax + By + Cz = D (can be extended for higher dimensions). The normal vector can be extracted from this equation: [A B C]; it is the vector orthogonal to every other vector that lies on the plane.
Now if we have a weight vector [w1 w2 w3], then when do w^T * x >= 0 (to get positive classification) and w^T * x < 0 (to get negative classification). WLOG, we can also do w^T * x >= d. Now, do you see where I am going with this?
The weight vector is the same as the normal vector from the first section. And as we know, this normal vector (and a point) define a plane: which is exactly the decision boundary. Hence, because the normal vector is orthogonal to the plane, then so too is the weight vector orthogonal to the decision boundary.
Start with the simplest form, ax + by = 0, weight vector is [a, b], feature vector is [x, y]
Then y = (-a/b)x is the decision boundary with slope -a/b
The weight vector has slope b/a
If you multiply those two slopes together, result is -1
This proves decision boundary is perpendicular to weight vector
Although the question was asked 2 years ago, I think many students will have the same doubts. I reached this answer because I asked the same question.
Now, just think of X, Y (a Cartesian coordinate system is a coordinate system that specifies each point uniquely in a plane by a pair of numerical coordinates, which are the signed distances from the point to two fixed perpendicular directed lines [from Wikipedia]).
If Y = 3X, in geometry Y is perpendicular to X, then let w = 3, then Y = wX, w = Y/X and if we want to draw the relation between X, w we will have two perpendicular lines just like when we draw the relation between X, Y. So always think of the w-coefficient as perpendicular to X and Y.

How to smooth a cyclic column vector

This is an OpenCV2 question.
I have a matrix representing a closed space curve.
cv::Mat_<Point3f> points;
I want to smooth it (using, for example a Gaussian kernel).
I have tried using:
cv::Mat_<Point3f> result;
cv::GaussianBlur(points, result, cv::Size(4 * sigma, 1), sigma, sigma, cv::BORDER_WRAP);
But I get the error:
Assertion failed (columnBorderType != BORDER_WRAP)
What is the best way to convolve a cyclic vector in OpenCV? ("Best" should take into account space and time requirements.)
I found a way. I repeat the matrix, then blur, then extract a range.
GaussianBlur(repeat(points, 3, 1), ret, cv::Size(0,0), sigma);
int rows = points.rows;
result = Mat(result, Range(rows, 2 * rows - 1), Range::all());
This requires extra work (and extra space?).
Edit: I now manually expand points by copying (wrapping) as many points are required by the kernel. I then crop off the extra points. This is similar to the above, but wastes less space and time.

Resources