Given a set of points on an image, I want to detect groups of aligned points as shown in the figure:
How can I do this? Any help will be appreciated.
This is a good potential application of the Hough Transform. The Hough space for lines is (r, \theta) where r is the distance from origin to closest point on line and \theta is its orientation.
Each point in x-y space becomes a sinusoid in Hough space as shown in the Wiki article.
The places where all the sinusoids intersect corresponds to a single line that passes through all the points. If the points are not perfectly colinear, the intersection will be "fuzzy".
The simplest algorithm to fit lines to points is to make a rectangular (r, \theta) accumulator array set to zero initially. Then trace a sinusoid for each point into this discrete (r, \theta) space, incrementing each accumulator element by a fixed amount. Find prospective line fits by looking for large array elements. The element coordinates give (r, \theta) for the fit.
Tracing the sinusoid is straightforward. If you have T accumulator bins on the \theta axis then each corresponds to an angle k(\pi)/N for some 0 <= k < T. So for k in this range, calculate the distance from the origin to the closest point of a line with this orientation passing through the point. This provides an r value. If there are R bins on the R axis and Rmax is the maximum value of r, then increment bin (floor(r/rMax*R), k).
As a start, you can try this:
List all lines that can be formed by selecting any two of these points (n(n-1)/2 ones for n points).
For any two of these lines, check if they are aligned (i.e. slope diff within say 10 degrees).
For each aligned pair lines, you can easily check whether other points are also aligned on these lines. And these points will be the aligned points you need.
Related
I am trying to find corners of a square, potentially rotated shape, to determine the direction of its primary axes (horizontal and vertical) and be able to do a perspective transform (straighten it out).
From a prior processing stage I obtain the coordinates of a point (red dot in image) belonging to the shape. Next I do a flood-fill of the shape on a thresholded version of the image to determine its center (not shown) and area, by summing up X and Y of all filled pixels and dividing them by the area (number of pixels filled).
Given this information, what is an easy and reliable way to determine the corners of the shape (blue arrows)?
I was thinking about keeping track of P1, P2, P3, P4 where P1 is (minX, minY), P2 is (minX, maxY), P3 (maxY, minY) and P4 (maxY, maxY), so P1 is the point with the smallest value of X encountered, and of all those P, the one where Y is smallest too. Then sort them to get a clock-wise ordering. But I'm not sure if this is correct in all cases and efficient.
PS: I can't use OpenCV.
Looking your image, direction of 2 axes of the 2D pattern coordinate system will be able to be estimated from histogram of gradient direction.
When creating such histogram, 4 peeks will be found clearly.
If the image captured from front (image without perspective, your image looks like this case), Ideally, the angles between adjacent peaks are all 90 degrees.
directions of 2 axes of the pattern coordinate system will be directly estimated from those peaks.
After that, 4 corners can be simply estimated from "Axis aligned bounding box" (along the estimated axis, of course).
If not (when image is a picture with perspective), 4 peaks indicates which edge line is along the axis of the pattern coordinates.
So, for example, you can estimate corner location as intersection of 2 lines that along edge.
What I eventually ended up doing is the following:
Trace the edges of the contour using Moore-Neighbour Tracing --> this gives me a sequence of points lying on the border of rectangle.
During the trace, I observe changes in rectangular distance between the first and last points in a sliding window. The idea is inspired by the paper "The outline corner filter" by C. A. Malcolm (https://spie.org/Publications/Proceedings/Paper/10.1117/12.939248?SSO=1).
This is giving me accurate results for low computational overhead and little space.
In camera imaging, there are several terms for point coordinates.
World coordinates: [X, Y, Z] in physical unit
Image coordinates: [u, v] in pixel.
Do these coordinates become homogeneous coordinates by appending with a 1?
Sometimes in books and paper it is represented by [x, y w]. When is w is used? When is 1 used?
In the function initUndistortRectifyMap, http://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#void%20initUndistortRectifyMap(InputArray%20cameraMatrix,%20InputArray%20distCoeffs,%20InputArray%20R,%20InputArray%20newCameraMatrix,%20Size%20size,%20int%20m1type,%20OutputArray%20map1,%20OutputArray%20map2)
the following process is applied
Is there one term for the coordinates [x y 1]?
I don't understand why R can be applied to [x y 1]? In my view, R is the transformation in 3D. Is [x y 1] is one 2d point or one 3d point?
[u v]->[x y]->[x y 1]->[X Y W]->[x' y']
The coordinates are processed according to the above chain. What is the principle behind it?
In 2-D perspective geometry, there are two main sets of coordinates; Cartesian coordinates (x,y) and homogeneous coordinates which are represented by a triple (x,y,z). This triple can be confusing---it's not a point in three dimensions like the Cartesian (x,y,z). Because of this, some authors use a different notation for homogeneous points, like [x,y,z] or (x:y:z), and this notation makes more sense for reasons we'll get into later.
The third coordinate exists for one purpose only, and that is to add some points to the domain, namely, points at infinity. For the double (x,y), there is no way to represent infinity, at least not with numbers and in ways that we can manipulate easily. But this is a problem for computer graphics since parallel lines are of course very prevalent, and an axiom of Euclidean geometry is that parallel lines meet at infinity. And parallel lines are important as the transformations that are used in computer graphics are line preserving. When we distort points with a homography or affine transformation, we move pixels in a way that maps lines to other lines. If those lines happen to be parallel like they would be in a Euclidean or affine transformation, the coordinate system we use needs to be able to represent that.
So we use homogeneous coordinates (x,y,z) for the sole purpose of including those points at infinity, which are represented by the triple (x,y,0). And since we can put a zero in this place for every Cartesian pair, it's like we have a point at infinity in every single direction (where the direction is given by the angle to that point).
But then, since we have the third value, which can be also any other number other than zero, what are all these additional points? What is the difference between (x,y,2) and (x,y,3) and so on? If the points (x,y,2) and (x,y,3) aren't points at infinity, they better be equal to some other Cartesian points. And luckily, there's a really simple way to map all these homogeneous triples to Cartesian pairs in a way that's nice: simply divide by the third coordinate. Then (x,y,3) gets mapped back into the Cartesian (x/3, y/3), and mapping (x,y,0) to Cartesian is undefined---which is perfect since that point at infinity doesn't exist in Cartesian coordinates.
Because of this scaling factor, that means that homogeneous coordinates can be represented an infinite number of ways. You can map the Cartesian point (x,y) to (x,y,1) in homogeneous coordinates, but you can also map (x,y) to (2x, 2y, 2). Note that if we divide by the third coordinate to go back to Cartesian coordinates, we end up with the same starting point. And that is true in general when you multiply by any non-zero scalar. So the idea is Cartesian coordinates are represented uniquely by a single pair of values, whereas homogeneous coordinates can be represented an infinite amount of ways. This is why some authors use [x,y,z] or (x:y:z). The square bracket is often used in mathematics to define an equivalence relation, and for homogeneous coordinates, [x,y,z]~[sx,sy,sz] for non-zero s. And similarly, : is usually used as a ratio, so the ratio of the three points will be equivalent with any scalar s multiplying them. So whenever you want to transform from homogeneous coordinates to Cartesian, simply divide by the last number as it acts like a scaling factor, and then just pull off the (x,y) values. See my answer here for example.
So the simple way to move into homogeneous coordinates is to append a 1, but really, you could append a 1 and then multiply by any scalar; you wouldn't change anything. You could map (x,y) to (5x,5y,5), apply your transformation (sx',sy',s) = H * (5x,5y,5), and then obtain your Cartesian points as (sx',sy')/s = (x',y') all the same.
Given an edge points of object. Let us say Obj = (xi,yi); i=1,2,3,....
How can we know if these edge points represent an ellipse or not?
As long as you have more than 2 points you could try linear fitting by using least squares:
See here:https://math.stackexchange.com/a/153150/104118
See section 7 Fitting an Ellipse to 2D Points in the actual link: http://www.geometrictools.com/Documentation/LeastSquaresFitting.pdf
Off the top of my head, I would calculate the axis with minimal variance (call it a) and the axis with maximal variance (call it b).
I would check that those axes are reasonably close to being perpendicular - if not then it's probably not an ellipse. If they are close to being perpendicular, I would rotate the point cloud so that a and are aligned with the x- and y-axes.
Next step would be to translate the point cloud so its center is at (0,0) and then check that each translated point lie close to the perimeter of an ellipse with axes a and b by putting each of the points into the equation of the ellipse and checking that the value is close to 0.
This is all based on me reading "edge points" as just looking at the points used by edges. If the edges themselves are to be involved, you would have to check that the edges go "around the clock" as well.
Well I know this was loose... hope it made sense somehow :-).
Now I have a set of contour points. I have ray L which starts at Pn and has an angle of ALPHA clockwise to the horizontal axis. I want to calculate the length of line which starts at Pn and ends at the point that ray L intersects with the contour, in this case is one point between Pn-2 and Pn-3. So how can I efficently and fast calculate this length?
No algorithm can solve this in faster than linear time, since the number of intersections may be linear, and so is the size of the output. I can suggest the following algorithm, which is quite convenient and efficient to implement:
transfer the points to a coordinate system x',y' whose center is Pn and x' is parallel to L. (In practice only the y' coordinate needs to be calculated. This requires 2 multiplication and 2 additions per point).
now find all the intersecting segments by searching for adjacent indices where the y' coordinates changes signs.
Calculate the intersection & length only for these segments
You could just compute the intersection of ray L with all line segments consisting of any pair of neighbouring contour points.
Of course you might want to optimize this process by sorting by distance to Pn or whatever. Depending on the countour (concave shape?) there could be multiple intersections, so you have to choose the right one (inner, outer, ...).
Instea of computing the intersection you also could draw the contour and the ray (e.g. using openCV) and find the point of intersection by using logical and.
I'm trying to implement rectangle detection using the Hough transform, based on
this paper.
I programmed it using Matlab, but after the detection of parallel pair lines and orthogonal pairs, I must detect the intersection of these pairs. My question is about the quality of the two line intersection in Hough space.
I found the intersection points by solving four equation systems. Do these intersection points lie in cartesian or polar coordinate space?
For those of you wondering about the paper, it's:
Rectangle Detection based on a Windowed Hough Transform by Cláudio Rosito Jung and Rodrigo Schramm.
Now according to the paper, the intersection points are expressed as polar coordinates, obviously you implementation may be different (the only way to tell is to show us your code).
Assuming you are being consistent with his notation, your peaks should be expressed as:
You must then perform peak paring given by equation (3) in section 4.3 or
where represents the angular threshold corresponding to parallel lines
and is the normalized threshold corresponding to lines of similar length.
The accuracy of the Hough space should be dependent on two main factors.
The accumulator maps onto Hough Space. To loop through the accumulator array requires that the accumulator divide the Hough Space into a discrete grid.
The second factor in accuracy in Linear Hough Space is the location of the origin in the original image. Look for a moment at what happens if you do a sweep of \theta for any given change in \rho. Near the origin, one of these sweeps will cover far less pixels than a sweep out near the edges of the image. This has the consequence that near the edges of the image you need a much higher \rho \theta resolution in your accumulator to achieve the same level of accuracy when transforming back to Cartesian.
The problem with increasing the resolution of course is that you will need more computational power and memory to increase it. Also If you uniformly increase the accumulator resolution you have wasted resolution near the origin where it is not needed.
Some ideas to help with this.
place the origin right at the
center of the image. as opposed to
using the natural bottom left or top
left of an image in code.
try using the closest image you can
get to a square. the more elongated an
image is for a given area the more
pronounced the resolution trap
becomes at the edges
Try dividing your image into 4/9/16
etc different accumulators each with
an origin in the center of that sub-image.
It will require a little overhead to link
the results of each accumulator together
for rectangle detection, but it should help
spread the resolution more evenly.
The ultimate solution would be to increase
the resolution linearly depending on the
distance from the origin. this can be achieved using the
(x-a)^2 + (y-b)^2 = \rho^2
circle equation where
- x,y are the current pixel
- a,b are your chosen origin
- \rho is the radius
once the radius is known adjust your accumulator
resolution accordingly. You will have to keep
track of the center of each \rho \theta bin.
for transforming back to Cartesian
The link to the referenced paper does not work, but if you used the standard hough transform than the four intersection points will be expressed in cartesian coordinates. In fact, the four lines detected with the hough tranform will be expressed using the "normal parametrization":
rho = x cos(theta) + y sin(theta)
so you will have four pairs (rho_i, theta_i) that identifies your four lines. After checking for orthogonality (for example just by comparing the angles theta_i) you solve four equation system each of the form:
rho_j = x cos(theta_j) + y sin(theta_j)
rho_k = x cos(theta_k) + y sin(theta_k)
where x and y are the unknowns that represents the cartesian coordinates of the intersection point.
I am not a mathematician. I am willing to stand corrected...
From Hough 2) ... any line on the xy plane can be described as p = x cos theta + y sin theta. In this representation, p is the normal distance and theta is the normal angle of a straight line, ... In practical applications, the angles theta and distances p are quantized, and we obtain an array C(p, theta).
from CRC standard math tables Analytic Geometry, Polar Coordinates in a Plane section ...
Such an ordered pair of numbers (r, theta) are called polar coordinates of the point p.
Straight lines: let p = distance of line from O, w = counterclockwise angle from OX to the perpendicular through O to the line. Normal form: r cos(theta - w) = p.
From this I conclude that the points lie in polar coordinate space.