I detected lane lines in opencv and calculated their angles (which are shown by read lines in the image), although they look almost the same angle, the angle calculated by the program shows quiet a difference with left line always greater than right.
I am using arctan(slope) to find angles.
Is is due to the fact that y-axis in MAT matrix is inverted?
I am trying to detect the difference in the lane line angles to detect turns and straight road. How can I do achieve my goal? which I can not right now because lines do not have same(but opposite) angle on the straight road.
Below is the image.
Image
The difference of the two angles is not close to zero, because the lines are not parallel in 2D, simple as that. You are comparing angles of 2D lines in the image plane!
What you want to do is to check how close is the sum of the angles to zero, i.e. fabs(angle1 + angle2). You probably also want to check if fabs(angle1) and fabs(angle2) are within a specific range.
Furthermore, you shouldn't use slopes, as the slope of a vertical line is infinity. You probably have 2D direction vectors for each line at some point. Either use atan2(dy, dx) to compute the angle for each line, or you could stick with the direction vectors, in the latter case adding the normalized direction vectors and comparing their angle to the vector (0, 1), which is the vertical line.
Be aware that all this assumes that the camera points into the direction of the (straight) lane.
Related
Say I have the lines shown in the image below, represented in polar coordinate format (rho and theta). These lines are the output of OpenCV's HoughLines function after some post processing. (Sorry I'm not allowed to embed images yet.)
What I want to do is, given any one line, find all of the lines that are perpendicular to that line, as shown in the second image below.
I understand how to do this with Cartesian lines, but I'm having trouble wrapping my mind around what properties of rho and theta the two lines would have to have to be perpendicular, although I understand how polar lines work at least fundamentally. Sorry if this is elementary stuff, but I'm having trouble finding any explanation of this online anywhere. Do I need to first convert the lines to Cartesian coordinates, or is there some simpler way to do this? Any help would be much appreciated, thanks!
To get perpendicular lines in polar coordinates, you simply take the theta for the first line, and find all lines whose theta = +/- 90° of the first theta.
You have to normalize the angles to be within 0°-360° or some other range, when comparing them.
So if line 1 has a theta line1.Theta
Then the angle to another line is a = (line2.Theta - line1.Theta)
and you want all lines where a is close to -90°, 90°, 270°, -270°, ...
depending on how you normalize your angles
let's say I am placing a small object on a flat floor inside a room.
First step: Take a picture of the room floor from a known, static position in the world coordinate system.
Second step: Detect the bottom edge of the object in the image and map the pixel coordinate to the object position in the world coordinate system.
Third step: By using a measuring tape measure the real distance to the object.
I could move the small object, repeat this three steps for every pixel coordinate and create a lookup table (key: pixel coordinate; value: distance). This procedure is accurate enough for my use case. I know that it is problematic if there are multiple objects (an object could cover an other object).
My question: Is there an easier way to create this lookup table? Accidentally changing the camera angle by a few degrees destroys the hard work. ;)
Maybe it is possible to execute the three steps for a few specific pixel coordinates or positions in the world coordinate system and perform some "calibration" to calculate the distances with the computed parameters?
If the floor is flat, its equation is that of a plane, let
a.x + b.y + c.z = 1
in the camera coordinates (the origin is the optical center of the camera, XY forms the focal plane and Z the viewing direction).
Then a ray from the camera center to a point on the image at pixel coordinates (u, v) is given by
(u, v, f).t
where f is the focal length.
The ray hits the plane when
(a.u + b.v + c.f) t = 1,
i.e. at the point
(u, v, f) / (a.u + b.v + c.f)
Finally, the distance from the camera to the point is
p = √(u² + v² + f²) / (a.u + b.v + c.f)
This is the function that you need to tabulate. Assuming that f is known, you can determine the unknown coefficients a, b, c by taking three non-aligned points, measuring the image coordinates (u, v) and the distances, and solving a 3x3 system of linear equations.
From the last equation, you can then estimate the distance for any point of the image.
The focal distance can be measured (in pixels) by looking at a target of known size, at a known distance. By proportionality, the ratio of the distance over the size is f over the length in the image.
Most vision libraries (including opencv) have built in functions that will take a couple points from a camera reference frame and the related points from a Cartesian plane and generate your warp matrix (affine transformation) for you. (some are fancy enough to include non-linearity mappings with enough input points, but that brings you back to your time to calibrate issue)
A final note: most vision libraries use some type of grid to calibrate off of ie a checkerboard patter. If you wrote your calibration to work off of such a sheet, then you would only need to measure distances to 1 target object as the transformations would be calculated by the sheet and the target would just provide the world offsets.
I believe what you are after is called a Projective Transformation. The link below should guide you through exactly what you need.
Demonstration of calculating a projective transformation with proper math typesetting on the Math SE.
Although you can solve this by hand and write that into your code... I strongly recommend using a matrix math library or even writing your own matrix math functions prior to resorting to hand calculating the equations as you will have to solve them symbolically to turn it into code and that will be very expansive and prone to miscalculation.
Here are just a few tips that may help you with clarification (applying it to your problem):
-Your A matrix (source) is built from the 4 xy points in your camera image (pixel locations).
-Your B matrix (destination) is built from your measurements in in the real world.
-For fast recalibration, I suggest marking points on the ground to be able to quickly place the cube at the 4 locations (and subsequently get the altered pixel locations in the camera) without having to remeasure.
-You will only have to do steps 1-5 (once) during calibration, after that whenever you want to know the position of something just get the coordinates in your image and run them through step 6 and step 7.
-You will want your calibration points to be as far away from eachother as possible (within reason, as at extreme distances in a vanishing point situation, you start rapidly losing pixel density and therefore source image accuracy). Make sure that no 3 points are colinear (simply put, make your 4 points approximately square at almost the full span of your camera fov in the real world)
ps I apologize for not writing this out here, but they have fancy math editing and it looks way cleaner!
Final steps to applying this method to this situation:
In order to perform this calibration, you will have to set a global home position (likely easiest to do this arbitrarily on the floor and measure your camera position relative to that point). From this position, you will need to measure your object's distance from this position in both x and y coordinates on the floor. Although a more tightly packed calibration set will give you more error, the easiest solution for this may simply be to have a dimension-ed sheet(I am thinking piece of printer paper or a large board or something). The reason that this will be easier is that it will have built in axes (ie the two sides will be orthogonal and you will just use the four corners of the object and used canned distances in your calibration). EX: for a piece of paper your points would be (0,0), (0,8.5), (11,8.5), (11,0)
So using those points and the pixels you get will create your transform matrix, but that still just gives you a global x,y position on axes that may be hard to measure on (they may be skew depending on how you measured/ calibrated). So you will need to calculate your camera offset:
object in real world coords (from steps above): x1, y1
camera coords (Xc, Yc)
dist = sqrt( pow(x1-Xc,2) + pow(y1-Yc,2) )
If it is too cumbersome to try to measure the position of the camera from global origin by hand, you can instead measure the distance to 2 different points and feed those values into the above equation to calculate your camera offset, which you will then store and use anytime you want to get final distance.
As already mentioned in the previous answers you'll need a projective transformation or simply a homography. However, I'll consider it from a more practical view and will try to summarize it short and simple.
So, given the proper homography you can warp your picture of a plane such that it looks like you took it from above (like here). Even simpler you can transform a pixel coordinate of your image to world coordinates of the plane (the same is done during the warping for each pixel).
A homography is basically a 3x3 matrix and you transform a coordinate by multiplying it with the matrix. You may now think, wait 3x3 matrix and 2D coordinates: You'll need to use homogeneous coordinates.
However, most frameworks and libraries will do this handling for you. What you need to do is finding (at least) four points (x/y-coordinates) on your world plane/floor (preferably the corners of a rectangle, aligned with your desired world coordinate system), take a picture of them, measure the pixel coordinates and pass both to the "find-homography-function" of your desired computer vision or math library.
In OpenCV that would be findHomography, here an example (the method perspectiveTransform then performs the actual transformation).
In Matlab you can use something from here. Make sure you are using a projective transformation as transform type. The result is a projective tform, which can be used in combination with this method, in order to transform your points from one coordinate system to another.
In order to transform into the other direction you just have to invert your homography and use the result instead.
I'm working on an object tracking application using openCV. I want to convert my pixel coordinates to world coordinates to get more meaningful information. I have read a lot about computing the perspective transform matrix, and I know about cv2.solvePnP. But I feel like my case should be special, because I'm tracking a runner on a track and field runway with the runway orthogonal to the camera's z-axis. I will set up the camera to ensure this.
If I just pick two points on the runway edge, I can calculate a linear conversion from pixels to world coords at that specific height (ground level) and distance from the camera (i.e. along that line). Then I reason that the runner will run on a line parallel to the runway at a different height and slightly different distance from the camera, but the lines should still be parallel in the image, because they will both be orthogonal to the camera z-axis. With all those constraints, I feel like I shouldn't need the normal number of points to track the runner on that particular axis. My gut says that 2-3 should be enough. Can anyone help me nail down the method here? Am I completely off track? With both height and distance from camera essentially fixed, shouldn't I be able to work with a much smaller set of correspondences?
Thanks, Bill
So, I think I've answered this one myself. It's true that only two correspondence points are needed given the following assumptions.
Assume:
World coordinates are set up with X-axis and Y-axis parallel to the ground plane. X-axis is parallel to the runway.
Camera is translated and possibly rotated about X-axis (angled downward), but no rotation around Y-axis(camera plane parallel to runway and x-axis) or Z-axis (camera is level with respect to ground).
Camera intrinsic parameters are known from camera calibration.
Method:
Pick two points in the ground plane with known coordinates in world and image. For example, two points on the runway edge as mentioned in original post. The line connecting the poitns in world coordinates should not be parallel with either X or Z axis.
Since Y=0 for these points, ignore the second column of the rotation/translation matrix, reducing the projection to a planar homography transform (3x3 matrix). Now we have 9 degrees of freedom.
The rotation assumptions will enforce a certain form on the rotation/translation matrix. Namely, the first column and first row will be the identity (1,0,0). This further reduces the number of degrees of freedom in the matrix to 5.
Constrain the values of the second column of the matrix such that cos^2(theta)+sin^2(theta) = 1. This reduces the number of unknowns to only 4. Two correspondence points will give us the 4 equations we need to calculate the homography matrix for the ground plane.
Factor out the camera intrinsic parameter matrix from the homography matrix, leaving the rotation/translation matrix for the ground plane.
Due to the rotation assumptions made earlier, the ignored column of the rotation/translation matrix can be easily constructed from the third column of the same matrix, which is the second column in the ground plane homography matrix.
Multiply back out with the camera intrinsic parameters to arrive at the final universal projection matrix (from only 2 correspondence points!)
My test implentation has worked quite well. Of course, it's sensitive to the accuracy of the two correspondence points provided, but that's kind of a given.
So I have a function called
function AngleToVector(speed,xAngle,yAngle,zAngle)
local angle = Vector3.new(xAngle,yAngle,zAngle)
--
-- calculations
--
local position = Vector3.new(SOMETHING,SOMETHING,SOMETHING)
return position
end
I decided to rewrite the entire question. In the picture, sqrt(27) is the distance a bullet travels in one second. Assume that I know the 3 angles that dictate where this line is pointing. I'm trying to find the length of the 3, red green and blue, dotted lines using my "speed" scalar, and my 3 angles which show the direction of my scalar.
You state that the final Vector3 that is returned should contain the values that the subject should step in to travel a certain speed. This is actually quite straight forward. The returned Vector3 should be the unit vector of the angle vector multiplied by the speed scalar.
This is more of a Math question than a programming question, and I am not familiar with the library you are using, so I will answer it mathematically. First, divide each component of the angle vector by the magnitude of the angle vector. This gives you a unit vector in that direction (e.g. 1 meter in that direction). Multiply this unit vector by the speed given, and that will give you speed steps in that direction. You can then add this new vector to the current position for the object to move.
I recommend doing some reading on Vectors and Matrices.
velocityVector = CFrame.Angles(xAngle,yAngle,zAngle).lookVector*speed
Is probably what you want. CFrame.lookVector gives you the unit Vector that Luke was talking about. CFrame makes everything a ton easier when you get used to it.
(This is a follow-up from this previous question).
I was able to successfully use OpenCV / Hough transforms to detect lines in pictures (scanned text); at first it would detect many many lines (at least one line per line of text), but by adjusting the 'threshold' parameter via trial-and-error, it now only detects "real" lines.
(The 'threshold' parameter is dependant on image size, which is a bit of a problem if one has to deal with images of different resolutions, but that's another story).
My problem is that the Hough transform sometimes detects two lines where there is only one; those two lines are very near one another and (apparently) parallel.
=> How can I identify that two lines are almost parallel and very near one another? (so that I can keep only one).
If you use the standard or multiscale hough, you will end up with the rho and theta coordinates of the lines in polar coordinates. Rho is the distance to the origin, and theta is normally the angle between the detected line and the Y axis. Without looking into the details of the hough transform in opencv, this is a general rule in those coordinates: two lines will be almost parallel and very near one another when:
- their thetas are nearly identical AND their rhos are nearly identical
OR
- their thetas are near 180 degrees apart AND their rhos are near each other's negative
I hope that makes sense.
That's interesting about the theta being the angle between the line and the y-axis.
Generally, the rho and theta values are visualized as being the angle from the x-axis to the line perpendicular to the line in question. The rho is then the length of this perpendicular line. Thus, a theta = 90 and rho = 20 would mean a horizontal line 20 pixels up from the origin.
A nice image is shown on Hough Transform question