Which is the orientation/angle exactly? Between which vectors?
Is it the angle between the first frame and the second frame?
Estimating Optical Flow delivers you a displacement matrix in pixels.
You get the displacement in the x direction and the displacement in y direction.
A person walking from bottom left to top right might give you a flow mean of x=5 and y = -5. As you got a vector, you can calculate the angle using arctan(5/5) = 45 degrees
Related
I set up a stereo-vision system to triangulate 3D point given two 2D points from 2 views (corresponding to the same point.) I have some questions on the interpretability of the results.
So the size of my calibration squares are '25mm' a side, and after triangulating and normalizing the homogeneous coordinates (dividing the array of points by the fourth coordinate), I multiply all of them by 25mm and divide by 10 (to get in cm) to get the actual distance from the camera set up.
For eg - the final coordinates that I got were something like ([-13.29, -5.94, 68.41]) So how do I interpret this? 68.41 is the distance in the z direction, and -5.94 is the position in the y and -13.29 is the position in the x. But what is the origin here? By convention is it the left camera? Or is it the center of the epipolar baseline? I am using OpenCV for reference.
As I understand OpenCV's coordinate system, as in this diagram.
The left camera of a calibrated stereo pair is located at the origin facing the Z direction.
I have a pair of 2464x2056 pixel cameras that I have calibrated (with a stereo rms of around 0.35), computed the disparity on a pair of images and reprojected this to get the 3D pointcloud. However, I've noticed that the Z axis is not in line with the optical centre of the camera.
This does kind of mess with some of the pointcloud manipulation I'm hoping to do- is this expected, or does it indicate that that something has gone wrong along the way?
Below is the point I've generated, plus the axis- the red green and blue lines indicate the x,y and z axes respectively, coming out from the origin.
As you can see, the Z axis intercepts the pointcloud between the head and the post- this corresponds to a pixel coordinate of approximately x = 637, y = 1028 when I fix the principal point during calibration to cx = 1232,y=1028. When I remove the CV_FIX_PRINCIPAL_POINT flag, this is calculated as approximatly cx = 1310, cy=1074, and the Z axis intercepts at around x=310,y=1050.
Compared to the rectified image here where the midpoint x = 1232,y=1028 is marked by a yellow cross, the centre of the image is over the mannequin had, the intersection between the Z axis is significantly off from where I would expect.
Does anyone have any idea as to why this could be occuring? Any help would be greatly appreciated.
I have a problem that has been puzling me for the last few days. I have a camera pose obtained with Opencv that is right handed (X-right, Y-up, Z-back) and I would like to visualize in Unity (X-right, Y-up, Z-forward) but I cannot really manage to get it right.
I tried to do it using either quaternions or matrices and it should be just mirroring the Z axis and set the rotation of the transform of the camera in Unity to the computed transformation however I cannot get the right conversion.
With quaternions I tried to mirror by negating the Z and W term and I achieved a coordinate system (X-right, Y-down, Z-forward), it makes sense but it is not what I want to achieve. With matrices I think I should multiply my right hand camera by an identity matrix with the element [2,2] set to -1, however I don't get what I want.
I am definitely missing something, probably something really stupid I forgot :)
Has anybody a suggestion?
A Quaternion can be thought of as a rotation around an axis a = (ax, ay, az) by an angle theta
qx = ax * sin(theta/2)
qy = ay * sin(theta/2)
qz = az * sin(theta/2)
qw = cos(theta/2)
In a right-handed coordinate system a rotation of theta will be counter-clockwise, while in a left-handed coordinate system a rotation of theta will be clockwise (depending on your point-of-view, of course).
So to get your quaternion from a right-handed system to Unity's Left-Handed system you have to account for two factors:
The Z-Axis is negated
The direction of rotation is flipped from CCW to CW
We first factor is accounted for by negating the qz component of the quaternion. The second factor is accounted for by flipping the axis of rotation (rotating by 90 degrees around 1,0,0 is the inverse of rotating 90 degrees around -1,0,0).
If your original right-handed quaternion is q and your left-handed quaternion is q'That means you end up with:
q'=(-qx, -qy, qz, qw)
Additional Note
Quaternions don't inherently have a handedness. A quaterion q applies equally well in RH or LH coordinate systems. However, when you apply the quaternion to a spatial vector, the resulting transformation takes on the handedness of the vector's space.
Let's say I have a pinhole camera with known intristic values like camera matrix and distortion coefficients. Let's say there is a point in large enough distance from the camera, so we can say it is placed in infinity.
Given image coordinates of this point in pixels, I would like to calculate camera rotation relative to the axis that connects camera and this point (so rotation is 0,0 if camera is directed at this point and it is in the optical center of the image).
How can this be done using opencv?
Many thanks!
You need to specify an additional constraint - rotating the camera from its current pose to one that aligns the optical axis with an arbitrary ray leaves the camera free to rotate about the ray itself (i.e. it leaves the "roll" angle unspecified).
Let's assume that you want the roll to be zero, i.e. that you want the motion to be a pure pan-tilt. This has a unique solution as long as the ray you want to align to is not parallel to the vertical image axis (in which case pan and roll are the same motion).
Then the solution is computed as follows. Let's use the OpenCV camera frame: Z=[0,0,1]' (, where " ' " means transpose) be the camera focal axis, oriented going out of the lens, Y=[0,1,0]' the vertical axis going down, and X = Z x Y (where 'x' is the cross product) the horizontal camera axis going toward the right of the image. So "pan" is a rotation about Y, "tilt" is a rotation about X.
Let U = [u1, u2, u3]', with || u || = 1 be the ray you want to rotate to. You want to apply a pan that brings Z onto the plane Puy defined by the vectors u and Y, then apply a tilt that brings Z onto u.
The angle of the first rotation is (angle between Z and Puy) = [90 deg - (angle between Z and Y x U)]. this is because Y x U is orthogonal to Puy. Look up the expressions for computing the angle between vectors on Wikipedia or elsewhere online. Once you have the angle (or its cosine and sine), the rotation about Y can be expressed as a standard rotation matrix Ry.
The angle of the second rotation, about X after once Z is onto Puy, is the angle between vector Z and U after Ry is applied to Z, or equivalently, between Z and inv(Ry) * U. Compute the angle between the vector, and use to build a standard rotation matrix about X, Rx
The final transformation is then Rx * Ry.
I'm trying to implement rectangle detection using the Hough transform, based on
this paper.
I programmed it using Matlab, but after the detection of parallel pair lines and orthogonal pairs, I must detect the intersection of these pairs. My question is about the quality of the two line intersection in Hough space.
I found the intersection points by solving four equation systems. Do these intersection points lie in cartesian or polar coordinate space?
For those of you wondering about the paper, it's:
Rectangle Detection based on a Windowed Hough Transform by Cláudio Rosito Jung and Rodrigo Schramm.
Now according to the paper, the intersection points are expressed as polar coordinates, obviously you implementation may be different (the only way to tell is to show us your code).
Assuming you are being consistent with his notation, your peaks should be expressed as:
You must then perform peak paring given by equation (3) in section 4.3 or
where represents the angular threshold corresponding to parallel lines
and is the normalized threshold corresponding to lines of similar length.
The accuracy of the Hough space should be dependent on two main factors.
The accumulator maps onto Hough Space. To loop through the accumulator array requires that the accumulator divide the Hough Space into a discrete grid.
The second factor in accuracy in Linear Hough Space is the location of the origin in the original image. Look for a moment at what happens if you do a sweep of \theta for any given change in \rho. Near the origin, one of these sweeps will cover far less pixels than a sweep out near the edges of the image. This has the consequence that near the edges of the image you need a much higher \rho \theta resolution in your accumulator to achieve the same level of accuracy when transforming back to Cartesian.
The problem with increasing the resolution of course is that you will need more computational power and memory to increase it. Also If you uniformly increase the accumulator resolution you have wasted resolution near the origin where it is not needed.
Some ideas to help with this.
place the origin right at the
center of the image. as opposed to
using the natural bottom left or top
left of an image in code.
try using the closest image you can
get to a square. the more elongated an
image is for a given area the more
pronounced the resolution trap
becomes at the edges
Try dividing your image into 4/9/16
etc different accumulators each with
an origin in the center of that sub-image.
It will require a little overhead to link
the results of each accumulator together
for rectangle detection, but it should help
spread the resolution more evenly.
The ultimate solution would be to increase
the resolution linearly depending on the
distance from the origin. this can be achieved using the
(x-a)^2 + (y-b)^2 = \rho^2
circle equation where
- x,y are the current pixel
- a,b are your chosen origin
- \rho is the radius
once the radius is known adjust your accumulator
resolution accordingly. You will have to keep
track of the center of each \rho \theta bin.
for transforming back to Cartesian
The link to the referenced paper does not work, but if you used the standard hough transform than the four intersection points will be expressed in cartesian coordinates. In fact, the four lines detected with the hough tranform will be expressed using the "normal parametrization":
rho = x cos(theta) + y sin(theta)
so you will have four pairs (rho_i, theta_i) that identifies your four lines. After checking for orthogonality (for example just by comparing the angles theta_i) you solve four equation system each of the form:
rho_j = x cos(theta_j) + y sin(theta_j)
rho_k = x cos(theta_k) + y sin(theta_k)
where x and y are the unknowns that represents the cartesian coordinates of the intersection point.
I am not a mathematician. I am willing to stand corrected...
From Hough 2) ... any line on the xy plane can be described as p = x cos theta + y sin theta. In this representation, p is the normal distance and theta is the normal angle of a straight line, ... In practical applications, the angles theta and distances p are quantized, and we obtain an array C(p, theta).
from CRC standard math tables Analytic Geometry, Polar Coordinates in a Plane section ...
Such an ordered pair of numbers (r, theta) are called polar coordinates of the point p.
Straight lines: let p = distance of line from O, w = counterclockwise angle from OX to the perpendicular through O to the line. Normal form: r cos(theta - w) = p.
From this I conclude that the points lie in polar coordinate space.