Problem: I'm trying to align two frames of a moving video.
I'm currently trying to use the function "cvCalcOpticalFlowLK" and the result outputs velocity vectors of x and y in the form of a "CvArr".
So I obtained the result, but i'm not sure how to use these vector arrays.
My question is this... how do i know what is the velocity of each pixel? Is it just the value of each pixel value at that particular point?
Note: I would've used the other optical flow functions such as cvCalcOpticalFlowPyrLK() as it is much easier, but i want the dense optical flow.
Apparently my original assumption was true. The "velx" and "vely" outputs from the optical flow function are the actual velocities for each pixel value. To best extract them, I accessed the pixel from the raw data and pulled the value. There are 2 ways to do this.
cvGet2D() -- this way is slower but if you only need to access 1 pixel it's okay.
(uchar*)(image->imageData + height*image->widthStep + width);
(image is an IplImage, width and height are just the corresponding widths and heights of the image)
If you need the motion vectors for each pixel, then you need to compute what's called 'dense optical flow'. Starting from openCV 2.1, there is a function to do exactly that: calcOpticalFlowFarneback.
See the link below:
velx and vely are optical flow not the actual velocity.
The method you used is Obsolete. Use this calcOpticalFlowFarneback()
I have been running the Python implementation code of Dense Optical Flow given in the official documentation page. At one particular line of the code, they use
mag, ang = cv2.cartToPolar(flow[...,0], flow[...,1]).
When I print the values of mag, I get these -
Please check this image for the output I'm getting
I have no idea how to make sense of this output.
My end objective is to use optical flow to get a resultant or an average motion value for every frame.
Quoting the same OpenCV tutorial you use
We get a 2-channel array with optical flow vectors, (u,v).
That is the output of the dense optical flow. Basically it tells you how each of the points moved in a vectorial way. (u,v) is just the cartesian representation of a vector and it can be converted to polar coordinates, this means an angle and the magnitude.
The angle is the orientation where the pixel moved. And the magnitude is the distance that the pixel moved.
In many algorithms you may use the magnitude to know if the pixel moved (less than 1 means no movement for example). Or if you are tracking an object which you know the initial position (meaning the pixels position of the object) you may find where the majority of the pixels are moving to, and use that info to determine the new position.
BTW, cartToPolar returns the angles in Radians unless it is specified. Here is an extract of the documentation:
cv2.cartToPolar(x, y[, magnitude[, angle[, angleInDegrees]]]) → magnitude, angle
angleInDegrees must be True if you need it in degrees.
Assuming the static scene, with a single camera moving exactly sideways at small distance, there are two frames and a following computed optic flow (I use opencv's calcOpticalFlowFarneback):
Here scatter points are detected features, which are painted in pseudocolor with depth values (red is little depth, close to the camera, blue is more distant). Now, I obtain those depth values by simply inverting optic flow magnitude, like d = 1 / flow. Seems kinda intuitive, in a motion-parallax-way - the brighter the object, the closer it is to the observer. So there's a cube, exposing a frontal edge and a bit of a side edge to the camera.
But then I'm trying to project those feature points from camera plane to the real-life coordinates to make a kind of top view map (where X = (x * d) / f and Y = d (where d is depth, x is pixel coordinate, f is focal length, and X and Y are real-life coordinates). And here's what I get:
Well, doesn't look cubic to me. Looks like the picture is skewed to the right. I've spent some time thinking about why, and it seems that 1 / flow is not an accurate depth metric. Playing with different values, say, if I use 1 / power(flow, 1 / 3), I get a better picture:
But, of course, power of 1 / 3 is just a magic number out of my head. The question is, what is the relationship between optic flow in depth in general, and how do I suppose to estimate it for a given scene? We're just considering camera translation here. I've stumbled upon some papers, but no luck trying to find a general equation yet. Some, like that one, propose a variation of 1 / flow, which isn't going to work, I guess.
What bothers me a little is that simple geometry points me to 1 / flow answer too. Like, optic flow is the same (in my case) as disparity, right? Then using this formula I get d = Bf / (x2 - x1), where B is distance between two camera positions, f is focal length, x2-x1 is precisely the optic flow. Focal length is a constant, and B is constant for any two given frames, so that leaves me with 1 / flow again multiplied by a constant. Do I misunderstand something about what optic flow is?
for a static scene, moving a camera precisely sideways a known amount, is exactly the same as a stereo camera setup. From this, you can indeed estimate depth, if your system is calibrated.
Note that calibration in this sense is rather broad. In order to get real accurate depth, you will need to in the end supply a scale parameter on top of the regular calibration stuff you have in openCV, or else there is a single uniform ambiguity of the 3D (This last step is often called going to the "metric" reconstruction from only the "Euclidean").
Another thing which is apart of broad calibration is lens distortion compensation. Before anything else, you probably want to force your cameras to behave like pin-hole cameras (which real-world cameras usually dont).
With that said, optical flow is definetely very different from a metric depth map. If you properly calibraty and rectify your system first, then optical flow is still not equivalent to disparity estimation. If your system is rectified, there is no point in doing a full optical flow estimation (such as Farnebäck), because the problem is thereafter constrained along the horizontal lines of the image. Doing a full optical flow estimation (giving 2 d.o.f) will introduce more error after said rectification likely.
A great reference for all this stuff is the classic "Multiple View Geometry in Computer Vision"
I need to read the displacement of each pixel in each stage using the Optical flow - Simple Flow tracking algorithm.
I tried the code mentioned here:
How to make Simpleflow work
The code works fine. However, I don't know what does the flow array contain because its values have a strange format, does it contain the displacement or the new position of the pixel or non of them? And is there any way to read these values in order to track the pixel?
After readin their paper SimpleFlow: A Non-iterative, Sublinear Optical Flow Algorithm, authors clearly state that the resulting flow matrice contains the displacement of the pixel, i.e. the pixel p1 in the Ft image is at the position (x,y), and the position of p1 in the Ft+i is (x+u, y+v), (u,v) are the values saved in the resulting flow matrice for each pixel from Ft.
I do have two sets of points and I want to find the best transformation between them.
In OpenCV, you have the following function:
Mat H = Calib3d.findHomography(src_points, dest_points);
that returns you a 3x3 Homography matrix, using RANSAC. My problem is now, that I only need translation and rotation (& maybe scale), I don't need affine and perspective.
The thing is, my points are only in 2D.
(1) Is there a function to compute something like a homography but with less degrees of freedom?
(2) If there is none, is it possible to extract a 3x3 matrix that does only translation and rotation from the 3x3 homography matrix?
Thanks in advance for any help!
OpenCV estimateRigidTransform function is exactly what you need: it returns Translation, Rotation and Scale (use false value for fullAffine flag). And it DOES use RANSAC (see source code to be sure of it).
Homography is for 2D points, the third dimension is just for casting points in 3 dim homogeneous coordinates and performing perspective effects. You can always cast points back:
homogeneous [x, y, w]
cartesian [x/w, y/w]
However since you calculate 6DOF instead of 4DOF (similarity) you result is pretty different from what you expect with 4DOF. More flexible transformation will fit more points in RANSAC at the expense of distortions in transformations you care about. Bottom line - don’t try to decompose H, instead fit similarity or isometry (also called rigid or euclidean). The reason why they are absent in the library - they are expressed in closed form even with correct least squared metric in point coordinates and thus don't require non-linear optimization. In other words, they are very simple.
If you only have rotation and translation, I wrote a quick functions to find them (no RANSAC though). It is probably similar to a rigidTransform but more understandable (hopefully)
With scale there is still a closed form solution, but slightly different formulas for translation and scaling. See Learning similarity parameters, p. 25
I am working on a project to detect the 3D location of the object. I have two cameras set up at two corners of the room and I have obtained the Fundamental matrix between them. These cameras are internally calibrated. My images are 2592 X 1944
K = [1228 0 3267
0 1221 538
0 0 1 ]
F = [-1.098e-7 3.50715e-7 -0.000313
2.312e-7 2.72256e-7 4.629e-5
0.000234 -0.00129250 1 ]
Now, How do I proceed so that given a 3D point in space, I should be able to get points on the image which correspond to the same object in the room. If I can obtain the right projection matrices (with correct scale) I can use them later as inputs to OpenCV's traingulatePoints function to obtain the location of the object.
I have been stuck at this since a long time. So, please help me.
From what I gather, you have obtained the Fundamental matrix through some means of calibration? Either way, with the fundamental matrix (or the calibration rig itself) you can obtain the pose difference via decomposition of the Essential matrix. Once you have that, you can use matched feature points (using a feature extractor and descriptor like SURF, BRISK, ...) to identify which points in one image belong to the same object point as another feature point in the other image.
With that information, you should be able to triangulate away.
Sorry its not coming in size of comment..
so #user2167617 reply to your comment.
Pretty much. A few pointers, though: the singular values should be (s,s,0), so (1.3, 1.05, 0) is a pretty good guess. About the R: Technically, this is right, however, ignoring signs. It might very well be that you get a rotation matrix which does not satisfy the constraint deteminant(R) = 1 but is instead -1. You might want to multiply it with -1 in that case. Generally, if you run into problems with this approach, try to determine the Essential Matrix using the 5 point algorithm (implemented into the very newest version of OpenCV, you will have to build it yourself). The scale is indeed impossible to obtain with these informations. However, it's all to scale. If you define for example the distance between the cameras being 1 unit, then everything will be measured in that unit.
May be it will be simplier use cv::reprojectImageTo3D function? It will give you 3D coordinates.