What does the output array contain in calcOpticalFlowSF - opencv

I need to read the displacement of each pixel in each stage using the Optical flow - Simple Flow tracking algorithm.
I tried the code mentioned here:
How to make Simpleflow work
The code works fine. However, I don't know what does the flow array contain because its values have a strange format, does it contain the displacement or the new position of the pixel or non of them? And is there any way to read these values in order to track the pixel?
Thanks!

After readin their paper SimpleFlow: A Non-iterative, Sublinear Optical Flow Algorithm, authors clearly state that the resulting flow matrice contains the displacement of the pixel, i.e. the pixel p1 in the Ft image is at the position (x,y), and the position of p1 in the Ft+i is (x+u, y+v), (u,v) are the values saved in the resulting flow matrice for each pixel from Ft.

Related

What exactly is the output when we run the dense optical flow (farnnback)?

I have been running the Python implementation code of Dense Optical Flow given in the official documentation page. At one particular line of the code, they use
mag, ang = cv2.cartToPolar(flow[...,0], flow[...,1]).
When I print the values of mag, I get these -
Please check this image for the output I'm getting
I have no idea how to make sense of this output.
My end objective is to use optical flow to get a resultant or an average motion value for every frame.
Quoting the same OpenCV tutorial you use
We get a 2-channel array with optical flow vectors, (u,v).
That is the output of the dense optical flow. Basically it tells you how each of the points moved in a vectorial way. (u,v) is just the cartesian representation of a vector and it can be converted to polar coordinates, this means an angle and the magnitude.
The angle is the orientation where the pixel moved. And the magnitude is the distance that the pixel moved.
In many algorithms you may use the magnitude to know if the pixel moved (less than 1 means no movement for example). Or if you are tracking an object which you know the initial position (meaning the pixels position of the object) you may find where the majority of the pixels are moving to, and use that info to determine the new position.
BTW, cartToPolar returns the angles in Radians unless it is specified. Here is an extract of the documentation:
cv2.cartToPolar(x, y[, magnitude[, angle[, angleInDegrees]]]) → magnitude, angle
angleInDegrees must be True if you need it in degrees.

Rotating 1d FFT to get 2D FFT?

I have a blurry image with a sharp edge and I want to use the profile of that sharp edge to estimate the point spread function (PSF) of the imaging system (assuming that it is symmetric). The profile of the edge gives me the "edge spread function" (ESF) and the derivative of that gives me the "line spread function" (LSF). I am trying to follow these directions that I found in an old paper on how to convert from the LSF to the PSF:
"If we form the one-dimensional Fourier transform of the LSF and rotate the resulting curve about its vertical axis, the surface thus generated proves to be the two-dimensional fourier transform of the PSF. Hence it is merely necessary to take a two-dimensional inverse Fourier transform to obtain the PSF"
I can't seem to get this to work. The 2D FFT of a PSF-like function (for example a 2d gaussian) has lots of alternative positive and negative values, but if I rotate a 1D FFT, I get concentric rings of positive or negative values and the inverse transform looks nothing like a point-spread function. Am I missing a step or misunderstanding something? Any help would be appreciated! Thanks!
Edit: Here is some code showing my attempt to follow the procedure described
;generate x array
x=findgen(1000)/999*50-25
;generate gaussian test function in 1D
;P[0] = peak value
;P[1] = centroid
;P[2] = sigma
;P[3] = base level
P=[1.0,0.0,4.0,0.0]
test1d=gaussian_1d(x,P)
;Take the FFT of the test function
fft1d=fft(test1d)
;create an array with the frequency values for the FFT array, following the conventions used by IDL
;This piece of code to find freq is straight from IDL documentation: http://www.exelisvis.com/docs/FFT.html
N=n_elements(fft1d)
T=x[1]-x[0] ;T = sampling interval
fftx=(findgen((N-1)/2)+1)
is_N_even=(N MOD 2) EQ 0
if (is_N_even) then $
freq=[0.0,fftx,N/2,-N/2+fftx]/(N*T) $
else $
freq=[0.0,fftx,-(N/2+1)+fftx]/(N*T)
;Create a 1000x1000 array where each element holds the distance from the center
dim=1000
center=[(dim-1)/2.0,(dim-1)/2.0]
xarray=cmreplicate(findgen(dim),dim)
yarray=transpose(cmreplicate(findgen(dim),dim))
rarray=sqrt((xarray-center[0])^2+(yarray-center[1])^2)
rarray=rarray/max(rarray)*max(freq) ;scale rarray so max value is equal to highest freq in 1D FFT
;rotate the 1d FFT about zero to get a 2d array by interpolating the 1D function to the frequency values in the 2d array
fft2d=rarray*0.0
fft2d(findgen(n_elements(rarray)))=interpol(fft1d,freq,rarray(findgen(n_elements(rarray))))
;Take the inverse fourier transform of the 2d array
psf=fft(fft2d,/inverse)
;shift the PSF to be centered in the image
psf=shift(psf,500,500)
window,0,xsize=1000,ysize=1000
tvscl,abs(psf) ;visualize the absolute value of the result from the inverse 2d FFT
I don't know IDL, but I think your problem here is that you're taking the FFT of signals that are centered, where by default the function expects 0-frequency components at the beginning of the array.
A quick search for the proper way to do this in IDL indicates the CENTER keyword is what you're looking for.
CENTER
Set this keyword to shift the zero-frequency component to the center of the spectrum. In the forward direction, the resulting Fourier transform has the zero-frequency component shifted to the center of the array. In the reverse direction, the input is assumed to be a centered Fourier transform, and the coefficients are shifted back before performing the inverse transform.
Without letting the FFT routine know where the center of your signal is, it will seem shifted by N/2. In the converse domain this is a strong phase shift that will appear as if values are alternating positive and negative.
Ok, looks like I have solved the problem. The main issue seems to be that I needed to use the absolute value of the FFT results, rather than the complex array that is returned by default. Using the /CENTER keyword also helped make the indexing of the FFT result much simpler than IDL's default. Here is the working version of the code:
;generate x array
x=findgen(1000)/999*50-25
;generate lorentzian test function in 1D
;P[0] = peak value
;P[1] = centroid
;P[2] = fwhm
;P[3] = base level
P=[1.0,0.0,2,0.0]
test1d=lorentzian_1d(x,P)
;Take the FFT of the test function
fft1d=abs(fft(test1d,/center))
;Create an array of frequencies corresponding to the FFT result
N=n_elements(fft1d)
T=x[1]-x[0] ;T = sampling interval
freq=findgen(N)/(N*T)-N/(2*N*T)
;Create an array where each element holds the distance from the center
dim=1000
center=[(dim-1)/2.0,(dim-1)/2.0]
xarray=cmreplicate(findgen(dim),dim)
yarray=transpose(cmreplicate(findgen(dim),dim))
rarray=sqrt((xarray-center[0])^2+(yarray-center[1])^2)
rarray=rarray/max(rarray)*max(freq) ;scale rarray so max value is equal to highest freq in 1D FFT
;rotate the 1d FFT about zero to get a 2d array by interpolating the 1D function to the frequency values in the 2d array
fft2d=rarray*0.0
fft2d(findgen(n_elements(rarray)))=interpol(fft1d,freq,rarray(findgen(n_elements(rarray))))
;Take the inverse fourier transform of the 2d array
psf=abs(fft(fft2d,/inverse,/center))
;shift the PSF to be centered in the image
psf=shift(psf,dim/2.0,dim/2.0)
psf=psf/max(psf)
window,0,xsize=1000,ysize=1000
tvscl,real_part(psf) ;visualize the resulting PSF
;Test the performance by integrating the PSF in one dimension to recover the LSF
psftotal=total(psf,1)
plot,x*sqrt(2),psftotal/max(psftotal),thick=2,linestyle=2
oplot,x,test1d

Optical flow and focus of expansion in OpenCV

I am currently working on a project that requires me to find the focus of expansion using optical flow.
I currently have the optical flow and am using the formula from pages 13-14 this paper:
http://www.dgp.toronto.edu/~donovan/stabilization/opticalflow.pdf
I take two frames from a video and find pyramids from both using buildOpticalFlowPyramid then find the keypoints using goodFeaturesToTrack. Using these I then calculate the sparse optical flow with calcOpticalFlowPyrLK. All three of these methods are provided by OpenCV.
The problem I have hit is that I need both the flow vector for each keypoint in the image to fill the A and b matrices. Would the pixel value be just the location of the keypoint in the original image? And then the flow vector is the difference between the initial location and new point?
Yes, that is precisely so. Using the terms/variables as per the paper and the following link,
http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html#calcopticalflowpyrlk
p_i = (x,y) are the prevPts (points in the original image),
v = (u,v) are the flow vectors obtained by subtracting points in prevPts from those in nextPts.

Distance to the object using stereo camera

Is there a way to calculate the distance to specific object using stereo camera?
Is there an equation or something to get distance using disparity or angle?
NOTE: Everything described here can be found in the Learning OpenCV book in the chapters on camera calibration and stereo vision. You should read these chapters to get a better understanding of the steps below.
One approach that do not require you to measure all the camera intrinsics and extrinsics yourself is to use openCVs calibration functions. Camera intrinsics (lens distortion/skew etc) can be calculated with cv::calibrateCamera, while the extrinsics (relation between left and right camera) can be calculated with cv::stereoCalibrate. These functions take a number of points in pixel coordinates and tries to map them to real world object coordinates. CV has a neat way to get such points, print out a black-and-white chessboard and use the cv::findChessboardCorners/cv::cornerSubPix functions to extract them. Around 10-15 image pairs of chessboards should do.
The matrices calculated by the calibration functions can be saved to disc so you don't have to repeat this process every time you start your application. You get some neat matrices here that allow you to create a rectification map (cv::stereoRectify/cv::initUndistortRectifyMap) that can later be applied to your images using cv::remap. You also get a neat matrix called Q, which is a disparity-to-depth matrix.
The reason to rectify your images is that once the process is complete for a pair of images (assuming your calibration is correct), every pixel/object in one image can be found on the same row in the other image.
There are a few ways you can go from here, depending on what kind of features you are looking for in the image. One way is to use CVs stereo correspondence functions, such as Stereo Block Matching or Semi Global Block Matching. This will give you a disparity map for the entire image which can be transformed to 3D points using the Q matrix (cv::reprojectImageTo3D).
The downfall of this is that unless there is much texture information in the image, CV isn't really very good at building a dense disparity map (you will get gaps in it where it couldn't find the correct disparity for a given pixel), so another approach is to find the points you want to match yourself. Say you find the feature/object in x=40,y=110 in the left image and x=22 in the right image (since the images are rectified, they should have the same y-value). The disparity is calculated as d = 40 - 22 = 18.
Construct a cv::Point3f(x,y,d), in our case (40,110,18). Find other interesting points the same way, then send all of the points to cv::perspectiveTransform (with the Q matrix as the transformation matrix, essentially this function is cv::reprojectImageTo3D but for sparse disparity maps) and the output will be points in an XYZ-coordinate system with the left camera at the center.
I am still working on it, so I will not post entire source code yet. But I will give you a conceptual solution.
You will need the following data as input (for both cameras):
camera position
camera point of interest (point at which camera is looking)
camera resolution (horizontal and vertical)
camera field of view angles (horizontal and vertical)
You can measure the last one yourself, by placing the camera on a piece of paper and drawing two lines and measuring an angle between these lines.
Cameras do not have to be aligned in any way, you only need to be able to see your object in both cameras.
Now calculate a vector from each camera to your object. You have (X,Y) pixel coordinates of the object from each camera, and you need to calculate a vector (X,Y,Z). Note that in the simple case, where the object is seen right in the middle of the camera, the solution would simply be (camera.PointOfInterest - camera.Position).
Once you have both vectors pointing at your target, lines defined by these vectors should cross in one point in ideal world. In real world they would not because of small measurement errors and limited resolution of cameras. So use the link below to calculate the distance vector between two lines.
Distance between two lines
In that link: P0 is your first cam position, Q0 is your second cam position and u and v are vectors starting at camera position and pointing at your target.
You are not interested in the actual distance, they want to calculate. You need the vector Wc - we can assume that the object is in the middle of Wc. Once you have the position of your object in 3D space you also get whatever distance you like.
I will post the entire source code soon.
I have the source code for detecting human face and returns not only depth but also real world coordinates with left camera (or right camera, I couldn't remember) being origin. It is adapted from source code from "Learning OpenCV" and refer to some websites to get it working. The result is generally quite accurate.

How to extract velocity vectors from dense optical flow?

Problem: I'm trying to align two frames of a moving video.
I'm currently trying to use the function "cvCalcOpticalFlowLK" and the result outputs velocity vectors of x and y in the form of a "CvArr".
So I obtained the result, but i'm not sure how to use these vector arrays.
My question is this... how do i know what is the velocity of each pixel? Is it just the value of each pixel value at that particular point?
Note: I would've used the other optical flow functions such as cvCalcOpticalFlowPyrLK() as it is much easier, but i want the dense optical flow.
Apparently my original assumption was true. The "velx" and "vely" outputs from the optical flow function are the actual velocities for each pixel value. To best extract them, I accessed the pixel from the raw data and pulled the value. There are 2 ways to do this.
cvGet2D() -- this way is slower but if you only need to access 1 pixel it's okay.
or
(uchar*)(image->imageData + height*image->widthStep + width);
(image is an IplImage, width and height are just the corresponding widths and heights of the image)
If you need the motion vectors for each pixel, then you need to compute what's called 'dense optical flow'. Starting from openCV 2.1, there is a function to do exactly that: calcOpticalFlowFarneback.
See the link below:
http://opencv.itseez.com/modules/video/doc/motion_analysis_and_object_tracking.html?highlight=calcopticalflowfarneback#cv2.calcOpticalFlowFarneback
velx and vely are optical flow not the actual velocity.
The method you used is Obsolete. Use this calcOpticalFlowFarneback()

Resources