I was wondering if Lucas-Kanade Optical flow would be the best way to estimate the velocity of a baseball/cricketball being hit. I want to get a 3d velocity vector of the initial velocity of the ball to plug into a physics engine. Would I need 2 cameras, 1 for the x, y component and another for x/y, z Or can it be done with 1?
I wouldn't rely on optical flow, this will be too noisy. Much more reliable to take the positions in two successive frames. Two cameras, stereoscopy.
Related
it's now the standard practices to fuse the measurements from accelerometers and gyro through Kalman filter, for applications like self-balancing 2-wheel carts: for example: http://www.mouser.com/applications/sensor_solutions_mems/
accelerometer gives a reading of the tilt angle through arctan(a_x/a_y). it's very confusing to use the term "acceleration" here, since what it really means is the projection of gravity along the devices axis (though I understand that , physically, gravity is really just acceleration ).
here is the big problem: when the cart is trying to move, the motor drives the cart and creates a non-trivial acceleration in horizontal direction, this would make the a_x no longer a just projection of gravity along the device x-axis. in fact it would make the measured tilt angle appear larger. how is this handled? I guess given the maturity of Segway, there must be some existing ways to handle it. anybody has some pointers?
thanks
Yang
You are absolutely right. You can estimate pitch and roll angles using projection of gravity vector. You can obtain gravity vector by utilizing motionless accelerometer, but if accelerometer moves, then it measures gravity + linear component of acceleration and the main problem here is to sequester gravity component from linear accelerations. The best way to do it is to pass the accelerometer signal through Low-Pass filter.
Please refer to
Low-Pass Filter: The Basics or
Android Accelerometer: Low-Pass Filter Estimated Linear Acceleration
to learn more about Low-Pass filter.
Sensor fusion algorithm should be interesting for you as well.
I have two calibrated cameras looking at an overlapping scene. I am trying to estimate the pose of camera2 with respect to camera1 (because camera2 can be moving; but both camera1 and 2 will always have some features that are overlapping).
I am identifying features using SIFT, computing the fundamental matrix and eventually the essential matrix. Once I solve for R and t (one of the four possible solutions), I obtain the translation up-to-scale, but is it possible to somehow compute the translation in real world units? There are no objects of known size in the scene; but I do have the calibration data for both the cameras. I've gone through some info about Structure from Motion and stereo pose estimation, but the concept of scale and the correlation with real world translation is confusing me.
Thanks!
This is the classical scale problem with structure from motion.
The short answer is that you must have some other source of information in order to resolve scale.
This information can be about points in the scene (e.g. terrain map), or some sensor reading from the moving camera (IMU, GPS, etc.)
I need to get the u,v components so that I can compute an avoidance strategy of obstacles for blind people.
I will divide the frame into 2 halves and sum up the flow components u+v in them the avoidance strategy will be that the blind person will move away from the half that has the higher value of flow .
The function calcOpticalFlowPyrLK in opencv returns the position of the points in the new frame however I need the u and v components.
How can that be achieved. And also is there an avoidance strategy that I can use better than this one using only an RGB camera
As for dividing for the u and v components, I suggest doing a simple subtraction of point coordinates before and after translation. You can try to speed it up by for example putting all the points into 2 channel matrices and subtracting matrices from each other.
As for a better way here is an article I based my masters thesis on. There is a trick in it using amount of optical flow for obstacle detection.
I am using standard OpenCV functions to calibrate camera for intrinsic parameters. In order to obtain good results, I know we have to use images of the chessboard from different angles (considering different planes in the 3D). This is stated in all the documentations
and papers but I really don't understand, why is it so important for us to consider different planes and if there is an optimal number of planes that we have to consider for the best calibration results?
I will be glad if you can provide me reference to some paper or documentation which explains this. (I think Zhang's paper talks about it but, its mathematically intensive and was hart to digest.)
Thanks
Mathematically, a unique solution for the intrinsic parameters (up to scale) is defined only if you have 3 or more distinct images of the planar target. See page 6 of Zhang's paper: "If n images of the model plane are observed, by stacking n such equations as (8) we have Vb = 0 ; (9) where V is a 2n×6 matrix. If n ≥ 3, we will have in general a unique solution b defined up to a scale factor..."
There isn't an "optimal" number of planes, where data are concerned, the more you have the merrier you are. But as the solution starts to converge, the marginal gain in calibration accuracy due to adding an extra image becomes negligible. Of course, this assumes that the images show planes well separated in both pose and location.
See also this other answer of mine for practical tips.
If you're looking for a little intuition, here's an example of why one plane isn't enough. Imagine your calibration chessboard is tilting away from you at a 45° angle:
You can see that when you move up the chessboard by 1 meter in the +y direction, you also move away from the camera by 1 meter in the +z direction. This means there's no way to separate the effect of moving in the y direction vs the z direction. The y and z movement directions are effectively tied to each other, for all our training points. So, if we just look at points on this one plane, there's no way to tease apart the effects of y movement vs z movement.
For example, from this 1 plane, we can't tell the difference between these scenarios:
The camera has perspective distortion such that things appear smaller in the image as they move in the world's +y direction.
The camera focal length is such that things appear smaller in the image as they move in the world's +z direction.
Any mixture of the effects in #1 and #2.
Mathematically, this ambiguity means that there are many equally possible solutions when OpenCV tries to fit a camera matrix to match the data. (Note that the 45° angle was not important. Any plane you choose will have the same problem: training examples' (x,y,z) dimensions are coupled together, so you can't separate their effects.)
One last note: if you make enough assumptions about the camera matrix (e.g. no perspective distortion, x and y scale identically, etc) then you can end up with a situation with fewer unknowns (in an extreme case, maybe you're just calculating the focal length) and in that case you could calibrate with just 1 plane.
I am doing a project to detect moving object from a moving camera with optical flow. To detect the real motion of any moving object I need compensate the ego-motion of the camera. Can any body suggest a simple way to do so? I use opencv c and c++ for my project.
Hi actually if you use optical flow you not ecessarily need to compensate the ego-motion. It is possible to create long term trajectories and cluster them. Look at these publications LDOF or MORLOF. But if you want to copensate the ego-motion than:
detect points to track using GFT or simple a point grid
compute motion vector via Lucas Kanade or other local optical flow methods
compute the affine or perspective transformation matrix using cv::getAffineTransform or cv::getPerspectiveTransform (RANSAC is a good estimator)
compensate ego-motion with the transformation matrix by using cv::warpAffine or cv::warpPerspective