I am doing a project to detect moving object from a moving camera with optical flow. To detect the real motion of any moving object I need compensate the ego-motion of the camera. Can any body suggest a simple way to do so? I use opencv c and c++ for my project.
Hi actually if you use optical flow you not ecessarily need to compensate the ego-motion. It is possible to create long term trajectories and cluster them. Look at these publications LDOF or MORLOF. But if you want to copensate the ego-motion than:
detect points to track using GFT or simple a point grid
compute motion vector via Lucas Kanade or other local optical flow methods
compute the affine or perspective transformation matrix using cv::getAffineTransform or cv::getPerspectiveTransform (RANSAC is a good estimator)
compensate ego-motion with the transformation matrix by using cv::warpAffine or cv::warpPerspective
Related
How is it possible to determine an object's 3D position using one camera and OpenCV when the camera is kept at (say) 45 degrees with respect to the ground ?
Two types of motion can be applied to camera in 3D world: translation and rotation. It's not possible to infer depth from mono camera, if there is no translation. You should check stereo vision for the details.
Simply, you need to recover essential matrix where E = [t_x]R and if t_x = 0, which means you are using monocular vision. There is no way to recover this by classical stereo vision.
However, there are some methods that uses depth of training dataset to infer the depth of test image. Please check this slide. They published their code for Matlab; however, you can easily implement it by yourself.
If you want a more accurate result, you can use deep learning models to estimate the depth of the pixels in an input image. There are some open-source models available such as this one. However, note that bts model is trained with KITTI dataset from an autonomous vehicle perspective. To have better results, you need to have a dataset that is relevant to your application. Then use frameworks such bts to train a model for depth estimation. This model will provide you with point clouds of a single image with (x,y,z) coordinates.
I am trying to track multiple people using a video camera. I do not want to use blob segmentation techniques.
What I want to do:
Perform background subtraction to obtain a mask isolating the peoples' motion.
Perform grid based optical flow on those areas -
What would be my best bet?
I am struggling to implement. I have tried blob detection and also some optical flow based examples (sparse), sparse didn't really do it for me as I wasn't getting enough feature points from goodfeaturestotrack() - I would like to end up with at least 20 track able points per person so that's why I think a grid based method would be better for me, I will use the motion vectors obtained to classify different people ( clustering on magnitude and direction possibly? )
I am using opencv3 with Python 3.5 - but am still quite noobish in this field.
Would appreciate some guidance immensely!
For a sparse optical flow ( in OpenCV the pyramidal Lucas Kanade method) you don't need good features-to-track mandatory to get the positions.
The calcOpticalFlowPyrLK function allows you to estimate the motion at predefined positions and these can be given by you too.
So just initialized a grid of cv::Point2f by your self, e.g. create a list of points and set the positions to the grid points located at your blobs, and run calcOpticalFlowPyrLK().
The idea of the good features-to-track method is that it gives you the points where the calcOpticalFlowPyrLK() result is more likely to be accurate and this is on image locations with edge-like structures. But in my experiences this gives not always the optimal feature point set. I prefer to use regular grids as feature point sets.
I am currently working on pose estimation of one camera with respect to another using opencv, in a setup where camera1 is fixed and camera2 is free to move. I know the intrinsics of both the cameras. I have the pose estimation module using epipolar geometry and computing essential matrix using the five-point algorithm to figure out the R and t of camera2 with respect to camera1; but I would like to get the metric translation. To help achieve this, I have two GPS modules, one on camera1 and one on camera2. For now, if we assume camera1's GPS is flawless and accurate; camera2's GPS exhibits some XY noise, I would need a way to use the opencv pose estimate on top of this noisy GPS to get the final accurate translation.
Given that info, my question has two parts:
Because the extrinsics between the cameras keep changing, would it be possible to use bundle adjustment to refine my pose?
And can I somehow incorporate my (noisy) GPS measurements in a bundle adjustment framework as an initial estimate, and obtain a more accurate estimate of metric translation as my end result?
1) No, bundle adjustment has another function and you would not be able to work with it anyway because you would have an unknown scale for every pair you use with 5-point. You should instead use a perspective-n-point algorithm after the first pair of images.
2) Yes, it's called sensor fusion and you need to first calibrate (or know) the transformation between your GPS sensor coordinates and your camera coordinates. There is an open source framework you can use.
I am doing a project on 3D rendering of a scene. I am using OpenCV. The steps I am doing are like this:
Taking two images of a scene.
Calculating object correspondence using SURF feature matching.
Calculating camera fundamental matrix.
Calculating the Disparity image.
Now I have two questions
After calculating fundamental matrix how can I calculate the Q matrix? (I can't calibrate the camera)
How can I render in 3D using opencv or any other library?
For the 3D part, you can render your scene with OpenGL or with PCL. You've two solutions:
For each pixel, you make a point with the right color extracted from the camera's image. This will give you a point cloud which can be processed with PCL (for 3D features extraction for example).
You apply a triangulation algorithm, but in order to apply this algorithm you must have the extrinsic matrices of your camera.
You can find more information about these techniques here:
Point Cloud technique
Triangulation algorithm
If you want to use OpenGL, you have to open a valid OpenGL context. I recommend you the SFML library or Qt. These libraries are very easy to use and have a good documentation. Both have tutorials about 3D rendering with OpenGL.
you can have Q matrix from stereo rectification via openCV method:
cv::stereoRectify
I think you want the Q matrix to reconstruct the 3D. However, you can reconstruct from intrinsic parameters via:
X = (u-cu)*base/d
Y = (v-cv)*base/d
Z = f*base/d
where (u,v) is a 2D point in the image coordinate system, (cu,cv) is the principal point of the camera, f is the focal length, base is the baseline, d is the disparity and (X,Y,Z) is a 3D point in the camera coordinate system.
For the visualization, it is possible to use PCL or VTK (the visualization of PCL is based on vtk, but for me more simple to implement).
If you just want to have a look to the output you can just use some software like Meshlab
Cheers
i need to find a marker like the ones used in Augmented Reality.
Like this:
I have a solid background on algebra and calculus, but no experience whatsoever on image processing. My thing is Php, sql and stuff.
I just want this to work, i've read the theory behind this and it's extremely hard to see in code for me.
The main idea is to do this as a batch process, so no interactivity is needed. What do you suggest?
Input : The sample image.
Output: Coordinates and normal vector in 3D of the marker.
The use for this will be linking images that have the same marker to spatialize them, a primitive version of photosync we could say. Just a caroussel of pinned images, the marker acting like the pin.
The reps given allowed me to post images, thanks.
You can always look at the open source libraries such as ARToolkit and see how it works but generally in order to get the 3D coordinates of marker you would need to:
Do the camera calibration.
Find marker in image using local features for example.
Using calibrated camera parameters and 2D coordinates of marker do the approximation the 3D coordinates.
I've never implemented sth similar by myself but I think this is a general concept you should apply on your method.
Your problem can be solved by perspective n point camera pose estimation. When you can reasonably assume that all correspondences are correct, a linear algorithm should do.
Since the marker is planar, you can also recover the displacement from the homography between the model plane and the image plane (link). As usual, best results are obtained by iterative algorithms (link).