Is it necessary to calibrate the camera if I were to implement a natural marker tracker?
Actually I don't quite get the idea of camera calibration although I have read that it is required for augmenting 3d/2d objects onto the image feed.
Camera-calibration means finding intrinsic parameters of the camera. It is necessary, of course, if you want to detect natural features sucessfully, and here I explain why.
Then you only have to look for extrinsic parameters. You only have to do calibration once, as the camera is always the same (considering you cannot zoom in/out, change focal length, etc). Without camera calibration you will have many problems for the natural features tracking task, as it is more challenging than fiducials tracking.
In the link I passed you, you will also find how to calculate the pose from a planar marker. It is theoretical, but you can find a lot of code in the web. If you need more help tell me, I can explain in more detail if necessary.
Strictly speaking, you could detect features, do pattern matching to recognize the marker and then track those features without camera calibration. Calibration allows us to determine both intrinsic (e.g. distortion coefficients) and extrinsic (e.g. rotation) camera parameters, which are required when someone is to determine marker boundaries or perform 3D pose estimation.
Is it necessary? No.
Is it useful? You bet. The rule of thumb is ALWAYS, if you can perform camera calibration for your stationary camera, do it.
You can do many things with such information: remove distortion, get distance in some type of metric space, ... Most trackers have an underlying assumption/models, these models are best fit when the data is in a space where the model makes sense. Camera calibration is one easy way to achieve this.
Related
I have a multi-camera system where the field of views are mostly non-overlapping. I have been researching on methods to calibrate the camera extrinsics and the first thing I'm going to try is to take a picture of a chessboard at a known location and use solvePnP from OpenCV to find the extrinsic rotation and translation vectors for each camera separately (following the method described in the answer here).
My problem is, this method uses only one measurement and as every measurement it is prone to errors. I assume that by taking multiple measurements, either by changing the position or the orientation of the chessboard, the accuracy can be improved. But what would be the best way to combine the rotation and translation obtained from the different measurements? A simple average?
In theory I would think that an option could be using solvePnP on all the points at the same time. Since I am calculating extrinsics the camera can't be moved so I would have to change to position and/or orientation of the board for each picture and measure the 3D points positions as accurately as possible each time.
I'm also wondering if using two chessboards in the same picture would be a possible solution, even if OpenCV doesn't seem to support multiple chessboard detection.
Is there a better way to measure extrinsics or anything that I'm missing?
The purpose of calibration is to calibrated distortion the image.
What's main source of this distortion in the image when the lens is used, for example fish-eyes lens?
Q1-You think we are going to identify some of the objects and using fish-eyes lenses in order to cover a wide view of the environment, Do we need to calibrate the camera? That is, we must correct the image distortions and then identify the objects? Does the corrected image still cover the same amount of objects? If it's not cover all objects of distorted image, then what is the point of taking a wide-angle lens? Wouldn't it be better to use the same flat lens without having to calibrate the camera?
Q2-For calculating the distortion param like intrinsic and extrinsic param and etc, Is need to calculate parameters for all of camera with same specifics independently? That's, the finding parameters of distortion for one camera can be correctly work with other camera with same specifics?
Q1 Answer : You need to dewarp the image/video that comes out of the camera. There are some libraries that do it for you. You can also calibrate the dewarping according to your needs.
When dewarping the fisheye input, the corners of the video feed are a little lost. This won't be a huge loss.
Q2 Answer : Usually you don't have to do a different dewarping configuration based on your camera. But if you want to finetune it, there are parameters for it.
FFmpeg has lens correction filter, the parameters to finetune are also present in the link.
I am trying to reconstruct the real-world coordinates of 3D points from two images taken from the same camera. The camera is not calibrated, but the movement (translation and rotation) is known. In short:
Requirement:
No calibration
Extra constraints other than image point correspondences:
Known camera translation and rotation
Same camera used in all views
I understand that, from image point correspondences alone, a scene can be reconstructed only up to a projective transformation. With more constraints, an affine or similarity reconstruction may be done. In my case, I need a similarity reconstruction.
Given the above constraints, is a similarity reconstruction possible? If possible, how should I go about doing it?
I have tried to attack the problem from a few angles. Since I am not mathematically fluent, I try to use opencv as much as possible.
findFundamentalMat() from the two images, hopefully extract the two camera matrices somehow, then triangulatePoints(). As you could have guessed, I got stuck in the middle, unable to obtain camera matrices from fundamental matrix.
The textbook "Multiple View Geometry in Computer Vision" (by Hartley and Zisserman) gives an expression (p.256, Result 9.14) that expresses the camera matrices in terms of fundamental matrix and one of the epipoles. However, without knowing the camera's intrinsic parameters (requirement: no calibration), I don't see how I can get the epipole.
I also try to treat my problem as a stereo system and use opencv's stereo*** functions. But they all seem to require human intervention to calibrate, which violates my requirement.
So, that's why I ask the question here today. The key is still, given those extra constraints, is a similarity reconstruction possible? I am not smart enough to understand the wealth of knowledge out there, and not able to come up with my own solution. Any help is appreciated.
I am a beginner when it comes to computer vision so I apologize in advance. Basically, the idea I am trying to code is that given two cameras that can simulate a multiple baseline stereo system; I am trying to estimate the pose of one camera given the other.
Looking at the same scene, I would incorporate some noise in the pose of the second camera, and given the clean image from camera 1, and slightly distorted/skewed image from camera 2, I would like to estimate the pose of camera 2 from this data as well as the known baseline between the cameras. I have been reading up about homography matrices and related implementation in opencv, but I am just trying to get some suggestions about possible approaches. Most of the applications of the homography matrix that I have seen talk about stitching or overlaying images, but here I am looking for a six degrees of freedom attitude of the camera from that.
It'd be great if someone can shed some light on these questions too: Can an approach used for this be extended to more than two cameras? And is it also possible for both the cameras to have some 'noise' in their pose, and yet recover the 6dof attitude at every instant?
Let's clear up your question first. I guess You are looking for the pose of the camera relative to another camera location. This is described by Homography only for pure camera rotations. For General motion that includes translation this is described by rotation and translation matrices. If the fields of view of the cameras overlap the task can be solved with structure from motion which still estimates only 5 dof. This means that translation is estimated up to scale. If there is a chessboard with known dimensions in the cameras' field of view you can easily solve for 6dof by running a PnP algorithm. Of course, cameras should be calibrated first. Finally, in 2008 Marc Pollefeys came up with an idea how to estimate 6 dof from two moving cameras with non-overlapping fields of view without using any chess boards. To give you more detail please tell a bit for the intended appljcation you are looking for.
I want to find the depth map for stereo images.At present i am working on the internet image,I want to take stereo images so that i can work on it by my own.How to take best stereo images without much noise.I have single camera.IS it necessary to do rectification?How much distance must be kept between the cameras?
Not sure I've understood your problem correclty - will try anyway
I guess your currently working with images from middlebury or something similar. If you want to use similar algorithms you have to rectify your images because they are based on the assumption that corresponding pixels are on the same line in all images. If you actually want depth images (!= disparity images) you also need to get the camera extrinsics.
Your setup should have two cameras and you have to make sure that they don't change there relative position/orientation - otherwise your rectification will break apart. In the first step you have to calibrate your system to get intrinsic and extrinsic camera parameters. For that you can either use some tool or roll your own with (for example) OpenCV (calib-module). Print out a calibration board to calibrate your system. Afterwards you can take images and use the calibration to rectify the images.
Regarding color-noise:
You could make your aperture very small and use high exposure times. In my own opinion this is useless because real world situations have to deal with such things anyway.
In short, there are plenty of stereo images on the internet that are already rectified. If you want to take your own stereo images you have to follow these three steps:
The relationship between distance to the object z (mm) and disparity in pixels D is inverse: z=fb/D, where f is focal length in pixels and b is camera separation in mm. Select b such that you have at least several pixels of disparity;
If you know camera intrinsic matrix and compensated for radial distortions you still have to rectify your images in order to ensure that matches are located in the same row. For this you need to find a fundamental matrix, recover essential matrix, apply rectifying homographies and update your intrinsic camera parameters... or use stereo pairs from the Internet.
The low level of noise in the camera image is helped by brightly illuminated scenes, large aperture, large pixel size, etc.; however, depending on your set up you still can end up with a very noisy disparity map. The way to reduce this noise is to trade-off with accuracy and use larger correlation windows. Another way to clean up a disparity map is to use various validation techniques such as
error validation;
uniqueness validation or back-and-force validation
blob-noise supression, etc.
In my experience:
-I did the rectification, so I had to obtain the fundamental matrix, and this may not be correct with some image pairs.
-Better resolution of your camera is better for the matching, I use OpenCV and it has an implementation of BRISK descriptor, it was useful for me.
-Try to cover the same area and try not to do unnecessary rotations.
-Once you understand the Theory, OpenCV is a good friend. Here is some result, but I am still working on it:
Depth map:
Rectified images: