I am using OpenCV, a newbie to the entire thing.
I have a scenario, I am projecting on a wall, I am building a kind of a robot which has a camera. I wanted to know how can I process the image so that I could get the real-world values of the co-ordinates of the blobs tracked by my camera?
First of all, you need to calibrate the intrinsic of the camera. Use checkerboard-patterns printed on cardboard to do this, OpenCV has methods for this although there are finished tools for this as well.
To get an idea, I have written some python code to calibrate from a live video stream, move the cardboard along the camera in some different angles and distances. Take a look here: http://svn.ioctl.eu/pub/opencv/py-camera_intrinsic/
Then you need to calibrate the extrinsic of the camera, that is the position of the camera wrt. your world coordinates. You can place some markers on the wall, define the 3D-position of those markers and let OpenCV calibrate the extrinsic for this (cvFindExtrinsicCameraParams2).
In my sample code, I calculate the extrinsic wrt. the checkerboard so I can render a Teapot in the correct perspective of the camera. You have to adjust this to your needs.
I assume you project only onto a flat surface. You have to know the geometry to get the 3D coordinates of your detected blob. You can then find the blobs in your camera image and knowing intrinsic, extrinsic and the geometry, you can cast rays for each blob from the camera according to your intrinsic/extrinsic and calculate the intersection of each such ray with your known geometry. The intersection then is your 3D point in world space where the blob is projected to.
Related
There are a number of calibration tutorials to calibrate camera images of chessboards in EMGU (OpenCV). They all end up calibrating and then undistorting an image for display. That's cool and all but I need to do machine vision where I am taking an image, identifying the location of a corner or blob or feature in the image and then translating the location of that feature in pixels into real world X, Y coordinates.
Pixel -> mm.
Is this possible with EMGU? If so, how? I'd hate to spend a bunch of time learning EMGU and then not be able to do this crucial function.
Yes, it's certainly possible as the "bread and butter" of OpenCV.
The calibration you are describing, in terms of removing distortions, is a prerequisite to this process. After which, the following applies:
The Intrinsic calibration, or "camera matrix" is the first of two required matrices. The second is the Extrinsic calibration of the camera which is essentially the 6 DoF transform that describes the physical location of the sensor center relative to a coordinate reference frame.
All of the Distortion Coefficients, Intrinsic, and Extrinsic Calibrations are available from a single function in Emgu.CV: CvInvoke.CalibrateCamera This process is best explained, I'm sure, by one of the many tutorials available that you have described.
After that it's as simple as CvInvoke.ProjectPoints to apply the transforms above and produce 3D coordinates from 2D pixel locations.
The key to doing this successfully this providing comprehensive IInputArray objectPoints and IInputArray imagePoints to CvInvoke.CalibrateCamera. Be sure to cause "excitation" by using many images, from many different perspectives.
Based on the documentation of stereo-rectify from OpenCV, one can rectify an image based on two camera matrices, their distortion coefficients, and a rotation-translation from one camera to another.
I would like to rectify an image I took using my own camera to the stereo setup from the KITTI dataset. From their calibration files, I know the camera matrix and size of images before rectification of all the cameras. All their data is calibrated to their camera_0.
From this PNG, I know the position of each of their cameras relative to the front wheels of the car and relative to ground.
I can also do a monocular calibration on my camera and get a camera matrix and distortion coefficients.
I am having trouble coming up with the rotation and translation matrix/vector between the coordinate systems of the first and the second cameras, i.e. from their camera to mine or vice-versa.
I positioned my camera on top of my car at almost exactly the same height and almost exactly the same distance from the center of the front wheels, as shown in the PNG.
However now I am at a loss as to how I can create the joint rotation-translation matrix. In a normal stereo-calibrate, these are returned by the setereoCalibrate function.
I looked at some references about coordinate transformation but I don't have sufficient practice in them to figure it out on my own.
Any suggestions or references are appreciated!
How important it is to do camera calibration for ArUco? What if I dont calibrate the camera? What if I use calibration data from other camera? Do you need to recalibrate if camera focuses change? What is the practical way of doing calibration for consumer application?
Before answering your questions let me introduce some generic concepts related with camera calibration. A camera is a sensor that captures the 3D world and project it in a 2D image. This is a transformation from 3D to 2D performed by the camera. Following OpenCV doc is a good reference to understand how this process works and the camera parameters involved in the same. You can find detailed AruCo documentation in the following document.
In general, the camera model used by the main libraries is the pinhole model. In the simplified form of this model (without considering radial distortions) the camera transformation is represented using the following equation (from OpenCV docs):
The following image (from OpenCV doc) illustrates the whole projection process:
In summary:
P_im = K・R・T ・P_world
Where:
P_im: 2D points porojected in the image
P_world: 3D point from the world
K is the camera intrinsics matrix (this depends on the camera lenses parameters. Every time you change the camera focus for exapmle the focal distances fx and fy values whitin this matrix change)
R and T are the extrensics of the camera. They represent the rotation and translation matrices for the camera respecively. These are basically the matrices that represent the camera position/orientation in the 3D world.
Now, let's go through your questions one by one:
How important it is to do camera calibration for ArUco?
Camera calibration is important in ArUco (or any other AR library) because you need to know how the camera maps the 3D to 2D world so you can project your augmented objects on the physical world.
What if I dont calibrate the camera?
Camera calibration is the process of obtaining camera parameters: intrinsic and extrinsic parameters. First one are in general fixed and depend on the camera physical parameters unless you change some parameter as the focus for example. In such case you have to re-calculate them. Otherwise, if you are working with camera that has a fixed focal distance then you just have to calculate them once.
Second ones depend on the camera location/orientation in the world. Each time you move the camera the RT matrices change and you have to recalculate them. Here when libraries such as ArUco come handy because using markers you can obtain these values automatically.
In few words, If you don't calculate the camera you won't be able to project objects on the physical world on the exact location (which is essential for AR).
What if I use calibration data from other camera?
It won't work, this is similar as using an uncalibrated camera.
Do you need to recalibrate if camera focuses change?
Yes, you have to recalculate the intrinsic parameters because the focal distance changes in this case.
What is the practical way of doing calibration for consumer application?
It depends on your application, but in general you have to provide some method for manual re-calibration. There're also method for automatic calibration using some 3D pattern.
I have 2 images for the same object from different views. I want to form a camera calibration, but from what I read so far I need to have a 3D world points to get the camera matrix.
I am stuck at this step, who can explain it to me
Popular camera calibration methods use 2D-3D point correspondences to determine the projective properties (intrinsic parameters) and the pose of a camera (extrinsic parameters). The most simple approach is the Direct Linear Transformation (DLT).
You might have seen, that often planar chessboards are used for camera calibrations. The 3D coordinates of it's corners can be chosen by the user itself. Many people choose the chessboard being in x-y plane [x,y,0]'. However, the 3D coordinates need to be consistent.
Coming back to your object: Span your own 3D coordinate system over the object and find at least six spots, from which you can determine easy their 3D position. Once you have that, you have to find their corresponding 2D positions (pixel) in your two images.
There are complete examples in OpenCV. Maybe you get a better picture when reading the code.
Suppose I've got two images taken by the same camera. I know the 3d position of the camera and the 3d angle of the camera when each picture was taken. I want to extract some 3d data from the images on the portion of them that overlaps. It seems that OpenCV could help me solve this problem, but I can't seem to find where my camera position and angle would be used in their method stack. Help? Is there some other C library that would be more helpful? I don't even know what keywords to search for on the web. What's the technical term for overlapping image content?
You need to learn a little more about camera geometry, and stereo rig geometry. Unless your camera was mounted on a special rig, it's rather doubtful that its pose at each image can be specified with just an angle and a point. Rather, you'd need three angles (e.g. roll, pitch, yaw). Plus, if you want your reconstruction to be metrical accurate, you need to calibrate accurately the focal length of the camera (at a minimum).