When we compute the pose of the camera with respect to a primitive like a marker or a 3D model..etc, the origin of that primitive is usually precisly known like the origin of a chessboard or a marker (in blue).
Now the question is where is the origin of the camera (in black)? The vector translation of the pose is expressed with respect to which reference? How can we determine where it is?
The optical center is meant to be on the optical axis (ideally it projects to the center of the image), at a distance of the sensor equal to the focal length, which can be expressed in pixel units (knowing the pixel size).
You can see where the optical axis lies (it is the symmetry axis of the lens), but the optical center is somewhere inside the camera.
OpenCV uses the pinhole camera model to model cameras. The origin of the 3D coordinate system used in OpenCV, for camera calibration and other purposes, is the camera itself, or more specifically, the pinhole of the camera model. It is the point where all light rays that enter the camera converge to a point, and is also called the "centre of projection".
Real cameras with lenses do not actually have a pinhole. But by analysing images taken with the camera, it is possible to calculate a pinhole model which models the real camera's optics very closely. That is what OpenCV does when it calibrates your camera. As #Yves Daoust said, the pinhole of this model (and hence the 3D coordinate origin) will be a 3D point somewhere inside your camera (or possibly behind it, depending on its focal length), but it is not possible for OpenCV to say exactly where it is relative to your camera's body, because OpenCV knows nothing about the physical size or shape of your camera or its sensor.
Even if you knew exactly where the origin is relative to your camera's body, it probably would not be of much use, because you can't take any physical measurements with respect to a point that is located inside your camera without taking it apart! Really, you can do everything you need to do in OpenCV without knowing this detail.
Related
I was trying to do a lens calibration using OpenCV for a camera with variable zoom and focus (a broadcast camera). I managed to acquire decent parameters for a lens (focal length, k1, k2), however, I stopped at the nodal offset.
As I understand the nodal point of a lens is the point at which light rays converge. This is causing shift of a object from a camera in Z-coordinate. Basically, when I do cv::SolvePnP with my know parameters the distance from a object to camera is not exactly the same as it it in a world. For example, I completely zoomed in with a camera and put it in the focus. OpenCV can estimate that the pattern is roughly 3 meters away from a camera but when I measure it with laser measuring tool, it's 1.8 meters. This is not the case when you put lens wide because nodal offset is really small.
Question is, is there any method to measure nodal offset of the camera using a pattern and without measuring the distance of a pattern from a camera?
What I tried
I used the pan, tilt and roll tripod that can provide the rotation of the camera. I have put pattern in front of a camera and captured it several times with different angles. I was hoping I can see some difference in position when I transform pattern using a rotation from a tripod.
Also I noticed that the Unreal engine is estimating a nodal offset using a pattern by placing a CG object and aligning it with a video [link]. However I thought there is a different way how to achieve that without having a CG object.
I use single-camera calibration with checkerboard and I used one fix position of the camera to do the calibration. Now my question is if I use the same position but change the height of the camera then do I need to do calibration again? If no then will I get the same result by using the different height of the camera?
In my case, I changed the height of the camera but the position of the camera was the same. And I got a different result when I changed height. So I was wondering that may I need to do again calibration of the camera or not?
please help me out.
Generally speaking, and to achieve greatest accuracy, you will need to recalibrate the camera whenever it is moved. However, if the lens mount is rigid enough w.r.t the sensor, you may get away with only updating the extrinsic calibration, especially if your accuracy requirements are modest.
To see why this is the case notice that, unless you have a laboratory-grade rig holding and moving the camera, you can't just change the height only. With a standard tripod, for example, there will in general be a motion in all three axes amounting to a significant fraction of the sensor's size, which will be reflected in visible motion of several pixel with respect to the scene.
Things get worse / more complicated when you also add rotation to re-orient the field of view, since a mechanical mount will not, in general, rotate the camera around its optical center (i.e. the exit pupil of the lens), and therefore every rotation necessarily comes with an additional translation.
In your specific case, since you are only interested in measurements on a plane, and therefore can compute everything using homographies, refining the extrinsic calibration amounts to just recomputing the world-to-image scale. This can easily be achieved by taking one or more images of objects of known size on the plane - a calibration checkerboard is just such an object.
Based on the documentation of stereo-rectify from OpenCV, one can rectify an image based on two camera matrices, their distortion coefficients, and a rotation-translation from one camera to another.
I would like to rectify an image I took using my own camera to the stereo setup from the KITTI dataset. From their calibration files, I know the camera matrix and size of images before rectification of all the cameras. All their data is calibrated to their camera_0.
From this PNG, I know the position of each of their cameras relative to the front wheels of the car and relative to ground.
I can also do a monocular calibration on my camera and get a camera matrix and distortion coefficients.
I am having trouble coming up with the rotation and translation matrix/vector between the coordinate systems of the first and the second cameras, i.e. from their camera to mine or vice-versa.
I positioned my camera on top of my car at almost exactly the same height and almost exactly the same distance from the center of the front wheels, as shown in the PNG.
However now I am at a loss as to how I can create the joint rotation-translation matrix. In a normal stereo-calibrate, these are returned by the setereoCalibrate function.
I looked at some references about coordinate transformation but I don't have sufficient practice in them to figure it out on my own.
Any suggestions or references are appreciated!
I have 2 images for the same object from different views. I want to form a camera calibration, but from what I read so far I need to have a 3D world points to get the camera matrix.
I am stuck at this step, who can explain it to me
Popular camera calibration methods use 2D-3D point correspondences to determine the projective properties (intrinsic parameters) and the pose of a camera (extrinsic parameters). The most simple approach is the Direct Linear Transformation (DLT).
You might have seen, that often planar chessboards are used for camera calibrations. The 3D coordinates of it's corners can be chosen by the user itself. Many people choose the chessboard being in x-y plane [x,y,0]'. However, the 3D coordinates need to be consistent.
Coming back to your object: Span your own 3D coordinate system over the object and find at least six spots, from which you can determine easy their 3D position. Once you have that, you have to find their corresponding 2D positions (pixel) in your two images.
There are complete examples in OpenCV. Maybe you get a better picture when reading the code.
Suppose I've got two images taken by the same camera. I know the 3d position of the camera and the 3d angle of the camera when each picture was taken. I want to extract some 3d data from the images on the portion of them that overlaps. It seems that OpenCV could help me solve this problem, but I can't seem to find where my camera position and angle would be used in their method stack. Help? Is there some other C library that would be more helpful? I don't even know what keywords to search for on the web. What's the technical term for overlapping image content?
You need to learn a little more about camera geometry, and stereo rig geometry. Unless your camera was mounted on a special rig, it's rather doubtful that its pose at each image can be specified with just an angle and a point. Rather, you'd need three angles (e.g. roll, pitch, yaw). Plus, if you want your reconstruction to be metrical accurate, you need to calibrate accurately the focal length of the camera (at a minimum).