Camera rotation and translation from tracking? - opencv

What is the best way to find camera rotation and translation from tracking the scene without calibrating the camera?

I'm not sure how you will skip calibration because you get the translation and rotation from the calibration process. Maybe instead of using two cameras you'll compare frame 0 to frame 1000 from the same (moving) camera, but the calibration process is the same.
Check out Chapter 11 in the Learning OpenCV book. If you want to skip lens correction, check the section heading "Computing extrinsics only." The function is cvFindExtrinsicCameraParams2.

Related

Estimation of nodal offset with pattern and OpenCV

I was trying to do a lens calibration using OpenCV for a camera with variable zoom and focus (a broadcast camera). I managed to acquire decent parameters for a lens (focal length, k1, k2), however, I stopped at the nodal offset.
As I understand the nodal point of a lens is the point at which light rays converge. This is causing shift of a object from a camera in Z-coordinate. Basically, when I do cv::SolvePnP with my know parameters the distance from a object to camera is not exactly the same as it it in a world. For example, I completely zoomed in with a camera and put it in the focus. OpenCV can estimate that the pattern is roughly 3 meters away from a camera but when I measure it with laser measuring tool, it's 1.8 meters. This is not the case when you put lens wide because nodal offset is really small.
Question is, is there any method to measure nodal offset of the camera using a pattern and without measuring the distance of a pattern from a camera?
What I tried
I used the pan, tilt and roll tripod that can provide the rotation of the camera. I have put pattern in front of a camera and captured it several times with different angles. I was hoping I can see some difference in position when I transform pattern using a rotation from a tripod.
Also I noticed that the Unreal engine is estimating a nodal offset using a pattern by placing a CG object and aligning it with a video [link]. However I thought there is a different way how to achieve that without having a CG object.

Stereo calibration, do extrinsics change if the lens changes?

I have a stereo camera setup. Typically I would calibrate the intrinsics of each camera, and then using this result calibrate the extrinsics, so the baseline between the cameras.
What happens now if I change for example the focus or zoom on the lenses? Of course I will have to re-calibrate the intrinsics, but what about the extrinsics?
My first thought would be no, the actual body of the camera didn't move. But on my second thought, doesn't the focal point within the camera change with the changed focus? And isn't the extrinsic calibration actually the calibration between the two focal points of the cameras?
In short: should I re-calibrate the extrinsics of my setup after changing the intrinsics?
Thanks for any help!
Yes you should.
It's about the optical center of each camera. Different lenses put that in different places (but hopefully along the optical axis).

Difference between stereo camera calibration vs two single camera calibrations using OpenCV

I have a vehicle with two cameras, left and right. Is there a difference between me calibrating each camera separately vs me performing "stereo calibration" ? I am asking because I noticed in the OpenCV documentation that there is a stereoCalibrate function, and also a stereo calibration tool for MATLAB. If I do separate camera calibration on each and then perform a depth calculation using the undistorted images of each camera, will the results be the same ?
I am not sure what the difference is between the two methods. I performed normal camera calibration for each camera separately.
For intrinsics, it doesn't matter. The added information ("pair of cameras") might make the calibration a little better though.
Stereo calibration gives you the extrinsics, i.e. transformation matrices between cameras. That's for... stereo vision. If you don't perform stereo calibration, you would lack the extrinsics, and then you can't do any depth estimation at all, because that requires the extrinsics.
TL;DR
You need stereo calibration if you want 3D points.
Long answer
There is a huge difference between single and stereo camera calibration.
The output of single camera calibration are intrinsic parameters only (i.e. the 3x3 camera matrix and a number of distortion coefficients, depending on the model used). In OpenCV this is accomplished by cv2.calibrateCamera. You may check my custom library that helps reducing the boilerplate.
When you do stereo calibration, its output is given by the intrinsics of both cameras and the extrinsic parameters.
In OpenCV this is done with cv2.stereoCalibrate. OpenCV fixes the world origin in the first camera and then you get a rotation matrix R and translation vector t to go from the first camera (origin) to the second one.
So, why do we need extrinsics? If you are using a stereo system for 3D scanning then you need those (and the intrinsics) to do triangulation, so to obtain 3D points in the space: if you know the projection of a general point p in the space on both cameras, then you can calculate its position.
To add something to what #Christoph correctly answered before, the intrinsics should be almost the same, however, cv2.stereoCalibrate may improve the calculation of the intrinsics if the flag CALIB_FIX_INTRINSIC is not set. This happens because the system composed by two cameras and the calibration board is solved as a whole by numerical optimization.

Creating a Trajectory using a 360 camera video without use of GPS, IMU, sensor, ROS or LIDAR

Creating a Trajectory using a 360 camera video without use of GPS, IMU, sensor, ROS or LIDAR
Input is a video, created using a 360 camera(Samsung Gear 360). I need to plot a trajectory (without the use of ground truth poses) as I move around in an indoor location(that is I need to know the camera locations and plot accordingly).
Firstly, camera calibration was done by capturing 21 pics of the chessboard, and by using OpenCV methods, camera matrix(3x3 matrix which includes fx,fy,cx,cy, and skew factor) was achieved which was then given input to a text file.
Have tried: Feature detection(ORB, SIFT, AKAZE..) and tracking (Flann and Brute Force) methods. It works well for a single space but fails if a video is of a multi-storey building. Tested on this multi-storey building:https://youtu.be/6DPFcKoHiak and results obtained were:
An example of camera motion estimation that is required: https://arxiv.org/pdf/2003.08056.pdf
Any help regarding on how to plot camera poses with the use of VSLAM Visual odometry or any other.

How frequent do you need to do camera calibration for ArUco?

How important it is to do camera calibration for ArUco? What if I dont calibrate the camera? What if I use calibration data from other camera? Do you need to recalibrate if camera focuses change? What is the practical way of doing calibration for consumer application?
Before answering your questions let me introduce some generic concepts related with camera calibration. A camera is a sensor that captures the 3D world and project it in a 2D image. This is a transformation from 3D to 2D performed by the camera. Following OpenCV doc is a good reference to understand how this process works and the camera parameters involved in the same. You can find detailed AruCo documentation in the following document.
In general, the camera model used by the main libraries is the pinhole model. In the simplified form of this model (without considering radial distortions) the camera transformation is represented using the following equation (from OpenCV docs):
The following image (from OpenCV doc) illustrates the whole projection process:
In summary:
P_im = K・R・T ・P_world
Where:
P_im: 2D points porojected in the image
P_world: 3D point from the world
K is the camera intrinsics matrix (this depends on the camera lenses parameters. Every time you change the camera focus for exapmle the focal distances fx and fy values whitin this matrix change)
R and T are the extrensics of the camera. They represent the rotation and translation matrices for the camera respecively. These are basically the matrices that represent the camera position/orientation in the 3D world.
Now, let's go through your questions one by one:
How important it is to do camera calibration for ArUco?
Camera calibration is important in ArUco (or any other AR library) because you need to know how the camera maps the 3D to 2D world so you can project your augmented objects on the physical world.
What if I dont calibrate the camera?
Camera calibration is the process of obtaining camera parameters: intrinsic and extrinsic parameters. First one are in general fixed and depend on the camera physical parameters unless you change some parameter as the focus for example. In such case you have to re-calculate them. Otherwise, if you are working with camera that has a fixed focal distance then you just have to calculate them once.
Second ones depend on the camera location/orientation in the world. Each time you move the camera the RT matrices change and you have to recalculate them. Here when libraries such as ArUco come handy because using markers you can obtain these values automatically.
In few words, If you don't calculate the camera you won't be able to project objects on the physical world on the exact location (which is essential for AR).
What if I use calibration data from other camera?
It won't work, this is similar as using an uncalibrated camera.
Do you need to recalibrate if camera focuses change?
Yes, you have to recalculate the intrinsic parameters because the focal distance changes in this case.
What is the practical way of doing calibration for consumer application?
It depends on your application, but in general you have to provide some method for manual re-calibration. There're also method for automatic calibration using some 3D pattern.

Resources