can RGB-D camera work on micro scale object? - image-processing

Excuse me everyone, I have a question related to RGB-D camera. I want to know whether RGB-D camera can estimate the depth of small object (0.5mm - 5mm depth)


How frequent do you need to do camera calibration for ArUco?

How important it is to do camera calibration for ArUco? What if I dont calibrate the camera? What if I use calibration data from other camera? Do you need to recalibrate if camera focuses change? What is the practical way of doing calibration for consumer application?
Before answering your questions let me introduce some generic concepts related with camera calibration. A camera is a sensor that captures the 3D world and project it in a 2D image. This is a transformation from 3D to 2D performed by the camera. Following OpenCV doc is a good reference to understand how this process works and the camera parameters involved in the same. You can find detailed AruCo documentation in the following document.
In general, the camera model used by the main libraries is the pinhole model. In the simplified form of this model (without considering radial distortions) the camera transformation is represented using the following equation (from OpenCV docs):
The following image (from OpenCV doc) illustrates the whole projection process:
In summary:
P_im = K・R・T ・P_world
P_im: 2D points porojected in the image
P_world: 3D point from the world
K is the camera intrinsics matrix (this depends on the camera lenses parameters. Every time you change the camera focus for exapmle the focal distances fx and fy values whitin this matrix change)
R and T are the extrensics of the camera. They represent the rotation and translation matrices for the camera respecively. These are basically the matrices that represent the camera position/orientation in the 3D world.
Now, let's go through your questions one by one:
How important it is to do camera calibration for ArUco?
Camera calibration is important in ArUco (or any other AR library) because you need to know how the camera maps the 3D to 2D world so you can project your augmented objects on the physical world.
What if I dont calibrate the camera?
Camera calibration is the process of obtaining camera parameters: intrinsic and extrinsic parameters. First one are in general fixed and depend on the camera physical parameters unless you change some parameter as the focus for example. In such case you have to re-calculate them. Otherwise, if you are working with camera that has a fixed focal distance then you just have to calculate them once.
Second ones depend on the camera location/orientation in the world. Each time you move the camera the RT matrices change and you have to recalculate them. Here when libraries such as ArUco come handy because using markers you can obtain these values automatically.
In few words, If you don't calculate the camera you won't be able to project objects on the physical world on the exact location (which is essential for AR).
What if I use calibration data from other camera?
It won't work, this is similar as using an uncalibrated camera.
Do you need to recalibrate if camera focuses change?
Yes, you have to recalculate the intrinsic parameters because the focal distance changes in this case.
What is the practical way of doing calibration for consumer application?
It depends on your application, but in general you have to provide some method for manual re-calibration. There're also method for automatic calibration using some 3D pattern.

3D image reconstruction from 3 Fixed Camera?

I see some 3D Facial devices that using 3 Camera and find the 3D picture of face.
IS there any specific angle these camera should be fixed for this
Is there any SDK, or tools in this domain that could simplify producing 3D image
from these fixed camera?
The less angle you have, the less information about depth you will get from the cameras. So an angle is important, but i cannot say it will need x° degrees.

How to convert one image taken from a camera position to the image from another virtual camera position

Is it possible to transform one 2D image from one camera (camera1) to the image from another camera (camera2, which is a virtual camera)'s view point under the condition that I know both camera's poses? I looked up some techniques including homography transformation, but it looks not help.
Here is the information I have and I don't have.
- Known: Camera1 pose, camera2 pose (= transformation matrix between two cameras), camera parameters for both cameras
- Unkown: Object pose
If the object 3D pose in the original image is known, the conversion is easy. However, you can't suppose to get the 3D pose (depth) information in my setting.
I believe there is a way because it's already used in the car navigation (, but I'm curious the general way to realize this transformation. If this is theoretically impossible problem, please let me know too.

Human height estimation using one mono calibrated camera

I am working on an algorithm to estimate the height of detected people in a video, and I'm stuck.
The part that I have working is the detection of people using the HoG algorithm, so I have a bounding box for every person in the frame. And I have calibrated the camera, so I have my intrinsic and extrinsic camera parameters.
The problem is that now I have a formula for the perspective projection with 2 unknowns: height of the object and the distance from the object to the camera. I am using one mono web camera to detect people so I have no information about the distance from the object to the camera. And the height is what I'm trying to estimate, so I don't have that as well.
I know this problem is solvable if I use a kinect or a stereo camera in order to get the distance, but I'm limited to only one mono web camera.
Does anyone have an idea on how to approach this problem? I have read about using reference objects but I can't figure out how to use them to help my problem.

two images with camera position and angle to 3d data?

Suppose I've got two images taken by the same camera. I know the 3d position of the camera and the 3d angle of the camera when each picture was taken. I want to extract some 3d data from the images on the portion of them that overlaps. It seems that OpenCV could help me solve this problem, but I can't seem to find where my camera position and angle would be used in their method stack. Help? Is there some other C library that would be more helpful? I don't even know what keywords to search for on the web. What's the technical term for overlapping image content?
You need to learn a little more about camera geometry, and stereo rig geometry. Unless your camera was mounted on a special rig, it's rather doubtful that its pose at each image can be specified with just an angle and a point. Rather, you'd need three angles (e.g. roll, pitch, yaw). Plus, if you want your reconstruction to be metrical accurate, you need to calibrate accurately the focal length of the camera (at a minimum).
