Required tolerance for camera calibration target - opencv

In reading about and experimenting with camera calibration I haven't seen any mention of the required tolerance for the placement of calibration targets. For example say I have a field of view of 200mm x 30mm and I want to be able to measure the position of objects in this field to within 1mm. I will calibrate my camera using a grid pattern and the OpenCV calibrateCamera flow. Say my calibration target is a printed chessboard grid with 5mm pitch. What is the tolerance on that 5mm spacing between corners on my target? Does a tighter tolerance result in more accurate pixel to real-world transformation? Does a tighter tolerance result in better distortion removal?
Note I'm measuring objects on a 2D plane, no depth measurement, and unfortunately I don't have the ability to move the calibration targets around and take multiple views of it. So I'm talking specifically about calibrating using a single view.

Calibration using a single view is a poor idea, generally speaking, because of the small number of independent samples it entails, so it is possible that tolerance on the calibration grid manufacture be the least of your worries. But if you must...
The controlling factor here is the sensor's dot pitch. Given the nominal focal length of your lens, and that you want your calibration RMSE to be order of a few tenths of pixel, you can work out the angle spanned by, say, 1/10 of a pixel along the sensor's horizontal axis. Back projecting that at the nominal distance between the lens's exit pupil and the target will give you a length in 3D world that measures the uncertainty in a target's corner location at the calibration optimum. Your physical target points should be known at least as accurately, and normally better.
Example:
Setup: Dot pitch 5um, 16mm focal lens, 200mm working distance to target.
Backprojected 1/10 pixel: 200/16*0.5um =~ 6um.
Backprojected 1/2 pixel : 200/16*2.5um =~ 31um.
You can loosen that if you assume perfect Chi-square scaling of the errors with the square root of the number of the data points. If you have, say, 100 corners, you can multiply that by 10, i.e. ~ 300um for 1/2 pixel
Note that with this kind of tolerances temperature control (for camera and target) may become a factor to keep into account.

Related

Will bad camera calibration affect pixel coordinates?

I am working with Turtlebots and ROS, and using a camera to find the pixel positions of a marker in the camera. I've moved over from simulations to a physical system. The issue I'm having is that the pixel positions in my physical system did not match the pixel positions in the physical system despite the marker and everything else being in the same position as in the simulations. There was a shift in the vertical pixel position by about 40 pixels when everything else like the height between the camera and marker, the marker position, and the distance between the marker and camera were the same in both the physical and simulated system. The simulated system does not need a camera calibration matrix, it is assumed to be ideal.
The resolution I'm using is 640x480, so the center pixels should be cx=320 and cy=240, but what I noticed in the camera calibration matrix I was using in the physical system was that the cx was around 318, which is pretty accurate, but the cy was around 202, which is far from what it should be. This also made me think that the shift in pixel positions in the vertical direction is shifted with about the same amount of pixels that I'm getting as an error.
So is it right to assume that the error in the center pixel in the calibration could be causing the error in the pixel positions?
I have been trying to calibrate a USB camera (Logitech C920 I think) and I've been using the camera_calibrator ROS package found here http://wiki.ros.org/camera_calibration to calibrate the camera. I think the camera calibration did not go that well, seeing as I always have a pretty big error in either cx or cy. Here are the calibration matrices.
First calibration matrix, used 15x10 vertices with size 0.25
Recalibrated but did not actually use this yet, calibrated with 8x6 size 0.25
Same as previous, some difference between the two
The checkerboards were on A4 papers.
Thanks in advance.
I believe the answer to your question is to answer how to perform a better camera calibration.
Quoting from Calib.io enter link description here:
Choose the right size calibration target.
Perform calibration at the approximate working distance (WD) of your final application.
The target should have a high feature count.
Collect images from different areas and tilts.
Use good lighting.
Calibration is only as accurate as the calibration target used. Use laser or inkjet printed targets only to validate and test.
Per sample, proper mounting of calibration target and camera.
Remove bad observations. Carefully inspect reprojection errors.
Obtaining a low re-projection error does not equal a good camera calibration. Be careful of over fitting.

Is there a way to find mm per pixel value for a camera?

I need to implement dimension inspection of an object with a tolerance of 20 microns using image processing. To measure the dimension in mm, i need the mm per pixel value for pixel to mm conversion.
Camera and lens Specifications:
5 MP Matrix vision camera (2592 x 1944)
25 mm lens
How i tried to do it:
I used a 30 cm ruler to get the actual field of view in mm covered by the camera.I got a plot of the image using Matplotlib function in OpenCV as shown in the fig.
Image for scaling
From the image i got 31 mm as the actual width covered by the camera and the camera resolution is 2592 x 1944. So i obtained mm/pixel = 31/2952 = 0.011959876.
But i want to know if it is the correct way to find the mm/pixel value using a centimeter scale specially when tolerance of 20 micron is needed in dimension inspection. If this is not the correct way, then a solution procedure for finding mm/pixel value would be really helpful.
I believe what you are doing really borderline. First of all, to be as precise as possible I would use the right (or left) edge of the most left and most right ruler ticks like I sketched here:
and then use this value in pixel to calculate the mm/pixel calibration value. Even using this method 20 mu is really tough to achieve. Let's say we can determine the ruler tick edge position with a precision of 2 pixels (very optimistic) then you would have an error of about 31mm/2580 * 2, which is about 25 mu.
If you really need the 20mu calibration precision I would go for a microscope calibration target. I've been always used one of those for this kind of calibration task.
20 microns over a field of view of 31 mm = 31000 µm corresponds to 1.7 pixel, so your measurement error must be smaller than that. This is a stringent requirement. Your ruler and manual operation are not appropriate.
In the first place, you should check the magnitude of the lens distortion, which could very well exceed these 1.7 pixels. You will need a precise calibration procedure that can fit a deformation model to the image. For this purpose you should use a certified calibration target such as grid of dots or a chessboard pattern.
At the same time as the calibration software measures and compensates the distortion, it will provide the scale factor between physical units (knowing the grid spacing) and pixels. You can measure feature location on the target by blob analysis or gauging techniques, then use least-squares fitting of a model.
Software packages made for machine vision applications do contain such tools.
Also be aware that there can be a bias in the dimensional measurement of the object due to mis-location of the edges. Simply moving the light source can result in variations of the measured size.
If your objects are always the same and at the same place in the field of view, a cheap solution is to establish a repeatable measurement procedure in pixels, and physically measure one of the parts. This will give you a scale factor valid in the same conditions.
But simply moving the object will have a noticeable effect, both by changing the light reflection/shadows on edges and by having a different distortion.

Measure distance to object with a single camera in a static scene

let's say I am placing a small object on a flat floor inside a room.
First step: Take a picture of the room floor from a known, static position in the world coordinate system.
Second step: Detect the bottom edge of the object in the image and map the pixel coordinate to the object position in the world coordinate system.
Third step: By using a measuring tape measure the real distance to the object.
I could move the small object, repeat this three steps for every pixel coordinate and create a lookup table (key: pixel coordinate; value: distance). This procedure is accurate enough for my use case. I know that it is problematic if there are multiple objects (an object could cover an other object).
My question: Is there an easier way to create this lookup table? Accidentally changing the camera angle by a few degrees destroys the hard work. ;)
Maybe it is possible to execute the three steps for a few specific pixel coordinates or positions in the world coordinate system and perform some "calibration" to calculate the distances with the computed parameters?
If the floor is flat, its equation is that of a plane, let
a.x + b.y + c.z = 1
in the camera coordinates (the origin is the optical center of the camera, XY forms the focal plane and Z the viewing direction).
Then a ray from the camera center to a point on the image at pixel coordinates (u, v) is given by
(u, v, f).t
where f is the focal length.
The ray hits the plane when
(a.u + b.v + c.f) t = 1,
i.e. at the point
(u, v, f) / (a.u + b.v + c.f)
Finally, the distance from the camera to the point is
p = √(u² + v² + f²) / (a.u + b.v + c.f)
This is the function that you need to tabulate. Assuming that f is known, you can determine the unknown coefficients a, b, c by taking three non-aligned points, measuring the image coordinates (u, v) and the distances, and solving a 3x3 system of linear equations.
From the last equation, you can then estimate the distance for any point of the image.
The focal distance can be measured (in pixels) by looking at a target of known size, at a known distance. By proportionality, the ratio of the distance over the size is f over the length in the image.
Most vision libraries (including opencv) have built in functions that will take a couple points from a camera reference frame and the related points from a Cartesian plane and generate your warp matrix (affine transformation) for you. (some are fancy enough to include non-linearity mappings with enough input points, but that brings you back to your time to calibrate issue)
A final note: most vision libraries use some type of grid to calibrate off of ie a checkerboard patter. If you wrote your calibration to work off of such a sheet, then you would only need to measure distances to 1 target object as the transformations would be calculated by the sheet and the target would just provide the world offsets.
I believe what you are after is called a Projective Transformation. The link below should guide you through exactly what you need.
Demonstration of calculating a projective transformation with proper math typesetting on the Math SE.
Although you can solve this by hand and write that into your code... I strongly recommend using a matrix math library or even writing your own matrix math functions prior to resorting to hand calculating the equations as you will have to solve them symbolically to turn it into code and that will be very expansive and prone to miscalculation.
Here are just a few tips that may help you with clarification (applying it to your problem):
-Your A matrix (source) is built from the 4 xy points in your camera image (pixel locations).
-Your B matrix (destination) is built from your measurements in in the real world.
-For fast recalibration, I suggest marking points on the ground to be able to quickly place the cube at the 4 locations (and subsequently get the altered pixel locations in the camera) without having to remeasure.
-You will only have to do steps 1-5 (once) during calibration, after that whenever you want to know the position of something just get the coordinates in your image and run them through step 6 and step 7.
-You will want your calibration points to be as far away from eachother as possible (within reason, as at extreme distances in a vanishing point situation, you start rapidly losing pixel density and therefore source image accuracy). Make sure that no 3 points are colinear (simply put, make your 4 points approximately square at almost the full span of your camera fov in the real world)
ps I apologize for not writing this out here, but they have fancy math editing and it looks way cleaner!
Final steps to applying this method to this situation:
In order to perform this calibration, you will have to set a global home position (likely easiest to do this arbitrarily on the floor and measure your camera position relative to that point). From this position, you will need to measure your object's distance from this position in both x and y coordinates on the floor. Although a more tightly packed calibration set will give you more error, the easiest solution for this may simply be to have a dimension-ed sheet(I am thinking piece of printer paper or a large board or something). The reason that this will be easier is that it will have built in axes (ie the two sides will be orthogonal and you will just use the four corners of the object and used canned distances in your calibration). EX: for a piece of paper your points would be (0,0), (0,8.5), (11,8.5), (11,0)
So using those points and the pixels you get will create your transform matrix, but that still just gives you a global x,y position on axes that may be hard to measure on (they may be skew depending on how you measured/ calibrated). So you will need to calculate your camera offset:
object in real world coords (from steps above): x1, y1
camera coords (Xc, Yc)
dist = sqrt( pow(x1-Xc,2) + pow(y1-Yc,2) )
If it is too cumbersome to try to measure the position of the camera from global origin by hand, you can instead measure the distance to 2 different points and feed those values into the above equation to calculate your camera offset, which you will then store and use anytime you want to get final distance.
As already mentioned in the previous answers you'll need a projective transformation or simply a homography. However, I'll consider it from a more practical view and will try to summarize it short and simple.
So, given the proper homography you can warp your picture of a plane such that it looks like you took it from above (like here). Even simpler you can transform a pixel coordinate of your image to world coordinates of the plane (the same is done during the warping for each pixel).
A homography is basically a 3x3 matrix and you transform a coordinate by multiplying it with the matrix. You may now think, wait 3x3 matrix and 2D coordinates: You'll need to use homogeneous coordinates.
However, most frameworks and libraries will do this handling for you. What you need to do is finding (at least) four points (x/y-coordinates) on your world plane/floor (preferably the corners of a rectangle, aligned with your desired world coordinate system), take a picture of them, measure the pixel coordinates and pass both to the "find-homography-function" of your desired computer vision or math library.
In OpenCV that would be findHomography, here an example (the method perspectiveTransform then performs the actual transformation).
In Matlab you can use something from here. Make sure you are using a projective transformation as transform type. The result is a projective tform, which can be used in combination with this method, in order to transform your points from one coordinate system to another.
In order to transform into the other direction you just have to invert your homography and use the result instead.

Camera projection for lines orthogonal to camera z-axis

I'm working on an object tracking application using openCV. I want to convert my pixel coordinates to world coordinates to get more meaningful information. I have read a lot about computing the perspective transform matrix, and I know about cv2.solvePnP. But I feel like my case should be special, because I'm tracking a runner on a track and field runway with the runway orthogonal to the camera's z-axis. I will set up the camera to ensure this.
If I just pick two points on the runway edge, I can calculate a linear conversion from pixels to world coords at that specific height (ground level) and distance from the camera (i.e. along that line). Then I reason that the runner will run on a line parallel to the runway at a different height and slightly different distance from the camera, but the lines should still be parallel in the image, because they will both be orthogonal to the camera z-axis. With all those constraints, I feel like I shouldn't need the normal number of points to track the runner on that particular axis. My gut says that 2-3 should be enough. Can anyone help me nail down the method here? Am I completely off track? With both height and distance from camera essentially fixed, shouldn't I be able to work with a much smaller set of correspondences?
Thanks, Bill
So, I think I've answered this one myself. It's true that only two correspondence points are needed given the following assumptions.
Assume:
World coordinates are set up with X-axis and Y-axis parallel to the ground plane. X-axis is parallel to the runway.
Camera is translated and possibly rotated about X-axis (angled downward), but no rotation around Y-axis(camera plane parallel to runway and x-axis) or Z-axis (camera is level with respect to ground).
Camera intrinsic parameters are known from camera calibration.
Method:
Pick two points in the ground plane with known coordinates in world and image. For example, two points on the runway edge as mentioned in original post. The line connecting the poitns in world coordinates should not be parallel with either X or Z axis.
Since Y=0 for these points, ignore the second column of the rotation/translation matrix, reducing the projection to a planar homography transform (3x3 matrix). Now we have 9 degrees of freedom.
The rotation assumptions will enforce a certain form on the rotation/translation matrix. Namely, the first column and first row will be the identity (1,0,0). This further reduces the number of degrees of freedom in the matrix to 5.
Constrain the values of the second column of the matrix such that cos^2(theta)+sin^2(theta) = 1. This reduces the number of unknowns to only 4. Two correspondence points will give us the 4 equations we need to calculate the homography matrix for the ground plane.
Factor out the camera intrinsic parameter matrix from the homography matrix, leaving the rotation/translation matrix for the ground plane.
Due to the rotation assumptions made earlier, the ignored column of the rotation/translation matrix can be easily constructed from the third column of the same matrix, which is the second column in the ground plane homography matrix.
Multiply back out with the camera intrinsic parameters to arrive at the final universal projection matrix (from only 2 correspondence points!)
My test implentation has worked quite well. Of course, it's sensitive to the accuracy of the two correspondence points provided, but that's kind of a given.

soft binning in SIFT

According to "Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of
computer vision 60.2 (2004): 91-110 "
"It is important to avoid all boundary affects in which the descriptor
abruptly changes as a sample shifts smoothly from being within one
histogram to another or from one orientation to another. Therefore,
trilinear interpolation is used to distribute the value of each
gradient sample into adjacent histogram bins. In other words, each
entry into a bin is multiplied by a weight of 1−d for each dimension,
where d is the distance of the sample from the central value of the
bin as measured in units of the histogram bin spacing."
I am calculating the orientation[t] and location of gradient(x,y) which will be in floating point. Currently, I was just
providing the gradient magnitude to 3d histogram values[t][x][y] ( means the lower bound of floating point values of t,x
and y). But, according to paper, I have to distribute the gradient magnitude to adjacent bins. I am not sure about how
to distribute it.
I got my answer on following link:
HOG Trilinear Interpolation of Histogram Bins

Resources