Aruco scales coordinates wrong - opencv

I am using the (newly released) ArUco 2.0.7 to track some markers.
The camera that I am using is mounted to the ceiling facing down, so I only need the x and y coordinates.
It can view an area of 2.6m by 1.5m. If I understand the documentation correctly, I supply the sidelength of the markers I'm using in an arbitrary unit, the output of the pose will be in the same unit.
So the markers have a sidelength of 19.5cm. As I want my result in meters, I have that value set to 0.195.
However, the results I obtain are not correct. If I place the markers right in the corners of the field of view of the camera, they are not at the corresponding expected x and y coordinates.
I am placing the global origin on one of the corners of the field of view, e.g. (0,0) is the bottom left corner. This is done by transforming all incoming positions into that markers coordinate system using the matrix transforms obtained by getRTMatrix().
Everything seems to be working, except the x and y coordinates are in a wrong unit or scaled. The rotation works perfectly.
Am I missing something? Or can I not expect a good accuracy? The error is significant, e.g. when it should be (2.6,1.5), it is displayed as (1.8, 1), which is roughly an error of 33%.

After some more thought I figured out that simply my camera was calibrated using a smaller distance from the calibration board to the lens than what I need for my use case.
This caused the distortion coefficients the be wrong, thus giving me a bogus scale.
I re-calibrated using the aruco_calibration tool and am now accurate to roughly 3 or 4 cm, which is good enough for me.

Related

OpenCV - Determine detected objects angle from "center" of frame

I am working on a machine vision project and need to determine the angle of an object in x and y relative to the center of the frame (center in my mind being where the camera is pointed). I originally did NOT do a camera calibration (calculated angle per pixel by taking a picture of a dense grid and doing some simple math). While doing some object tracking I was noticing some strange behaviour which I suspected was due to some distortion. I also noticed that an object that should be dead center of my frame was not, the camera had to be shifted or the angle changed for that to be true.
I performed a calibration in OpenCV and got a principal point of (363.31, 247.61) with a resolution of 640x480. The angle per pixel obtained by cv2.calibrationMatrixVales() was very close to what I had calculated, but up to this point I was assuming center of the frame was based on 640/2, 480/2. I'm hoping that someone can confirm, but going forward do I assume that my (0,0) in cartesian coordinates is now at the principal point? Perhaps I can use my new camera matrix to correct the image so my original assumption is true? Or I am out to lunch and need some direction on how to achieve this.
Also was my assumption of 640/2 correct or should it technically have been (640-1)/2. Thanks all!

Overhead camera's pose estimation with OpenCV SolvePnP results in a few centimeter off height

I'd like to get the pose (translation: x, y, z and rotation: Rx, Ry, Rz in World coordinate system) of the overhead camera. I got many object points and image points by moving the ChArUco calibration board with a robotic arm (like this https://www.youtube.com/watch?v=8q99dUPYCPs). Because of that, I already have exact positions of all the object points.
In order to feed many points to solvePnP, I set the first detected pattern (ChArUco board) as the first object and used it as the object coordinate system's origin. Then, I added the detected object points (from the second pattern to the last) to the first detected object points' coordinate system (the origin of the object frame is the origin of the first object).
After I got the transformation between the camera and the object's coordinate frame, I calculated the camera's pose based on that transformation.
The result looked pretty good at first, but when I measured the camera's absolute pose by using a ruler or a tape measure, I noticed that the extrinsic calibration result was around 15-20 millimeter off for z direction (the height of the camera), though almost correct for the others (x, y, Rx, Ry, Rz). The result was same even I changed the range of the object points by moving a robotic arm differently, it always ended up to have a few centimeters off for the height.
Has anyone experienced the similar problem before? I'd like to know anything I can try. What is the common mistake when the depth direction (z) is inaccurate?
I don't know how you measure the z but I believe that what you're measuring with the ruler is not z but the euclidean distance which is computed like so:
d=std::sqrt(x*x+y*y+z*z);
Let's take an example, if x=2; y=2; z=2;
then d will be d~3,5 so 3.5-2=1.5 is the difference you get between z and the ruler when you said around 15-20 millimeter off for z direction.

Measure distance to object with a single camera in a static scene

let's say I am placing a small object on a flat floor inside a room.
First step: Take a picture of the room floor from a known, static position in the world coordinate system.
Second step: Detect the bottom edge of the object in the image and map the pixel coordinate to the object position in the world coordinate system.
Third step: By using a measuring tape measure the real distance to the object.
I could move the small object, repeat this three steps for every pixel coordinate and create a lookup table (key: pixel coordinate; value: distance). This procedure is accurate enough for my use case. I know that it is problematic if there are multiple objects (an object could cover an other object).
My question: Is there an easier way to create this lookup table? Accidentally changing the camera angle by a few degrees destroys the hard work. ;)
Maybe it is possible to execute the three steps for a few specific pixel coordinates or positions in the world coordinate system and perform some "calibration" to calculate the distances with the computed parameters?
If the floor is flat, its equation is that of a plane, let
a.x + b.y + c.z = 1
in the camera coordinates (the origin is the optical center of the camera, XY forms the focal plane and Z the viewing direction).
Then a ray from the camera center to a point on the image at pixel coordinates (u, v) is given by
(u, v, f).t
where f is the focal length.
The ray hits the plane when
(a.u + b.v + c.f) t = 1,
i.e. at the point
(u, v, f) / (a.u + b.v + c.f)
Finally, the distance from the camera to the point is
p = √(u² + v² + f²) / (a.u + b.v + c.f)
This is the function that you need to tabulate. Assuming that f is known, you can determine the unknown coefficients a, b, c by taking three non-aligned points, measuring the image coordinates (u, v) and the distances, and solving a 3x3 system of linear equations.
From the last equation, you can then estimate the distance for any point of the image.
The focal distance can be measured (in pixels) by looking at a target of known size, at a known distance. By proportionality, the ratio of the distance over the size is f over the length in the image.
Most vision libraries (including opencv) have built in functions that will take a couple points from a camera reference frame and the related points from a Cartesian plane and generate your warp matrix (affine transformation) for you. (some are fancy enough to include non-linearity mappings with enough input points, but that brings you back to your time to calibrate issue)
A final note: most vision libraries use some type of grid to calibrate off of ie a checkerboard patter. If you wrote your calibration to work off of such a sheet, then you would only need to measure distances to 1 target object as the transformations would be calculated by the sheet and the target would just provide the world offsets.
I believe what you are after is called a Projective Transformation. The link below should guide you through exactly what you need.
Demonstration of calculating a projective transformation with proper math typesetting on the Math SE.
Although you can solve this by hand and write that into your code... I strongly recommend using a matrix math library or even writing your own matrix math functions prior to resorting to hand calculating the equations as you will have to solve them symbolically to turn it into code and that will be very expansive and prone to miscalculation.
Here are just a few tips that may help you with clarification (applying it to your problem):
-Your A matrix (source) is built from the 4 xy points in your camera image (pixel locations).
-Your B matrix (destination) is built from your measurements in in the real world.
-For fast recalibration, I suggest marking points on the ground to be able to quickly place the cube at the 4 locations (and subsequently get the altered pixel locations in the camera) without having to remeasure.
-You will only have to do steps 1-5 (once) during calibration, after that whenever you want to know the position of something just get the coordinates in your image and run them through step 6 and step 7.
-You will want your calibration points to be as far away from eachother as possible (within reason, as at extreme distances in a vanishing point situation, you start rapidly losing pixel density and therefore source image accuracy). Make sure that no 3 points are colinear (simply put, make your 4 points approximately square at almost the full span of your camera fov in the real world)
ps I apologize for not writing this out here, but they have fancy math editing and it looks way cleaner!
Final steps to applying this method to this situation:
In order to perform this calibration, you will have to set a global home position (likely easiest to do this arbitrarily on the floor and measure your camera position relative to that point). From this position, you will need to measure your object's distance from this position in both x and y coordinates on the floor. Although a more tightly packed calibration set will give you more error, the easiest solution for this may simply be to have a dimension-ed sheet(I am thinking piece of printer paper or a large board or something). The reason that this will be easier is that it will have built in axes (ie the two sides will be orthogonal and you will just use the four corners of the object and used canned distances in your calibration). EX: for a piece of paper your points would be (0,0), (0,8.5), (11,8.5), (11,0)
So using those points and the pixels you get will create your transform matrix, but that still just gives you a global x,y position on axes that may be hard to measure on (they may be skew depending on how you measured/ calibrated). So you will need to calculate your camera offset:
object in real world coords (from steps above): x1, y1
camera coords (Xc, Yc)
dist = sqrt( pow(x1-Xc,2) + pow(y1-Yc,2) )
If it is too cumbersome to try to measure the position of the camera from global origin by hand, you can instead measure the distance to 2 different points and feed those values into the above equation to calculate your camera offset, which you will then store and use anytime you want to get final distance.
As already mentioned in the previous answers you'll need a projective transformation or simply a homography. However, I'll consider it from a more practical view and will try to summarize it short and simple.
So, given the proper homography you can warp your picture of a plane such that it looks like you took it from above (like here). Even simpler you can transform a pixel coordinate of your image to world coordinates of the plane (the same is done during the warping for each pixel).
A homography is basically a 3x3 matrix and you transform a coordinate by multiplying it with the matrix. You may now think, wait 3x3 matrix and 2D coordinates: You'll need to use homogeneous coordinates.
However, most frameworks and libraries will do this handling for you. What you need to do is finding (at least) four points (x/y-coordinates) on your world plane/floor (preferably the corners of a rectangle, aligned with your desired world coordinate system), take a picture of them, measure the pixel coordinates and pass both to the "find-homography-function" of your desired computer vision or math library.
In OpenCV that would be findHomography, here an example (the method perspectiveTransform then performs the actual transformation).
In Matlab you can use something from here. Make sure you are using a projective transformation as transform type. The result is a projective tform, which can be used in combination with this method, in order to transform your points from one coordinate system to another.
In order to transform into the other direction you just have to invert your homography and use the result instead.

Find Distance between barcode and camera?

Is it possible to find the distance between the detected qr bar code (square) and the camera, if the size of the actual bar code and the (x,y) of all the corners of the bar code detected by the camera are known?
I want the method to work even if the camera is at an angle from the barcode.
I tried using a simple equation like f=d*z/D , where f is the local length of the camera, D is the size of the object, d is the width of the detected object in pixels, and z is the distance between the camera and the barcode. First, I calculate the focal length by using data from a known distance and then get the z values accordingly.
While the above method works pretty well but it has a lot of error if the camera is at an angle.
Is there is a better method to do this ?
Also, I can use only one camera, using two cameras is not an option.
Use your current formula (which you state works well) against the longest side and its opposite, then average the results.
Alternatively, just average the lengths of the longest side and its opposite. The relationships are all linear so you should end up with the same answer.
First you have to know the camera angle.
If you can not read that parameter from a device you could estimate that parameter by using other measures.
For example you know that a bar-code is rectangular. So by detecting it you could obtain four angles and from that estimate a homografy matrix. By knowing the homography matrix you could simplify your problem by just multiplying the coordinates with a homography inverse.
Homography matrix is wiedly used in camera calibration when a known pattern is presented such as chessboard.

Marker Tracking + perspective warp of marker

I'm tracking a marker with ARToolKit+. I receive a model view matrix that looks about right. Now I'd like to warp the image in a way that the marker looks just like it would look if I looked straight at it. But whatever I do, the result looks just extremely distorted. I know that ARToolKit stores the 4x4 matrix in column major order, so I fixed that for OpenCV.
What I tried so far was:
1) fix the order to row major order
2) calculate the inverse with cvInverse (although transposing the 3x3 rotation part + inverting the translation should suffice)
3) use that matrix with cvPerspectiveWarp
Am I doing something wrong?
tl;dr:
I want this: https://www.youtube.com/watch?v=qZ-LU-C2p2Q
I get some distorted lines and lots of black instead.
Your problem is in converting from 4x4 to 3x3. The short answer is that you want to drop the 3rd column and bottom row to make the 3x3 and then premultiply with your camera matrix. For a longer explanation see here
Clarification
The pose you get from ARTK represents a transform from one place to another. When I say "the initial image appears without rotation" I meant that your transform goes from an initial state which has no rotation about the x or y axis to the current state. That is a fine assumption for most augmented reality applications, I mentioned it just to be thorough.
As for why you can drop the 3rd column. Since you are transforming a plane, your z coordinate can be completely expressed by your x and y coordinates given the equation of your plane. If we assume that initially there is no rotation then your initial z coordinate is a constant value. If there is rotation then z is not constant but it varies deterministically in x and y according to its plane equation which can still be expressed in one matrix (though you don't need that). Since in your case your 4x4 transform is probably expressing the transform from the marker lying flat at z = 0 to its current position, the 3rd column of your 4x4 matrix does nothing (it all gets multiplied by 0) so it can be dropped without affecting the result.
In short: Forget about the rotation stuff, its more complicated than you need, just realize that the transform is from initial coordinates to final coordinates and your initial coordinates are always
[x,y,0,1]
which makes your third column irrelevant.
Update
I'm sorry! I just re-read your question and realized you just want to warp the marker so it looks like a straight on view, I got caught up in describing a general transform from 4x4 to 3x3. The 4x4 transform you get from ARTK is not the transform that will de warp the warker, it is the transform that moves the marker from the origin to its final position. To de warp the marker like you asked the process is similar but would be slightly different. I haven't done that before but here is my guess.
First, you need to get the 4x4 transform between where the marker is in world space, and where you would like it to appear to be after warping it. Right now the transform goes from the origin to the marker location. To change the transform to go from some point farther down on the z axis (say 100) to the marker location define the transform.
initial_marker_pose = [1,0,0,0
0,1,0,0
0,0,1,100
0,0,0,1];
Now you have the transform from the origin to what you want as your "inital" position, and the transform from the origin to your "final" position. To get the transform from initial to final simply
initial_to_final = origin_to_marker*initial_marker_pose.inv();
Now you would follow the process outlined in the link I gave you, in this case your initial zpos is no longer 0, it is 100. Then when you are finished you will need to invert your 3x3 matrix. That is because this process takes you from a straight on view to the one defined by the pose from ARTK and you want the opposite of that. You will need to experiment with the initial z position. The smaller it is, the larger your marker will appear after de-warping.
Hopefully that works, sorry for the confusion about your question.

Resources