OpenCV - Determine detected objects angle from "center" of frame - opencv

I am working on a machine vision project and need to determine the angle of an object in x and y relative to the center of the frame (center in my mind being where the camera is pointed). I originally did NOT do a camera calibration (calculated angle per pixel by taking a picture of a dense grid and doing some simple math). While doing some object tracking I was noticing some strange behaviour which I suspected was due to some distortion. I also noticed that an object that should be dead center of my frame was not, the camera had to be shifted or the angle changed for that to be true.
I performed a calibration in OpenCV and got a principal point of (363.31, 247.61) with a resolution of 640x480. The angle per pixel obtained by cv2.calibrationMatrixVales() was very close to what I had calculated, but up to this point I was assuming center of the frame was based on 640/2, 480/2. I'm hoping that someone can confirm, but going forward do I assume that my (0,0) in cartesian coordinates is now at the principal point? Perhaps I can use my new camera matrix to correct the image so my original assumption is true? Or I am out to lunch and need some direction on how to achieve this.
Also was my assumption of 640/2 correct or should it technically have been (640-1)/2. Thanks all!

Related

Overhead camera's pose estimation with OpenCV SolvePnP results in a few centimeter off height

I'd like to get the pose (translation: x, y, z and rotation: Rx, Ry, Rz in World coordinate system) of the overhead camera. I got many object points and image points by moving the ChArUco calibration board with a robotic arm (like this https://www.youtube.com/watch?v=8q99dUPYCPs). Because of that, I already have exact positions of all the object points.
In order to feed many points to solvePnP, I set the first detected pattern (ChArUco board) as the first object and used it as the object coordinate system's origin. Then, I added the detected object points (from the second pattern to the last) to the first detected object points' coordinate system (the origin of the object frame is the origin of the first object).
After I got the transformation between the camera and the object's coordinate frame, I calculated the camera's pose based on that transformation.
The result looked pretty good at first, but when I measured the camera's absolute pose by using a ruler or a tape measure, I noticed that the extrinsic calibration result was around 15-20 millimeter off for z direction (the height of the camera), though almost correct for the others (x, y, Rx, Ry, Rz). The result was same even I changed the range of the object points by moving a robotic arm differently, it always ended up to have a few centimeters off for the height.
Has anyone experienced the similar problem before? I'd like to know anything I can try. What is the common mistake when the depth direction (z) is inaccurate?
I don't know how you measure the z but I believe that what you're measuring with the ruler is not z but the euclidean distance which is computed like so:
d=std::sqrt(x*x+y*y+z*z);
Let's take an example, if x=2; y=2; z=2;
then d will be d~3,5 so 3.5-2=1.5 is the difference you get between z and the ruler when you said around 15-20 millimeter off for z direction.

Aruco scales coordinates wrong

I am using the (newly released) ArUco 2.0.7 to track some markers.
The camera that I am using is mounted to the ceiling facing down, so I only need the x and y coordinates.
It can view an area of 2.6m by 1.5m. If I understand the documentation correctly, I supply the sidelength of the markers I'm using in an arbitrary unit, the output of the pose will be in the same unit.
So the markers have a sidelength of 19.5cm. As I want my result in meters, I have that value set to 0.195.
However, the results I obtain are not correct. If I place the markers right in the corners of the field of view of the camera, they are not at the corresponding expected x and y coordinates.
I am placing the global origin on one of the corners of the field of view, e.g. (0,0) is the bottom left corner. This is done by transforming all incoming positions into that markers coordinate system using the matrix transforms obtained by getRTMatrix().
Everything seems to be working, except the x and y coordinates are in a wrong unit or scaled. The rotation works perfectly.
Am I missing something? Or can I not expect a good accuracy? The error is significant, e.g. when it should be (2.6,1.5), it is displayed as (1.8, 1), which is roughly an error of 33%.
After some more thought I figured out that simply my camera was calibrated using a smaller distance from the calibration board to the lens than what I need for my use case.
This caused the distortion coefficients the be wrong, thus giving me a bogus scale.
I re-calibrated using the aruco_calibration tool and am now accurate to roughly 3 or 4 cm, which is good enough for me.

Camera pose and reflections using OpenCV's SovePNP

I'm trying to use the function SolvePNP to estimate the relative position of a camera. Mi question is this, when choosing world coordinates, do I need to be careful in choosing them so that there can be no reflections when transforming them to camera coordinates? Or will OpenCV correct that for me?
Details: I'm filming a tennis court and was originally setting the world coordinate origin to be the centre of the court, with the x-axis pointing parallel to the net towards the left, the y-axis pointing forwards vertically on the court, and the z-axis pointing upwards. If I've understood correctly, SolvePNP will transform these coordinates to a system with origin at some point behind the top left corner of an image, with x-axis pointing downwards on the image, y-axis pointing to the right, and z-axis pointing forwards to the scene. However this transformation would definitely involve a reflection, must I swap the x and y axis of my world coordinates to avoid this or is it fine to leave them as they are? (Also, let me know if I'm making a big mistake and SolvePnp actually puts the origin at a point behind the centre of the image rather than one the top left corner...)
Assuming that you have a camera calibration matrix (and that such calibration was done assuming a right hand coordinate system all along), and correct correspondences between the tennis field features in the image and the CAD-features:
You need to select the reference frame in the tennis court such that is a right hand coordinate system, so that your solution from solvePNP provides the pose and position of the tennis field reference frame with respect to the camera coordinate system (by default a right hand coordinate system).
Hope it helps

Is there a reverse function of lookat for glMatrix?

I am using the glMatrix to code Webgl and want to get the eye position, focal point and up direction from the existing projection and view matrix (kinda like the reverse of lookat function). Is there any way to do this?
I didn't implement one, no. I'm not even sure that you could decompose it into the original vectors, for that matter. The lookAt point could be anywhere along a ray from the origin, and how would you determine what the appropriate up vector was? I'm thinking this is a one-way algorithm (just too lazy to prove it!)
Beyond that, however, I question wether you would want to do this even if there was a method for it. I'll be willing to bet that it's almost always more beneficial to track the values you're using and manipulate them rather than to try and pull them back and forth from matrix to vectors and back.
Yes and No: Yes you can invert the model view transformation and no you will not get exactly all three vectors the same.
The model view transformation of lookAt is very similar to the connectTo operation as used in CSG models. It is mounting your scene in front of your camera. This is done by translation and three axis rotations. The eye point is translated to (0,0,0) and all further rotation is done around it. You can easily derive the eye point by transforming (0,0,0) with the inverse matrix.
But the center point is just used for adjusting the axis of view along the -Z axis. In openGL the eye is facing to -Z. The distance between center and eye is lost. So you can easy get a center point along your axis of view if you define the distance yourself. Let's say we want a distance of d. Then we just need to transform (0,0,-d) with the inverse matrix and we get a valid center point, but not exactly the same. The center point is defining only two rotation angles, the camera pan and tilt.
Even more worse is the reconstruction of the up vector. It is only used for the roll angle of the camera and thus only for one scalar value. Thus for the inverse transformation you can not only choose any positive value along the Y axis, you could choose any point in the YZ plane with a positive Y value. To get a up vector perfectly normal to the viewing axis and of size 1 we just transform (0,1,0) with the inverse matrix. Remember to transform as vector this time (not as point).
Now we have eye, center and up reconstructed in a way to get exactly the same result of lookAt next time. But since this matrix contains only 6 values of information (translation,pan,tilt,roll) we had to choose 3 values that were lost (distance center to eye, size and angle of up vector in YZ plane of camera).
The model view matrix can of course do other transformation (any affine) but the lookAt function is using this matrix only for translation and rotation. It is adjusting the scene in front of the camera without distorting it.

Finding distance from camera to object of known size

I am trying to write a program using opencv to calculate the distance from a webcam to a one inch white sphere. I feel like this should be pretty easy, but for whatever reason I'm drawing a blank. Thanks for the help ahead of time.
You can use triangle similarity to calibrate the camera angle and find the distance.
You know your ball's size: D units (e.g. cm). Place it at a known distance Z, say 1 meter = 100cm, in front of the camera and measure its apparent width in pixels. Call this width d.
The focal length of the camera f (which is slightly different from camera to camera) is then f=d*Z/D.
When you see this ball again with this camera, and its apparent width is d' pixels, then by triangle similarity, you know that f/d'=Z'/D and thus: Z'=D*f/d' where Z' is the ball's current distance from the camera.
To my mind you will need a camera model = a calibration model if you want to measure distance or other things (int the real-world).
The pinhole camera model is simple, linear and gives good results (but won't correct distortions, (whether they are radial or tangential).
If you don't use that, then you'll be able to compute disparity-depth map, (for instance if you use stereo vision) but it is relative and doesn't give you an absolute measurement, only what is behind and what is in front of another object....
Therefore, i think the answer is : you will need to calibrate it somehow, maybe you could ask the user to approach the sphere to the camera till all the image plane is perfectly filled with the ball, and with a prior known of the ball measurement, you'll be able to then compute the distance....
Julien,

Resources