Calibration Opencv - opencv

I am trying to find camera extrinsics from six chessboard images with Opencv. My output is:
<Extrinsic_Parameters type_id="opencv-matrix"><rows>8</rows><cols>6</cols><dt>d</dt><data>
1.7261576010447846e-01 3.1158880577193560e-01 1.2720406228471280e-02
-1.1592911113815259e+02 -2.2406582979927950e+02
8.1420941356557194e+02 3.9346701007260626e-01 6.9003564777197379e-01
-1.3469982321835601e+00 -8.6826153197023956e+01
1.6051013708505607e+02 7.2011353507821275e+02
-2.7069532546117758e-01 2.2148718738805429e-01
3.6171900096916804e-01 -5.2552433323678208e+01
-2.0518988227964823e+02 6.4574752691945832e+02
-1.0899083613916538e-01 2.7649799933587743e-01
1.4885762528126442e-03 -1.6853141436409484e+02
-1.3942460278197393e+02 6.7605597445537410e+02
-2.6583012230541703e-01 4.5957884020124173e-01
1.3218319619927716e+00 1.4402207754903725e+02
-2.3447541083582803e+02 6.3522213346840215e+02
4.3950055218555362e-01 3.1960642369670372e-01 1.6428303526609498e+00
3.6431577304606526e+02 -1.3627914048252993e+02
6.7111839770203858e+02 2.2222642699596459e-01 3.7120066692341575e-01
1.8680367378145426e+00 7.2471813700978785e+01
-1.4921069561111017e+02 7.8381396962300209e+02
-5.5897629846263175e-02 5.0568582309311438e-01
1.7593139413155239e+00 1.8515422152916943e+02
-1.7998599745081759e+02 6.3305731735881432e+02</data></Extrinsic_Parameters>
And:
<Distortion_Coefficients type_id="opencv-matrix"><rows>5</rows><cols>1</cols><dt>d</dt><data>
-3.0246684876115215e-01 1.5880077551448199e-01 0. 0.
-6.9087472680662912e-02</data></Distortion_Coefficients>
Now, how I calculate the real world coordinates of the point (u,v) on the screen?

With extrinsic and intrinsic parameters I find:
Pc = [R|t] * Pw
where Pc and Pw are cam and world coordinates, R is rotation matrix and t is translation vector

Related

How to use OpenCV triangulatePoints and GPS data properly?

I am trying to estimate the 3D position of a world coordinate from 2 frames. The frames are captured with the same camera from different positions. The problem is, the estimation is wrong.
I have
Camera Intrinsic parameters
K = [4708.29296875, 0, 1218.51806640625;
0, 4708.8935546875, 1050.080322265625;
0, 0, 1]
Translation and Rotation data:
Frame X-Coord Y-Coord Z-Coord(altitude) Pitch Roll Yaw
1 353141.23 482097.85 38.678 0.042652439 1.172694124 16.72142499
2 353141.82 482099.69 38.684 0.097542931 1.143224387 16.79931141
Note: GPS data uses cartesian coordinate system (X,Y,Z Coordinates) is in meter units based on British National Grid GPS system.
To get the rotation matrix I used
https://stackoverflow.com/a/56666686/16432598 which is based on http://www.tobias-weis.de/triangulate-3d-points-from-3d-imagepoints-from-a-moving-camera/.
Using above data I calculate Extrinsic Parameters and the Projection Matrices as follows.
Rt0 = [-0.5284449976982357, 0.308213375891041, -0.7910438668806931, 353141.21875;
-0.8478960766271159, -0.2384055118949635, 0.4735346398506075, 482097.84375;
-0.04263950806535898, 0.9209600028339713, 0.3873171123665929, 38.67800140380859]
Rt1 = [-0.4590975294881605, 0.3270290779984009, -0.8260032933114635, 353141.8125;
-0.8830316937622665, -0.2699087096524321, 0.3839326975722462, 482099.6875;
-0.097388326965866, 0.905649640091175, 0.4126914624432091, 38.68399810791016]
P = K * Rt;
P1 = [-2540.030877954028, 2573.365272473235, -3252.513377560185, 1662739447.059914;
-4037.427278644764, -155.5442017945203, 2636.538291686695, 2270188044.171295;
-0.04263950806535898, 0.9209600028339713, 0.3873171123665929, 38.67800140380859]
P2 = [-2280.235105924588, 2643.299156802081, -3386.193495224041, 1662742249.915956;
-4260.36781710715, -319.9665173096691, 2241.257388910372, 2270196732.490808;
-0.097388326965866, 0.905649640091175, 0.4126914624432091, 38.68399810791016]
triangulatePoints(Points2d, projection_matrices, out);
Now, I pick the same point in both images for triangulation
p2d_1(205,806) and p2d_2(116,813)
For the 3D position of this particular point I expect something like;
[353143.7, 482130.3, 40.80]
whereas I calculate
[549845.5109014747, -417294.6070425579, -201805.410744677]
I know that my Intrinsic parameters and GPS data is very accurate.
Can anybody tell me what is missing or what do I do wrong here?
Thanks

opencv transform tilt view to plan view

Here is the picture I take with my USB camera. My camera has an angle with horizontal line, the target is on the bottom, with parallel and orthogonal lines delimiting rectangles. Post-it is a control marker of the center-rectangle.
Then I process several step-by-step processing in order to adjust the 'tilt' of the view and to extract lines.
Here is the line extraction without transform :
{"type":"toGray"} => mat.cvtColor( cv4.COLOR_BGR2GRAY);
{"type":"toBlur","size":10} => mat.gaussianBlur( new cv4.Size( size, size),0);
{"type":"toCanny","low":50,"high":150} => mat.canny( low_threshold, high_threshold);
{"type":"getLines","rho":1,"theta":0.017453292222222222,"threshold":15,"min_line_length":50,"max_line_gap":20 }] => let lines = mat.houghLinesP( rho, theta, threshold, min_line_length, max_line_gap);
Result is :
Now, I want to correct the tilt of view, using 'warpAffine' function, before extracting lines.
I select four points of the centered rectangle, in order to build two "three points array" (src, dst):
matTransf = cv4.getAffineTransform( srcPoints, dstPoints);
resultMat = mat.warpAffine( matTransf, new cv4.Size( mat.cols, mat.rows));
The result is the following:
Where is the mistake ?
I have tried too :
// four points at each corner of the rectangle, srcPoints for the picture, and dstPoints for the theoric shape
// With getPerspectiveTransform
matTransf = cv4.getPerspectiveTransform( srcPoints, dstPoints);
resultMat = mat.warpPerspective( matTransf, new cv4.Size( mat.cols, mat.rows));
// With findHomography
let result = cv4.findHomography( srcPoints, dstPoints);
matTransf = result.homography;
resultMat = mat.warpPerspective( matTransf, new cv4.Size( mat.cols, mat.rows));
Result is :
Best regards.
The transformation is not an affinity, it is a perspective described by a homography. Select in the image four corners of a physical rectangle, map them to points in a rectangle with the same aspect ratio as the physical one, estimate the homography from them (findHomography), finally warp (warpPerspective).

How to convert camera extrinsic matrix to SCNCamera position and rotation

I'm trying to achieve Augmented Reality with SceneKit.
I got a intrinsic camera matrix and a extrinsic matrix by estimating pose of a marker, using ARuco (OpenCV augmented reality library).
And I set up the SCNCamera's projectionTransform with parameters of the intrinsic matrix (fovy, aspect, zNear, zFar).
Normally in OpenGL, world coordinate relative to camera coordinate is calculated with ModelView but in SceneKit, there is no things such as modelView.
So I calculated inverse matrix of the extrinsic matrix to get the camera coordinate relative to the world coordinate(the marker coordinate).
And I think I've got correct camera's position by the inverse matrix which contains rotation and translate matrix.
However I cannot get camera's rotation from that.
Do you have any ideas?
SceneKit has the same view matrixes that you've come across in OpenGL, they're just a little hidden until you start toying with shaders. A little too hidden IMO.
You seem to have most of this figured out. The projection matrix comes from your camera projectionTransform, and the view matrix comes from the inverse of your camera matrix SCNMatrix4Invert(cameraNode.transform). In my case everything was in world coordinates making my model matrix a simple identity matrix.
The code I ended up using to get the classic model-view-projection matrix was something like...
let projection = camera.projectionTransform()
let view = SCNMatrix4Invert(cameraNode.transform)
let model = SCNMatrix4Identity
let viewProjection = SCNMatrix4Mult(view, projection)
let modelViewProjection = SCNMatrix4Mult(model, viewProjection)
For some reason I found SCNMatrix4Mult(...) took arguments in a different order than I was expecting (eg; opposite to GLKMatrix4Multiply(...)).
I'm still not 100% on this, so would welcome edits/tips. Using this method I was unable to get the SceneKit MVP matrix (as passed to shader) to match up with that calculated by the code above... but it was close enough for what I needed.
#lock's answer looks good with a couple additions:
(1) access SCNNode worldTransform instead of transform in case the cameraNode is animated or parented:
let view = SCNMatrix4Invert(cameraNode.presentationNode.worldTransform)
(2) the code doesn't account for the view's aspect ratio. e.g., assuming a perspective projection, you'll want to do:
perspMatrix.m11 /= viewportAR; //if using Yfov -> adjust Y`
/* or, */
perspMatrix.m22 *= viewportAR; //if using Xfov -> adjust X`
Where, viewportAR = viewport.width / viewport.height
Another way to do it is to have one node with a rendered delegate in the scene, and retrieve SceneKit’s matrices from that delegate (they are passed as options):
FOUNDATION_EXTERN NSString * const SCNModelTransform;
FOUNDATION_EXTERN NSString * const SCNViewTransform;
FOUNDATION_EXTERN NSString * const SCNProjectionTransform;
FOUNDATION_EXTERN NSString * const SCNNormalTransform;
FOUNDATION_EXTERN NSString * const SCNModelViewTransform;
FOUNDATION_EXTERN NSString * const SCNModelViewProjectionTransform;

How to convert TangoXyxIjData into a matrix of z-values

I am currently using a Project Tango tablet for robotic obstacle avoidance. I want to create a matrix of z-values as they would appear on the Tango screen, so that I can use OpenCV to process the matrix. When I say z-values, I mean the distance each point is from the Tango. However, I don't know how to extract the z-values from the TangoXyzIjData and organize the values into a matrix. This is the code I have so far:
public void action(TangoPoseData poseData, TangoXyzIjData depthData) {
byte[] buffer = new byte[depthData.xyzCount * 3 * 4];
FileInputStream fileStream = new FileInputStream(
depthData.xyzParcelFileDescriptor.getFileDescriptor());
try {
fileStream.read(buffer, depthData.xyzParcelFileDescriptorOffset, buffer.length);
fileStream.close();
} catch (IOException e) {
e.printStackTrace();
}
Mat m = new Mat(depthData.ijRows, depthData.ijCols, CvType.CV_8UC1);
m.put(0, 0, buffer);
}
Does anyone know how to do this? I would really appreciate help.
The short answer is it can't be done, at least not simply. The XYZij struct in the Tango API does not work completely yet. There is no "ij" data. Your retrieval of buffer will work as you have it coded. The contents are a set of X, Y, Z values for measured depth points, roughly 10000+ each callback. Each X, Y, and Z value is of type float, so not CV_8UC1. The problem is that the points are not ordered in any way, so they do not correspond to an "image" or xy raster. They are a random list of depth points. There are ways to get them into some xy order, but it is not straightforward. I have done both of these:
render them to an image, with the depth encoded as color, and pull out the image as pixels
use the model/view/perspective from OpenGL and multiply out the locations of each point and then figure out their screen space location (like OpenGL would during rendering). Sort the points by their xy screen space. Instead of the calculated screen-space depth just keep the Z value from the original buffer.
or
wait until (if) the XYZij struct is fixed so that it returns ij values.
I too wish to use Tango for object avoidance for robotics. I've had some success by simplifying the use case to be only interested in the distance of any object located at the center view of the Tango device.
In Java:
private Double centerCoordinateMax = 0.020;
private TangoXyzIjData xyzIjData;
final FloatBuffer xyz = xyzIjData.xyz;
double cumulativeZ = 0.0;
int numberOfPoints = 0;
for (int i = 0; i < xyzIjData.xyzCount; i += 3) {
float x = xyz.get(i);
float y = xyz.get(i + 1);
if (Math.abs(x) < centerCoordinateMax &&
Math.abs(y) < centerCoordinateMax) {
float z = xyz.get(i + 2);
cumulativeZ += z;
numberOfPoints++;
}
}
Double distanceInMeters;
if (numberOfPoints > 0) {
distanceInMeters = cumulativeZ / numberOfPoints;
} else {
distanceInMeters = null;
}
Said simply this code is taking the average distance of a small square located at the origin of x and y axes.
centerCoordinateMax = 0.020 was determined to work based on observation and testing. The square typically contains 50 points in ideal conditions and fewer when held close to the floor.
I've tested this using version 2 of my tango-caminada application and the depth measuring seems quite accurate. Standing 1/2 meter from a doorway I slid towards the open door and the distance changed form 0.5 meters to 2.5 meters which is the wall at the end of the hallway.
Simulating a robot being navigated I moved the device towards a trash can in the path until 0.5 meters separation and then rotated left until the distance was more than 0.5 meters and proceeded forward. An oversimplified simulation, but the basis for object avoidance using Tango depth perception.
You can do this by using camera intrinsics to convert XY coordinates to normalized values -- see this post - Google Tango: Aligning Depth and Color Frames - it's talking about texture coordinates but it's exactly the same problem
Once normalized, move to screen space x[1280,720] and then the Z coordinate can be used to generate a pixel value for openCV to chew on. You'll need to decide how to color pixels that don't correspond to depth points on your own, and advisedly, before you use the depth information to further colorize pixels.
The main thing is to remember that the raw coordinates returned are already using the basis vectors you want, i.e. you do not want the pose attitude or location

problems in 3D reconstruction by triangulation opencv

I'm currently working in an stereo vision project, in which I'm supposed to reconstruct 3D points from correspondences found in each camera view, and for that I'm using OpenCV 2.4.7 for C++.
I was able to correctly calibrate both cameras, compute fundamental matrix, compute re-projection matrix and also rectify images.
My problem lies on the final part of the project, which is compute 3D world coordinates from 2D point correspondences. I already tried using cv::triangulatePoints, but the results where points with coordinates (0, 0, 0), no matter what the input points were. I also tried the linear triangulation algorithm by Hartley & Strum, but that didn't give me good results either.
Could somebody give me a hint on what function I should use? Or maybe some tips on how to correctly implement the ones I've talked about. My biggest problem is to find good documentation on the internet, so that's why I decided to ask here.
Thank you!
I tried cv::triangulatePoints also and it calculates garbage. I was forced to implement a linear triangulation method manually, which returns the triangulated 3D point given a stereo pixel correspondence:
Mat triangulate_Linear_LS(Mat mat_P_l, Mat mat_P_r, Mat warped_back_l, Mat warped_back_r)
{
Mat A(4,3,CV_64FC1), b(4,1,CV_64FC1), X(3,1,CV_64FC1), X_homogeneous(4,1,CV_64FC1), W(1,1,CV_64FC1);
W.at<double>(0,0) = 1.0;
A.at<double>(0,0) = (warped_back_l.at<double>(0,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,0) - mat_P_l.at<double>(0,0);
A.at<double>(0,1) = (warped_back_l.at<double>(0,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,1) - mat_P_l.at<double>(0,1);
A.at<double>(0,2) = (warped_back_l.at<double>(0,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,2) - mat_P_l.at<double>(0,2);
A.at<double>(1,0) = (warped_back_l.at<double>(1,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,0) - mat_P_l.at<double>(1,0);
A.at<double>(1,1) = (warped_back_l.at<double>(1,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,1) - mat_P_l.at<double>(1,1);
A.at<double>(1,2) = (warped_back_l.at<double>(1,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,2) - mat_P_l.at<double>(1,2);
A.at<double>(2,0) = (warped_back_r.at<double>(0,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,0) - mat_P_r.at<double>(0,0);
A.at<double>(2,1) = (warped_back_r.at<double>(0,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,1) - mat_P_r.at<double>(0,1);
A.at<double>(2,2) = (warped_back_r.at<double>(0,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,2) - mat_P_r.at<double>(0,2);
A.at<double>(3,0) = (warped_back_r.at<double>(1,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,0) - mat_P_r.at<double>(1,0);
A.at<double>(3,1) = (warped_back_r.at<double>(1,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,1) - mat_P_r.at<double>(1,1);
A.at<double>(3,2) = (warped_back_r.at<double>(1,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,2) - mat_P_r.at<double>(1,2);
b.at<double>(0,0) = -((warped_back_l.at<double>(0,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,3) - mat_P_l.at<double>(0,3));
b.at<double>(1,0) = -((warped_back_l.at<double>(1,0)/warped_back_l.at<double>(2,0))*mat_P_l.at<double>(2,3) - mat_P_l.at<double>(1,3));
b.at<double>(2,0) = -((warped_back_r.at<double>(0,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,3) - mat_P_r.at<double>(0,3));
b.at<double>(3,0) = -((warped_back_r.at<double>(1,0)/warped_back_r.at<double>(2,0))*mat_P_r.at<double>(2,3) - mat_P_r.at<double>(1,3));
solve(A,b,X,DECOMP_SVD);
vconcat(X,W,X_homogeneous);
return X_homogeneous;
}
the input parameters are two 3x4 camera projection matrices and the left/right corresponding homogeneous pixel coordinates.

Resources