Does ARCore Pose gives us the ModelView matrix (model to camera) or Model matrix (model to world)? - opencv

I'm trying to use ARCore poses in Colmap for 3D reconstruction.
I realize that the pose returned from ARCore is in OpenGL coordinate system.I'm trying to convert it to OpenCV coordinate system as required by colmap.
These are the steps I've done:
Convert the quaternion to normalized quaternions and then to a rotation matrix (3x3).
Convert the second and third columns to negative by multiplying it with [[1,0,0],[0,-1,0],[0,0,-1]] array. (Since OpenGL uses column major).
Transpose the rotation matrix (3x3) to get row major.
Convert it back to quaternions.
Still after doing this, the camera positions are wrong in reconstructions.
After a bit of reading, I thought it might be because Arcore is returning Model transform which transform points from model to world coordinates. But Colmap requires poses in world to camera coordinates. I was wondering I might be able to solve it if I convert the ARcore poses to world to camera coordinates first before doing everything else.
Is this the correct method? If yes, how do I get the View matrix? If not, what am I doing wrong here?

Related

Conversion from OpenGL to OpenCV

What I have
I'm generating images using the standard perspective camera in unity. The camera is aiming to the ground plane (in unity it's the xz-plane), see image. From this I need to remove the perspective so all crop rows are parallel to each other.
Methode
The warpPerspective() function from openCV can be used to remove perspective from an image. All information is known such as, field of view, rotation, position, ... and thus I know how a 3D point maps on the 2D plane and visa versa. The problem is OpenCV uses an other system. In openCV should be a 3X3 matrix and the transformation matrix from unit is a 4X4 matrix. Is there a conversion between the two? Or should I think of another strategy?
EDIT
I can not use the orthographic camera in unity.
Fixed
Solved the issue by constructing a ray from the camera origin through each pixel and looking for an intersection with the ground plane. After this I discretised the ground plane in a grid with the same resolution of the original image. Points that map to the same cell are accumulated
I you cannot use the unity's orthographic camera, what I would try to imitate the c++ code from the examples from your link in open CV documentation. Another approach can be to try to obtain the projection matrix of the points you want the projection to be removed by multiplying by the inverse matrix (the inverse of the transformation matrix of that point). A matrix multiplied by its inverse is the identitiy so the projection transformation would be removed. I think that should be possible, you can dig on that you can obtain/change the projection matrix checking this. The point would be to undo the projection transformation. Then you would need to obtain the according othographic projection matrix and apply it to obtain the positions you're after. That should be the same thing that the unity's orthographic camera does.
To understand the projection matrix to the lowest level this source is awesome.
I think that In the camera component you just need to change the projection from prespective to orthographic:

How to obtain extrinsic matrix from ARKit camera?

I want to convert the pixel coordinate into real world coordinate. And I found that the ARKit API provide a function in ARCamera call viewMatrix()
Returns a transform matrix for converting from world space to camera
space
It this function can obtain extrinsic matrix for the camera?
This may help:
self.sceneView.session.currentFrame?.camera.transform
The position and orientation of the camera in world coordinate space.
.transform documentation
You can directly extract the eulerAngles from this, but will have to parse the translation yourself.
How come you manually want to project pixels into world positions? (The transform alone isn't going to help you there obviously).

solvePnP with Unity3D

I have a real/physical stick with an IR camera attached to it and some IR LED that forms a pattern that I'm using in order to make a virtual stick move in the same way as the physical one.
For that, I'm using OpenCV in Python and send a rotation and translation vector calculated by solvePnP to Unity.
I'm struggling to understand how I can use the results given by the solvePnP function into my 3D world.
So far what I did is: using the solvePnP function to get the rotation and translation vectors. And then use this rotation vector to move my stick in the 3d World
transform.rotation = Quaternion.Euler(new Vector3(x, z, y));
It seems to work okay when my stick is positioned at a certain angle and if I move slowly...but most of the time it moves everywhere.
By looking for answers online, most of people are doing several more steps after solvePnP - from what I understand:
Using Rodrigues to convert the rotation vector to a rotation matrix
Copy the rotation matrix and translation vector into a extrinsic matrix
Inverse the extrinsic matrix
I understand that these steps are necessary if I was working with matrix like in OpenGL - but what about Unity3D? Are these extra steps necessary? Or can I directly use the vectors given by the solvePnP function (which I doubt as the results I'm having so far aren't good).
This is old, but the answer to the question "what about Unity3D? Are these extra steps necessary? Or can I directly use the vectors given by the solvePnP function"
is:
-No, you can't directly use them. I tried to convert rvec using Quaternion.Euler and as you've posted, the results were bad.
-Yes, you have to use Rodrigues, which converts rvec correctly into a rotation matrix.
-About inversing the extrinsic matrix: it depends.
If your object is at (0,0,0) in world space and you want to place the camera, you have to invert the transform resulting from tvec and rvec, in order to get the desired result.
If on the other hand your camera has a fixed position and you want to position the object relatively to it, you have to apply the camera's localToWorld matrix to your transform resulting from rvec and tvec, in order to get the desired result

Triangulation to find distance to the object- Image to world coordinates

Localization of an object specified in the image.
I am working on the project of computer vision to find the distance of an object using stereo images.I followed the following steps using OpenCV to achieve my objective
1. Calibration of camera
2. Surf matching to find fundamental matrix
3. Rotation and Translation vector using svd as method is described in Zisserman and Hartley book.
4. StereoRectify to get the projection matrix P1, P2 and Rotation matrices R1, R2. The Rotation matrices can also be find using Homography R=CameraMatrix.inv() H Camera Matrix.
Problems:
i triangulated point using least square triangulation method to find the real distance to the object. it returns value in the form of [ 0.79856 , .354541 .258] . How will i map it to real world coordinates to find the distance to an object.
http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/
Alternative approach:
Find the disparity between the object in two images and find the depth using the given formula
Depth= ( focal length * baseline ) / disparity
for disparity we have to perform the rectification first and the points must be undistorted. My rectification images are black.
Please help me out.It is important
Here is the detail explanation of how i implemented the code.
Calibration of Camera using Circles grid to get the camera matrix and Distortion coefficient. The code is given on the Github (Andriod).
2.Take two pictures of a car. First from Left and other from Right. Take the sub-image and calculate the -fundmental matrix- essential matrix- Rotation matrix- Translation Matrix....
3.I have tried to projection in two ways.
Take the first image projection as identity matrix and make a second project 3x4d through rotation and translation matrix and perform Triangulation.
Get the Projection matrix P1 and P2 from Stereo Rectify to perform Triangulation.
My object is 65 meters away from the camera and i dont know how to calculate this true this based on the result of triangulation in the form of [ 0.79856 , .354541 .258]
Question: Do i have to do some extra calibration to get the result. My code is not based to know the detail of geometric size of the object.
So you already computed the triangulation? Well, then you have points in camera coordinates, i.e. in the coordinate frame centered on one of the cameras (the left or right one depending on how your code is written and the order in which you feed your images to it).
What more do you want? The vector length (square root of the sum of the square coordinates) of those points is their estimated distance from the same camera. If you want their position in some other "world" coordinate system, you need to give the coordinate transform between that system and the camera - presumably through a calibration procedure.

Finding Rotation matrices between two cameras for "Stereorectify"

So I have a depth map and the extrinsics and intrinsics of the camera.I want to get back the 3D points and the surface normals .I am using the functionReprojectImageTo3D.In the stereo rectify function to find Q how do I get the The rotation matrix
between
the 1st and the 2nd cameras’ coordinate systems? I have individual rotation matrix and translation vector but how do I get it for "between the cameras?"
.Also this would give me the 3D points .Is there a method to generate the surface normals?
Given that you have the extrinsic matrix of both cameras, can't you simply take the inverse extrinsic matrix of camera 1, multiplied by the extrinsic matrix of camera 2?
Also, for a direct relation between the two cameras, take a look at the Fundamental Matrix (or, more specific, the Essential matrix). See if you can find a copy of the book Multiple View Geometry by Hartley and Zisserman.
As for the surface normals, you can compute those yourself by computing crossproducts on the corners of triangles. However, you then first need the reconstructed 3D point cloud.

Resources