Finding image based on closest camera pose - machine-learning

I have a camera matrix world of a 3D scene (let's call it A):
[[ 6.86118087e-01 -7.27490186e-01 1.11022302e-16 0.00000000e+00]
[ 3.82193007e-01 3.60457832e-01 8.50881106e-01 0.00000000e+00]
[-6.19007654e-01 -5.83804917e-01 5.25358300e-01 0.00000000e+00]
[-2.89658661e+00 5.19178303e-01 -3.50478367e-02 1.00000000e+00]]
And, I have 1000 2D images of the scene. For each image, I have pose (camera to world) as follows:
[[-9.55421e-01 1.19616e-01 -2.69932e-01 2.65583e+00]
[ 2.95248e-01 3.88339e-01 -8.72939e-01 2.98160e+00]
[ 4.07581e-04 -9.13720e-01 -4.06343e-01 1.36865e+00]
[ 0.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00]]
I want to find the image that is closest to A by comparing pose matrix. How do I compare A with the camera pose of each image?
What is closest?
Just a little background, we asked annotators to write descriptions of an object in a 3D scene. Along with that, we also captured camera parameters, such as matrix world (see A above), center, lookat, and dof, so that we can estimate where in 3D scene were they looking at when they wrote the description. So, now I am trying to create a training set of images (one image for each scene from 1000 available) and would like to find the image that best matches the recorded camera parameters.

Related

view mapping between two images taken from same cameras of same scene using homography in open CV, except the camera positions are not parallel

I am trying to understand mapping points between two images of same scene except the camera positions are different. say like this apologies for the rough sketch and the hand-writing. Sample image taken from cam1 and Sample image taken from cam2 . Trying to map between these two images. since the two cameras used are same(logitech camera). I assume camera calibration isn't required. So with the help of SIFT descriptors and feature matching, using the good matches from the images as inputs to Homography with RANSAC. I get 3*3 matrix. To verify the view mapping. I select few objects(say bins in the image) in cam1 image and try to map the same object in cam2 image using 3 * 3 matrix by using warp_perspective, but the outputs aren't good. say something like this had selected top left and bottom right of the objects in cam1 image(i.e. bins) and trying to draw a bounding box for the desired object in cam2 image.
But as visible in the view map output image the bounding boxes aren't proper to the bins.
Wanted to understand, where am i going wrong. Is it the camera positions affecting, and this shouldn't be used for homography or have to use multiple homographies or have to get to know the translation between the camera positions. very confused. Thank you.
Homography transforms plane into a plane. It can only be used if all of the matches lay on a plane in real world (e.g. on the planar wall) or the feature points are located far from both cameras so the transformation between the cameras might be expressed as pure rotation. See this link for further explanation.
In your case the objects are located at different depths so you need to perform stereo calibration of cameras and then compute the depth map to be able to map pixels from one camera into another.

Deprojection without Intrinsic Camera Matrix

I am trying to verify a solution to deprojecting a pixel point (u,v) into a 3D world location (x,y,z) using only the camera's extrinsic rotation and translation in addition to (u,v).
The proposed solution
I have modeled the problem in Unreal, where I have a virtual camera with world position (1077,1133,450) and rotation yaw=90, pitch=345, roll=0 degrees. I have an object of known 3D position (923,2500,0) seen by the 1280x720 camera at pixel location (771,426) or frame center position (131,-66).
The transpose of my rotation matrix is:
[[ 5.91458986e-17 9.65925826e-01 -0.00000000e+00]
[-1.00000000e+00 6.12323400e-17 0.00000000e+00]
[-1.58480958e-17 -2.58819045e-01 9.65925826e-01]]
My Tx_Ty_Tz matrix is:
[[-1094.39396119]
[ 1077. ]
[ -141.42464373]]
My dx_dy_dz matrix is
[[ -63.75110454]
[-131. ]
[ 18.0479828 ]]
And I end up with location (-1593,50,0) as the deprojected world coordinate, which is clearly wrong. Is one of my matrices incorrectly calculated? If not, is the method provided flawed / incomplete?
The proposed solution in the link does not appear to be accurate as the intrinsic camera matrix should be required for deprojection.

How to map points from left camera to right camera using Rotation(R) and translation(t) between two cameras obtained from stereoCalibrate()?

I have R|t between two cameras which is estimated using stereoCalibrate() function from Opencv. From stereoCalibrate() function, we are getting R1 t1 and R2 t2 for each cameras respectively. We also getting between camera R t(R t between both cameras). Further, we also getting 2 intrinsic matrices K1 and K2, one for each of the camera.
I tried to map points from one camera to another camera using estimated R|t (between two cameras). However, I failed to map, even the points which I used for estimating R|t. I tried to map using depth data also but i failed. Any idea how to map the points from one camera to another?.
I tried Pose estimation of 2nd camera of a calibrated stereo rig, given 1st camera pose but didn't get success.
The "mapping" you seek requires knowledge of the 3D geometry of the scene. This can be inferred from a depth map, i.e. an image associated to a camera, whose pixel values equal the distance from the camera of the scene object seen through each pixel. The depth map itself can be computed from a stereo algorithm.
In some special cases the mapping can be computed without knowledge of the scene geometry. These include:
The camera displacement is a pure rotation (or, more generally, the translation between the cameras is very small compared to the distance of the scene objects from the cameras). In this case the image mapping is a homography.
The scene lies in a plane. In this case also the image mapping is a homography.

Finding a Projector real world position (using OpenCV)

I'm currently trying to discover the 3D position of a projector within a real world coordinate system. The origin of such a system is, for example, the corner of a wall. I've used Open Frameworks addon called ofxCvCameraProjectorCalibration
that is based on OpenCV functions, namely calibrateCamera and stereoCalibrate methods. The application output is the following:
camera intrisic matrix (distortion coeficients included);
projector intrisic matrix (distortion coeficients included);
camera->projector extrinsic matrix;
My initial idea was, while calibration the camera, place the chessboard pattern at the corner of the wall and extract the extrinsic parameters ( [RT] matrix ) for that particular calibration.
After calibrating both camera and projector do I have all the necessary data to discover the position of the projector in real world coordinates? If so, what's the matrix manipulation required to get it?

how to obtain the world coordinates of an image

After to calibrated a camera using Jean- Yves Bouget's Camera Calibration Toolbox and checkerboard-patterns printed on cardboard, I´ve obtained extrinsic and intrinsic parameters, I can use the informations to find camera coordinates:
Pc = R * Pw + T
After that, how to obtain the world coordinates of an image using the Pc and calibration parametesr?
thanks in advance.
EDIT
The goal is to use the calibrated camera parameters to measure planar objects with a calibrated Camera). To perform this task i dont know to use the camera parameters. in other words i have to convert the pixels coordinates of the image to world coordinates using the calibrated parameters. I already have the parameters and the new image. How can i do this convertion?
thanks in advance.
I was thinking about problem, and came to the result:
You can't find the object size. The problem is by a single shot, when you have no idea how far the Object is from your camera you can't say something about the size of the object. The calibration just say how far is the image plane from the camera (focal length) and the open angles of the lense. When the focal length changes the calbriation changes too.
But there are some possibiltys:
How to get the real life size of an object from an image, when not knowing the distance between object and the camera?
So how I understand you can approximate the size of the objects.
Your problem can be solved if (and only if) you can express the plane of your object in calibrated camera coordinates.
The calibration procedure outputs, along with the camera intrinsic parameters K, a coordinate transform matrix for every calibration image Qwc_i = [Rwc_i |Twc_i] matrix, that expresses the location and pose of a particular scene coordinate frame in the camera coordinates at that calibration image. IIRC, in Jean-Yves toolbox this is the frame attached to the top-left corner of the calibration checkerboard.
So, if your planar object is on the same plane as the checkerboard in one of the calibration images, all you have to do in order to find its location in space is intersect the checkerboard plane with camera rays cast from the camera center (0,0,0) to the pixels into which the object is imaged.
If your object is NOT in one of those planes, all you can do is infer the object's own plane from additional information, if available, e.g. from a feature of known size and shape.

Resources