In opencv's solvePnP, what should I pass for objectPoints? - opencv

OpenCV docs for solvePnp
In an augmented reality app, I detect the image in the scene so I know imagePoints, but the object I'm looking for (objectPoints) is a virtual marker just stored in memory to search for in the scene, so I don't know where it is in space. The book I'm reading(Mastering OpenCV with Practical Computer Vision Projects ) passes it as if the marker is a 1x1 matrix and it works fine, how? Doesn't solvePnP needs to know the size of the object and its projection so we know who much scale is applied ?

Assuming you're looking for a physical object, you should pass the 3D coordinates of the points on the model which are mapped (by projection) to the 2D points in the image. You can use any reference frame, and the results of the solvePnp will give you the position and orientation of the camera in that reference frame.
If you want to get the object position/orientation in camera space, you can then transform both by the inverse of the transform you got from solvePnp, so that the camera is moved to the origin.
For example, for a cube object of size 2x2x2, the visible corners may be something like: {-1,-1,-1},{1,-1,-1},{1,1,-1}.....

You have to pass the 3D coordinates of the real-world object that you want to map with the image. The scaling and rotation values will depend on the coordinate system that you use.
This is not as difficult as it sounds. See this blog post on head pose estimation. for more details with code.

Related

Finding the real world coordinates of an object from a camera

I am trying to find the coordinates of an object, which is detected from single camera, by using OpenCV. The camera will be mounted on the drone, looking through directly to the surface.
I have:
-Camera's coordinates from GPS sensor on the drone.
-Camera's height .
-Camera's intrinsic parameters.
3D Reconstruction formula
According to this formula, I need to find the extrinsic parameters to find the real world coordinates. I suppose to be use OpenCV’s solvePnP method to find extrinsic parameters. As I know, extrinsic parameters are about the camera location but my camera will be on the drone and the location will be change. Is the extrinsic parameters are constant just like the intrinsic parameters?
Is there any other way to do this calculation?
What you're trying to do is what is called monoplotting. In order to estimate XY real world coordinates from a single image you need to know the following:
X0, Y0, Z0 real world coordinates of your camera
Exterior orientation of your camera (Pitch, Roll and Yaw)
Interior orientation of your camera (focal length, principle point in x and y direction, radial and tangential distortion parameters)
So the XYZ of your camera and the exterior orientations are typically stored within the exif data of your image. This does however depend on the drone. There are some good python modules for extracting this information in the images. I use exifread.
The interior orientation can be more difficult to get, as you need to perform at camera calibration, which you can do with OpenCV. You should be able to use the tutorial Yunus linked in his comment. However there is a shortcut if you use photogrammetry software from pix4D, you can use their camera database which stores information about many different drones and their interior orientations. They are not perfect but should be alright for many use cases, see link.
When you have all of these parameters, you need to do the following:
Undistort your images
Create image coordinates of the points you wish to know the real world coordinate of, use undistorted image.
Create rotation matrices with rotations along X,Y,Z axis and do matrix multiplikation on them (RXRYRZ)
Apply collinearity equations
Regarding 1. you can use cv2.undistort for this, the link Yunus provided have a tutorial for this as well.
Regarding 3, I'm sure OpenCV probably can provide this matrix for you, but creating the function for your self is quite easy and good for understanding what is going on. See wikipedia: link It can be a little confusing which of the pitch, roll, yaw angles to use for which matrices. This all depends on how your cameras exterior coordinate system is. The typical convention is that pitch is around x, roll is around y and yaw is around z.
Regarding 4. The collinearity equations depends on the rotation matrices you created, see: link the term Z-Z0 just means the negative relative height above the object you try to find coordinates for. So if you do not have the relative height of your drone you need to know the height of your object and subtract the drones height from the objects height (you'll get a negative number).
I hope this helps you and have pointed you towards the right direction.

Conversion from OpenGL to OpenCV

What I have
I'm generating images using the standard perspective camera in unity. The camera is aiming to the ground plane (in unity it's the xz-plane), see image. From this I need to remove the perspective so all crop rows are parallel to each other.
Methode
The warpPerspective() function from openCV can be used to remove perspective from an image. All information is known such as, field of view, rotation, position, ... and thus I know how a 3D point maps on the 2D plane and visa versa. The problem is OpenCV uses an other system. In openCV should be a 3X3 matrix and the transformation matrix from unit is a 4X4 matrix. Is there a conversion between the two? Or should I think of another strategy?
EDIT
I can not use the orthographic camera in unity.
Fixed
Solved the issue by constructing a ray from the camera origin through each pixel and looking for an intersection with the ground plane. After this I discretised the ground plane in a grid with the same resolution of the original image. Points that map to the same cell are accumulated
I you cannot use the unity's orthographic camera, what I would try to imitate the c++ code from the examples from your link in open CV documentation. Another approach can be to try to obtain the projection matrix of the points you want the projection to be removed by multiplying by the inverse matrix (the inverse of the transformation matrix of that point). A matrix multiplied by its inverse is the identitiy so the projection transformation would be removed. I think that should be possible, you can dig on that you can obtain/change the projection matrix checking this. The point would be to undo the projection transformation. Then you would need to obtain the according othographic projection matrix and apply it to obtain the positions you're after. That should be the same thing that the unity's orthographic camera does.
To understand the projection matrix to the lowest level this source is awesome.
I think that In the camera component you just need to change the projection from prespective to orthographic:

How to calculate translation matrix?

I have 2D image data with respective camera location in latitude and longitude. I want to translate pixel co-ordinates to 3D world co-ordinates. I have access to intrinsic calibration parameters and Yaw, pitch and roll. Using Yaw, pitch and roll I can derive rotation matrix but I am not getting how to calculate translation matrix. As I am working on data set, I don't have access to camera physically. Please help me to derive translation matrix.
Cannot be done at all if you don't have the elevation of the camera with respect to the ground (AGL or ASL) or another way to resolve the scale from the image (e.g. by identifying in the image an object of known size, for example a soccer stadium in an aerial image).
Assuming you can resolve the scale, the next question is how precisely you can (or want to) model the terrain. For a first approximation you can use a standard geodetical ellipsoid (e.g. WGS-84). For higher precision - especially for images shot from lower altitudes - you will need use a DTM and register it to the images. Either way, it is a standard back-projection problem: you compute the ray from the camera centre to the pixel, transform it into world coordinates, then intersect with the ellipsoid or DTM.
There are plenty of open source libraries to help you do that in various languages (e.g GeographicLib)
Edited to add suggestions:
Express your camera location in ECEF.
Transform the ray from the camera in ECEF as well taking into account the camera rotation. You can both transformations using a library, e.g. nVector.
Then proceeed to intersect the ray with the ellipsoid, as explained in this answer.

How to obtain extrinsic matrix from ARKit camera?

I want to convert the pixel coordinate into real world coordinate. And I found that the ARKit API provide a function in ARCamera call viewMatrix()
Returns a transform matrix for converting from world space to camera
space
It this function can obtain extrinsic matrix for the camera?
This may help:
self.sceneView.session.currentFrame?.camera.transform
The position and orientation of the camera in world coordinate space.
.transform documentation
You can directly extract the eulerAngles from this, but will have to parse the translation yourself.
How come you manually want to project pixels into world positions? (The transform alone isn't going to help you there obviously).

how to obtain the world coordinates of an image

After to calibrated a camera using Jean- Yves Bouget's Camera Calibration Toolbox and checkerboard-patterns printed on cardboard, I´ve obtained extrinsic and intrinsic parameters, I can use the informations to find camera coordinates:
Pc = R * Pw + T
After that, how to obtain the world coordinates of an image using the Pc and calibration parametesr?
thanks in advance.
EDIT
The goal is to use the calibrated camera parameters to measure planar objects with a calibrated Camera). To perform this task i dont know to use the camera parameters. in other words i have to convert the pixels coordinates of the image to world coordinates using the calibrated parameters. I already have the parameters and the new image. How can i do this convertion?
thanks in advance.
I was thinking about problem, and came to the result:
You can't find the object size. The problem is by a single shot, when you have no idea how far the Object is from your camera you can't say something about the size of the object. The calibration just say how far is the image plane from the camera (focal length) and the open angles of the lense. When the focal length changes the calbriation changes too.
But there are some possibiltys:
How to get the real life size of an object from an image, when not knowing the distance between object and the camera?
So how I understand you can approximate the size of the objects.
Your problem can be solved if (and only if) you can express the plane of your object in calibrated camera coordinates.
The calibration procedure outputs, along with the camera intrinsic parameters K, a coordinate transform matrix for every calibration image Qwc_i = [Rwc_i |Twc_i] matrix, that expresses the location and pose of a particular scene coordinate frame in the camera coordinates at that calibration image. IIRC, in Jean-Yves toolbox this is the frame attached to the top-left corner of the calibration checkerboard.
So, if your planar object is on the same plane as the checkerboard in one of the calibration images, all you have to do in order to find its location in space is intersect the checkerboard plane with camera rays cast from the camera center (0,0,0) to the pixels into which the object is imaged.
If your object is NOT in one of those planes, all you can do is infer the object's own plane from additional information, if available, e.g. from a feature of known size and shape.

Resources