Perspective Transform (Homography) on rgb-d images - opencv

I'm using a RGB-D camera (Intel Realsense D345) to implement a table top projected augmented reality system. Using chessboard calibration I obtain a transformation matrix which I use to transform each incoming frame using warpPerspective from openCV. It works really well for the color frames. The problem is, am I allowed to do this for depth images as well ? Considering depth images are 3D geometrical data. What's the right way to apply a transformation matrix to depth images?

Related

Use EMGU to get "real world" coordinates of pixel values

There are a number of calibration tutorials to calibrate camera images of chessboards in EMGU (OpenCV). They all end up calibrating and then undistorting an image for display. That's cool and all but I need to do machine vision where I am taking an image, identifying the location of a corner or blob or feature in the image and then translating the location of that feature in pixels into real world X, Y coordinates.
Pixel -> mm.
Is this possible with EMGU? If so, how? I'd hate to spend a bunch of time learning EMGU and then not be able to do this crucial function.
Yes, it's certainly possible as the "bread and butter" of OpenCV.
The calibration you are describing, in terms of removing distortions, is a prerequisite to this process. After which, the following applies:
The Intrinsic calibration, or "camera matrix" is the first of two required matrices. The second is the Extrinsic calibration of the camera which is essentially the 6 DoF transform that describes the physical location of the sensor center relative to a coordinate reference frame.
All of the Distortion Coefficients, Intrinsic, and Extrinsic Calibrations are available from a single function in Emgu.CV: CvInvoke.CalibrateCamera This process is best explained, I'm sure, by one of the many tutorials available that you have described.
After that it's as simple as CvInvoke.ProjectPoints to apply the transforms above and produce 3D coordinates from 2D pixel locations.
The key to doing this successfully this providing comprehensive IInputArray objectPoints and IInputArray imagePoints to CvInvoke.CalibrateCamera. Be sure to cause "excitation" by using many images, from many different perspectives.

How to rectify my own image to the cameras of the KITTI dataset using OpenCV

Based on the documentation of stereo-rectify from OpenCV, one can rectify an image based on two camera matrices, their distortion coefficients, and a rotation-translation from one camera to another.
I would like to rectify an image I took using my own camera to the stereo setup from the KITTI dataset. From their calibration files, I know the camera matrix and size of images before rectification of all the cameras. All their data is calibrated to their camera_0.
From this PNG, I know the position of each of their cameras relative to the front wheels of the car and relative to ground.
I can also do a monocular calibration on my camera and get a camera matrix and distortion coefficients.
I am having trouble coming up with the rotation and translation matrix/vector between the coordinate systems of the first and the second cameras, i.e. from their camera to mine or vice-versa.
I positioned my camera on top of my car at almost exactly the same height and almost exactly the same distance from the center of the front wheels, as shown in the PNG.
However now I am at a loss as to how I can create the joint rotation-translation matrix. In a normal stereo-calibrate, these are returned by the setereoCalibrate function.
I looked at some references about coordinate transformation but I don't have sufficient practice in them to figure it out on my own.
Any suggestions or references are appreciated!

Homography when camera translation (for stitching)

I have a camera which I take with it 2 captures. I want do make a reconstitution with the 2 images in one image.
I only do a translation with the camera an take images of a plane TV screen. I heard homography only works when the camera does a rotation.
What should i do when I only have a translation?
Because you are imaging a planar surface (in your case a TV screen), all images of it with a perspective camera will be related by homographies. This is the same if your camera is translating and/or rotating. Therefore to stitch different images of the surface, you don't need to do any 3D geometry processing (essential matrix computation/triangulation etc.).
To solve your problem you need to do the following:
You determine the homographies between your images. Because you only have two images you can select the first one as the 'source' and the second one as the 'target', and compute the homography from target to source. This is classically done with feature detection and robust homography fitting. Let's denote this homography by the 3x3 matrix H.
You warp your target image to your source using H. You can do this in openCV with the warpPerspective method.
Merge your source and warped target using a blending function.
An open source project for doing exactly these steps is here.
If your TV lacks distinct features or there is lots of background clutter, the method for estimating H might not be highly robust. If this is the case you could manually click four or more correspondences on the TV in the target and source images, and compute H using OpenCV's findHomography method. Note that your correspondences cannot be completely arbitrary. Specifically, there should not be three correspondences that are colinear (in which case H cannot be computed). They should also be clicked as accurately as possible because errors will affect the final stitch and cause ghosting artefacts.
An important caveat is if your camera has significant lens distortion. In this case your images will not be related by homographies. You can deal with this by performing a calibration of your camera using OpenCV, and then you need to pre-process your images to undo the lens distortion (using OpenCV's undistort method).

3D rendering in OpenCV

I am doing a project on 3D rendering of a scene. I am using OpenCV. The steps I am doing are like this:
Taking two images of a scene.
Calculating object correspondence using SURF feature matching.
Calculating camera fundamental matrix.
Calculating the Disparity image.
Now I have two questions
After calculating fundamental matrix how can I calculate the Q matrix? (I can't calibrate the camera)
How can I render in 3D using opencv or any other library?
For the 3D part, you can render your scene with OpenGL or with PCL. You've two solutions:
For each pixel, you make a point with the right color extracted from the camera's image. This will give you a point cloud which can be processed with PCL (for 3D features extraction for example).
You apply a triangulation algorithm, but in order to apply this algorithm you must have the extrinsic matrices of your camera.
You can find more information about these techniques here:
Point Cloud technique
Triangulation algorithm
If you want to use OpenGL, you have to open a valid OpenGL context. I recommend you the SFML library or Qt. These libraries are very easy to use and have a good documentation. Both have tutorials about 3D rendering with OpenGL.
you can have Q matrix from stereo rectification via openCV method:
cv::stereoRectify
I think you want the Q matrix to reconstruct the 3D. However, you can reconstruct from intrinsic parameters via:
X = (u-cu)*base/d
Y = (v-cv)*base/d
Z = f*base/d
where (u,v) is a 2D point in the image coordinate system, (cu,cv) is the principal point of the camera, f is the focal length, base is the baseline, d is the disparity and (X,Y,Z) is a 3D point in the camera coordinate system.
For the visualization, it is possible to use PCL or VTK (the visualization of PCL is based on vtk, but for me more simple to implement).
If you just want to have a look to the output you can just use some software like Meshlab
Cheers

Camera Calibration

I am using OpenCV, a newbie to the entire thing.
I have a scenario, I am projecting on a wall, I am building a kind of a robot which has a camera. I wanted to know how can I process the image so that I could get the real-world values of the co-ordinates of the blobs tracked by my camera?
First of all, you need to calibrate the intrinsic of the camera. Use checkerboard-patterns printed on cardboard to do this, OpenCV has methods for this although there are finished tools for this as well.
To get an idea, I have written some python code to calibrate from a live video stream, move the cardboard along the camera in some different angles and distances. Take a look here: http://svn.ioctl.eu/pub/opencv/py-camera_intrinsic/
Then you need to calibrate the extrinsic of the camera, that is the position of the camera wrt. your world coordinates. You can place some markers on the wall, define the 3D-position of those markers and let OpenCV calibrate the extrinsic for this (cvFindExtrinsicCameraParams2).
In my sample code, I calculate the extrinsic wrt. the checkerboard so I can render a Teapot in the correct perspective of the camera. You have to adjust this to your needs.
I assume you project only onto a flat surface. You have to know the geometry to get the 3D coordinates of your detected blob. You can then find the blobs in your camera image and knowing intrinsic, extrinsic and the geometry, you can cast rays for each blob from the camera according to your intrinsic/extrinsic and calculate the intersection of each such ray with your known geometry. The intersection then is your 3D point in world space where the blob is projected to.

Resources