Does Kinect Infrared View Have an offset with the Kinect Depth View

Does Kinect Infrared View Have an offset with the Kinect Depth View - opencv

I am working on a Kinect project using the infrared view and depth view. In the infrared view, using CVBlob library, I am able to extract some 2D points of interest. I want to find the depth of these 2D points. So I thought that I can use the depth view directly, something like this:
coordinates3D[0] = coordinates2D[0];
coordinates3D[1] = coordinates2D[1];
coordinates3D[2] = (USHORT*)(LockedRect.pBits)
[(int)coordinates2D[1] * Width + (int)coordinates2D[0]] >> 3;
I don't think this is the right formula to get the depth.
I am able to visualize the 2D points of interest in the depth view. If I get a point (x, y) in infrared view, then I draw it as a red point in the depth view at (x, y)
I noticed that the red points are not where I expect them to be (on an object). There is a systematic error in their locations.
I was of the opinion that the depth view and infrared views have one-to-one correspondence unlike the correspondence between the color view and depth view.
Is this indeed true or is there an offset between the IR and depth views? If there is an offset, can I somehow get the right depth value?

Depth and Color streams are not taken from the same point so they do not correspond to each other perfectly. Also they FOV (field of view) is different.
cameras
IR/Depth FOV 58.5° x 45.6°
Color FOV 62.0° x 48.6°
distance between cameras 25mm
my corrections for 640x480 resolution for both streams
if (valid depth)
{
ax=(((x+10-xs2)*241)>>8)+xs2;
ay=(((y+30-ys2)*240)>>8)+ys2;
}
x,y are in coordinates in depth image
ax,ay are out coordinates in color image
xs,ys = 640,480
xs2,ys2 = 320,240
as you can see my kinect has also y-offset which is weird (even bigger then x-offset). My conversion works well on ranges up to 2 m I did not measure it further but it should work even then
do not forget to correct space coordinates from depth and depth image coordinates
pz=0.8+(float(rawdepth-6576)*0.00012115165336374002280501710376283);
px=-sin(58.5*deg*float(x-xs2)/float(xs))*pz;
py=+sin(45.6*deg*float(y-ys2)/float(ys))*pz;
pz=-pz;
where px,py,pz is point coordinate in [m] in space relative to kinect
I use coordinate system for camera with opposite Z direction therefore the negation of sign
PS. I have old model 1414 so newer models have probably different calibration parameters

There is no offset between the "IR View" and "Depth View". Primarily because they are the same thing.
The Kinect has 2 cameras. A RGB color camera and a depth camera, which uses an IR blaster to generate a field light field that is used when processing the data. These give you a color video stream and a depth data stream; there is no "IR view" separate from the depth data.
UPDATE:
They are actually the same thing. What you are referring to as a "depth view" is simply a colorized version of of the "IR view"; the black-and-white image is the "raw" data, while the color image is a processed version of the same data.
In the Kinect for Windows Toolkit, have a look in the KinectWpfViewers project (if you installed the KinectExplorer-WPF example, it should be there). In there is the KinectDepthViewer and the DepthColorizer classes. They will demonstrate how the colorized "depth view" is created.
UPDATE 2:
Per comments below what I've said above is almost entirely junk. I'll likely go edit it out or just delete my answer in full in the near future, until then it shall stand as a testament to my once invalid beliefs on what was coming from where.
Anyways... Have a look at the CoordinateMapper class as another possible solution. The link will take you to the managed code docs (which is what I'm familiar with), I'm looking around the C++ docs to see if I can find the equivalent.
I've used this to map the standard color and depth views. It may also map the IR view just as well (I wouldn't see why not), but I'm not 100% sure of that.

I created a blog showing the IR and Depth views:
http://aparajithsairamkinect.blogspot.com/2013/06/kinect-infrared-and-depth-views_6.html

This code works for many positions of the trackers from the Kinect:
coordinates3D[0] = coordinates2D[0];
coordinates3D[1] = coordinates2D[1];
coordinates3D[2] = (USHORT*)(LockedRect.pBits)
[(int)(coordinates2D[1] + 23) * Width + (int)coordinates2D[0]] >> 3;

Related

Finding the intrinsic parameters of a camera without a chessboard

I need to find the intrinsic parameters of a CCTV camera using a set of historic footage images (That is all I got, no control on the environment, thus no chessboard calibration).
The good news is that I have the access to some ground-truth real-world coordinates, visible in most of the images.
Just wondering if there is any solid approach to come up with the camera intrinsic parameters.
P.S. I already found the homography matrix using cv2.findHomography in Python.
P.S. I have already tested QTcalib on two machines, but it is unable to visualize the images in the first place. Not sure what is wrong with it.
Thanks in advance.

intrinsic parameters contain both fx fy cx cy and skew with additional distortion parameters k1-k5 r1-r2.
Assuming you have no distortion and cx and cy are perfectly in the center. Image origin at top left as a normal understanding of the image. As you say you know some ground truth level 3D points.3D measurements are with respect to camera optical axis. Then this 3D point P can be projected into camera image plane called p. The P p O(the camera optical center) with center lines forms isosceles triangle.
fx / （p_x-cx) = P_z / P_x
fx = （p_x-cx) * P_z / P_x
The same goes for the fy. and usually fx and fy are the same.
This is under the perfect assumption that you don't have distortion on camera. If you start to have distortion, then you need to find enough sample points all over the image to form distortion understanding as shown below. One or 2 points won't give you the whole picture understanding.
There are some cheats in some papers that using sea vanishing lines(see ref, it is a series of works) or perfect 3D building vanishing points to detect the distortion. We start from extrinsic to intrinsic and it can get some good guess after some trial eventually. But it is very much in research and can not apply to general cases.
Ref: Han Wang, Wei Mou, Xiaozheng Mou, Shenghai Yuan, Soner Ulun, Shuai Yang and Bok-Suk Shin, An Automatic Self-Calibration Approach for Wide Baseline Stereo Cameras Using Sea Surface Images, unmanned system

If all you have is a video and a few 3d points, your best bet is probably to matchmove it, that is, do a manually assisted bundle adjustment using a 3D computer graphics environment, e.g. Blender. There are a lot of tutorials online on how to do it (example). To add the 3d points as constraints, you build some shapes representing them in the virtual world (e.g. some small spheres) and place them so that their relative positions match the ground truth you have, then add them to the tracker solution.

Opencv get accurate real world coordinates from 2 known parallel planes

So I have been tinkering a little bit with opencv and I want to be able to use a camera image to get the position of certain objects that are lying flat on plane. These objects are simple shapes such as circles squares etc. They all have the same height of 5cm. To be able to relate real world points to pixels on the camera I painted 4 white squares on the plane with known distances between them.
So the steps I have been taking are:
Initialization:
Calibrate my camera using a checkerboard image and save the calibration data.
Get the input image. call cv::undistort with the calibration data for my camera.
Find the center points of the 4 squares in the image and pass that data and the real world coordinates of the squares to the cv::solvePnP function. Save the rvec and tvec return parameters.
Warp the perspective of the image so you can get a top down view from the image. This is essentially following this tutorial: https://docs.opencv.org/3.4.1/d9/dab/tutorial_homography.html
Use the resulting image to again find the 4 white squares and then calculate a "pixels per meter" translation constant which can relate a certain amount of difference in pixels between points to the real world distance on the plane where the 4 squares are.
Finding object, This is done after initialization:
Get the input image. call cv::undistort with the calibration data for my camera.
Warp the perspective of the image so you can get a top down view from the image. This is the same as step 4 during initialisation.
Find the centerpoint of the object to detect.
Since the centerpoint of the object is on a higher plane then where I calibrated I use the following formula to correct this(d = is the pixel offset from the center of the image. camHeight is the cameraHeight I measured by using a tape measure. h is height of the object):
d = x - (h * (x / camHeight))
So here for an illustration how I got this formule:
But still the coordinates are not matching up...
So I am wondering at all if this is the correct. Specifically I have the following questions:
Is using cv::undistort before using cv::solvenPnP correct? cv::solvePnP also takes the camera calibration data as input so I'm not sure if I have to pass an undistorted image to it or not.
Similar to 1. During Finding object I call cv::undistort -> cv::warpPerspective. Is this undistort necessary here?
Is my calculation to correct for the parallel planes in step 4 correct? I feel like I am missing something but I can't see what. One thing I am wondering is whether I can get the camera height from opencv once solvePnp is done.
I am a newbie to CV so If anything else is totally wrong please also point it out to me.
Thank you for reading this wall of text!

OpenCV cvRemap Cropping Image

So I am very new to OpenCV (2.1), so please keep that in mind.
So I managed to calibrate my cheap web camera that I am using (with a wide angle attachment), using the checkerboard calibration method to produce the intrinsic and distortion coefficients.
I then have no trouble feeding these values back in and producing image maps, which I then apply to a video feed to correct the incoming images.
I run into an issue however. I know when it is warping/correcting the image, it creates several skewed sections, and then formats the image to crop out any black areas. My question then is can I view the complete warped image, including some regions that have black areas? Below is an example of the black regions with skewed sections I was trying to convey if my terminology was off:
An image better conveying the regions I am talking about can be found here! This image was discovered in this post.
Currently: The cvRemap() returns basically the yellow box in the image linked above, but I want to see the whole image as there is relevant data I am looking to get out of it.
What I've tried: Applying a scale conversion to the image map to fit the complete image (including stretched parts) into frame
CvMat *intrinsic = (CvMat*)cvLoad( "Intrinsics.xml" );
CvMat *distortion = (CvMat*)cvLoad( "Distortion.xml" );
cvInitUndistortMap( intrinsic, distortion, mapx, mapy );
cvConvertScale(mapx, mapx, 1.25, -shift_x); // Some sort of scale conversion
cvConvertScale(mapy, mapy, 1.25, -shift_y); // applied to the image map
cvRemap(distorted,undistorted,mapx,mapy);
The cvConvertScale, when I think I have aligned the x/y shift correctly (guess/checking), is somehow distorting the image map making the correction useless. There might be some math involved here I am not correctly following/understanding.
Does anyone have any other suggestions to solve this problem, or what I might be doing wrong? I've also tried trying to write my own code to fix distortion issues, but lets just say OpenCV knows already how to do it well.

From memory, you need to use InitUndistortRectifyMap(cameraMatrix,distCoeffs,R,newCameraMatrix,map1,map2), of which InitUndistortMap is a simplified version.
cvInitUndistortMap( intrinsic, distort, map1, map2 )
is equivalent to:
cvInitUndistortRectifyMap( intrinsic, distort, Identity matrix, intrinsic,
map1, map2 )
The new parameters are R and newCameraMatrix. R species an additional transformation (e.g. rotation) to perform (just set it to the identity matrix).
The parameter of interest to you is newCameraMatrix. In InitUndistortMap this is the same as the original camera matrix, but you can use it to get that scaling effect you're talking about.
You get the new camera matrix with GetOptimalNewCameraMatrix(cameraMat, distCoeffs, imageSize, alpha,...). You basically feed in intrinsic, distort, your original image size, and a parameter alpha (along with containers to hold the result matrix, see documentation). The parameter alpha will achieve what you want.
I quote from the documentation:
The function computes the optimal new camera matrix based on the free
scaling parameter. By varying this parameter the user may retrieve
only sensible pixels alpha=0, keep all the original image pixels if
there is valuable information in the corners alpha=1, or get something
in between. When alpha>0, the undistortion result will likely have
some black pixels corresponding to “virtual” pixels outside of the
captured distorted image. The original camera matrix, distortion
coefficients, the computed new camera matrix and the newImageSize
should be passed to InitUndistortRectifyMap to produce the maps for
Remap.
So for the extreme example with all the black bits showing you want alpha=1.
In summary:
call cvGetOptimalNewCameraMatrix with alpha=1 to obtain newCameraMatrix.
use cvInitUndistortRectifymap with R being identity matrix and newCameraMatrix set to the one you just calculated
feed the new maps into cvRemap.

Distance to the object using stereo camera

Is there a way to calculate the distance to specific object using stereo camera?
Is there an equation or something to get distance using disparity or angle?

NOTE: Everything described here can be found in the Learning OpenCV book in the chapters on camera calibration and stereo vision. You should read these chapters to get a better understanding of the steps below.
One approach that do not require you to measure all the camera intrinsics and extrinsics yourself is to use openCVs calibration functions. Camera intrinsics (lens distortion/skew etc) can be calculated with cv::calibrateCamera, while the extrinsics (relation between left and right camera) can be calculated with cv::stereoCalibrate. These functions take a number of points in pixel coordinates and tries to map them to real world object coordinates. CV has a neat way to get such points, print out a black-and-white chessboard and use the cv::findChessboardCorners/cv::cornerSubPix functions to extract them. Around 10-15 image pairs of chessboards should do.
The matrices calculated by the calibration functions can be saved to disc so you don't have to repeat this process every time you start your application. You get some neat matrices here that allow you to create a rectification map (cv::stereoRectify/cv::initUndistortRectifyMap) that can later be applied to your images using cv::remap. You also get a neat matrix called Q, which is a disparity-to-depth matrix.
The reason to rectify your images is that once the process is complete for a pair of images (assuming your calibration is correct), every pixel/object in one image can be found on the same row in the other image.
There are a few ways you can go from here, depending on what kind of features you are looking for in the image. One way is to use CVs stereo correspondence functions, such as Stereo Block Matching or Semi Global Block Matching. This will give you a disparity map for the entire image which can be transformed to 3D points using the Q matrix (cv::reprojectImageTo3D).
The downfall of this is that unless there is much texture information in the image, CV isn't really very good at building a dense disparity map (you will get gaps in it where it couldn't find the correct disparity for a given pixel), so another approach is to find the points you want to match yourself. Say you find the feature/object in x=40,y=110 in the left image and x=22 in the right image (since the images are rectified, they should have the same y-value). The disparity is calculated as d = 40 - 22 = 18.
Construct a cv::Point3f(x,y,d), in our case (40,110,18). Find other interesting points the same way, then send all of the points to cv::perspectiveTransform (with the Q matrix as the transformation matrix, essentially this function is cv::reprojectImageTo3D but for sparse disparity maps) and the output will be points in an XYZ-coordinate system with the left camera at the center.

I am still working on it, so I will not post entire source code yet. But I will give you a conceptual solution.
You will need the following data as input (for both cameras):
camera position
camera point of interest (point at which camera is looking)
camera resolution (horizontal and vertical)
camera field of view angles (horizontal and vertical)
You can measure the last one yourself, by placing the camera on a piece of paper and drawing two lines and measuring an angle between these lines.
Cameras do not have to be aligned in any way, you only need to be able to see your object in both cameras.
Now calculate a vector from each camera to your object. You have (X,Y) pixel coordinates of the object from each camera, and you need to calculate a vector (X,Y,Z). Note that in the simple case, where the object is seen right in the middle of the camera, the solution would simply be (camera.PointOfInterest - camera.Position).
Once you have both vectors pointing at your target, lines defined by these vectors should cross in one point in ideal world. In real world they would not because of small measurement errors and limited resolution of cameras. So use the link below to calculate the distance vector between two lines.
Distance between two lines
In that link: P0 is your first cam position, Q0 is your second cam position and u and v are vectors starting at camera position and pointing at your target.
You are not interested in the actual distance, they want to calculate. You need the vector Wc - we can assume that the object is in the middle of Wc. Once you have the position of your object in 3D space you also get whatever distance you like.
I will post the entire source code soon.

I have the source code for detecting human face and returns not only depth but also real world coordinates with left camera (or right camera, I couldn't remember) being origin. It is adapted from source code from "Learning OpenCV" and refer to some websites to get it working. The result is generally quite accurate.

Given normal map in world space what is a suitable algorithm to find edges?

If I have the vertex normals of a normal scene showing up as colours in a texture in world space is there a way to calculate edges efficiently or is it mathematically impossible? I know it's possible to calculate edges if you have the normals in view space but I'm not sure if it is possible to do so if you have the normals in world space (I've been trying to figure out a way for the past hour..)
I'm using DirectX with HLSL.

if ( normalA dot normalB > cos( maxAngleDiff )
then you have an edge. It won't be perfect but it will definitely find edges that other methods won't.
Or am i misunderstanding the problem?
Edit: how about, simply, high pass filtering the image?

I assume you are trying to make cartoon style edges for a cell shader?
If so, simply make a dot product of the world space normal with the world space pixel position minus camera position. As long as your operands are all in the same space you should be ok.
float edgy = dot(world_space_normal, pixel_world_pos - camera_world_pos);
If edgy is near 0, it's an edge.
If you want a screen space sized edge you will need to render additional object id information on another surface and post process the differences to the color surface.

It will depend on how many colors your image contain, and how they merge: sharp edges, dithered, blended,...
Since you say you have the vertex normals I am assuming that you can access the color-information on a single plane.
I have used two techniques with varying success:
I searched the image for local areas of the same color (RGB) and then used the highest of R, G or B to find the 'edge' - that is where the selected R,G or B is no longer the highest value;
the second method I used is to reduce the image to 16 colors internally, and it is easy to find the outlines in this case.
To construct vectors would then depend on how fine you want the granularity of your 'wireframe'-image to be.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart