How to get back the co-ordinate points corresponding to the intensity points obtained from a faster r-cnn object detection process? - machine-learning

As a result of the faster r-cnn method of object detection, I have obtained a set of boxes of intensity values(each bounding box can be thought of as a 3D matrix with depth of 3 for rgb intensity, a width and a height which can then be converted into a 2D matrix by taking gray scale) corresponding to the region containing the object. What I want to do is to obtain the corresponding co-ordinate points in the original image for each cell of intensity inside of the bounding box. Any ideas how to do so?

From what I understand, you got an R-CNN model that outputs cropped pieces of the input image and you now want to trace those output crops back to their coordinates in the original image.
What you can do is simply use a patch-similarity-measure to find the original position.
Since the output crop should look exactly like itself in the original image, just use Pixel-based distance:
Find the place in the image with the smallest distance (should be zero) and from that you can find your desired coordinates.
In python:
d_min = 10**6
crop_size = crop.shape
for x in range(org_image.shape[0]-crop_size[0]):
for y in range(org_image.shape[1]-crop_size[1]):
d = np.abs(np.sum(np.sum(org_image[x:x+crop_size[0],y:y+crop_size[0]]-crop)))
if d <= d_min:
d_min = d
coord = [x,y]
However, your model should have that info available in it (after all, it crops the output based on some coordinates). Maybe if you add some info on your implementation.

Related

how to match rgb image pixels with corresponding pointcloud points

I have a color image, corresponding point cloud captured by oak-D camera(see the image below) and i want to get the information of pixels in the color image and corresponding point cloud value in point cloud.
how can i get this information? for instance, i have a pixel value (200,250) in the color image and how to know the corresponding point value in the point cloud?
any help would be appreciated.
It sounds like you want to project a 2D image to a 3D point cloud using the computed disparity map. To do this you will also need to know about your camera intrinsics. Since you are using the oak-D, you should be able to get everything you need with the following piece of code.
with dai.Device(pipeline) as device:
calibData = device.readCalibration()
# get right intrinsic matrix
w, h = monoRight.getResolutionSize()
K_right = calibData.getCameraIntrinsics(dai.CameraBoardSocket.RIGHT, dai.Size2f(w, h))
# get left intrinsic matrix
w, h = monoLeft.getResolutionSize()
K_left = calibData.getCameraIntrinsics(dai.CameraBoardSocket.LEFT, dai.Size2f(w, h))
R_left = calibData.getStereoLeftRectificationRotation()
R_right = calibData.getStereoRightRectificationRotation()
x_baseline = calibData.getBaselineDistance()
Once you have all you camera parameters, you should be able to use opencv to approach this.
First you will need to construct the Q matrix (or rectified transformation matrix).
You will need to provide
The left and right intrinsic calibration matrices
The Translation vector from the coordinate system of the first camera to the second camera
The Rotation matrix from the coordinate system of the first camera to the second camera
Here's a coded example:
import numpy as np
import cv2
Q = np.zeros((4,4))
cv2.stereoRectify(cameraMatrix1=K_left, # left intrinsic matrix
cameraMatrix2=K_right, # right intrinsic matrix
distCoeffs1=0,
distCoeffs2=0,
imageSize=imageSize, # pass in the image size
R=R_left, # Rotation matrix from camera 1 to camera 2
T=x_baseline, # Translation matrix from camera 1 to camera 2
R1=None,
R2=None,
P1= None,
P2= None,
Q=Q);
Next you will need to reproject the image to 3D, using the known disparity map and the Q matrix. The operation is illustrated below, but opencv makes this much easier.
xyz = cv2.reprojectImageTo3D(disparity, Q)
This will give you an array of 3D points. This array specifically has the shape: (rows, columns, 3), where the 3 corresponds the (x,y,z) coordinate of the point cloud. Now you can use the a pixel location to index into xyz and find it's corresponding (x, y, z) point.
pix_row = 200
pix_col = 250
point_cloud_coordinate = xyz[pix_row, pix_col, :]
See the docs for more details
cv2.stereoRectify()
cv2.reprojectImageTo3D()

Opencv get accurate real world coordinates from 2 known parallel planes

So I have been tinkering a little bit with opencv and I want to be able to use a camera image to get the position of certain objects that are lying flat on plane. These objects are simple shapes such as circles squares etc. They all have the same height of 5cm. To be able to relate real world points to pixels on the camera I painted 4 white squares on the plane with known distances between them.
So the steps I have been taking are:
Initialization:
Calibrate my camera using a checkerboard image and save the calibration data.
Get the input image. call cv::undistort with the calibration data for my camera.
Find the center points of the 4 squares in the image and pass that data and the real world coordinates of the squares to the cv::solvePnP function. Save the rvec and tvec return parameters.
Warp the perspective of the image so you can get a top down view from the image. This is essentially following this tutorial: https://docs.opencv.org/3.4.1/d9/dab/tutorial_homography.html
Use the resulting image to again find the 4 white squares and then calculate a "pixels per meter" translation constant which can relate a certain amount of difference in pixels between points to the real world distance on the plane where the 4 squares are.
Finding object, This is done after initialization:
Get the input image. call cv::undistort with the calibration data for my camera.
Warp the perspective of the image so you can get a top down view from the image. This is the same as step 4 during initialisation.
Find the centerpoint of the object to detect.
Since the centerpoint of the object is on a higher plane then where I calibrated I use the following formula to correct this(d = is the pixel offset from the center of the image. camHeight is the cameraHeight I measured by using a tape measure. h is height of the object):
d = x - (h * (x / camHeight))
So here for an illustration how I got this formule:
But still the coordinates are not matching up...
So I am wondering at all if this is the correct. Specifically I have the following questions:
Is using cv::undistort before using cv::solvenPnP correct? cv::solvePnP also takes the camera calibration data as input so I'm not sure if I have to pass an undistorted image to it or not.
Similar to 1. During Finding object I call cv::undistort -> cv::warpPerspective. Is this undistort necessary here?
Is my calculation to correct for the parallel planes in step 4 correct? I feel like I am missing something but I can't see what. One thing I am wondering is whether I can get the camera height from opencv once solvePnp is done.
I am a newbie to CV so If anything else is totally wrong please also point it out to me.
Thank you for reading this wall of text!

Take pixels from input image (subsampling of image pixels)

How to take pixels from an input image by using Gaussian sub-sampling (shotgun pattern like)?
I want to take the locations of pixels that are to be taken like in a shotgun pattern concentrated in the middle of the image. Because I do not want to extract features of all pixels in an image. The output should be the coordinates of sampled pixels. I will be thankful if you guide me.
Is there any function or code that I can get help from that.
Your help is appreciated.
If you are looking for a method to define a Region of Interest (ROI) of an image in Matlab in order to perform some operation in a restricted are, remembering that x coordinates represent column and y is on the rows (matlab reads images as matrices):
For cut an image from x1 to x2 and from y1 to y2 try something like
ROI = image[y1:y2,x1:x2]
but how to determine these 4 values without a specific example is up to you

Understanding Disparity Map in Opencv

Can somebody explain me what exactly does a disparity map return. Because there is not much given in the documentation and I have a few questions related to it.
Does it return difference values of pixels with respect to both images?
How to use disparity values in the formula for depth estimation i.e.
Depth = focalLength*Baseline/Disparity
I have read somewhere that disparity map gives a function of depth f(z)
Please explain what it means. If depth is purely an absolute value how can it be generated as a function or is it a function with respect to the pixels?
The difference d = pl − pr of two corresponding image points is called disparity.
Here, pl is the position of the point in the left stereo image and pr is the position of the point in the right stereo image.
For parallel optical axes, the disparity is d = xl − xr
⇒ search for depth information is equivalent to search for disparity, i.e. corresponding pixel
the distance is inversely proportional to disparity
The disparity values are visualized in a so-called disparity map, each disparity value for each pixel in the reference image (here: left) is coded as a grayscale value. Also for pixel that do not have any correspondences, a grayscale value (here: black) is defined. The so-called groundtruth-map is a disparity map that contains the ideal solution of the correspondence problem.
Relation between Disparity and Depth information:
The following image represent two cameras (left and right) and then tries to find the depth of a point p(x_w, z_x).
The result of depth is given my:
so, it can be seen that the depth is inversely proportional to disparity.
UPDATE:
To calculate the disparity, you need two image (1) Left image and (2) Right image. Lets say that there is a pixel at position (60,30) in left image and that same pixel is present at position (40,30) in right image then your disparity will be: 60 - 40 = 20. So, disparity map gives you the difference between the position of pixels between left image and right image. If a pixel is present in left image but absent in right image then then value at that position in disparity map will be zero. Once you get the disparity value for each pixel of left image then we can easily calculate the depth using the formula given at the end of my answer.

Display 3D matrix with its thickness information

I have a problem about plotting 3D matrix. Assume that I have one image with its size 384x384. In loop function, I will create about 10 images with same size and store them into a 3D matrix and plot the 3D matrix in loop. The thickness size is 0.69 between each size (distance between two slices). So I want to display its thickness by z coordinate. But it does not work well. The problem is that slice distance visualization is not correct. And it appears blue color. I want to adjust the visualization and remove the color. Could you help me to fix it by matlab code. Thank you so much
for slice = 1 : 10
Img = getImage(); % get one 2D image.
if slice == 1
image3D = Img;
else
image3D = cat(3, image3D, Img);
end
%Plot image
figure(1)
[x,y,z] = meshgrid(1:384,1:384,1:slice);
scatter3(x(:),y(:),z(:).*0.69,90,image3D(:),'filled')
end
The blue color can be fixed by changing the colormap. Right now you are setting the color of each plot point to the value in image3D with the default colormap of jet which shows lower values as blue. try adding colormap gray; after you plot or whichever colormap you desire.
I'm not sure what you mean by "The problem is that slice distance visualization is not correct". If each slice is of a thickness 0.69 than the image values are an integral of all the values within each voxel of thickness 0.69. So what you are displaying is a point at the centroid of each voxel that represents the integral of the values within that voxel. Your z scale seems correct as each voxel centroid will be spaced 0.69 apart, although it won't start at zero.
I think a more accurate z-scale would be to use (0:slice-1)+0.5*0.69 as your z vector. This would put the edge of the lowest slice at zero and center each point directly on the centroid of the voxel.
I still don't think this will give you the visualization you are looking for. 3D data is most easily viewed by looking at slices of it. you can check out matlab's slice which let's you make nice displays like this one:
slice view http://people.rit.edu/pnveme/pigf/ThreeDGraphics/thrd_threev_slice_1.gif

Resources