Orthographic Projection in OpenCV for Projecting 3D Points - opencv

I have the following inputs:
A color image of size 480 x 848 pixels
An aligned depth image of size 480 x 848 pixels
Camera intrinsic parameters
A transformation from the camera to my frame located at the top
Consider the camera looking into an object from an angle. Furthermore, assume that we have defined a frame at the top of this object. I want to transform the color and depth image from the camera to this frame. As if the camera is mounted at this frame.
The 3D points (a point cloud having x, y, and z values without color) can be obtained using depth image and camera parameters. I want to transform these 3D points (with color) into the top frame. Because these 3D points are actual points in a 3D space, I believe this is just an orthographic projection.
Sample Code
In [1]: import numpy as np
In [2]: import cv2 as cv
In [3]: cv.__version__
Out[3]: '4.2.0'
In [4]: image = cv.imread("image.png")
In [5]: image.shape, image.dtype
Out[5]: ((480, 848, 3), dtype('uint8'))
In [6]: depth = cv.imread("depth.png", cv.CV_16UC1)
In [7]: depth.shape, depth.dtype
Out[7]: ((480, 848), dtype('uint16'))
In [8]: mask = np.ones_like(depth) * 255
In [9]: mask = mask.astype(np.uint8)
In [10]: mask.shape, mask.dtype
Out[10]: ((480, 848), dtype('uint8'))
In [11]: # define transformation from camera to my frame located at top
In [12]: Rt = np.array([[ 1. , -0. , 0. , 0. ],
...: [ 0. , 0.89867918, 0.43860659, -0.191 ],
...: [-0. , -0.43860659, 0.89867918, 0.066 ],
...: [ 0. , 0. , 0. , 1. ]])
In [13]: # camera focal lengths and principal point
In [14]: cameraMatrix = np.array([[ 428.12915039, 0, 418.72729492 ],
...: [ 0, 427.6109314, 238.20678711 ],
...: [ 0, 0, 1 ]])
In [15]: # camera distortion parameters (k1, k2, t1, t2, k3)
In [16]: distCoeff = np.array([-0.05380916, 0.0613398, -0.00064336, 0.00040269, -0.01984365])
In [17]: warpedImage, warpedDepth, warpedMask = cv.rgbd.warpFrame(image, depth, mask, Rt, cameraMatrix, distCoeff)
In [18]: cv.imwrite("warpedImage.png", warpedImage)
Out[18]: True
In [19]: cv.imwrite("warpedDepth.png", warpedDepth)
Out[19]: True
Frame Visualization
The camera is located at camera_color_optical_frame and looking at the object at an angle
The top frame named my_frame is situated on the top of the object
The object is kept at workspace frame
Input Images
Color Image
Depth Image
Output Images
Warped Image
Warped Depth
Expected Output
The output image should be similar to the picture taken from the camera at the top position. A sample image is shown below. We know that we can not get precisely this image; nevertheless, the image below is just for reference purposes.
Notice carefully that this image does not contain the red color attached to the object's walls.

Assuming you are fine with blank space due to parts not seen in the image, using another package, and some extra processing time, you can use Open3D to transform(basically, rotate) the RGBD Image by the required amount.
First create a open3d RGBD-image
import open3d as o3d
color_raw = o3d.io.read_image("image.png")
depth_raw = o3d.io.read_image("depth.png")
rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(
color_raw, depth_raw)
Then, convert to PCD, (I tried transforming an RGBD image but didn't work.. so conversion)
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(
rgbd_image,
o3d.camera.PinholeCameraIntrinsic(
o3d.camera.PinholeCameraIntrinsicParameters.PrimeSenseDefault))
# Flip it, otherwise the pointcloud will be upside down
pcd.transform([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])
o3d.visualization.draw_geometries([pcd], zoom=0.5)
Replace the default camera parameters by your own.
Then, you can use a transformation according to the frames
Next, if you have to convert the pcd back to an RGBD image, follow this example.
Secondly, here is a similar unanswered question where the user ends up using perspective transform for 2-D images
PS. new here, suggestions welcome

Related

Pixel per mm calculation of an image using camera calibration matrix and object distance is not same as pixel dimension of object in MS Paint

I want to calculate number of pixels per grid (i.e. pixels per 11 mm) of the checkerboard. I am doing this to validate that mm/pixel calculation I obtain using calibration matrix and formula (below) is same as what I will see when I open the image in MS Paint.
For the checkerboard image (1920x1080 resolution): 11 grid x 7 grid in size with each grid as 11x 11 mm, at a distance of 500 mm (picture below).
I compute a calibration matrix using code:
import cv2
import numpy as np
import pathlib
#from utils import *
import glob
from argparse import ArgumentParser
topview_image_path = 'checkerboard_top\*.png'
camera_orientation = 'topview'
if camera_orientation == 'topview':
image_path = topview_image_path
def calibrate_chessboard(folder):
# Defining the dimensions of checkerboard
CHECKERBOARD = (6,9)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Creating vector to store vectors of 3D points for each checkerboard image
objpoints = []
# Creating vector to store vectors of 2D points for each checkerboard image
imgpoints = []
# Defining the world coordinates for 3D points
objp = np.zeros((1, CHECKERBOARD[0] * CHECKERBOARD[1], 3), np.float32)
objp[0,:,:2] = np.mgrid[0:CHECKERBOARD[0], 0:CHECKERBOARD[1]].T.reshape(-1, 2)
prev_img_shape = None
# Extracting path of individual image stored in a given directory
print("image path:", image_path)
images = glob.glob(image_path)
#images = glob.glob(f'{folder}/*.png')
# if len(images) == 0:
# images = glob.glob(f'{folder}/*.jpg')
# print(images)
for fname in images:
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
# If desired number of corners are found in the image then ret = true
ret, corners = cv2.findChessboardCorners(gray, CHECKERBOARD, cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_FAST_CHECK + cv2.CALIB_CB_NORMALIZE_IMAGE)
"""
If desired number of corner are detected,
we refine the pixel coordinates and display
them on the images of checker board
"""
if ret == True:
objpoints.append(objp)
# refining pixel coordinates for given 2d points.
corners2 = cv2.cornerSubPix(gray, corners, (11,11),(-1,-1), criteria)
imgpoints.append(corners2)
# Draw and display the corners
img = cv2.drawChessboardCorners(img, CHECKERBOARD, corners2, ret)
cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
h,w = img.shape[:2]
"""
Performing camera calibration by
passing the value of known 3D points (objpoints)
and corresponding pixel coordinates of the
detected corners (imgpoints)
"""
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
return [ret, mtx, dist, rvecs, tvecs]
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--type', dest='type',type=str, default='topview',help='is the image topview or sideview?')
parser.add_argument('--folder', dest='folder',type=str, default='cal_images/checkerboard_topview',help='is the image topview or sideview?')
args = parser.parse_args()
WIDTH = 6
HEIGHT = 9
# Calibrate
ret, mtx, dist, rvecs, tvecs = calibrate_chessboard(args.folder)
print(mtx)
print(dist)
mtx_list = ["calibration matrix:\n", str(mtx),
"\ndistortion matrix:", str(dist)]
txt_file = "matrix_" + camera_orientation +".txt"
with open(txt_file, mode='wt', encoding='utf-8') as myfile:
myfile.write('\n'.join(mtx_list))
and get the calibration matrix as:
M = [[2.86276094e+03 0.00000000e+00 8.23315889e+02]
[0.00000000e+00 2.86846709e+03 5.80987675e+02]
[0.00000000e+00 0.00000000e+00 1.00000000e+00]]
This gives me the focal length (f) in pixel units (M[0][0]) i.e. 2862.
I then calculate size in pixels (X_sizepx) of the checkerboard grid 11 mm (X_sizemm) object at distance Z (in my case 500mm) using formula:
X_sizepx = (f/Z) * X_sizemm
Substituting all the values: f = 2862, Z =500, X_sizemm = 11, I get 62.94. So as per opencv calibration and the formula, 11mm should ~63 pixels.
I then open the checkerboard image in MS paint to see the pixel dimension of the square grid in pixels and it says 41 pixels (image below).
This is a big difference if I were to use camera calibration matrix and the the formula. Is there something I am doing wrong?
Note: Z can never be more than 530 mm if I were to assume that Z might be slightly off.

Detect rectangular table from ASUS Xtion Live Pro IR image

I have an IR image with a resolution of (240 x 320), datatype: float32 as you can see here:
The raw npy image is here.
My objective is to detect the table (brown contour) in order to crop this region as a ROI.
What I have tried so far is to do:
Histogram equalization to increase contrast,
Gaussian Blurring to reduce the noise, and
contour detection to find the rectangular table in the middle of the image.
Note that the table in my case is installed on wheels, and hence it might slightly move so I want to detect it dynamically, and not use its fixed position inside the image.
The code I have used is:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
import cv2
import random as rng
path = ""
# Read the numpy array
ir_raw = np.load(path+"ir.npy") # (240, 320) float32
ir = np.uint8((ir_raw/ir_raw.max()) * 255)
# Histogram equalization (Contrast Adjustment)
heq_ir = cv2.equalizeHist(ir)
# Gaussian smoothing (Noise Removal)
g_blur_ir = cv2.GaussianBlur(heq_ir, (5,5), 0)
# Otsu Thresholding
_, thresh_ir = cv2.threshold(g_blur_ir, 120, 255, cv2.THRESH_BINARY +
cv2.THRESH_OTSU)
# Find contours
contours, hierarchy = cv2.findContours(thresh_ir, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Draw contours
rng.seed(12345)
drawing = np.zeros((thresh_ir.shape[0], thresh_ir.shape[1], 3), dtype=np.uint8)
for i in range(len(contours)):
color = (rng.randint(0,256), rng.randint(0,256), rng.randint(0,256))
cv2.drawContours(drawing, contours, i, color, 2, cv2.LINE_8, hierarchy, 0)
plt.subplot(121)
plt.imshow(heq_ir)
plt.title("IR")
plt.subplot(122)
plt.imshow(drawing)
plt.title("IR contours")
plt.show()
Can you please tell me how can I detect the rectangular table in order to crop it as a ROI? thanks in advance.
Assuming the following, the table always has same area (well duh) but in the case below its always the largest object detected (the code should be easily to amend to filter out objects of different areas though).
I've filtered out all contours that have a different area to the table contour (process of elimination until I found the table then set the area threshold accordingly) & then fitted a rectangle to that contour. It is possible to fit skewed rectangles in OpenCV as well but I havent in this case. (EDIT: code for skewed rectangles added)
Your ROI boundary is in boundRect
for i in range(len(contours)):
if cv2.contourArea(contours[i]) > 10000:
color = (rng.randint(0,256), rng.randint(0,256), rng.randint(0,256))
cv2.drawContours(drawing, contours, i, color, 2, cv2.LINE_8, hierarchy, 0)
# For unrotated bounding rectangle
boundRect = cv2.boundingRect(contours[i])
cv2.rectangle(drawing, (int(boundRect[0]), int(boundRect[1])), (int(boundRect[0]+boundRect[2]), int(boundRect[1]+boundRect[3])), (255, 255, 255), 2)
# For minimal bounding rectangle that will rotate with table rotation
rect = cv2.minAreaRect(contours[i])
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(drawing, [box], 0, (255, 0, 255), 2)
print(cv2.contourArea(contours[i]))
Ouput,

Why cannot draw lines after Canny Edge Detection?

Actually, I am noob for working with Computer Vision. Sorry in advance.
I want to detect edges of tram lane. Mostly, the code works well but sometimes It cannot even draw a line. I don't know why.
cropped_Image function is just cropping the polygonal area of the current frame.
display_lines function draw lines whose absolute value of angle is between 30 and 90. It uses cv2.line to draw lines.
Here is the code:
_,frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY) # convert image to gray to be one layer
blur = cv2.GaussianBlur(gray, (1, 1), 0) # to reduce noise in gray scale image
canny = cv2.Canny(blur, 150, 200, apertureSize=3)
cropped_image = region_of_interest(canny) # simply, it crops bottom of image
lines = cv2.HoughLinesP(cropped_image, 1, np.pi / 180, 100, np.array([]),
minLineLength=5, maxLineGap=5)
hough_bundler = HoughBundler()
lines_merged = hough_bundler.process_lines(lines, cropped_image)
line_image = display_lines(frame, lines_merged)
combo_image = cv2.addWeighted(frame, 0.8, line_image, 1, 1)
cv2.imshow(‘test’, combo_image)
To see it: HoughBundler
Expected: expected img
Canny: canny img of wrong result
Result: wrong result
First of all I'd start by fixing the cv2.GuassianBlur() line. You've used a 1x1 kernel which doesn't do anything, you need to use at least a 3x3 kernel. Look into how convolutions are applied if you want to know why a 1x1 filter doesn't work.
Secondly, I would play with the Canny aperture size to suit my needs. Also after edge detection you can use cv2.erode() with a 3x3 or 5x5 kernel so that you don't get a broken line in the image.

Inverse Perspective Transform?

I am trying to find the bird's eye image from a given image. I also have the rotations and translations (also intrinsic matrix) required to convert it into the bird's eye plane. My aim is to find an inverse homography matrix(3x3).
rotation_x = np.asarray([[1,0,0,0],
[0,np.cos(R_x),-np.sin(R_x),0],
[0,np.sin(R_x),np.cos(R_x),0],
[0,0,0,1]],np.float32)
translation = np.asarray([[1, 0, 0, 0],
[0, 1, 0, 0 ],
[0, 0, 1, -t_y/(dp_y * np.sin(R_x))],
[0, 0, 0, 1]],np.float32)
intrinsic = np.asarray([[s_x * f / (dp_x ),0, 0, 0],
[0, 1 * f / (dp_y ) ,0, 0 ],
[0,0,1,0]],np.float32)
#The Projection matrix to convert the image coordinates to 3-D domain from (x,y,1) to (x,y,0,1); Not sure if this is the right approach
projection = np.asarray([[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[0, 0, 1]], np.float32)
homography_matrix = intrinsic # translation # rotation # projection
inv = cv2.warpPerspective(source_image, homography_matrix,(w,h),flags = cv2.INTER_CUBIC | cv2.WARP_INVERSE_MAP)
My question is, Is this the right approach, as I can manual set a suitable ty,rx, but not for the one (ty,rx) which is provided.
First premise: your bird's eye view will be correct only for one specific plane in the image, since a homography can only map planes (including the plane at infinity, corresponding to a pure camera rotation).
Second premise: if you can identify a quadrangle in the first image that is the projection of a rectangle in the world, you can directly compute the homography that maps the quad into the rectangle (i.e. the "birds's eye view" of the quad), and warp the image with it, setting the scale so the image warps to a desired size. No need to use the camera intrinsics. Example: you have the image of a building with rectangular windows, and you know the width/height ratio of these windows in the world.
Sometimes you can't find rectangles, but your camera is calibrated, and thus the problem you describe comes into play. Let's do the math. Assume the plane you are observing in the given image is Z=0 in world coordinates. Let K be the 3x3 intrinsic camera matrix and [R, t] the 3x4 matrix representing the camera pose in XYZ world frame, so that if Pc and Pw represent the same 3D point respectively in camera and world coordinates, it is Pc = R*Pw + t = [R, t] * [Pw.T, 1].T, where .T means transposed. Then you can write the camera projection as:
s * p = K * [R, t] * [Pw.T, 1].T
where s is an arbitrary scale factor and p is the pixel that Pw projects onto. But if Pw=[X, Y, Z].T is on the Z=0 plane, the 3rd column of R only multiplies zeros, so we can ignore it. If we then denote with r1 and r2 the first two columns of R, we can rewrite the above equation as:
s * p = K * [r1, r2, t] * [X, Y, 1].T
But K * [r1, r2, t] is a 3x3 matrix that transforms points on a 3D plane to points on the camera plane, so it is a homography.
If the plane is not Z=0, you can repeat the same argument replacing [R, t] with [R, t] * inv([Rp, tp]), where [Rp, tp] is the coordinate transform that maps a frame on the plane, with the plane normal being the Z axis, to the world frame.
Finally, to obtain the bird's eye view, you select a rotation R whose third column (the components of the world's Z axis in camera frame) is opposite to the plane's normal.

Field of view of a GoPro camera

I have calibrated my GoPro Hero 4 Black using Camera calibration toolbox for Matlab and calculated its fields of view and focal length using OpenCV's calibrationMatrixValues(). These, however, differ from GoPro's specifications. Istead of 118.2/69.5 FOVs I get 95.4/63.4 and focal length 2.8mm instead of 17.2mm. Obviously something is wrong.
I suppose the calibration itself is correct since image undistortion seems to be working well.
Can anyone please give me a hint where I made a mistake? I am posting my code below.
Thanks.
Code
cameraMatrix = new Mat(3, 3, 6);
for (int i = 0; i < cameraMatrix.height(); i ++)
for (int j = 0; j < cameraMatrix.width(); j ++) {
cameraMatrix.put(i, j, 0);
}
cameraMatrix.put(0, 0, 582.18394);
cameraMatrix.put(0, 2, 663.50655);
cameraMatrix.put(1, 1, 582.52915);
cameraMatrix.put(1, 2, 378.74541);
cameraMatrix.put(2, 2, 1.);
org.opencv.core.Size size = new org.opencv.core.Size(1280, 720);
//output parameters
double [] fovx = new double[1];
double [] fovy = new double[1];
double [] focLen = new double[1];
double [] aspectRatio = new double[1];
Point ppov = new Point(0, 0);
org.opencv.calib3d.Calib3d.calibrationMatrixValues(cameraMatrix, size,
6.17, 4.55, fovx, fovy, focLen, ppov, aspectRatio);
System.out.println("FoVx: " + fovx[0]);
System.out.println("FoVy: " + fovy[0]);
System.out.println("Focal length: " + focLen[0]);
System.out.println("Principal point of view; x: " + ppov.x + ", y: " + ppov.y);
System.out.println("Aspect ratio: " + aspectRatio[0]);
Results
FoVx: 95.41677635378488
FoVy: 63.43170132212425
Focal length: 2.8063085232812504
Principal point of view; x: 3.198308916796875, y: 2.3934605770833333
Aspect ratio: 1.0005929569269807
GoPro specifications
https://gopro.com/help/articles/Question_Answer/HERO4-Field-of-View-FOV-Information
Edit
Matlab calibration results
Focal Length: fc = [ 582.18394 582.52915 ] ± [ 0.77471 0.78080 ]
Principal point: cc = [ 663.50655 378.74541 ] ± [ 1.40781 1.13965 ]
Skew: alpha_c = [ -0.00028 ] ± [ 0.00056 ] => angle of pixel axes = 90.01599 ± 0.03208 degrees
Distortion: kc = [ -0.25722 0.09022 -0.00060 0.00009 -0.01662 ] ± [ 0.00228 0.00276 0.00020 0.00018 0.00098 ]
Pixel error: err = [ 0.30001 0.28188 ]
One of the images used for calibration
And the undistorted image
You have entered 6.17mm and 4.55mm for the sensor size in OpenCV, which corresponds to an aspect ratio 1.36 whereas as your resolution (1270x720) is 1.76 (approximately 16x9 format).
Did you crop your image before MATLAB calibration?
The pixel size seems to be 1.55µm from this Gopro page (this is by the way astonishingly small!). If pixels are squared, and they should be on this type of commercial cameras, that means your inputs are not coherent. Computed sensor size should be :
[Sensor width, Sensor height] = [1280, 720]*1.55*10^-3 = [1.97, 1.12]
mm
Even if considering the maximal video resolution which is 3840 x 2160, we obtain [5.95, 3.35]mm, still different from your input.
Please see this explanation about equivalent focal length to understand why the actual focal length of the camera is not 17.2 but 17.2*5.95/36 ~ 2.8mm. In that case, compute FOV using the formulas here for instance. You will indeed find values of 93.5°/61.7° (close to your outputs but still not what is written in the specifications because there probably some optical distortion due to the wide angles).
What I do not understand though, is how the focal distance returned can be right whereas sensor size entered is wrong. Could you give more info and/or send an image?
Edits after question updates
On that cameras, with a working resolution of 1280x720, the image is downsampled but not cropped so what I said above about sensor dimensions does not apply. The sensor size to consider is indeed the one used (6.17x4.55) as explained in your first comment.
The FOV is constrained by the calibration matrix inputs (fx, fy, cx, cy) given in pixels and the resolution. You can check it by typing:
2*DEGRES(ATAN(1280/(2*582.18394))) (=95.416776...°)
This FOV value is smaller than what is expected, but by the look of the undistorted image, your MATLAB distortion model is right and the calibration is correct. The barrel distortion due to the wide angle seems well corrected by the the rewarp you applied.
However, MATLAB toolbox uses a pinhole model, which is linear and cannot account for intrinsic parameters such as lens distortion. I assume this from the page :
https://fr.mathworks.com/help/vision/ug/camera-calibration.html
Hence, my best guess is that unless you find a model which fits more accurately the Gopro camera (maybe a wide-angle lens model), MATLAB calibration will return an intrinsic camera matrix corresponding to the "linear" undistorted image and the FOV will indeed be smaller (in the case of barrel distortion). You will have to apply distortion coefficients associated to the calibration to retrieve the actual FOV value.
We can see in the corrected image that side parts of the FOV get rejected out of bounds. If you had warped the image entirely, you would find that some undistorted pixels coordinates exceed [-1280/2;+1280/2] (horizontally, and idem vertically). Then, replacing opencv.core.Size(1280, 720) by the most extreme ranges obtained, you would hopefully retrieve Gopro website values.
In conclusion, I think you can rely on the focal distance value that you obtained if you make measurements in the center of your image, otherwise there is too much distortion and it doesn't apply.

Resources