Perspective Transform using paper - opencv

I've got an image from phone camera which have a paper inside it. In the image are also, some coordinates marked to get the distance between them. Since the aspect ratio of paper is known in advance (0.7072135785007072), I want to correct the distortion so that the whole image looks as if it's taken from the top view. I collect the four corners of the paper and apply opencv getPerspectiveTransform as follows:
pts1 = [[ 717., 664.],
[1112., 660.],
[1117., 1239.],
[ 730., 1238.]]
ratio=0.7072135785007072
cardH=math.sqrt((pts1[2][0]-pts1[1][0])*(pts1[2][0]-pts1[1][0])+(pts1[2][1]-pts1[1][1])*(pts1[2][1]-pts1[1][1]))
cardW=ratio*cardH;
pts2 = np.float32([[pts1[0][0],pts1[0][1]], [pts1[0][0]+cardW, pts1[0][1]], [pts1[0][0]+cardW, pts1[0][1]+cardH], [pts1[0][0], pts1[0][1]+cardH]])
M = cv2.getPerspectiveTransform(pts1,pts2)
with this matrix M I'm transforming the whole image as follows:
transformed = np.zeros((image.shape[1], image.shape[0]), dtype=np.uint8);
dst = cv2.warpPerspective(image, M, transformed.shape)
_ = cv2.rectangle(dst, (pts2[0][0], pts2[0][1]), (int(pts2[2][0]), int(pts2[2][1])), (0, 255, 0), 2)
The problem with this is that it's correcting the perspective of paper but distorting the overall image. I don't know why. The input image is this and the corresponding output image is this. In the input image point M and O and aligned horizontally, but to my surprise after transforming the overall image the point M and O are no longer aligned horizontally, why is that happening ?

Related

how to match rgb image pixels with corresponding pointcloud points

I have a color image, corresponding point cloud captured by oak-D camera(see the image below) and i want to get the information of pixels in the color image and corresponding point cloud value in point cloud.
how can i get this information? for instance, i have a pixel value (200,250) in the color image and how to know the corresponding point value in the point cloud?
any help would be appreciated.
It sounds like you want to project a 2D image to a 3D point cloud using the computed disparity map. To do this you will also need to know about your camera intrinsics. Since you are using the oak-D, you should be able to get everything you need with the following piece of code.
with dai.Device(pipeline) as device:
calibData = device.readCalibration()
# get right intrinsic matrix
w, h = monoRight.getResolutionSize()
K_right = calibData.getCameraIntrinsics(dai.CameraBoardSocket.RIGHT, dai.Size2f(w, h))
# get left intrinsic matrix
w, h = monoLeft.getResolutionSize()
K_left = calibData.getCameraIntrinsics(dai.CameraBoardSocket.LEFT, dai.Size2f(w, h))
R_left = calibData.getStereoLeftRectificationRotation()
R_right = calibData.getStereoRightRectificationRotation()
x_baseline = calibData.getBaselineDistance()
Once you have all you camera parameters, you should be able to use opencv to approach this.
First you will need to construct the Q matrix (or rectified transformation matrix).
You will need to provide
The left and right intrinsic calibration matrices
The Translation vector from the coordinate system of the first camera to the second camera
The Rotation matrix from the coordinate system of the first camera to the second camera
Here's a coded example:
import numpy as np
import cv2
Q = np.zeros((4,4))
cv2.stereoRectify(cameraMatrix1=K_left, # left intrinsic matrix
cameraMatrix2=K_right, # right intrinsic matrix
distCoeffs1=0,
distCoeffs2=0,
imageSize=imageSize, # pass in the image size
R=R_left, # Rotation matrix from camera 1 to camera 2
T=x_baseline, # Translation matrix from camera 1 to camera 2
R1=None,
R2=None,
P1= None,
P2= None,
Q=Q);
Next you will need to reproject the image to 3D, using the known disparity map and the Q matrix. The operation is illustrated below, but opencv makes this much easier.
xyz = cv2.reprojectImageTo3D(disparity, Q)
This will give you an array of 3D points. This array specifically has the shape: (rows, columns, 3), where the 3 corresponds the (x,y,z) coordinate of the point cloud. Now you can use the a pixel location to index into xyz and find it's corresponding (x, y, z) point.
pix_row = 200
pix_col = 250
point_cloud_coordinate = xyz[pix_row, pix_col, :]
See the docs for more details
cv2.stereoRectify()
cv2.reprojectImageTo3D()

Find image that fit together the best

Given a batch of images i have to find the images that fit together the best like in the example given below, but my solutions are not working:
Left image
Right image
I tried firstly with google cloud Vision API but it wasn't giving good results, then i trained a model over with ludwig but it will take forever to try all the possible combinations of images, as i have 2500 left images and 2500 right images.
is there a way to find this out or decrease the possible cases so that i can use it in my model.
This solution looks at a pair of images. The algorithm evaluates whether the shapes in the image will mesh like a key and a lock. My answer does not attempt to align the images.
The first step is to find the contours in the images:
left= cv2.imread('/home/stephen/Desktop/left.png')
right = cv2.imread('/home/stephen/Desktop/right.png')
# Resize
left = cv2.resize(left, (320,320))
gray = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
_, left_contours, _ = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
# Approximate
left_contour = left_contours[0]
epsilon = 0.005*cv2.arcLength(left_contour,True)
left_contour = cv2.approxPolyDP(left_contour,epsilon,True)
What is a contour? A contour is just a list of points that lie on the perimeter of a shape. The contour for a triangle will have 3 points and a length of 3. The distance between the points will be the length of each leg in the triangle.
Similarly, the distances between the peaks and valleys will match in your images. To compute this distance, I found the distance between the contour points. Because of the way that the images are aligned I only used the horizontal distance.
left_dx = []
for point in range(len(left_contour)-1):
a = left_contour[point][0]
b = left_contour[point+1][0]
dist = a[0]-b[0]
left_dx.append(dist)
right_dx = []
for point in range(len(right_contour)-1):
a = right_contour[point][0]
b = right_contour[point+1][0]
# Use the - of the distance becuase this is the key hole, not the key
dist = -distance(a,b)
right_dx.append(dist)
# Reverse so they will fit
right_dx.reverse()
A this point you can sort of see that the contours line up. If you have better images, the contours will line up in this step. I used Scipy to iterpolate and check if the functions line up. If the two functions do line up, then the objects in the images will mesh.
left_x_values = []
for i in range(len(left_dx)): left_x_values.append(i)
x = np.array(left_x_values)
y = np.array(left_dx)
left_x_new = np.linspace(x.min(), x.max(),500)
f = interp1d(x, y, kind='quadratic')
left_y_smooth=f(left_x_new)
plt.plot (left_x_new, left_y_smooth,c = 'g')
I tried this again on a pair of images that I generated myself:
The contours:
The distances between contour points:
Fitting the contours:

How to get back the co-ordinate points corresponding to the intensity points obtained from a faster r-cnn object detection process?

As a result of the faster r-cnn method of object detection, I have obtained a set of boxes of intensity values(each bounding box can be thought of as a 3D matrix with depth of 3 for rgb intensity, a width and a height which can then be converted into a 2D matrix by taking gray scale) corresponding to the region containing the object. What I want to do is to obtain the corresponding co-ordinate points in the original image for each cell of intensity inside of the bounding box. Any ideas how to do so?
From what I understand, you got an R-CNN model that outputs cropped pieces of the input image and you now want to trace those output crops back to their coordinates in the original image.
What you can do is simply use a patch-similarity-measure to find the original position.
Since the output crop should look exactly like itself in the original image, just use Pixel-based distance:
Find the place in the image with the smallest distance (should be zero) and from that you can find your desired coordinates.
In python:
d_min = 10**6
crop_size = crop.shape
for x in range(org_image.shape[0]-crop_size[0]):
for y in range(org_image.shape[1]-crop_size[1]):
d = np.abs(np.sum(np.sum(org_image[x:x+crop_size[0],y:y+crop_size[0]]-crop)))
if d <= d_min:
d_min = d
coord = [x,y]
However, your model should have that info available in it (after all, it crops the output based on some coordinates). Maybe if you add some info on your implementation.

OpenCv warpPerspective meaning of elements of homography

I have a question regarding the meaning of the elements from an projective transformation matrix e.g. in an homography used by OpenCv warpPerspective.
I know the basic of an affin transformation, but here I'm more interested in the projective transformation, meaning in the below shown matrix the elements A31 and A32:
A11 A12 A13
A21 A22 A23
A31 A32 1
I played around with the values a bit which means having a fixed numbers for all other element. Meaning:
1 0 0
0 1 0
A31 A32 1
to have just the projective elements.
But what exactly causing the elements A31 and A32 ? Like A13 and A23 are responsible for the horizontal and vertical translation.
Is there an simple explanation for this two elements? Like having a positive value means ...., having a negativ value meaning ... . S.th. like that.
Hope anyone can help me.
Newton's descriptions are correct, but it might be helpful to actually see the transformations to understand what's going on, and how they might work together with other values in the transformation matrix to make a bit more sense. I'll give some python/OpenCV examples with animations to show what these values do.
import numpy as np
import cv2
img = cv2.imread('img1.png')
h, w = img.shape[:2]
# initializations
max_m20 = 2e-3
nsteps = 50
M = np.eye(3)
So here I'm setting the transformation matrix to be the identity (no transformation). We want to see the effect of changing the element at (2, 0) in the transformation matrix M, so we'll animate by looping through nsteps linearly spaced between 0 to max_m20.
for m20 in np.linspace(0, max_m20, nsteps):
M[2, 0] = m20
warped = cv2.warpPerspective(img, M, (w, h))
cv2.imshow('warped', warped)
k = cv2.waitKey(1)
if k == ord('q') & 0xFF:
break
I applied this on an image taken from Oxford's Visual Geometry Group.
So indeed, we can see that this is similar to either rotating your camera around a point that is aligned with the left edge of the image, or rotating the image itself around an axis. However, it is a little different than that. Note that the top edge stays along the top the whole time, which is a little strange. Instead of we rotate around an axis like above, we would imagine that the top edge would start to come down on the right edge too. Like this:
Well, if you're thinking about transformations, one easy way to get this transformation is to take the transformation above, and add some skew distortion so that the right top side is being pushed down as that bottom right corner is being pushed up. And that's actually exactly how this view was created:
M = np.eye(3)
max_m20 = 2e-3
max_m10 = 0.6
for m20, m10 in zip(np.linspace(0, max_m20, nsteps), np.linspace(0, max_m10, nsteps)):
M[2, 0] = m20
M[1, 0] = m10
warped = cv2.warpPerspective(img, M, (w, h))
cv2.imshow('warped', warped)
k = cv2.waitKey(1)
if k == ord('q') & 0xFF:
break
So the right way to think about the perspective in these matrices is, IMO, with the skew entries and the last row together. Those are the two places in the homography matrix where angles actually get modified*; otherwise, it's just rotation, scaling, and translation---all of which are angle preserving.
*Note: Actually, angles can be changed in one more way that I didn't mention. Affine transformations allow for non-uniform scaling, which means you can stretch a shape in width and not in height or vice-versa, which would also change the angles. Imagine if you had a triangle and stretched it only in width; the angles would change. So it turns out that non-uniform scaling (i.e. when the first and middle element of the transformation matrix are different values) can also modify angles in addition to the perspective change and shearing distortions.
Note that in these examples, the same applies to the second entry in the last row with the other skew location; the only difference is it happens at the top instead of the left side. Negative values in both cases is akin to rotating the plane along that axis towards, instead of farther away from, the camera.
The 3x1 ,3x2 elements of homography matrix change the plane of the image. Thats the difference between Affine and Homography matrices. For instance consider this- The A31 changes the plane of your image along the left edge. Its like sticking your image to a stick like a flag and rotating. The positive is clock wise and the negative is reverse. The other element does the same from the top edge. But together, they set a plane for your image. That's the simplest way i could put it.

how to get the result in template matching code?

I am a beginner to Computer vision .I am currently working on a project to find the match between two images using matchTemplate in iOS.The problem that I am facing is with finding a way to determine whether the two images are matching or not although matchTemplate is working well.I thought of taking the percentage of result matrix but I did not know how and could not find a way.also MinMaxLoc did not work with me .
If anyone can help me or give me an idea I would really really appreciate it cause I am on desperate point now.
Here is the code:
`
UIImage* image1 = [UIImage imageNamed:#"1.png"];
UIImage* image2 = [UIImage imageNamed:#"Image002.png"];
// Convert UIImage* to cv::Mat
UIImageToMat(image1, MatImage1);
UIImageToMat(image2, MatImage2);
MatImage1.resize(100 , 180);
MatImage2.resize(100 , 180);
if (!MatImage1.empty())
{
// Convert the image to grayscale
//we can also use BGRA2GRAY : Blue , Green , Red and Alpha(Opacity)
cv::cvtColor(MatImage1, grayImage1, cv::COLOR_BGRA2GRAY );
cv::cvtColor(MatImage2, grayImage2, cv::COLOR_BGRA2GRAY);
}
/// Create the result matrix
int result_cols = grayImage1.cols ;
int result_rows = grayImage1.rows ;
result.create( result_cols, result_rows, CV_32FC1 );
/// Do the Matching and Normalize
matchTemplate( grayImage1 , grayImage2 , result , CV_TM_SQDIFF_NORMED);
//Normalize
normalize( result, result, 0, 100, cv::NORM_MINMAX, -1 );
//Threshold
cv::threshold(result , result , 30, 0, CV_THRESH_TOZERO);`
The intent of matchTemplate(...) is that the template is usually smaller than the image. The template is then moved across the image as a sliding window and a 'matching score' is calculated in some way e.g. using cross-correlation or squared difference.
So if, the input image is 10x10 and the template is 3x3, then the template is positioned so the top left corner is at the top left corner of the image (centre of template is at pixel (1,1) assuming that we index from 0). The matching score is calculated and then the template slides to (1,2) and we match again. When the template's middle pixel is at (8,1) we slide it down to the next row (1,2) and repeat.
The output result of this process is an 8x8 matrix where the value at each position represents the matching score for when the template was at that point. The size of the output image is W-w+1 x H-h+1 where WxH is the size of the image and wxh is the size of the template.
You can then use minMaxLoc to work out which is the highest and lowest scores in the output matrix and, depending on the matching score you use, one of these will be the most likely match for the template within the image.
Now you are resizing your template and image to the same size:
MatImage1.resize(100 , 180);
MatImage2.resize(100 , 180);
which means that there is only one place that the template can be located within the image and your output matrix should be a 1x1 grid.
You are also using
CV_TM_SQDIFF_NORMED
Which is the normalised squared difference. For this score, a lower matching score is better. i.e. the closer the value in your 1x1 output matrix is to 0, the closer the match between your template and your image.
Given the sizes of template and image are 100x180 then you can easily calculate the maximum value that this matching score can have as 100x180x255 which you would get if the entire image were black and the template were white or vice-versa. This should help you work out a sensible threshold below which you would say t=your template matched the image.
Since you only have a 1x1 output though there's little value in normalising or thresholding the result.

Resources