Related
I can read text from an image using OCR. However, it works line by line.
I want to now group text based on solid lines surrounding the text.
For example, consider I have below rectangle banners. I can read text line by line. Fine! Now I want to group them by Board A,B,C and hold them in some data structure, so that I can identify, which lines belong to which board. It is given that images would be diagrams like this with solid lines around each block of text.
Please guide me on the right approach.
As mentioned in the comments by Yunus, you need to crop sub-images and feed them to an OCR module individually. An additional step could be ordering of the contours.
Approach:
Obtain binary image and invert it
Find contours
Crop sub-images based on the bounding rectangle for each contour
Feed each sub-image to OCR module (I used easyocr for demonstration)
Store text for each board in a dictionary
Code:
# Libraries import
import cv2
from easyocr import Reader
reader = Reader(['en'])
img = cv2.imread('board_text.jpg',1)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# inverse binary
th = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]
# find contours and sort them from left to right
contours, hierarchy = cv2.findContours(th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
contours = sorted(contours, key=lambda x: [cv2.boundingRect(x)[0], cv2.boundingRect(x)[1]])
#initialize dictionary
board_dictionary = {}
# iterate each contour and crop bounding box
for i, c in enumerate(cnts):
x,y,w,h = cv2.boundingRect(c)
crop_img = img[y:y+h, x:x+w]
# feed cropped image to easyOCR module
results = reader.readtext(crop_img)
# result is output per line
# create a list to append all lines in cropped image to it
board_text = []
for (bbox, text, prob) in results:
board_text.append(text)
# convert list of words to single string
board_para = ' '.join(board_text)
#print(board_para)
# store string within a dictionary
board_dictionary[str(i)] = board_para
Dictionary Output:
board_dictionary
{'0': 'Board A Board A contains Some Text, That goes Here Some spaces and then text again', '1': 'Board B Board B has some text too but sparse.', '2': 'Board €C Board C is wide and contains text with white spaces '}
Drawing each contour
img2 = img.copy()
for i, c in enumerate(cnts):
x,y,w,h = cv2.boundingRect(c)
img2 = cv2.rectangle(img2, (x, y), (x + w, y + h), (0,255,0), 3)
Note:
While working on different images make sure the ordering is correct.
Choice of OCR module is yours pytesseract and easyocr are the options I know.
This can be done by performing following steps:
Find the shapes.
Compute the shape centers.
Find the text boxes.
Compute the text boxes centers.
Associate the textboxes with shapes based on distance.
The code is as follows:
import cv2
from easyocr import Reader
import math
shape_number = 2
image = cv2.imread("./ueUco.jpg")
deep_copy = image.copy()
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(image_gray, 150, 255, cv2.THRESH_BINARY)
thresh = 255 - thresh
shapes, hierarchy = cv2.findContours(image=thresh, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(image=deep_copy, contours=shapes, contourIdx=-1, color=(0, 255, 0), thickness=2, lineType=cv2.LINE_AA)
shape_centers = []
for shape in shapes:
row = int((shape[0][0][0] + shape[3][0][0])/2)
column = int((shape[3][0][1] + shape[2][0][1])/2)
center = (row, column, shape)
shape_centers.append(center)
# cv2.imshow('Shapes', deep_copy)
# cv2.waitKey(0)
# cv2.destroyAllWindows()
languages = ['en']
reader = Reader(languages, gpu = True)
results = reader.readtext(image)
def cleanup_text(text):
return "".join([c if ord(c) < 128 else "" for c in text]).strip()
for (bbox, text, prob) in results:
text = cleanup_text(text)
(tl, tr, br, bl) = bbox
tl = (int(tl[0]), int(tl[1]))
tr = (int(tr[0]), int(tr[1]))
br = (int(br[0]), int(br[1]))
bl = (int(bl[0]), int(bl[1]))
column = int((tl[0] + tr[0])/2)
row = int((tr[1] + br[1])/2)
center = (row, column, bbox)
distances = []
for iteration, shape_center in enumerate(shape_centers):
shape_row = shape_center[0]
shape_column = shape_center[1]
dist = int(math.dist([column, row], [shape_row, shape_column]))
distances.append(dist)
min_value = min(distances)
min_index = distances.index(min_value)
if min_index == shape_number:
cv2.rectangle(image, tl, br, (0, 255, 0), 2)
cv2.putText(image, text, (tl[0], tl[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.imshow("Image", image)
cv2.waitKey(0)
cv2.imwrite(f"image_{shape_number}.jpg", image)
cv2.destroyAllWindows()
The output looks like this.
Please note that this solution is almost complete. You just have to compute the text embodied in each shape and put it in your desired data structure.
Note: shape_number represents the shape that you want to consider.
There is another solution that I would like you to work on.
Find all the text boxes.
Compute the centers for text boxes.
Run k-means clustering on the centers.
I would prefer the second solution but for the time being, I implemented the first one.
Currently I'm working on a project, where I need to measure the width of car fuse wire. In order to achieve that I need to detect and localize the fuse on the image. fuse_image
My plan is to find bounding rectangle region with the fuse and then search for a wire contours in fixed position of that region.fuse_contours
I have already tried ORB, BRISK feature based template matching, but the results were not acceptable. Maybe anyone can suggest some possible methods to solve this task?
We can start the problem by applying Canny operation to see the features of the image. Result is:
The aim is to calculate the width. Therefore we only need the left and right outer length of the image. We don't need inner lines. To remove the inner features we can smooth the image.
How do we accurately calculate the width? What part of the features can we take as an reference? If we consider the base? The base features are:
How do we find the base feature coordinates?
Blue point is the one with the highest y coordinate value
Red point is the one with the highest x coordinate value
For all detected line coordinates, we need to find the highest y coordinate value with the corresponding x coordinate value. We need to find the highest x coordinate value with the corresponding y value.
For detecting line coordinates we can use fast line detector. Result will be:
We can calculate the euclidian-distance, which will be: 146.49 pixel
The idea is based on the finding the base and then calculating the euclidean-distance.
Update
The orientation of the fuse can be random.
First, we need to get the fuse part of the image.
Second, we need to get the canny features (or any other filtering method)
At this point we need to find the left (blue-dot) and right (red-dot) part of the fuse:
If we connect them:
We will have an approximate length of the fuse.
So How do we find the left and right parts of the fuse?
Finding left part:
1. From the current x1, x2 tuples
2. If min(x1, x2) < x_min
3. x_min = min(x1, x2)
Finding right part:
1. From the current x1, x2 tuples
2. If max(x1, x2) > x_max
3. x_max = max(x1, x2)
This is my idea for approaching the problem. You can modify for better results.
Code:
# Load libraries
import cv2
import numpy as np
# Load the image
img = cv2.imread("E8XlZ.jpg")
# Get the image dimension
(h, w) = img.shape[:2]
# Convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the binary-mask
msk = cv2.inRange(hsv, np.array([0, 24, 161]), np.array([77, 255, 217]))
# Display the mask
cv2.imshow("msk", msk)
cv2.waitKey(0)
# Smooth the image
gauss = cv2.GaussianBlur(msk, (21, 21), 0)
# Canny features
cny = cv2.Canny(gauss, 50, 200)
# Display canny features
cv2.imshow("cny", cny)
cv2.waitKey(0)
# Initialize line-detector
lns = cv2.ximgproc.createFastLineDetector().detect(cny)
# Initialize temporary variables
x_min, x_max, y_min, y_max = w, 0, 0, 0
# Detect the lines
for line in lns:
# Get current coordinates
x1 = int(line[0][0])
y1 = int(line[0][1])
x2 = int(line[0][2])
y2 = int(line[0][3])
# Get maximum coordinates
if max(x1, x2) > x_max:
x_max = max(x1, x2)
y_max = y1 if x_max == x1 else y2
if min(x1, x2) < x_min:
x_min = min(x1, x2)
y_min = y1 if x_min == x1 else y2
# Draw the points
cv2.circle(img, (x_min, int((y_min + y_max)/2)), 3, (255, 0, 0), 5)
cv2.circle(img, (x_max, int((y_min + y_max)/2)), 3, (0, 0, 255), 5)
# Write coordinates to the console
print("Coordinates: ({}, {})->({}, {})".format(x_min, int((y_min + y_max)/2), x_max, int((y_min + y_max)/2)))
# Draw the minimum and maximum coordinates
cv2.line(img, (x_min, int((y_min + y_max)/2)), (x_max, int((y_min + y_max)/2)), (0, 255, 0), 5)
# Calculate the euclidean distance
pt1 = np.array((x_min, int((y_min + y_max)/2)))
pt2 = np.array((x_max, int((y_min + y_max)/2)))
dist = np.linalg.norm(pt1 - pt2)
print("Result: %.2f pixel" % dist)
# Display the result
cv2.imshow("img", img)
cv2.waitKey(0)
I have a picture like this:
And then I transform it into binary image and use canny to detect edge of the picture:
gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY)
edge = Image.fromarray(edges)
And then I get the result as:
I want to get the area of 2 like this:
My solution is to use HoughLines to find lines in the picture and calculate the area of triangle formed by lines. However, this way is not precise because the closed area is not a standard triangle. How to get the area of region 2?
A simple approach using floodFill and countNonZero could be the following code snippet. My standard quote on contourArea from the help:
The function computes a contour area. Similarly to moments, the area is computed using the Green formula. Thus, the returned area and the number of non-zero pixels, if you draw the contour using drawContours or fillPoly, can be different. Also, the function will most certainly give a wrong results for contours with self-intersections.
Code:
import cv2
import numpy as np
# Input image
img = cv2.imread('images/YMMEE.jpg', cv2.IMREAD_GRAYSCALE)
# Needed due to JPG artifacts
_, temp = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY)
# Dilate to better detect contours
temp = cv2.dilate(temp, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3)))
# Find largest contour
cnts, _ = cv2.findContours(temp, cv2.RETR_EXTERNAL , cv2.CHAIN_APPROX_NONE)
largestCnt = []
for cnt in cnts:
if (len(cnt) > len(largestCnt)):
largestCnt = cnt
# Determine center of area of largest contour
M = cv2.moments(largestCnt)
x = int(M["m10"] / M["m00"])
y = int(M["m01"] / M["m00"])
# Initiale mask for flood filling
width, height = temp.shape
mask = img2 = np.ones((width + 2, height + 2), np.uint8) * 255
mask[1:width, 1:height] = 0
# Generate intermediate image, draw largest contour, flood filled
temp = np.zeros(temp.shape, np.uint8)
temp = cv2.drawContours(temp, largestCnt, -1, 255, cv2.FILLED)
_, temp, mask, _ = cv2.floodFill(temp, mask, (x, y), 255)
temp = cv2.morphologyEx(temp, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3)))
# Count pixels in desired region
area = cv2.countNonZero(temp)
# Put result on original image
img = cv2.putText(img, str(area), (x, y), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, 255)
cv2.imshow('Input', img)
cv2.imshow('Temp image', temp)
cv2.waitKey(0)
Temporary image:
Result image:
Caveat: findContours has some problems one the right side, where the line is very close to the bottom image border, resulting in possibly omitting some pixels.
Disclaimer: I'm new to Python in general, and specially to the Python API of OpenCV (C++ for the win). Comments, improvements, highlighting Python no-gos are highly welcome!
There is a very simple way to find this area, if you take some assumptions that are met in the example image:
The area to be found is bounded on top by a line
Any additional lines in the image are above the line of interest
There are no discontinuities in the line
In this case, the area of the region of interest is given by the sum of the lengths from the bottom of the image to the first set pixel. We can compute this with:
import numpy as np
import matplotlib.pyplot as pp
img = pp.imread('/home/cris/tmp/YMMEE.jpg')
img = np.flip(img, axis=0)
pos = np.argmax(img, axis=0)
area = np.sum(pos)
print('Area = %d\n'%area)
This prints Area = 22040.
np.argmax finds the first set pixel on each column of the image, returning the index. By first using np.flip, we flip this axis so that the first pixel is actually the one on the bottom. The index corresponds to the number of pixels between the bottom of the image and the line (not including the set pixel).
Thus, we're computing the area under the line. If you need to include the line itself in the area, add pos.shape[0] to the area (i.e. the number of columns).
currently I'm working on a project where I try to find the corners of the rectangle's surface in a photo using OpenCV (Python, Java or C++)
I've selected desired surface by filtering color, then I've got mask and passed it to the cv2.findContours:
cnts, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnt = sorted(cnts, key = cv2.contourArea, reverse = True)[0]
peri = cv2.arcLength(cnt, True)
approx = cv2.approxPolyDP(cnt, 0.02*peri, True)
if len(approx) == 4:
cv2.drawContours(mask, [approx], -1, (255, 0, 0), 2)
This gives me an inaccurate result:
Using cv2.HoughLines I've managed to get 4 straight lines that accurately describe the surface. Their intersections are exactly what I need:
edged = cv2.Canny(mask, 10, 200)
hLines = cv2.HoughLines(edged, 2, np.pi/180, 200)
lines = []
for rho,theta in hLines[0]:
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
cv2.line(mask, (x1,y1), (x2,y2), (255, 0, 0), 2)
lines.append([[x1,y1],[x2,y2]])
The question is: is it possible to somehow tweak findContours?
Another solution would be to find coordinates of intersections. Any clues for this approach are welcome :)
Can anybody give me a hint how to solve this problem?
Finding intersection is not so trivial problem as it seems to be, but before the intersection points will be found, following problems should be considered:
The most important thing is to choose the right parameters for the HoughLines function, since it can return from 0 to an infinite numbers of lines (we need 4 parallel)
Since we do not know in what order these lines go, they need to be compared with each other
Because of the perspective, parallel lines are no longer parallel, so each line will have a point of intersection with the others. A simple solution would be to filter the coordinates located outside the photo. But it may happen that an undesirable intersection will be within the photo.
The coordinates should be sorted. Depending on the task, it could be done in different ways.
cv2.HoughLines will return an array with the values of rho and theta for each line.
Now the problem becomes a system of equations for all lines in pairs:
def intersections(edged):
# Height and width of a photo with a contour obtained by Canny
h, w = edged.shape
hl = cv2.HoughLines(edged,2,np.pi/180,190)[0]
# Number of lines. If n!=4, the parameters should be tuned
n = hl.shape[0]
# Matrix with the values of cos(theta) and sin(theta) for each line
T = np.zeros((n,2),dtype=np.float32)
# Vector with values of rho
R = np.zeros((n),dtype=np.float32)
T[:,0] = np.cos(hl[:,1])
T[:,1] = np.sin(hl[:,1])
R = hl[:,0]
# Number of combinations of all lines
c = n*(n-1)/2
# Matrix with the obtained intersections (x, y)
XY = np.zeros((c,2))
# Finding intersections between all lines
for i in range(n):
for j in range(i+1, n):
XY[i+j-1, :] = np.linalg.inv(T[[i,j],:]).dot(R[[i,j]])
# filtering out the coordinates outside the photo
XY = XY[(XY[:,0] > 0) & (XY[:,0] <= w) & (XY[:,1] > 0) & (XY[:,1] <= h)]
# XY = order_points(XY) # obtained points should be sorted
return XY
here is the result:
It is possible to:
select the longest contour
break it into segments and group them by gradient
Fit lines to the largest four groups
Find intersection points
But then, Hough transform does nearly the same thing. Is there any particular reason for not using it?
Intersection points of lines are very easy to calculate. A high-school coordinate geometry lesson can provide you with the algorithm.
I´m trying to find the corners on a image, I don´t need the contours, only the 4 corners. I will change the perspective using 4 corners.
I´m using Opencv, but I need to know the steps to find the corners and what function I will use.
My images will be like this:(without red points, I will paint the points after)
EDITED:
After suggested steps, I writed the code: (Note: I´m not using pure OpenCv, I´m using javaCV, but the logic it´s the same).
// Load two images and allocate other structures (I´m using other image)
IplImage colored = cvLoadImage(
"res/scanteste.jpg",
CV_LOAD_IMAGE_UNCHANGED);
IplImage gray = cvCreateImage(cvGetSize(colored), IPL_DEPTH_8U, 1);
IplImage smooth = cvCreateImage(cvGetSize(colored), IPL_DEPTH_8U, 1);
//Step 1 - Convert from RGB to grayscale (cvCvtColor)
cvCvtColor(colored, gray, CV_RGB2GRAY);
//2 Smooth (cvSmooth)
cvSmooth( gray, smooth, CV_BLUR, 9, 9, 2, 2);
//3 - cvThreshold - What values?
cvThreshold(gray,gray, 155, 255, CV_THRESH_BINARY);
//4 - Detect edges (cvCanny) -What values?
int N = 7;
int aperature_size = N;
double lowThresh = 20;
double highThresh = 40;
cvCanny( gray, gray, lowThresh*N*N, highThresh*N*N, aperature_size );
//5 - Find contours (cvFindContours)
int total = 0;
CvSeq contour2 = new CvSeq(null);
CvMemStorage storage2 = cvCreateMemStorage(0);
CvMemStorage storageHull = cvCreateMemStorage(0);
total = cvFindContours(gray, storage2, contour2, Loader.sizeof(CvContour.class), CV_RETR_CCOMP, CV_CHAIN_APPROX_NONE);
if(total > 1){
while (contour2 != null && !contour2.isNull()) {
if (contour2.elem_size() > 0) {
//6 - Approximate contours with linear features (cvApproxPoly)
CvSeq points = cvApproxPoly(contour2,Loader.sizeof(CvContour.class), storage2, CV_POLY_APPROX_DP,cvContourPerimeter(contour2)*0.005, 0);
cvDrawContours(gray, points,CvScalar.BLUE, CvScalar.BLUE, -1, 1, CV_AA);
}
contour2 = contour2.h_next();
}
}
So, I want to find the cornes, but I don´t know how to use corners function like cvCornerHarris and others.
First, check out /samples/c/squares.c in your OpenCV distribution. This example provides a square detector, and it should be a pretty good start on how to detect corner-like features. Then, take a look at OpenCV's feature-oriented functions like cvCornerHarris() and cvGoodFeaturesToTrack().
The above methods can return many corner-like features - most will not be the "true corners" you are looking for. In my application, I had to detect squares that had been rotated or skewed (due to perspective). My detection pipeline consisted of:
Convert from RGB to grayscale (cvCvtColor)
Smooth (cvSmooth)
Threshold (cvThreshold)
Detect edges (cvCanny)
Find contours (cvFindContours)
Approximate contours with linear features (cvApproxPoly)
Find "rectangles" which were structures that: had polygonalized contours possessing 4 points, were of sufficient area, had adjacent edges were ~90 degrees, had distance between "opposite" vertices was of sufficient size, etc.
Step 7 was necessary because a slightly noisy image can yield many structures that appear rectangular after polygonalization. In my application, I also had to deal with square-like structures that appeared within, or overlapped the desired square. I found the contour's area property and center of gravity to be helpful in discerning the proper rectangle.
At a first glance, for a human eye there are 4 corners. But in computer vision, a corner is considered to be a point that has large gradient change in intensity across its neighborhood. The neighborhood can be a 4 pixel neighborhood or an 8 pixel neighborhood.
In the equation provided to find the gradient of intensity, it has been considered for 4-pixel neighborhood SEE DOCUMENTATION.
Here is my approach for the image in question. I have the code in python as well:
path = r'C:\Users\selwyn77\Desktop\Stack\corner'
filename = 'env.jpg'
img = cv2.imread(os.path.join(path, filename))
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) #--- convert to grayscale
It is a good choice to always blur the image to remove less possible gradient changes and preserve the more intense ones. I opted to choose the bilateral filter which unlike the Gaussian filter doesn't blur all the pixels in the neighborhood. It rather blurs pixels which has similar pixel intensity to that of the central pixel. In short it preserves edges/corners of high gradient change but blurs regions that have minimal gradient changes.
bi = cv2.bilateralFilter(gray, 5, 75, 75)
cv2.imshow('bi',bi)
To a human it is not so much of a difference compared to the original image. But it does matter. Now finding possible corners:
dst = cv2.cornerHarris(bi, 2, 3, 0.04)
dst returns an array (the same 2D shape of the image) with eigen values obtained from the final equation mentioned HERE.
Now a threshold has to be applied to select those corners beyond a certain value. I will use the one in the documentation:
#--- create a black image to see where those corners occur ---
mask = np.zeros_like(gray)
#--- applying a threshold and turning those pixels above the threshold to white ---
mask[dst>0.01*dst.max()] = 255
cv2.imshow('mask', mask)
The white pixels are regions of possible corners. You can find many corners neighboring each other.
To draw the selected corners on the image:
img[dst > 0.01 * dst.max()] = [0, 0, 255] #--- [0, 0, 255] --> Red ---
cv2.imshow('dst', img)
(Red colored pixels are the corners, not so visible)
In order to get an array of all pixels with corners:
coordinates = np.argwhere(mask)
UPDATE
Variable coor is an array of arrays. Converting it to list of lists
coor_list = [l.tolist() for l in list(coor)]
Converting the above to list of tuples
coor_tuples = [tuple(l) for l in coor_list]
I have an easy and rather naive way to find the 4 corners. I simply calculated the distance of each corner to every other corner. I preserved those corners whose distance exceeded a certain threshold.
Here is the code:
thresh = 50
def distance(pt1, pt2):
(x1, y1), (x2, y2) = pt1, pt2
dist = math.sqrt( (x2 - x1)**2 + (y2 - y1)**2 )
return dist
coor_tuples_copy = coor_tuples
i = 1
for pt1 in coor_tuples:
print(' I :', i)
for pt2 in coor_tuples[i::1]:
print(pt1, pt2)
print('Distance :', distance(pt1, pt2))
if(distance(pt1, pt2) < thresh):
coor_tuples_copy.remove(pt2)
i+=1
Prior to running the snippet above coor_tuples had all corner points:
[(4, 42),
(4, 43),
(5, 43),
(5, 44),
(6, 44),
(7, 219),
(133, 36),
(133, 37),
(133, 38),
(134, 37),
(135, 224),
(135, 225),
(136, 225),
(136, 226),
(137, 225),
(137, 226),
(137, 227),
(138, 226)]
After running the snippet I was left with 4 corners:
[(4, 42), (7, 219), (133, 36), (135, 224)]
UPDATE 2
Now all you have to do is just mark these 4 points on a copy of the original image.
img2 = img.copy()
for pt in coor_tuples:
cv2.circle(img2, tuple(reversed(pt)), 3, (0, 0, 255), -1)
cv2.imshow('Image with 4 corners', img2)
Here's an implementation using cv2.goodFeaturesToTrack() to detect corners. The approach is
Convert image to grayscale
Perform canny edge detection
Detect corners
Optionally perform 4-point perspective transform to get top-down view of image
Using this starting image,
After converting to grayscale, we perform canny edge detection
Now that we have a decent binary image, we can use cv2.goodFeaturesToTrack()
corners = cv2.goodFeaturesToTrack(canny, 4, 0.5, 50)
For the parameters, we give it the canny image, set the maximum number of corners to 4 (maxCorners), use a minimum accepted quality of 0.5 (qualityLevel), and set the minimum possible Euclidean distance between the returned corners to 50 (minDistance). Here's the result
Now that we have identified the corners, we can perform a 4-point perspective transform to obtain a top-down view of the object. We first order the points clockwise then draw the result onto a mask.
Note: We could have just found contours on the Canny image instead of doing this step to create the mask, but pretend we only had the 4 corner points to work with
Next we find contours on this mask and filter using cv2.arcLength() and cv2.approxPolyDP(). The idea is that if the contour has 4 points, then it must be our object. Once we have this contour, we perform a perspective transform
Finally we rotate the image depending on the desired orientation. Here's the result
Code for only detecting corners
import cv2
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
canny = cv2.Canny(gray, 120, 255, 1)
corners = cv2.goodFeaturesToTrack(canny,4,0.5,50)
for corner in corners:
x,y = corner.ravel()
cv2.circle(image,(x,y),5,(36,255,12),-1)
cv2.imshow('canny', canny)
cv2.imshow('image', image)
cv2.waitKey()
Code for detecting corners and performing perspective transform
import cv2
import numpy as np
def rotate_image(image, angle):
# Grab the dimensions of the image and then determine the center
(h, w) = image.shape[:2]
(cX, cY) = (w / 2, h / 2)
# grab the rotation matrix (applying the negative of the
# angle to rotate clockwise), then grab the sine and cosine
# (i.e., the rotation components of the matrix)
M = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
# Compute the new bounding dimensions of the image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
# Adjust the rotation matrix to take into account translation
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY
# Perform the actual rotation and return the image
return cv2.warpAffine(image, M, (nW, nH))
def order_points_clockwise(pts):
# sort the points based on their x-coordinates
xSorted = pts[np.argsort(pts[:, 0]), :]
# grab the left-most and right-most points from the sorted
# x-roodinate points
leftMost = xSorted[:2, :]
rightMost = xSorted[2:, :]
# now, sort the left-most coordinates according to their
# y-coordinates so we can grab the top-left and bottom-left
# points, respectively
leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
(tl, bl) = leftMost
# now, sort the right-most coordinates according to their
# y-coordinates so we can grab the top-right and bottom-right
# points, respectively
rightMost = rightMost[np.argsort(rightMost[:, 1]), :]
(tr, br) = rightMost
# return the coordinates in top-left, top-right,
# bottom-right, and bottom-left order
return np.array([tl, tr, br, bl], dtype="int32")
def perspective_transform(image, corners):
def order_corner_points(corners):
# Separate corners into individual points
# Index 0 - top-right
# 1 - top-left
# 2 - bottom-left
# 3 - bottom-right
corners = [(corner[0][0], corner[0][1]) for corner in corners]
top_r, top_l, bottom_l, bottom_r = corners[0], corners[1], corners[2], corners[3]
return (top_l, top_r, bottom_r, bottom_l)
# Order points in clockwise order
ordered_corners = order_corner_points(corners)
top_l, top_r, bottom_r, bottom_l = ordered_corners
# Determine width of new image which is the max distance between
# (bottom right and bottom left) or (top right and top left) x-coordinates
width_A = np.sqrt(((bottom_r[0] - bottom_l[0]) ** 2) + ((bottom_r[1] - bottom_l[1]) ** 2))
width_B = np.sqrt(((top_r[0] - top_l[0]) ** 2) + ((top_r[1] - top_l[1]) ** 2))
width = max(int(width_A), int(width_B))
# Determine height of new image which is the max distance between
# (top right and bottom right) or (top left and bottom left) y-coordinates
height_A = np.sqrt(((top_r[0] - bottom_r[0]) ** 2) + ((top_r[1] - bottom_r[1]) ** 2))
height_B = np.sqrt(((top_l[0] - bottom_l[0]) ** 2) + ((top_l[1] - bottom_l[1]) ** 2))
height = max(int(height_A), int(height_B))
# Construct new points to obtain top-down view of image in
# top_r, top_l, bottom_l, bottom_r order
dimensions = np.array([[0, 0], [width - 1, 0], [width - 1, height - 1],
[0, height - 1]], dtype = "float32")
# Convert to Numpy format
ordered_corners = np.array(ordered_corners, dtype="float32")
# Find perspective transform matrix
matrix = cv2.getPerspectiveTransform(ordered_corners, dimensions)
# Return the transformed image
return cv2.warpPerspective(image, matrix, (width, height))
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
canny = cv2.Canny(gray, 120, 255, 1)
corners = cv2.goodFeaturesToTrack(canny,4,0.5,50)
c_list = []
for corner in corners:
x,y = corner.ravel()
c_list.append([int(x), int(y)])
cv2.circle(image,(x,y),5,(36,255,12),-1)
corner_points = np.array([c_list[0], c_list[1], c_list[2], c_list[3]])
ordered_corner_points = order_points_clockwise(corner_points)
mask = np.zeros(image.shape, dtype=np.uint8)
cv2.fillPoly(mask, [ordered_corner_points], (255,255,255))
mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.015 * peri, True)
if len(approx) == 4:
transformed = perspective_transform(original, approx)
result = rotate_image(transformed, -90)
cv2.imshow('canny', canny)
cv2.imshow('image', image)
cv2.imshow('mask', mask)
cv2.imshow('transformed', transformed)
cv2.imshow('result', result)
cv2.waitKey()
find contours with RETR_EXTERNAL option.(gray -> gaussian filter -> canny edge -> find contour)
find the largest size contour -> this will be the edge of the rectangle
find corners with little calculation
Mat m;//image file
findContours(m, contours_, hierachy_, RETR_EXTERNAL);
auto it = max_element(contours_.begin(), contours_.end(),
[](const vector<Point> &a, const vector<Point> &b) {
return a.size() < b.size(); });
Point2f xy[4] = {{9000,9000}, {0, 1000}, {1000, 0}, {0,0}};
for(auto &[x, y] : *it) {
if(x + y < xy[0].x + xy[0].y) xy[0] = {x, y};
if(x - y > xy[1].x - xy[1].y) xy[1] = {x, y};
if(y - x > xy[2].y - xy[2].x) xy[2] = {x, y};
if(x + y > xy[3].x + xy[3].y) xy[3] = {x, y};
}
xy[4] will be the four corners.
I was able to extract four corners this way.
Apply houghlines to the canny image - you will get a list of points
apply convex hull to this set of points