Related
I can read text from an image using OCR. However, it works line by line.
I want to now group text based on solid lines surrounding the text.
For example, consider I have below rectangle banners. I can read text line by line. Fine! Now I want to group them by Board A,B,C and hold them in some data structure, so that I can identify, which lines belong to which board. It is given that images would be diagrams like this with solid lines around each block of text.
Please guide me on the right approach.
As mentioned in the comments by Yunus, you need to crop sub-images and feed them to an OCR module individually. An additional step could be ordering of the contours.
Approach:
Obtain binary image and invert it
Find contours
Crop sub-images based on the bounding rectangle for each contour
Feed each sub-image to OCR module (I used easyocr for demonstration)
Store text for each board in a dictionary
Code:
# Libraries import
import cv2
from easyocr import Reader
reader = Reader(['en'])
img = cv2.imread('board_text.jpg',1)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# inverse binary
th = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]
# find contours and sort them from left to right
contours, hierarchy = cv2.findContours(th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
contours = sorted(contours, key=lambda x: [cv2.boundingRect(x)[0], cv2.boundingRect(x)[1]])
#initialize dictionary
board_dictionary = {}
# iterate each contour and crop bounding box
for i, c in enumerate(cnts):
x,y,w,h = cv2.boundingRect(c)
crop_img = img[y:y+h, x:x+w]
# feed cropped image to easyOCR module
results = reader.readtext(crop_img)
# result is output per line
# create a list to append all lines in cropped image to it
board_text = []
for (bbox, text, prob) in results:
board_text.append(text)
# convert list of words to single string
board_para = ' '.join(board_text)
#print(board_para)
# store string within a dictionary
board_dictionary[str(i)] = board_para
Dictionary Output:
board_dictionary
{'0': 'Board A Board A contains Some Text, That goes Here Some spaces and then text again', '1': 'Board B Board B has some text too but sparse.', '2': 'Board €C Board C is wide and contains text with white spaces '}
Drawing each contour
img2 = img.copy()
for i, c in enumerate(cnts):
x,y,w,h = cv2.boundingRect(c)
img2 = cv2.rectangle(img2, (x, y), (x + w, y + h), (0,255,0), 3)
Note:
While working on different images make sure the ordering is correct.
Choice of OCR module is yours pytesseract and easyocr are the options I know.
This can be done by performing following steps:
Find the shapes.
Compute the shape centers.
Find the text boxes.
Compute the text boxes centers.
Associate the textboxes with shapes based on distance.
The code is as follows:
import cv2
from easyocr import Reader
import math
shape_number = 2
image = cv2.imread("./ueUco.jpg")
deep_copy = image.copy()
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(image_gray, 150, 255, cv2.THRESH_BINARY)
thresh = 255 - thresh
shapes, hierarchy = cv2.findContours(image=thresh, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(image=deep_copy, contours=shapes, contourIdx=-1, color=(0, 255, 0), thickness=2, lineType=cv2.LINE_AA)
shape_centers = []
for shape in shapes:
row = int((shape[0][0][0] + shape[3][0][0])/2)
column = int((shape[3][0][1] + shape[2][0][1])/2)
center = (row, column, shape)
shape_centers.append(center)
# cv2.imshow('Shapes', deep_copy)
# cv2.waitKey(0)
# cv2.destroyAllWindows()
languages = ['en']
reader = Reader(languages, gpu = True)
results = reader.readtext(image)
def cleanup_text(text):
return "".join([c if ord(c) < 128 else "" for c in text]).strip()
for (bbox, text, prob) in results:
text = cleanup_text(text)
(tl, tr, br, bl) = bbox
tl = (int(tl[0]), int(tl[1]))
tr = (int(tr[0]), int(tr[1]))
br = (int(br[0]), int(br[1]))
bl = (int(bl[0]), int(bl[1]))
column = int((tl[0] + tr[0])/2)
row = int((tr[1] + br[1])/2)
center = (row, column, bbox)
distances = []
for iteration, shape_center in enumerate(shape_centers):
shape_row = shape_center[0]
shape_column = shape_center[1]
dist = int(math.dist([column, row], [shape_row, shape_column]))
distances.append(dist)
min_value = min(distances)
min_index = distances.index(min_value)
if min_index == shape_number:
cv2.rectangle(image, tl, br, (0, 255, 0), 2)
cv2.putText(image, text, (tl[0], tl[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.imshow("Image", image)
cv2.waitKey(0)
cv2.imwrite(f"image_{shape_number}.jpg", image)
cv2.destroyAllWindows()
The output looks like this.
Please note that this solution is almost complete. You just have to compute the text embodied in each shape and put it in your desired data structure.
Note: shape_number represents the shape that you want to consider.
There is another solution that I would like you to work on.
Find all the text boxes.
Compute the centers for text boxes.
Run k-means clustering on the centers.
I would prefer the second solution but for the time being, I implemented the first one.
Problem description
We are trying to match a scanned image onto a template image:
Example of a scanned image:
Example of a template image:
The template image contains a collection of hearts varying in size and contour properties (closed, open left and open right). Each heart in the template is a Region of Interest for which we know the location, size, and contour type. Our goal is to match a scanned onto the template so that we can extract these ROIs in the scanned image. In the scanned image, some of these hearts are crossed, and they will be presented to a classifier that decides if they are crossed or not.
Our approach
Following a tutorial on PyImageSearch, we have attempted to use ORB to find matching keypoints (code included below). This should allow us to compute a perspective transform matrix that maps the scanned image on the template image.
We have tried some preprocessing steps such as thresholding and/or blurring the scanned image. We have also tried to increase the maximum number of features as much as possible.
The problem
The method fails to work for our image set. This can be seen in the following image:
It appears that a lot of keypoints are mapped to the wrong part of the template image, so the transform matrix is not calculated correctly.
Is ORB the right technique to use here, or are there parameters of the algorithm that could be fine-tuned to improve performance? It feels like we are missing out on something simple that should make it work, but we really don't know how to go forward with this approach :).
We are trying out an alternative technique where we cross-correlate the scan with individual heart shapes. This should give an image with peaks at the heart locations. By drawing a bounding box around these peaks we hope to map that bounding box on the bounding box of the template (I can elaborat on this upon request)
Any suggestions are greatly appreciated!
import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np
# Preprocessing parameters
THRESHOLD = True
BLUR = False
# ORB parameters
MAX_FEATURES = 4048
KEEP_PERCENT = .01
SHOW_DEBUG = True
# Convert both the input image and template to grayscale
scan_file = r'scan.jpg'
template_file = r'template.jpg'
scan = cv.imread(scan_file)
template = cv.imread(template_file)
scan_gray = cv.cvtColor(scan, cv.COLOR_BGR2GRAY)
template_gray = cv.cvtColor(template, cv.COLOR_BGR2GRAY)
if THRESHOLD:
_, scan_gray = cv.threshold(scan_gray, 127, 255, cv.THRESH_BINARY)
_, template_gray = cv.threshold(template_gray, 127, 255, cv.THRESH_BINARY)
if BLUR:
scan_gray = cv.blur(scan_gray, (5, 5))
template_gray = cv.blur(template_gray, (5, 5))
# Use ORB to detect keypoints and extract (binary) local invariant features
orb = cv.ORB_create(MAX_FEATURES)
(kps_template, desc_template) = orb.detectAndCompute(template_gray, None)
(kps_scan, desc_scan) = orb.detectAndCompute(scan_gray, None)
# Match the features
#method = cv.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING
#matcher = cv.DescriptorMatcher_create(method)
#matches = matcher.match(desc_scan, desc_template)
bf = cv.BFMatcher(cv.NORM_HAMMING)
matches = bf.match(desc_scan, desc_template)
# Sort the matches by their distances
matches = sorted(matches, key = lambda x : x.distance)
# Keep only the top matches
keep = int(len(matches) * KEEP_PERCENT)
matches = matches[:keep]
if SHOW_DEBUG:
matched_visualization = cv.drawMatches(scan, kps_scan, template, kps_template, matches, None)
plt.imshow(matched_visualization)
Based on the clarifications provided by #it_guy, I have attempted to find all the crossed hearts using just the scanned image. I would have to try the algorithm on more images to check whether this approach will generalize or not.
Binarize the scanned image.
gray_image = cv2.cvtColor(rgb_image, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray_image, 180, 255, cv2.THRESH_BINARY_INV)
Perform dilation to close small gaps in the outline of the hearts, and the curves representing crosses. Note - The structuring element np.ones((1,2), np.uint8 can be changed by running the algorithm through multiple images and finding the most suitable structuring element.
closing_original = cv2.morphologyEx(original_binary, cv2.MORPH_DILATE, np.ones((1,2), np.uint8)).
Find all the contours in the image. The contours include all hearts and the triangle at the bottom. We eliminate other contours like dots by placing constraints on the height and width of contours to filter them. Further, we also use contour hierachies to eliminate inner contours in cross hearts.
contours_original, hierarchy_original = cv2.findContours(closing_original, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
We iterate through each of the filtered contours.
Contour with normal heart -
Contour with crossed heart -
Let us observe the difference between these two types of hearts. If we look at the transition from white-to-black pixel and black-to-white pixel ( from top to bottom ) inside the normal heart, we see that for majority of the image columns the number of such transitions are 4. ( Top border - 2 transitions, bottom border - 2 transitions )
white-to-black pixel - (255, 255, 0, 0, 0)
black-to-white pixel - (0, 0, 255, 255, 255)
But, in the case of the crossed heart, the number of transitions for majority of the columns must be 6, since the crossed curve / line adds two more transitions inside the heart (black-to-white first, then white-to-black). Hence, among all image columns which have greater than or equal to 4 such transitions, if more than 40% of the columns have 6 transitions, then the given contour represents a crossed contour. Result -
Code -
import cv2
import numpy as np
def convert_to_binary(rgb_image):
gray_image = cv2.cvtColor(rgb_image, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray_image, 180, 255, cv2.THRESH_BINARY_INV)
return gray_image, thresh
original = cv2.imread('original.jpg')
height, width = original.shape[:2]
original_gray, original_binary = convert_to_binary(original) # Get binary image
cv2.imwrite("binary.jpg", original_binary)
closing_original = cv2.morphologyEx(original_binary, cv2.MORPH_DILATE, np.ones((1,2), np.uint8)) # Close small gaps in the binary image
cv2.imwrite("closed.jpg", closing_original)
contours_original, hierarchy_original = cv2.findContours(closing_original, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) # Get all the contours
bounding_rects_original = [cv2.boundingRect(c) for c in contours_original] # Get all contour bounding boxes
orig_boxes = list()
all_contour_image = original.copy()
for i, (x, y, w, h) in enumerate(bounding_rects_original):
if h > height / 2 or w > width / 2: # Eliminate extremely large contours
continue
if h < w / 2 or w < h / 2: # Eliminate vertical / horuzontal lines
continue
if w * h < 200: # Eliminate small area contours
continue
if hierarchy_original[0][i][3] != -1: # Eliminate contours created by heart crosses
continue
orig_boxes.append((x, y, w, h))
cv2.rectangle(all_contour_image, (x,y), (x + w, y + h), (0, 255, 0), 3)
# cv2.imshow("warped", closing_original)
cv2.imwrite("all_contours.jpg", all_contour_image)
final_image = original.copy()
for x, y, w, h in orig_boxes:
cropped_image = closing_original[y - 2 :y + h + 2, x: x + w] # Get the heart binary image
col_pixel_diffs = np.abs(np.diff(cropped_image.T.astype(np.int16))/255) # Obtain all consecutive pixel differences in all the columns
column_sums = np.sum(col_pixel_diffs, axis=1) # Get the sum of each column's transitions. This results in an array of size equal
# to the number of columns, each element representing the number of black-white and white-black transitions.
percent_crosses = np.sum(column_sums >= 6)/ np.sum(column_sums >= 4) # Percentage of columns with 6 transitions among columns with 4 transitions
if percent_crosses > 0.4: # Crossed heart criterion
cv2.rectangle(final_image, (x,y), (x + w, y + h), (0, 255, 0), 3)
cv2.imwrite("crossed_heart.jpg", cropped_image)
else:
cv2.imwrite("normal_heart.jpg", cropped_image)
cv2.imwrite("all_crossed_hearts.jpg", final_image)
This approach can be tested on more images to find its accuracy.
I have a picture like this:
And then I transform it into binary image and use canny to detect edge of the picture:
gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY)
edge = Image.fromarray(edges)
And then I get the result as:
I want to get the area of 2 like this:
My solution is to use HoughLines to find lines in the picture and calculate the area of triangle formed by lines. However, this way is not precise because the closed area is not a standard triangle. How to get the area of region 2?
A simple approach using floodFill and countNonZero could be the following code snippet. My standard quote on contourArea from the help:
The function computes a contour area. Similarly to moments, the area is computed using the Green formula. Thus, the returned area and the number of non-zero pixels, if you draw the contour using drawContours or fillPoly, can be different. Also, the function will most certainly give a wrong results for contours with self-intersections.
Code:
import cv2
import numpy as np
# Input image
img = cv2.imread('images/YMMEE.jpg', cv2.IMREAD_GRAYSCALE)
# Needed due to JPG artifacts
_, temp = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY)
# Dilate to better detect contours
temp = cv2.dilate(temp, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3)))
# Find largest contour
cnts, _ = cv2.findContours(temp, cv2.RETR_EXTERNAL , cv2.CHAIN_APPROX_NONE)
largestCnt = []
for cnt in cnts:
if (len(cnt) > len(largestCnt)):
largestCnt = cnt
# Determine center of area of largest contour
M = cv2.moments(largestCnt)
x = int(M["m10"] / M["m00"])
y = int(M["m01"] / M["m00"])
# Initiale mask for flood filling
width, height = temp.shape
mask = img2 = np.ones((width + 2, height + 2), np.uint8) * 255
mask[1:width, 1:height] = 0
# Generate intermediate image, draw largest contour, flood filled
temp = np.zeros(temp.shape, np.uint8)
temp = cv2.drawContours(temp, largestCnt, -1, 255, cv2.FILLED)
_, temp, mask, _ = cv2.floodFill(temp, mask, (x, y), 255)
temp = cv2.morphologyEx(temp, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3)))
# Count pixels in desired region
area = cv2.countNonZero(temp)
# Put result on original image
img = cv2.putText(img, str(area), (x, y), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, 255)
cv2.imshow('Input', img)
cv2.imshow('Temp image', temp)
cv2.waitKey(0)
Temporary image:
Result image:
Caveat: findContours has some problems one the right side, where the line is very close to the bottom image border, resulting in possibly omitting some pixels.
Disclaimer: I'm new to Python in general, and specially to the Python API of OpenCV (C++ for the win). Comments, improvements, highlighting Python no-gos are highly welcome!
There is a very simple way to find this area, if you take some assumptions that are met in the example image:
The area to be found is bounded on top by a line
Any additional lines in the image are above the line of interest
There are no discontinuities in the line
In this case, the area of the region of interest is given by the sum of the lengths from the bottom of the image to the first set pixel. We can compute this with:
import numpy as np
import matplotlib.pyplot as pp
img = pp.imread('/home/cris/tmp/YMMEE.jpg')
img = np.flip(img, axis=0)
pos = np.argmax(img, axis=0)
area = np.sum(pos)
print('Area = %d\n'%area)
This prints Area = 22040.
np.argmax finds the first set pixel on each column of the image, returning the index. By first using np.flip, we flip this axis so that the first pixel is actually the one on the bottom. The index corresponds to the number of pixels between the bottom of the image and the line (not including the set pixel).
Thus, we're computing the area under the line. If you need to include the line itself in the area, add pos.shape[0] to the area (i.e. the number of columns).
I would like to detect these points in this graph , also I wanted to detect the lines ,
I searched for edge detection and corner detection (such as Harris Corner Detector ), but I don't know how to handle such graph , I only need to know a sudo algorithm , or steps of going through such problem
Detect the vertices - segment by color( r>>max(g,b) ) and then apply the median or minimum filter of the appropriate size, or simply binary erode few times. Then just label the remaining connected blobs.
Detect the lines - use the simplified Hough Transform. Basicaly, draw a virtual line from the center of each vertix to all others and count the red pixels along the line. If there are plenty of them - the line exists, otherwise the two vertices are not connected.
Something like that:
import numpy as np
from scipy.misc import imshow, imsave, imread
from scipy.ndimage import filters, morphology, measurements
from skimage.draw import line
img = imread("laGK6.jpg")
r = img[:,:, 0]
g = img[:,:, 1]
b = img[:,:, 2]
mask = (r.astype(np.float)-np.maximum(g,b) ) > 20
mask2 = morphology.binary_erosion(mask)
mask2 = morphology.binary_erosion(mask2)
mask2 = morphology.binary_erosion(mask2)
mask2 = morphology.binary_erosion(mask2)
mask2 = morphology.binary_dilation(mask2)
label, numfeatures = measurements.label(mask2)
mc = measurements.center_of_mass(mask2, label, range(1,numfeatures+1) )
mask3 = np.zeros_like(mask2)
for p in mc:
mask3[p[0], p[1]]=255
arr = range(numfeatures)
connections=[]
for i in range( numfeatures):
arr.remove(i)
for j in arr:
rr,cc = line(mc[i][0], mc[i][1], mc[j][0], mc[j][1])
mask3[rr,cc]=255
ms = np.sum(mask[rr,cc]).astype(np.float)/len(rr)
if ms > 0.9:
connections.append((i,j))
print "vertices: ", mc
print "connections: ", connections
This outputs the following:
vertices: [(76.551724137931032, 288.72413793103448),
(76.568181818181813, 613.61363636363637), (138.72727272727272,
126.04545454545455), (139.33333333333334, 450.33333333333331), (265.18181818181819, 207.5151515151515), (264.96666666666664,
369.53333333333336), (265.41379310344826, 694.51724137931035), (265.51724137931035, 45.379310344827587), (327.57692307692309,
532.42307692307691)]
connections: [(0, 4), (0, 5), (1, 6), (1, 8), (2, 4), (2, 7), (3, 5), (3, 8)]
I am also working on a project that detects shapes in a drawing. I am not sure if it will solve your problem as well but here is what I have done for such problems.
I am assuming that you need the value of X and Y coordinates of those edge points
first thing you need is the X and Y values of the complete shape
next inside a loop put an if condition saying "get this point if
Y[i]<Y[i+1] and Y[i]<Y[i-1]". point whose next and previous points have Y greater than the value of current Y.
this condition will give you the X and Y values of the edge points.
Good Luck
If the graph is always the same color, and the vertices are always marked with squares, you can threshold the image by its color to detect lines and vertices. Then look for connected sets of pixels whose width and height are exactly the ones you can just measure.
I want to do something similar to the levels function in Photoshop, but can't find the right openCV functions.
Basically I want to stretch the greys in an image to go from almost white to practically black instead of from almost white to slightly greyer, while leaving white as white and black as black (I am using greyscale images).
The following python code fully implements Photoshop Adjustments -> Levels dialog.
Change the values for each channel to the desired ones.
img is input rgb image of np.uint8 type.
inBlack = np.array([0, 0, 0], dtype=np.float32)
inWhite = np.array([255, 255, 255], dtype=np.float32)
inGamma = np.array([1.0, 1.0, 1.0], dtype=np.float32)
outBlack = np.array([0, 0, 0], dtype=np.float32)
outWhite = np.array([255, 255, 255], dtype=np.float32)
img = np.clip( (img - inBlack) / (inWhite - inBlack), 0, 255 )
img = ( img ** (1/inGamma) ) * (outWhite - outBlack) + outBlack
img = np.clip( img, 0, 255).astype(np.uint8)
I think this is a function mapping input levels to output levels as shown below in the figure.
For example, the orange curve is a straight line from (a, c) to (b, d), blue curve is a straight line from (a, d) to (b, c) and green curve is a non-linear function from (a,c) to (b, d).
We can define the blue curve as (x - a)/(y - d) = (a - b)/(d - c).
Limiting values of a, b, c and d depend on the range supported by the channel that you are applying this transformation to. For gray scale this is [0, 255].
For example, if you want a transformation like (a, d) = (10, 200), (b, c) = (250, 50) for a gray scale image,
y = -150*(x-10)/240 + 200 for x [10, 250]
y = x for [0, 10) and (250, 255] if you want remaining values unchanged.
You can use a lookup table in OpenCV (LUT function) to calculate the output levels and apply this transformation to your image or the specific channel. You can apply any piecewise transformation this way.
I don't know what are the "Photoshop levels". But from description, I think you should try the following:
Convert your image to YUV using cvtColor. Y will represent the intensity plane. (You can also use Lab, Luv, or any similar colorspace with separate intensity component).
Split the planes using split, so that the intensity plane will be a separate image.
Call equalizeHist on the intensity plane
Merge the planes back together using merge
Details on histogram equalization can be found here
Also note, that there's an implementation of somewhat improved histogram equalization method - CLAHE (but I can't find a better link than this, also #berak suggested a good link on the topic)