Adjusting pytesseract parameters - opencv

Note: I am migrating this question from Data Science Stack Exchange, where it received little exposure.
I am trying to implement an OCR solution to identify the numbers read from the picture of a screen.
I am adapting this pyimagesearch tutorial to my problem.
Because I am dealing with a dark background, I first invert the image, before converting it to grayscale and thresholding it:
inverted_cropped_image = cv2.bitwise_not(cropped_image)
gray = get_grayscale(inverted_cropped_image)
thresholded_image = cv2.threshold(gray, 100, 255, cv2.THRESH_BINARY)[1]
Then I call pytesseract's image_to_data function to output a dictionary containing the different text regions and their confidence intervals:
from pytesseract import Output
results = pytesseract.image_to_data(thresholded_image, output_type=Output.DICT)
Finally I iterate over results and plot them when their confidence exceeds a user defined threshold (70%). What bothers me, is that my script identifies everything in the image except the number that I would like to recognize (1227.938).
My first guess is that the image_to_data parameters are not set properly.
Checking this website, I selected a page segmentation mode (psm) of 11 (sparse text) and tried whitelisting numbers only (tessedit_char_whitelist=0123456789m.'):
results = pytesseract.image_to_data(thresholded_image, config='--psm 11 --oem 3 -c tessedit_char_whitelist=0123456789m.', output_type=Output.DICT)
Alas, this is even worse, and the script now identifies nothing at all!
Do you have any suggestion? Am I missing something obvious here?
EDIT #1:
At Ann Zen's request, here's the code used to obtain the first image:
import imutils
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pytesseract
from pytesseract import Output
def get_grayscale(image):
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
filename = "IMAGE.JPG"
cropped_image = cv2.imread(filename)
inverted_cropped_image = cv2.bitwise_not(cropped_image)
gray = get_grayscale(inverted_cropped_image)
thresholded_image = cv2.threshold(gray, 100, 255, cv2.THRESH_BINARY)[1]
results = pytesseract.image_to_data(thresholded_image, config='--psm 11 --oem 3 -c tessedit_char_whitelist=0123456789m.', output_type=Output.DICT)
color = (255, 255, 255)
for i in range(0, len(results["text"])):
x = results["left"][i]
y = results["top"][i]
w = results["width"][i]
h = results["height"][i]
text = results["text"][i]
conf = int(results["conf"][i])
print("Confidence: {}".format(conf))
if conf > 70:
print("Confidence: {}".format(conf))
print("Text: {}".format(text))
print("")
text = "".join([c if ord(c) < 128 else "" for c in text]).strip()
cv2.rectangle(cropped_image, (x, y), (x + w, y + h), color, 2)
cv2.putText(cropped_image, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX,1.2, color, 3)
cv2.imshow('Image', cropped_image)
cv2.waitKey(0)
EDIT #2:
Rarely have I spent reputation points so well! All three replies posted so far helped me refine my algorithm.
First, I wrote a Tkinter program allowing me to manually crop the image around the number of interest (modifying the one found in this SO post)
Then I used Ann Zen's idea of narrowing down the search area around the fractional part. I am using her nifty process function to prepare my grayscale image for contour extraction: contours, _ = cv2.findContours(process(img_gray), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE). I am using RETR_EXTERNAL to avoid dealing with overlapping bounding rectangles.
I then sorted my contours from left to right. Bounding rectangles exceeding a user-defined threshold are associated with the integral part (white rectangles); otherwise they are associated with the fractional part (black rectangles).
I then extracted the characters using Esraa's approach i.e. applying a Gaussian blur prior to calling Tesseract. I used a much larger kernel (15x15 vs 3x3) to achieve this.
I am not out of the woods yet, but hopefully I will get better results by using Ahx's adaptive thresholding.

The Concept
As you have probably heard, pytesseract is not good at detecting text of different sizes on the same line as one piece of text. In your case, you want to detect the 1227.938, where the 1227 is much larger than the .938.
One way to go about solving this is to have the program estimate where the .938 is, and enlarge that part of the image. After that, pytesseract will have no problem in returning the text.
The Code
import cv2
import numpy as np
import pytesseract
def process(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(img_gray, 200, 255, cv2.THRESH_BINARY)
img_canny = cv2.Canny(thresh, 100, 100)
kernel = np.ones((3, 3))
img_dilate = cv2.dilate(img_canny, kernel, iterations=2)
return cv2.erode(img_dilate, kernel, iterations=2)
img = cv2.imread("image.png")
img_copy = img.copy()
hh = 50
contours, _ = cv2.findContours(process(img), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
for cnt in contours:
if 20 * hh < cv2.contourArea(cnt) < 30 * hh:
x, y, w, h = cv2.boundingRect(cnt)
ww = int(hh / h * w)
src_seg = img[y: y + h, x: x + w]
dst_seg = img_copy[y: y + hh, x: x + ww]
h_seg, w_seg = dst_seg.shape[:2]
dst_seg[:] = cv2.resize(src_seg, (ww, hh))[:h_seg, :w_seg]
gray = cv2.cvtColor(img_copy, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY)
results = pytesseract.image_to_data(thresh)
for b in map(str.split, results.splitlines()[1:]):
if len(b) == 12:
x, y, w, h = map(int, b[6: 10])
cv2.putText(img, b[11], (x, y + h + 15), cv2.FONT_HERSHEY_COMPLEX, 0.6, 0)
cv2.imshow("Result", img)
cv2.waitKey(0)
The Output
Here is the input image:
And here is the output image:
As you have said in your post, the only part you need the the decimal 1227.938. If you want to filter out the rest of the detected text, you can try tweaking some parameters. For example, replacing the 180 from _, thresh = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY) with 230 will result in the output image:
The Explanation
Import the necessary libraries:
import cv2
import numpy as np
import pytesseract
Define a function, process(), that will take in an image array, and return a binary image array that is the processed version of the image that will allow proper contour detection:
def process(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(img_gray, 200, 255, cv2.THRESH_BINARY)
img_canny = cv2.Canny(thresh, 100, 100)
kernel = np.ones((3, 3))
img_dilate = cv2.dilate(img_canny, kernel, iterations=2)
return cv2.erode(img_dilate, kernel, iterations=2)
I'm sure that you don't have to do this, but due to a problem in my environment, I have to add pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' before I can call the pytesseract.image_to_data() method, or it throws an error:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
Read in the original image, make a copy of it, and define the rough height of the large part of the decimal:
img = cv2.imread("image.png")
img_copy = img.copy()
hh = 50
Detect the contours of the processed version of the image, and add a filter that roughly filters out the contours so that the small text remains:
contours, _ = cv2.findContours(process(img), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
for cnt in contours:
if 20 * hh < cv2.contourArea(cnt) < 30 * hh:
Define the bounding box of each contour that didn't get filtered out, and use the properties to enlarge those parts of the image to the height defined for the large text (making sure to also scale the width accordingly):
x, y, w, h = cv2.boundingRect(cnt)
ww = int(hh / h * w)
src_seg = img[y: y + h, x: x + w]
dst_seg = img_copy[y: y + hh, x: x + ww]
h_seg, w_seg = dst_seg.shape[:2]
dst_seg[:] = cv2.resize(src_seg, (ww, hh))[:h_seg, :w_seg]
Finally, we can use the pytesseract.image_to_data() method to detect the text. Of course, we'll need to threshold the image again:
gray = cv2.cvtColor(img_copy, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY)
results = pytesseract.image_to_data(thresh)
for b in map(str.split, results.splitlines()[1:]):
if len(b) == 12:
x, y, w, h = map(int, b[6: 10])
cv2.putText(img, b[11], (x, y + h + 15), cv2.FONT_HERSHEY_COMPLEX, 0.6, 0)
cv2.imshow("Result", img)
cv2.waitKey(0)

I have been working with Tesseract for quite some time, so let me clarify something for you. Tesseract is extremely helpful if you're trying to recognize text in documents more than any other computer vision projects. It usually needs a binarized image to get a good output. Therefore, you will always need some image pre-processing.
However, after several trials in the past with all page segmentation modes, I realized that it fails when font size differs on the same line without having a space. Sometimes PSM 6 is helpful if the difference is low, but in your condition, you may try an alternative. If you don't care about the decimals, you may try the following solution:
img = cv2.imread(r'E:\Downloads\Iwzrg.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_blur = cv2.GaussianBlur(gray, (3,3),0)
_,thresh = cv2.threshold(img_blur,200,255,cv2.THRESH_BINARY_INV)
# If using a fixed camera
new_img = thresh[0:100, 80:320]
text = pytesseract.image_to_string(new_img, lang='eng', config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789')
OUTPUT: 1227

I would like to recommend applying another image processing method.
Because I am dealing with a dark background, I first invert the image, before converting it to grayscale and thresholding it:
You applied global thresholding and couldn't achieve the desired result.
Then you can apply either adaptive-thresholding or inRange
For the given image, if we apply the inRange threshold:
To be able to recognize the image as accurately as possible we can add a border to the top of the image and resize the image (Optional)
In the OCR section, check if the detected region contains a digit
if text.isdigit():
Then display on the image:
The result is nearly the desired value. Now you can try with the other suggested methods to find the exact value.
The problem is .938 recognized as 235, maybe resizing using different values might improve the result.
Code:
from cv2 import imread, cvtColor, COLOR_BGR2HSV as HSV, inRange, getStructuringElement, resize
from cv2 import imshow, waitKey, MORPH_RECT, dilate, bitwise_and, rectangle, putText
from cv2 import copyMakeBorder as addBorder, BORDER_CONSTANT as CONSTANT, FONT_HERSHEY_SIMPLEX
from numpy import array
from pytesseract import image_to_data, Output
bgr = imread("Iwzrg.png")
resized = resize(bgr, (800, 600), fx=0.75, fy=0.75)
bordered = addBorder(resized, 200, 0, 0, 0, CONSTANT, value=0)
hsv = cvtColor(bordered, HSV)
mask = inRange(hsv, array([0, 0, 250]), array([179, 255, 255]))
kernel = getStructuringElement(MORPH_RECT, (50, 30))
dilated = dilate(mask, kernel, iterations=1)
thresh = 255 - bitwise_and(dilated, mask)
data = image_to_data(thresh, output_type=Output.DICT)
for i in range(0, len(data["text"])):
x = data["left"][i]
y = data["top"][i]
w = data["width"][i]
h = data["height"][i]
text = data["text"][i]
if text.isdigit():
print("Text: {}".format(text))
print("")
text = "".join([c if ord(c) < 128 else "" for c in text]).strip()
rectangle(thresh, (x, y), (x + w, y + h), (0, 255, 0), 2)
putText(thresh, text, (x, y - 10), FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)
imshow("", thresh)
waitKey(0)

Related

How to increase word-spacing threshold in Tesseract OCR?

I am trying to detect 'words' (code-names) in an image file using pytesseract.
As you can see below, the complete word is not detected and is broken due to spacing. The actual images I want to annotate have many such words and each one has similar kind of spacing issue. I have tried changing the psm and provided custom parameters as suggested in some existing posts, but no luck.
How can I increase the word-spacing threshold in tesseract so that it ignores such spaces?
Here is the code I've written:
import cv2 as cv
import pytesseract
from pytesseract import Output
import re
custom_config = r'--psm 3 textord_space_size_is_variable=1 tosp_min_sane_kn_sp=3'
img = cv.imread('text-detection.jpg')
img_gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
a = pytesseract.image_to_data(img_gray, output_type=Output.DICT, config=custom_config)
pattern = '69.'
a_boxes = len(a['text'])
for i in range(a_boxes):
if re.match(pattern, a['text'][i]):
(x, y, w, h) = (a['left'][i], a['top'][i], a['width'][i], a['height'][i])
img = cv.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv.imwrite('result_image.png', img)
Input image:
Output:
Expected output is the word 4"-P6910509-151440Y is marked.

issue of the recognize people by their clothes color with not severe illumination environments

I am interested in the human following using a real robot.
I'd like to use the color of clothes as a key feature to identify the target person in front of the robot to follow him/ her but I am suffering due to it is a weak feature with a very simple illumination changing. So, I need to alter this algorithm to another or update values (RGB) online in real-time but I don't have enough experience with image processing.
this is my full code for color detection:
import cv2
import numpy as np
from imutils.video import FPS
# capturing video through webcam
import time
cap = cv2.VideoCapture(0)
width = cap.get(3) # float
height = cap.get(4) # float
print width, height
time.sleep(2.0)
fps = FPS().start()
while (1):
_, img = cap.read()
if _ is True:
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
else:
continue
# blue color
blue_lower = np.array([99, 115, 150], np.uint8)
blue_upper = np.array([110, 255, 255], np.uint8)
blue = cv2.inRange(hsv, blue_lower, blue_upper)
kernal = np.ones((5, 5), "uint8")
blue = cv2.dilate(blue, kernal)
res_blue = cv2.bitwise_and(img, img, mask=blue)
# Tracking blue
(_, contours, hierarchy) = cv2.findContours(blue, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for pic, contour in enumerate(contours):
area = cv2.contourArea(contour)
if (area > 300):
x, y, w, h = cv2.boundingRect(contour)
img = cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
cv2.putText(img, "Blue Colour", (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0))
cv2.imshow("Color Tracking", img)
if cv2.waitKey(10) & 0xFF == ord('q'):
cap.release()
cv2.destroyAllWindows()
break
fps.update()
# stop the timer and display FPS information
fps.stop()
# print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
# print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
these are the outputs:
1- recognize a person by his clothes color
2- it is lost, the illumination changing is a very simple not severe
Any ideas or suggestions will be appreciated
It does look like you need to use a bit more advanced color similarity function to handle complex cases. Delta E will be the right starting point.
Proper threshold or several colors with associated thresholds will help to achieve pretty accurate results:
See the list of colours on the right side
Complete example.

Moroccan License Plate Recognition (LPR) using OpenCV and Tesseract

I'm working on a project about recognizing moroccan license plates which look like this image :
Moroccan License Plate
Please how can I use OpenCV to cut the license plate out and Tesseract to read the numbers and arabic letter in the middle.
I have looked into this research paper : https://www.researchgate.net/publication/323808469_Moroccan_License_Plate_recognition_using_a_hybrid_method_and_license_plate_features
I have installed OpenCV and Tesseract for python in Windows 10. When I run the tesseract on the text only part of the license plate using "fra" language I get 7714315l Bv. How can I separate the data?
Edit:
The arabic letters we use in Morocco are :
أ ب ت ج ح د هـ
The expected result is : 77143 د 6
The vertical lines are irrelevant, I have to use them to separate the image and read data separately.
Thanks in advance!
You can use HoughTransform since the two vertical lines are irrelevant, to crop the image:
import numpy as np
import cv2
image = cv2.imread("lines.jpg")
grayImage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
dst = cv2.Canny(grayImage, 0, 150)
cv2.imwrite("canny.jpg", dst)
lines = cv2.HoughLinesP(dst, 1, np.pi / 180, 50, None, 60, 20)
lines_x = []
# Get height and width to constrain detected lines
height, width, channels = image.shape
for i in range(0, len(lines)):
l = lines[i][0]
# Check if the lines are vertical or not
angle = np.arctan2(l[3] - l[1], l[2] - l[0]) * 180.0 / np.pi
if (l[2] > width / 4) and (l[0] > width / 4) and (70 < angle < 100):
lines_x.append(l[2])
# To draw the detected lines
#cv2.line(image, (l[0], l[1]), (l[2], l[3]), (0, 0, 255), 3, cv2.LINE_AA)
#cv2.imwrite("lines_found.jpg", image)
# Sorting to get the line with the maximum x-coordinate for proper cropping
lines_x.sort(reverse=True)
crop_image = "cropped_lines"
for i in range(0, len(lines_x)):
if i == 0:
# Cropping to the end
img = image[0:height, lines_x[i]:width]
else:
# Cropping from the start
img = image[0:height, 0:lines_x[i]]
cv2.imwrite(crop_image + str(i) + ".jpg", img)
I am sure you know now how to get the middle part ;)
Hope it helps!
EDIT:
Using some morphological operations, you can also extract the characters individually:
import numpy as np
import cv2
image = cv2.imread("lines.jpg")
grayImage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
dst = cv2.Canny(grayImage, 50, 100)
dst = cv2.morphologyEx(dst, cv2.MORPH_RECT, np.zeros((5,5), np.uint8),
iterations=1)
cv2.imwrite("canny.jpg", dst)
im2, contours, heirarchy = cv2.findContours(dst, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_NONE)
for i in range(0, len(contours)):
if cv2.contourArea(contours[i]) > 200:
x,y,w,h = cv2.boundingRect(contours[i])
# The w constrain to remove the vertical lines
if w > 10:
cv2.rectangle(image, (x, y), (x+w, y+h), (0, 0, 255), 1)
cv2.imwrite("contour.jpg", image)
Result:
This what I achieved by now...
The detection on second image was made by using the code found here: License plate detection with OpenCV and Python
Full code (which work from the third image an on) is this:
import cv2
import numpy as np
import tesserocr as tr
from PIL import Image
image = cv2.imread("cropped.png")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('gray', image)
thresh = cv2.adaptiveThreshold(gray, 250, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 255, 1)
cv2.imshow('thresh', thresh)
kernel = np.ones((1, 1), np.uint8)
img_dilation = cv2.dilate(thresh, kernel, iterations=1)
im2, ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0])
clean_plate = 255 * np.ones_like(img_dilation)
for i, ctr in enumerate(sorted_ctrs):
x, y, w, h = cv2.boundingRect(ctr)
roi = img_dilation[y:y + h, x:x + w]
# these are very specific values made for this image only - it's not a factotum code
if h > 70 and w > 100:
rect = cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
clean_plate[y:y + h, x:x + w] = roi
cv2.imshow('ROI', rect)
cv2.imwrite('roi.png', roi)
img = cv2.imread("roi.png")
blur = cv2.medianBlur(img, 1)
cv2.imshow('4 - blur', blur)
pil_img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
api = tr.PyTessBaseAPI()
try:
api.SetImage(pil_img)
boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
text = api.GetUTF8Text()
finally:
api.End()
# clean the string a bit
text = str(text).strip()
plate = ""
# 77143-1916 ---> NNNNN|symbol|N
for char in text:
firstSection = text[:5]
# the arabic symbol is easy because it's nearly impossible for the OCR to misunderstood the last 2 digit
# so we have that the symbol is always the third char from the end (right to left)
symbol = text[-3]
lastChar = text[-1]
plate = firstSection + "[" + symbol + "]" + lastChar
print(plate)
cv2.waitKey(0)
For arabic symbols you should install additional languages from TesseractOCR (and possibly use the version 4 of it).
Output: 77143[9]6
The number between brackets is the arabic symbol (undetected).
Hope I helped you.

Extract face rectangle from ID card

I’m researching the subject of extracting the information from ID cards and have found a suitable algorithm to locate the face on the front. As it is, OpenCV has Haar cascades for that, but I’m unsure what can be used to extract the full rectangle that person is in instead of just the face (as is done in https://github.com/deepc94/photo-id-ocr). The few ideas that I’m yet to test are:
Find second largest rectangle that’s inside the card containing the face rect
Do “explode” of the face rectangle until it hits the boundary
Play around with filters to see what can be seen
What can be recommended to try here as well? Any thoughts, ideas or even existing examples are fine.
Normal approach:
import cv2
import numpy as np
import matplotlib.pyplot as plt
image = cv2.imread("a.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_,thresh = cv2.threshold(gray,128,255,cv2.THRESH_BINARY)
cv2.imshow("thresh",thresh)
thresh = cv2.bitwise_not(thresh)
element = cv2.getStructuringElement(shape=cv2.MORPH_RECT, ksize=(7, 7))
dilate = cv2.dilate(thresh,element,6)
cv2.imshow("dilate",dilate)
erode = cv2.erode(dilate,element,6)
cv2.imshow("erode",erode)
morph_img = thresh.copy()
cv2.morphologyEx(src=erode, op=cv2.MORPH_CLOSE, kernel=element, dst=morph_img)
cv2.imshow("morph_img",morph_img)
_,contours,_ = cv2.findContours(morph_img,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
areas = [cv2.contourArea(c) for c in contours]
sorted_areas = np.sort(areas)
cnt=contours[areas.index(sorted_areas[-3])] #the third biggest contour is the face
r = cv2.boundingRect(cnt)
cv2.rectangle(image,(r[0],r[1]),(r[0]+r[2],r[1]+r[3]),(0,0,255),2)
cv2.imshow("img",image)
cv2.waitKey(0)
cv2.destroyAllWindows()
I found the first two biggest contours are the boundary, the third biggest contour is the face. Result:
There is also another way to investigate the image, using sum of pixel values by axises:
x_hist = np.sum(morph_img,axis=0).tolist()
plt.plot(x_hist)
plt.ylabel('sum of pixel values by X-axis')
plt.show()
y_hist = np.sum(morph_img,axis=1).tolist()
plt.plot(y_hist)
plt.ylabel('sum of pixel values by Y-axis')
plt.show()
Base on those pixel sums over 2 asixes, you can crop the region you want by setting thresholds for it.
Haarcascades approach (The most simple)
# Using cascade Classifiers
import numpy as np
import cv2
# We point OpenCV's CascadeClassifier function to where our
# classifier (XML file format) is stored
face_classifier = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# Load our image then convert it to grayscale
image = cv2.imread('./your/image/path.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('Original image', image)
# Our classifier returns the ROI of the detected face as a tuple
# It stores the top left coordinate and the bottom right coordiantes
faces = face_classifier.detectMultiScale(gray, 1.3, 5)
# When no faces detected, face_classifier returns and empty tuple
if faces is ():
print("No faces found")
# We iterate through our faces array and draw a rectangle
# over each face in faces
for (x, y, w, h) in faces:
x = x - 25 # Padding trick to take the whole face not just Haarcascades points
y = y - 40 # Same here...
cv2.rectangle(image, (x, y), (x + w + 50, y + h + 70), (27, 200, 10), 2)
cv2.imshow('Face Detection', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Link to the haarcascade_frontalface_default file
update to #Sanix darker code,
# Using cascade Classifiers
import numpy as np
import cv2
img = cv2.imread('link_to_your_image')
face_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
scale_percent = 60 # percent of original size
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)
# resize image
image = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# face classifier
faces = face_classifier.detectMultiScale(gray, 1.3, 5)
# When no faces detected, face_classifier returns and empty tuple
if faces is ():
print("No faces found")
# We iterate through our faces array and draw a rectangle
# over each face in faces
for (x, y, w, h) in faces:
x = x - 25 # Padding trick to take the whole face not just Haarcascades points
y = y - 40 # Same here...
cv2.rectangle(image, (x, y), (x + w + 50, y + h + 70), (27, 200, 10), 2)
cv2.imshow('Face Detection', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
# if you want to crop the face use below code
for (x, y, width, height) in faces:
roi = image[y:y+height, x:x+width]
cv2.imwrite("face.png", roi)

How to find number from image in OCR?

I'm trying to get the number contours from an image.
Original image is in number_img:
After I've used the following code:
gray = cv2.cvtColor(number_img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (1, 1), 0)
ret, thresh = cv2.threshold(blur, 70, 255, cv2.THRESH_BINARY_INV)
img2, contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for c in contours:
area = cv2.contourArea(c)
[x, y, w, h] = cv2.boundingRect(c)
if (area > 50 and area < 1000):
[x, y, w, h] = cv2.boundingRect(c)
cv2.rectangle(number_img, (x, y), (x + w, y + h), (0, 0, 255), 2)
Since there are small boxes in between, I tried to limit with height:
if (area > 50 and area < 1000) and h > 50:
[x, y, w, h] = cv2.boundingRect(c)
cv2.rectangle(number_img, (x, y), (x + w, y + h), (0, 0, 255), 2)
What other ways should I do to get the best contours of number to do OCR?
Thanks.
Just tried in Matlab. Hopefully you can adapt the code to OpenCV and tweak some parameters. It is not clear the right most blob is a number or not.
img1 = imread('DSYEW.png');
% first we can convert the grayscale image you provided to a binary
% (logical) image. It is always the best option in image preprocessing.
% Here I used the threshold .28 based on your image. But you may change it
% for a general solution.
img = im2bw(img1,.28);
% Then we can use the Matlab 'regionprops' command to identify the
% individual blobs in binary image. 'regionprops' gives us an output, the
% Area of the each blob.
s = regionprops(imcomplement(img));
% Now as you did, we can filter out the bounding boxes with an area
% threshold. I used 350 originally. But it can be changed for a better
% output.
s([s.Area] < 350) = [];
% Now we draw each bounding box on the image.
figure; imshow(img);
for k = 1 : length(s)
bb = s(k).BoundingBox;
rectangle('Position', [bb(1),bb(2),bb(3),bb(4)],...
'EdgeColor','r','LineWidth',2 )
end
Output image:
Update 1:
Just changed the area parameter in the above code as follows. Unfortunately I don't have Python OpenCV in my Mac. But, it is all about tweaking the parameters in you code.
s([s.Area] < 373) = [];
Output image:
Update 2:
Numbers 3 and 4 in the above figure were detected as one digit. If you look at carefully you are see that 3 and 4 are connected with each other, and that is why above code detected it as a single digit. So I used the imdilate function to get rid of that. Next, in your code, even the white holes inside some digits were detected as digits. To eliminate that we can fill the holes using imfill in Matlab.
Updated code:
img1 = imread('TCXeuO9.png');
img = im2bw(img1,.28);
img = imcomplement(img);
img = imfill(img,'holes');
img = imcomplement(img);
se = strel('line',2,90);
img = imdilate(img, se);
s = regionprops(imcomplement(img));
s([s.Area] < 330) = [];
figure; imshow(img);
for k = 1 : length(s)
bb = s(k).BoundingBox;
rectangle('Position', [bb(1),bb(2),bb(3),bb(4)],...
'EdgeColor','r','LineWidth',2 )
end
Output image:

Resources