Tesseract OCR not recognizing perfectly cut out characters - opencv

So I am working on a project in which it is necessary to read characters off of license plates. Given an image of (just) the license plate I'm using openCV to segment the characters and get their bounding boxes. Then the individual characters are cut out and I'd like to use Tesseract to recognize what the characters are.
Problem is: I'm getting really bad results, even though the characters seem perfectly cut out by openCV. I've included some example images below. Tesseract either fails to detect any character at all, or detects entirely wrong characters (I don't mean it confuses a 0 with an O, or 1 and l...it, detects 7, as an example, if there is a 4 clearly visible).
Is there anything I am doing wrong, or have I misunderstood the options I am setting? Help would be greatly appreciated, as I'm not seeing why Tesseract shouldn't recognize these characters.
(I'm using Tesseract OCR v4, in the LSTM mode)

You can recognize by pytesseract in two-steps
Applying adaptive-threshold
Setting page-segmentation-mode to 6
Adaptive-threshold
Here, the algorithm determines the threshold for a pixel based on a small region around it. So we get different thresholds for different regions of the same image which gives better results for images with varying illumination. source
Adaptive-threshold result below
Adaptive-threshold result below
pytesseract result below
pytesseract result below
4
9
Code:
import cv2
import pytesseract
img_lst = ["four.png", "nine.png"]
for pth in img_lst:
img = cv2.imread(pth)
img = cv2.resize(img, (28, 28))
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 47, 2)
txt = pytesseract.image_to_string(thr, config="--psm 6 digits")
print(txt)

Related

Using Tesseract to read dates from a small images

I have a rather small set of images which contains dates. The size might be a problem, but I'd say that the quality is OK. I have followed the guidelines to provide the clearest image I can to the engine. After resizing, apply filters, lots of trial and error, etc. I came up with an image that is almost properly read. I put an example below:
Now, this is read as “9 MAR 2021\n\x0c. Not bad, but the first 2 is read as ". At this point I think I'm misusing part of the power of Tesseract. After all, I know what it should expect, i.e. something as "%d %b %Y".
Is there a way to tell Tesseract that it should try to find the best match given this strong constraint? Providing this metadata to the engine should heavily facilitate the task. I have been reading the documentation, but I can't find the way to do this.
I'm using pytesseract on Tesseract 4.1. with Pytyon 3.9.
You need to know the followings:
Improving the quality of the output
When to apply - dilation or erosion
Centering the image
Now if we center the image (by adding borders):
We up-sample the image without losing any pixel.
Second, we need to make the characters in the image bold to make the OCR result accurate.
Now OCR:
29 MAR 2021
Code:
import cv2
import pytesseract
# Load the image
img = cv2.imread("xsGBK.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[0, 0, 0])
# Convert to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Dilate
gry = cv2.dilate(gry, None, iterations=1)
# OCR
print(pytesseract.image_to_string(gry))
# Display
cv2.imshow("", gry)
cv2.waitKey(0)

Count lines in image

I am planning to use Opencv on Raspberry pi 3 with camera to count lines in the following image
It will be used in machine which produce threads. If one (or more) will be lost it will stop the machine.
Now i am wondering how to do that...?
I will do a loop to capture the images
I will crop the images to see only the part with lines
I will convert it to black&white
how to count them ? in a loop - check pixel value change? Or there is a better/faster idea?
Thank you for advice!
EDIT
P.S.
I used cv2.findContours (answer from Jeru Luke).
I've putted an A4 sheet with black lines in front of camera. I works ok in while loop BUT... i have 43 lines on sheet. When camera detects some differences i wrote the results to file. Sometimes i have 710,800,67 etc.
Pls look on file with values i have https://www.dropbox.com/s/jnn4w8mq3rrtppo/bledy.txt?dl=0
lines....The error is permanet for some few secounds. Tere is noting wronge when i have 43,43,43,43,44,43,43,43 (the one is wrong) because i watch few values before i put error. But when the are hundreds of bad values i have no idea...
I have something relatively simpler. It does not involve any for loops hence requires less time. I used the concept of counting the contours in the image after finding an appropriate threshold. I found the perfect threshold through trial-and-error.
I have the approach in python:
import cv2
path = 'C:/Users/Desktop/stack/contour/'
img = cv2.imread(path + 'lines.png', 0)
cv2.imshow('original Image', img)
ret, thresh = cv2.threshold(img, 80, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('thresh1', thresh)
_, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print('Number of lines:', len(contours))
cv2.waitKey(0)
cv2.destroyAllWindows()
Note:
As you can see there are no for loops involved. There is no need to count the number of pixel changes as well. Each presumed line becomes a contour. Using `len(contours) you get the number of lines present.
Using Hough Line transform would work well only If the lines are straight. Since the lines in the provided image are slanted it won't find perfect lines. This statement is more emphasized in the comments by #MarkSetchell.
Use Hough Lines Transform to detect the lines and just count the number of countours you find.
Here there's a tutorial for your problem (since you didn't specify the Language, this is in python).
OpenCV Tutorial Hough Lines

OMR: evaluate filled circle

I'm implementing OMR system for test papers. But faced with problems when determining filled circles. I've succeeded in getting these grayscale regions of interest .
The problems are:
- Binary thresholding (adaptive and fixed) and counting non zero pixels gives a lot of errors because of letters in a circles and different brightness of photos made by mobile cameras.
- Also tried technique described in this survey that uses average grayscale values of a circle do mark it filled or not, but the brightness of an image is not uniform because of different light sources when people take photos be their cameras and I got a lot of wrong results.
- People also doesn't follow rules such us filling the whole circle, algorithm also need to be robust in such cases.
Sample images
I already have about 10 GBs of samples, so may be machine learning or other statistical methods will be useful.
Does anybody know other methods to classify a circle as filled?
Since it is not a straight forward problem, it needs lot of tweaking to make it robust. But I would like suggest you a good starting point. You can play with it and try to make it work.
import numpy as np
import cv2
image_ori = cv2.imread("circle_opt.png")
lower_bound = np.array([0, 0, 0])
upper_bound = np.array([255, 255, 195])
image = image_ori
mask = cv2.inRange(image_ori, lower_bound, upper_bound)
masked_red = cv2.bitwise_and(image, image, mask=mask)
kernel = np.ones((3,3),np.uint8)
closing = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
contours = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)[0]
contours.sort(key=lambda x:cv2.boundingRect(x)[0])
print len(contours)
for c in contours:
(x,y),r = cv2.minEnclosingCircle(c)
center = (int(x),int(y))
r = int(r)
if 10 <= r <= 15:
cv2.circle(image,center,r,(0,255,0),2)
# cv2.imwrite('omr_processed.png', image_ori)
cv2.imshow("original",image_ori)
cv2.waitKey(0)
The result I got from my code on the image you shared was this
You can apply thresholds to these green circled patches and then count non-zeros to get if the circle is marked or not. You can play with lower and upper_bound to try to make the solution robust.
Hope this helps! Good luck on your problem solving :)

How to improve OCR of text written on vehicles?

I am trying to do OCR of vehicles such as trains or trucks to identify the numbers and characters written on them. (Please note this is not license plate identification OCR)
I took this image. The idea is to be able to extract the text - BN SF 721 734 written on it.
For pre-processing, I first converted this image to grayscale and then converted it to a binarized image which looks something like this
I wrote some code in tesseract.
myimg = "image.png"
image = Image.open(myimg)
with PyTessBaseAPI() as api:
api.SetImage(image)
api.Recognize()
words = api.GetUTF8Text()
print words
print api.AllWordConfidences()
This code gave me a blank output with a confidence value of 95 which means that tesseract was 95% confident that no text exists in this image.
Then I used the setrectangle api in Tesseract to restrict OCR on a particular window within the image instead of trying to do OCR on the entire image.
myimg = "image.png"
image = Image.open(myimg)
with PyTessBaseAPI() as api:
api.SetImage(image)
api.SetRectangle(665,445,75,40)
api.Recognize()
words = api.GetUTF8Text()
print words
print api.AllWordConfidences()
print "----"
The coordinates 665, 445, 75 and 40 correspond to a rectangle which contains the text BNSF 721 734 in the image.
665 - top, 445- left, 75 - width and 40 - height.
The output I got was this:
an s
m,m
My question is how do I improve the results? I played around with the values in the setrectangle function and the results varied a bit but all of them were equally bad.
Is there a way to improve this?
If you are interested in how I converted the images to binarized images, I used OpenCV
img = cv2.imread(image)
grayscale_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
(thresh, im_bw) = cv2.threshold(grayscale_img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
thresh = 127
binarized_img = cv2.threshold(grayscale_img, thresh, 255, cv2.THRESH_BINARY)[1]
I suggest finding the contours in your cropped rectangle and setting some parameters to match the contours of your characters. For example: contours with area larger or smaller then some thresholds. Then draw one by one contour on an empty bitmap and perform OCR.
I know it's seems like a lot of work, but it gives you better and more robust results.
Good luck!

Preprocessing image for Tesseract OCR with OpenCV

I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and a Threshold method for binarization, but the result is pretty bad.
Here is the the image I'm using for tests:
And here the preprocessed image:
What others filter can I use to make the image more readable for Tesseract?
I described some tips for preparing images for Tesseract here:
Using tesseract to recognize license plates
In your example, there are several things going on...
You need to get the text to be black and the rest of the image white (not the reverse). That's what character recognition is tuned on. Grayscale is ok, as long as the background is mostly full white and the text mostly full black; the edges of the text may be gray (antialiased) and that may help recognition (but not necessarily - you'll have to experiment)
One of the issues you're seeing is that in some parts of the image, the text is really "thin" (and gaps in the letters show up after thresholding), while in other parts it is really "thick" (and letters start merging). Tesseract won't like that :) It happens because the input image is not evenly lit, so a single threshold doesn't work everywhere. The solution is to do "locally adaptive thresholding" where a different threshold is calculated for each neighbordhood of the image. There are many ways of doing that, but check out for example:
Adaptive gaussian thresholding in OpenCV with cv2.adaptiveThreshold(...,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,...)
Local Otsu's method
Local adaptive histogram equalization
Another problem you have is that the lines aren't straight. In my experience Tesseract can handle a very limited degree of non-straight lines (a few percent of perspective distortion, tilt or skew), but it doesn't really work with wavy lines. If you can, make sure that the source images have straight lines :) Unfortunately, there is no simple off-the-shelf answer for this; you'd have to look into the research literature and implement one of the state of the art algorithms yourself (and open-source it if possible - there is a real need for an open source solution to this). A Google Scholar search for "curved line OCR extraction" will get you started, for example:
Text line Segmentation of Curved Document Images
Lastly: I think you would do much better to work with the python ecosystem (ndimage, skimage) than with OpenCV in C++. OpenCV python wrappers are ok for simple stuff, but for what you're trying to do they won't do the job, you will need to grab many pieces that aren't in OpenCV (of course you can mix and match). Implementing something like curved line detection in C++ will take an order of magnitude longer than in python (* this is true even if you don't know python).
Good luck!
Scanning at 300 dpi (dots per inch) is not officially a standard for OCR (optical character recognition), but it is considered the gold standard.
Converting image to Greyscale improves accuracy in reading text in general.
I have written a module that reads text in Image which in turn process the image for optimum result from OCR, Image Text Reader .
import tempfile
import cv2
import numpy as np
from PIL import Image
IMAGE_SIZE = 1800
BINARY_THREHOLD = 180
def process_image_for_ocr(file_path):
# TODO : Implement using opencv
temp_filename = set_image_dpi(file_path)
im_new = remove_noise_and_smooth(temp_filename)
return im_new
def set_image_dpi(file_path):
im = Image.open(file_path)
length_x, width_y = im.size
factor = max(1, int(IMAGE_SIZE / length_x))
size = factor * length_x, factor * width_y
# size = (1800, 1800)
im_resized = im.resize(size, Image.ANTIALIAS)
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.jpg')
temp_filename = temp_file.name
im_resized.save(temp_filename, dpi=(300, 300))
return temp_filename
def image_smoothening(img):
ret1, th1 = cv2.threshold(img, BINARY_THREHOLD, 255, cv2.THRESH_BINARY)
ret2, th2 = cv2.threshold(th1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
blur = cv2.GaussianBlur(th2, (1, 1), 0)
ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
return th3
def remove_noise_and_smooth(file_name):
img = cv2.imread(file_name, 0)
filtered = cv2.adaptiveThreshold(img.astype(np.uint8), 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 41,
3)
kernel = np.ones((1, 1), np.uint8)
opening = cv2.morphologyEx(filtered, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
img = image_smoothening(img)
or_image = cv2.bitwise_or(img, closing)
return or_image
Note: this should be a comment to Alex I answer, but it's too long so i put it as answer.
from "An Overview of the Tesseract OCR engine, by Ray Smith, Google Inc." at https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf
"Processing follows a traditional step-by-step
pipeline, but some of the stages were unusual in their
day, and possibly remain so even now. The first step is
a connected component analysis in which outlines of
the components are stored. This was a computationally
expensive design decision at the time, but had a
significant advantage: by inspection of the nesting of
outlines, and the number of child and grandchild
outlines, it is simple to detect inverse text and
recognize it as easily as black-on-white text. Tesseract
was probably the first OCR engine able to handle
white-on-black text so trivially."
So it seems it's not needed to have black text on white background, and should work the opposite too.
You can play around with the configuration of the OCR by changing the --psm and --oem values, in your case specifically I will suggest using
--psm 3
--oem 2
you can also look at the following link for further details
here
I guess you have used the generic approach for Binarization, that is the reason whole image is not binarized uniformly. You can use Adaptive Thresholding technique for binarization. You can also do some skew correction, perspective correction, noise removal for better results.
Refer to this medium article, to know about the above-mentioned techniques along with code samples.
For wavy text like yours there is this fantastic Python code on GitHub, which transforms the text to straight lines: https://github.com/tachylatus/page_dewarp.git (this is the most updated version of MZucker's original post and the mechanics are explained here:https://mzucker.github.io/2016/08/15/page-dewarping.html)

Resources