Preprocessing image for Tesseract OCR with OpenCV - opencv

I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and a Threshold method for binarization, but the result is pretty bad.
Here is the the image I'm using for tests:
And here the preprocessed image:
What others filter can I use to make the image more readable for Tesseract?

I described some tips for preparing images for Tesseract here:
Using tesseract to recognize license plates
In your example, there are several things going on...
You need to get the text to be black and the rest of the image white (not the reverse). That's what character recognition is tuned on. Grayscale is ok, as long as the background is mostly full white and the text mostly full black; the edges of the text may be gray (antialiased) and that may help recognition (but not necessarily - you'll have to experiment)
One of the issues you're seeing is that in some parts of the image, the text is really "thin" (and gaps in the letters show up after thresholding), while in other parts it is really "thick" (and letters start merging). Tesseract won't like that :) It happens because the input image is not evenly lit, so a single threshold doesn't work everywhere. The solution is to do "locally adaptive thresholding" where a different threshold is calculated for each neighbordhood of the image. There are many ways of doing that, but check out for example:
Adaptive gaussian thresholding in OpenCV with cv2.adaptiveThreshold(...,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,...)
Local Otsu's method
Local adaptive histogram equalization
Another problem you have is that the lines aren't straight. In my experience Tesseract can handle a very limited degree of non-straight lines (a few percent of perspective distortion, tilt or skew), but it doesn't really work with wavy lines. If you can, make sure that the source images have straight lines :) Unfortunately, there is no simple off-the-shelf answer for this; you'd have to look into the research literature and implement one of the state of the art algorithms yourself (and open-source it if possible - there is a real need for an open source solution to this). A Google Scholar search for "curved line OCR extraction" will get you started, for example:
Text line Segmentation of Curved Document Images
Lastly: I think you would do much better to work with the python ecosystem (ndimage, skimage) than with OpenCV in C++. OpenCV python wrappers are ok for simple stuff, but for what you're trying to do they won't do the job, you will need to grab many pieces that aren't in OpenCV (of course you can mix and match). Implementing something like curved line detection in C++ will take an order of magnitude longer than in python (* this is true even if you don't know python).
Good luck!

Scanning at 300 dpi (dots per inch) is not officially a standard for OCR (optical character recognition), but it is considered the gold standard.
Converting image to Greyscale improves accuracy in reading text in general.
I have written a module that reads text in Image which in turn process the image for optimum result from OCR, Image Text Reader .
import tempfile
import cv2
import numpy as np
from PIL import Image
IMAGE_SIZE = 1800
BINARY_THREHOLD = 180
def process_image_for_ocr(file_path):
# TODO : Implement using opencv
temp_filename = set_image_dpi(file_path)
im_new = remove_noise_and_smooth(temp_filename)
return im_new
def set_image_dpi(file_path):
im = Image.open(file_path)
length_x, width_y = im.size
factor = max(1, int(IMAGE_SIZE / length_x))
size = factor * length_x, factor * width_y
# size = (1800, 1800)
im_resized = im.resize(size, Image.ANTIALIAS)
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.jpg')
temp_filename = temp_file.name
im_resized.save(temp_filename, dpi=(300, 300))
return temp_filename
def image_smoothening(img):
ret1, th1 = cv2.threshold(img, BINARY_THREHOLD, 255, cv2.THRESH_BINARY)
ret2, th2 = cv2.threshold(th1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
blur = cv2.GaussianBlur(th2, (1, 1), 0)
ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
return th3
def remove_noise_and_smooth(file_name):
img = cv2.imread(file_name, 0)
filtered = cv2.adaptiveThreshold(img.astype(np.uint8), 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 41,
3)
kernel = np.ones((1, 1), np.uint8)
opening = cv2.morphologyEx(filtered, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
img = image_smoothening(img)
or_image = cv2.bitwise_or(img, closing)
return or_image

Note: this should be a comment to Alex I answer, but it's too long so i put it as answer.
from "An Overview of the Tesseract OCR engine, by Ray Smith, Google Inc." at https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf
"Processing follows a traditional step-by-step
pipeline, but some of the stages were unusual in their
day, and possibly remain so even now. The first step is
a connected component analysis in which outlines of
the components are stored. This was a computationally
expensive design decision at the time, but had a
significant advantage: by inspection of the nesting of
outlines, and the number of child and grandchild
outlines, it is simple to detect inverse text and
recognize it as easily as black-on-white text. Tesseract
was probably the first OCR engine able to handle
white-on-black text so trivially."
So it seems it's not needed to have black text on white background, and should work the opposite too.

You can play around with the configuration of the OCR by changing the --psm and --oem values, in your case specifically I will suggest using
--psm 3
--oem 2
you can also look at the following link for further details
here

I guess you have used the generic approach for Binarization, that is the reason whole image is not binarized uniformly. You can use Adaptive Thresholding technique for binarization. You can also do some skew correction, perspective correction, noise removal for better results.
Refer to this medium article, to know about the above-mentioned techniques along with code samples.

For wavy text like yours there is this fantastic Python code on GitHub, which transforms the text to straight lines: https://github.com/tachylatus/page_dewarp.git (this is the most updated version of MZucker's original post and the mechanics are explained here:https://mzucker.github.io/2016/08/15/page-dewarping.html)

Related

Robust estimation of volume of transparent liquid using image processing

I'm working on a project which involves determining the volume of a transparent liquid (or air if it proves easier) in a confined space.
The images I'm working with are a background image of the container without any liquid and a foreground image which may be also be empty in rare cases, but most times is partly filled with some amount of liquid.
While it may seem like a pretty straightforward smooth and threshold approach, it proves somewhat more difficult.
I'm working with a set with tons of these image pairs of background and foreground images, and I can't seem to find an approach that is robust enough to be applied to all images in the set.
My work so far involves smoothing and thresholding the image and applying closing to wrap it up.
bg_image = cv.imread("bg_image", 0)
fg_image = cv.imread("fg_image", 0)
blur_fg = cv.GaussianBlur(fg_image, (5, 5), sigmaX=0, sigmaY=0)
thresholded_image = cv.threshold(blur_fg, 186, 255, cv.THRESH_BINARY_INV)[1]
kernel = np.ones((4,2),np.uint8)
closing = cv.morphologyEx(thresholded_image, cv.MORPH_CLOSE, kernel)
The results vary, here is an example when it goes well:
In other examples, it doesn't go as well:
Aside from that, I have also tried:
Subtraction of the background and foreground images
Contrast stretching
Histogram equalization
Other thresholding techniques such as Otsu
The main issue is that the pixel intensities in air and liquid sometime overlap (and pretty low contrast in general), causing inaccurate estimations. I am leaning towards utilizing the edge that occurs between the liquid and air but I'm not really sure how..
I don't want to overflow with information here so I'm leaving it at that. I am grateful for any suggestions and can provide more information if necessary.
EDIT:
Here are some sample images to play around with.
Here is an approach whereby you calculate the mean of each column of pixels in your image, then calculate the gradient of the means:
#!/usr/bin/env python3
import cv2
import numpy as np
import matplotlib.pyplot as plt
filename = 'fg1.png'
# Load image as greyscale and calculate means of each column of pixels
im = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
means = np.mean(im, axis=0)
# Calculate the gradient of the means
y = np.gradient(means)
# Plot the gradient of the means
xdata = np.arange(0, y.shape[0])
plt.plot(xdata, y, 'bo') # blue circles
plt.title(f'Gradient of Column Means for "{filename}"')
plt.xlabel('x')
plt.ylabel('Gradient of Column Means')
plt.grid(True)
plt.show()
If you just plot the means of all columns, without taking the gradient, you get this:

Count lines in image

I am planning to use Opencv on Raspberry pi 3 with camera to count lines in the following image
It will be used in machine which produce threads. If one (or more) will be lost it will stop the machine.
Now i am wondering how to do that...?
I will do a loop to capture the images
I will crop the images to see only the part with lines
I will convert it to black&white
how to count them ? in a loop - check pixel value change? Or there is a better/faster idea?
Thank you for advice!
EDIT
P.S.
I used cv2.findContours (answer from Jeru Luke).
I've putted an A4 sheet with black lines in front of camera. I works ok in while loop BUT... i have 43 lines on sheet. When camera detects some differences i wrote the results to file. Sometimes i have 710,800,67 etc.
Pls look on file with values i have https://www.dropbox.com/s/jnn4w8mq3rrtppo/bledy.txt?dl=0
lines....The error is permanet for some few secounds. Tere is noting wronge when i have 43,43,43,43,44,43,43,43 (the one is wrong) because i watch few values before i put error. But when the are hundreds of bad values i have no idea...
I have something relatively simpler. It does not involve any for loops hence requires less time. I used the concept of counting the contours in the image after finding an appropriate threshold. I found the perfect threshold through trial-and-error.
I have the approach in python:
import cv2
path = 'C:/Users/Desktop/stack/contour/'
img = cv2.imread(path + 'lines.png', 0)
cv2.imshow('original Image', img)
ret, thresh = cv2.threshold(img, 80, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('thresh1', thresh)
_, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print('Number of lines:', len(contours))
cv2.waitKey(0)
cv2.destroyAllWindows()
Note:
As you can see there are no for loops involved. There is no need to count the number of pixel changes as well. Each presumed line becomes a contour. Using `len(contours) you get the number of lines present.
Using Hough Line transform would work well only If the lines are straight. Since the lines in the provided image are slanted it won't find perfect lines. This statement is more emphasized in the comments by #MarkSetchell.
Use Hough Lines Transform to detect the lines and just count the number of countours you find.
Here there's a tutorial for your problem (since you didn't specify the Language, this is in python).
OpenCV Tutorial Hough Lines

OMR: evaluate filled circle

I'm implementing OMR system for test papers. But faced with problems when determining filled circles. I've succeeded in getting these grayscale regions of interest .
The problems are:
- Binary thresholding (adaptive and fixed) and counting non zero pixels gives a lot of errors because of letters in a circles and different brightness of photos made by mobile cameras.
- Also tried technique described in this survey that uses average grayscale values of a circle do mark it filled or not, but the brightness of an image is not uniform because of different light sources when people take photos be their cameras and I got a lot of wrong results.
- People also doesn't follow rules such us filling the whole circle, algorithm also need to be robust in such cases.
Sample images
I already have about 10 GBs of samples, so may be machine learning or other statistical methods will be useful.
Does anybody know other methods to classify a circle as filled?
Since it is not a straight forward problem, it needs lot of tweaking to make it robust. But I would like suggest you a good starting point. You can play with it and try to make it work.
import numpy as np
import cv2
image_ori = cv2.imread("circle_opt.png")
lower_bound = np.array([0, 0, 0])
upper_bound = np.array([255, 255, 195])
image = image_ori
mask = cv2.inRange(image_ori, lower_bound, upper_bound)
masked_red = cv2.bitwise_and(image, image, mask=mask)
kernel = np.ones((3,3),np.uint8)
closing = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
contours = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)[0]
contours.sort(key=lambda x:cv2.boundingRect(x)[0])
print len(contours)
for c in contours:
(x,y),r = cv2.minEnclosingCircle(c)
center = (int(x),int(y))
r = int(r)
if 10 <= r <= 15:
cv2.circle(image,center,r,(0,255,0),2)
# cv2.imwrite('omr_processed.png', image_ori)
cv2.imshow("original",image_ori)
cv2.waitKey(0)
The result I got from my code on the image you shared was this
You can apply thresholds to these green circled patches and then count non-zeros to get if the circle is marked or not. You can play with lower and upper_bound to try to make the solution robust.
Hope this helps! Good luck on your problem solving :)

How to use OpenCV to read a traditional thermometer

I'm new to OpenCV looking for direction on best approach to reading a traditional thermometer using computer vision. Any guidance, general approach, sample code? Thanks for any consideration on this very broad question.
So I guess more specifically how do you narrow your contours to your area of interest, such as just having bounding boxes around just the numbers for instance in the the attached image. Thanks for any consideration. [1]: http://eofdreams.com/photo/thermometer/05/ "thermometer"
import cv2
import numpy as np
img = cv2.imread('thermometer.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
contours,hierarchy = cv2.findContours(edges,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
keys = [i for i in range(48,58)]
#cnts = sorted(contours, key = cv2.contourArea, reverse = True)[:10]
for cnt in contours:
#if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
cv2.rectangle(img,(x,y),(x+w,y+h),(0,0,255),2)
roi = img[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
cv2.imshow('norm',img)
key = cv2.waitKey(0)
cv2.imwrite('houghlines3.jpg',edges)
Yeah, that is pretty general. I don't know what computer vision is, but I'm guessing it's some software that looks at said thermometer.
So first of all think in terms of what this software can understand. I'm going to guess that it can pick up on a color change. So you should be able to know when the colors go from red to white (or whatever it is when the thermometer is not red). The program may or may not be smart enough to read the numbers indicating the temperature along the thermometer (I'm assuming this is a vertical thermometer). If the #s are written on glass or a curved, the software probably won't be able to read it. However if it is black letters on a flat white background you may be in luck. Can you then find the closest # to where the red transitions to white? If not you may need to calibrate ahead of time the temperature that is associated with varying heights. In that case you will essentially be ignoring the written #s and be hardcoding them into your program.
Good luck!
Assuming its a static image you can calculate a scale of x pixels = y degrees. Or, well approximately... You can detect the high point of the mercury with simple colour detection, convert image to hsv, filter out with in range to leave just the red, then find the smallest y val and check against your scale.
Finding the Assuming a constant linear scale and each x amount of pixels in the y-axis is equal to y degrees will be easier than trying to detect the digits and read them. Though, to answer your question, as your digits are constant, I'd recommend cropping around the known positions of the numbers and template matching, but that's still pointless as the numbers won't change, so you can just hardcode the positions!
or, if it's a real world scenario, either use a digital thermometer and detect the lcd digits with template matching, or connect a thermometer to the computer.

How to detect simple geometric shapes using OpenCV

I have this project where I need (on iOS) to detect simple geometric shapes inside an image.
After searching the internet I have concluded that the best tool for this is OpenCV. The thing is that up until two hours ago I had no idea what OpenCV is and I have never even remotely did anything involving image processing. My main experience is JS/HTML,C#,SQL,Objective-C...
Where do I start with this?
I have found this answer that I was able to digest and by reading already other stuff, I understand that OpenCV should return an Array of shapes with the points/corners, is this true? Also how will it represent a circle or a half circle?
Also what about the shape orientation?
Do you know of any Demo iOS project that can demonstrate a similar functionality?
If you have only these regular shapes, there is a simple procedure as follows :
Find Contours in the image ( image should be binary as given in your question)
Approximate each contour using approxPolyDP function.
First, check number of elements in the approximated contours of all the shapes. It is to recognize the shape. For eg, square will have 4, pentagon will have 5. Circles will have more, i don't know, so we find it. ( I got 16 for circle and 9 for half-circle.)
Now assign the color, run the code for your test image, check its number, fill it with corresponding colors.
Below is my example in Python:
import numpy as np
import cv2
img = cv2.imread('shapes.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,1)
contours,h = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.01*cv2.arcLength(cnt,True),True)
print len(approx)
if len(approx)==5:
print "pentagon"
cv2.drawContours(img,[cnt],0,255,-1)
elif len(approx)==3:
print "triangle"
cv2.drawContours(img,[cnt],0,(0,255,0),-1)
elif len(approx)==4:
print "square"
cv2.drawContours(img,[cnt],0,(0,0,255),-1)
elif len(approx) == 9:
print "half-circle"
cv2.drawContours(img,[cnt],0,(255,255,0),-1)
elif len(approx) > 15:
print "circle"
cv2.drawContours(img,[cnt],0,(0,255,255),-1)
cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Below is the output:
Remember, it works only for regular shapes.
Alternatively to find circles, you can use houghcircles. You can find a tutorial here.
Regarding iOS, OpenCV devs are developing some iOS samples this summer, So visit their site : www.code.opencv.org and contact them.
You can find slides of their tutorial here : http://code.opencv.org/svn/gsoc2012/ios/trunk/doc/CVPR2012_OpenCV4IOS_Tutorial.pdf
The answer depends on the presence of other shapes, level of noise if any and invariance you want to provide for (e.g. rotation, scaling, etc). These requirements will define not only the algorithm but also required pre-procesing stages to extract features.
Template matching that was suggested above works well when shapes aren't rotated or scaled and when there are no similar shapes around; in other words, it finds a best translation in the image where template is located:
double minVal, maxVal;
Point minLoc, maxLoc;
Mat image, template, result; // template is your shape
matchTemplate(image, template, result, CV_TM_CCOEFF_NORMED);
minMaxLoc(result, &minVal, &maxVal, &minLoc, &maxLoc); // maxLoc is answer
Geometric hashing is a good method to get invariance in terms of rotation and scaling; this method would require extraction of some contour points.
Generalized Hough transform can take care of invariance, noise and would have minimal pre-processing but it is a bit harder to implement than other methods. OpenCV has such transforms for lines and circles.
In the case when number of shapes is limited calculating moments or counting convex hull vertices may be the easiest solution: openCV structural analysis
You can also use template matching to detect shapes inside an image.

Resources