I have a rather small set of images which contains dates. The size might be a problem, but I'd say that the quality is OK. I have followed the guidelines to provide the clearest image I can to the engine. After resizing, apply filters, lots of trial and error, etc. I came up with an image that is almost properly read. I put an example below:
Now, this is read as “9 MAR 2021\n\x0c. Not bad, but the first 2 is read as ". At this point I think I'm misusing part of the power of Tesseract. After all, I know what it should expect, i.e. something as "%d %b %Y".
Is there a way to tell Tesseract that it should try to find the best match given this strong constraint? Providing this metadata to the engine should heavily facilitate the task. I have been reading the documentation, but I can't find the way to do this.
I'm using pytesseract on Tesseract 4.1. with Pytyon 3.9.
You need to know the followings:
Improving the quality of the output
When to apply - dilation or erosion
Centering the image
Now if we center the image (by adding borders):
We up-sample the image without losing any pixel.
Second, we need to make the characters in the image bold to make the OCR result accurate.
Now OCR:
29 MAR 2021
Code:
import cv2
import pytesseract
# Load the image
img = cv2.imread("xsGBK.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[0, 0, 0])
# Convert to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Dilate
gry = cv2.dilate(gry, None, iterations=1)
# OCR
print(pytesseract.image_to_string(gry))
# Display
cv2.imshow("", gry)
cv2.waitKey(0)
Related
I am totally new in image analysis and have tried alot with ImageJ or QuPath but unfortunately I can’t find a proper way into it. Here is an image example I would like to quantify:
Has anyone a recommendation which software I should use or how I can quantify those little “dots” also finding out their position?
I tried it with ImageJ, but the image quality is so bad that it does not allow thresholding…
With thresholding it seems impossible to quantify just the “little dots”…
Kind regards
you can solve it in simple way using OpenCV, or you can go further with more sophisticated approach like this one.
import cv2
import numpy as np
img = cv2.imread('img.png')
mask = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
th = 35
mask[mask<th] = 0
mask[mask>0] = 255
mask = np.stack([mask, mask, mask], axis=2)
result = np.hstack((img, mask))
cv2.namedWindow("peaks", cv2.WINDOW_NORMAL)
cv2.imshow("peaks", result)
cv2.waitKey(0)
cv2.destroyAllWindows()
I find cv2 complicated and slow. I would do something like this:
from osgeo import gdal
thresh = 100
raster = gdal.Open(r'image.jpg')
# Extract raster band (each band is a primary colour)
img = raster.GetRasterBand(1).ReadAsArray(x_start,y_start,x_end,y_end) # reading the image on a specific window, keep empty for whole image.
binary_img = img > thresh
Gdal is a great package, but has many dependencies and is a bit harder to install, especially on windows.
So I am working on a project in which it is necessary to read characters off of license plates. Given an image of (just) the license plate I'm using openCV to segment the characters and get their bounding boxes. Then the individual characters are cut out and I'd like to use Tesseract to recognize what the characters are.
Problem is: I'm getting really bad results, even though the characters seem perfectly cut out by openCV. I've included some example images below. Tesseract either fails to detect any character at all, or detects entirely wrong characters (I don't mean it confuses a 0 with an O, or 1 and l...it, detects 7, as an example, if there is a 4 clearly visible).
Is there anything I am doing wrong, or have I misunderstood the options I am setting? Help would be greatly appreciated, as I'm not seeing why Tesseract shouldn't recognize these characters.
(I'm using Tesseract OCR v4, in the LSTM mode)
You can recognize by pytesseract in two-steps
Applying adaptive-threshold
Setting page-segmentation-mode to 6
Adaptive-threshold
Here, the algorithm determines the threshold for a pixel based on a small region around it. So we get different thresholds for different regions of the same image which gives better results for images with varying illumination. source
Adaptive-threshold result below
Adaptive-threshold result below
pytesseract result below
pytesseract result below
4
9
Code:
import cv2
import pytesseract
img_lst = ["four.png", "nine.png"]
for pth in img_lst:
img = cv2.imread(pth)
img = cv2.resize(img, (28, 28))
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 47, 2)
txt = pytesseract.image_to_string(thr, config="--psm 6 digits")
print(txt)
I'm working on a project which involves determining the volume of a transparent liquid (or air if it proves easier) in a confined space.
The images I'm working with are a background image of the container without any liquid and a foreground image which may be also be empty in rare cases, but most times is partly filled with some amount of liquid.
While it may seem like a pretty straightforward smooth and threshold approach, it proves somewhat more difficult.
I'm working with a set with tons of these image pairs of background and foreground images, and I can't seem to find an approach that is robust enough to be applied to all images in the set.
My work so far involves smoothing and thresholding the image and applying closing to wrap it up.
bg_image = cv.imread("bg_image", 0)
fg_image = cv.imread("fg_image", 0)
blur_fg = cv.GaussianBlur(fg_image, (5, 5), sigmaX=0, sigmaY=0)
thresholded_image = cv.threshold(blur_fg, 186, 255, cv.THRESH_BINARY_INV)[1]
kernel = np.ones((4,2),np.uint8)
closing = cv.morphologyEx(thresholded_image, cv.MORPH_CLOSE, kernel)
The results vary, here is an example when it goes well:
In other examples, it doesn't go as well:
Aside from that, I have also tried:
Subtraction of the background and foreground images
Contrast stretching
Histogram equalization
Other thresholding techniques such as Otsu
The main issue is that the pixel intensities in air and liquid sometime overlap (and pretty low contrast in general), causing inaccurate estimations. I am leaning towards utilizing the edge that occurs between the liquid and air but I'm not really sure how..
I don't want to overflow with information here so I'm leaving it at that. I am grateful for any suggestions and can provide more information if necessary.
EDIT:
Here are some sample images to play around with.
Here is an approach whereby you calculate the mean of each column of pixels in your image, then calculate the gradient of the means:
#!/usr/bin/env python3
import cv2
import numpy as np
import matplotlib.pyplot as plt
filename = 'fg1.png'
# Load image as greyscale and calculate means of each column of pixels
im = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
means = np.mean(im, axis=0)
# Calculate the gradient of the means
y = np.gradient(means)
# Plot the gradient of the means
xdata = np.arange(0, y.shape[0])
plt.plot(xdata, y, 'bo') # blue circles
plt.title(f'Gradient of Column Means for "{filename}"')
plt.xlabel('x')
plt.ylabel('Gradient of Column Means')
plt.grid(True)
plt.show()
If you just plot the means of all columns, without taking the gradient, you get this:
I am trying to do OCR of vehicles such as trains or trucks to identify the numbers and characters written on them. (Please note this is not license plate identification OCR)
I took this image. The idea is to be able to extract the text - BN SF 721 734 written on it.
For pre-processing, I first converted this image to grayscale and then converted it to a binarized image which looks something like this
I wrote some code in tesseract.
myimg = "image.png"
image = Image.open(myimg)
with PyTessBaseAPI() as api:
api.SetImage(image)
api.Recognize()
words = api.GetUTF8Text()
print words
print api.AllWordConfidences()
This code gave me a blank output with a confidence value of 95 which means that tesseract was 95% confident that no text exists in this image.
Then I used the setrectangle api in Tesseract to restrict OCR on a particular window within the image instead of trying to do OCR on the entire image.
myimg = "image.png"
image = Image.open(myimg)
with PyTessBaseAPI() as api:
api.SetImage(image)
api.SetRectangle(665,445,75,40)
api.Recognize()
words = api.GetUTF8Text()
print words
print api.AllWordConfidences()
print "----"
The coordinates 665, 445, 75 and 40 correspond to a rectangle which contains the text BNSF 721 734 in the image.
665 - top, 445- left, 75 - width and 40 - height.
The output I got was this:
an s
m,m
My question is how do I improve the results? I played around with the values in the setrectangle function and the results varied a bit but all of them were equally bad.
Is there a way to improve this?
If you are interested in how I converted the images to binarized images, I used OpenCV
img = cv2.imread(image)
grayscale_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
(thresh, im_bw) = cv2.threshold(grayscale_img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
thresh = 127
binarized_img = cv2.threshold(grayscale_img, thresh, 255, cv2.THRESH_BINARY)[1]
I suggest finding the contours in your cropped rectangle and setting some parameters to match the contours of your characters. For example: contours with area larger or smaller then some thresholds. Then draw one by one contour on an empty bitmap and perform OCR.
I know it's seems like a lot of work, but it gives you better and more robust results.
Good luck!
I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and a Threshold method for binarization, but the result is pretty bad.
Here is the the image I'm using for tests:
And here the preprocessed image:
What others filter can I use to make the image more readable for Tesseract?
I described some tips for preparing images for Tesseract here:
Using tesseract to recognize license plates
In your example, there are several things going on...
You need to get the text to be black and the rest of the image white (not the reverse). That's what character recognition is tuned on. Grayscale is ok, as long as the background is mostly full white and the text mostly full black; the edges of the text may be gray (antialiased) and that may help recognition (but not necessarily - you'll have to experiment)
One of the issues you're seeing is that in some parts of the image, the text is really "thin" (and gaps in the letters show up after thresholding), while in other parts it is really "thick" (and letters start merging). Tesseract won't like that :) It happens because the input image is not evenly lit, so a single threshold doesn't work everywhere. The solution is to do "locally adaptive thresholding" where a different threshold is calculated for each neighbordhood of the image. There are many ways of doing that, but check out for example:
Adaptive gaussian thresholding in OpenCV with cv2.adaptiveThreshold(...,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,...)
Local Otsu's method
Local adaptive histogram equalization
Another problem you have is that the lines aren't straight. In my experience Tesseract can handle a very limited degree of non-straight lines (a few percent of perspective distortion, tilt or skew), but it doesn't really work with wavy lines. If you can, make sure that the source images have straight lines :) Unfortunately, there is no simple off-the-shelf answer for this; you'd have to look into the research literature and implement one of the state of the art algorithms yourself (and open-source it if possible - there is a real need for an open source solution to this). A Google Scholar search for "curved line OCR extraction" will get you started, for example:
Text line Segmentation of Curved Document Images
Lastly: I think you would do much better to work with the python ecosystem (ndimage, skimage) than with OpenCV in C++. OpenCV python wrappers are ok for simple stuff, but for what you're trying to do they won't do the job, you will need to grab many pieces that aren't in OpenCV (of course you can mix and match). Implementing something like curved line detection in C++ will take an order of magnitude longer than in python (* this is true even if you don't know python).
Good luck!
Scanning at 300 dpi (dots per inch) is not officially a standard for OCR (optical character recognition), but it is considered the gold standard.
Converting image to Greyscale improves accuracy in reading text in general.
I have written a module that reads text in Image which in turn process the image for optimum result from OCR, Image Text Reader .
import tempfile
import cv2
import numpy as np
from PIL import Image
IMAGE_SIZE = 1800
BINARY_THREHOLD = 180
def process_image_for_ocr(file_path):
# TODO : Implement using opencv
temp_filename = set_image_dpi(file_path)
im_new = remove_noise_and_smooth(temp_filename)
return im_new
def set_image_dpi(file_path):
im = Image.open(file_path)
length_x, width_y = im.size
factor = max(1, int(IMAGE_SIZE / length_x))
size = factor * length_x, factor * width_y
# size = (1800, 1800)
im_resized = im.resize(size, Image.ANTIALIAS)
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.jpg')
temp_filename = temp_file.name
im_resized.save(temp_filename, dpi=(300, 300))
return temp_filename
def image_smoothening(img):
ret1, th1 = cv2.threshold(img, BINARY_THREHOLD, 255, cv2.THRESH_BINARY)
ret2, th2 = cv2.threshold(th1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
blur = cv2.GaussianBlur(th2, (1, 1), 0)
ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
return th3
def remove_noise_and_smooth(file_name):
img = cv2.imread(file_name, 0)
filtered = cv2.adaptiveThreshold(img.astype(np.uint8), 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 41,
3)
kernel = np.ones((1, 1), np.uint8)
opening = cv2.morphologyEx(filtered, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
img = image_smoothening(img)
or_image = cv2.bitwise_or(img, closing)
return or_image
Note: this should be a comment to Alex I answer, but it's too long so i put it as answer.
from "An Overview of the Tesseract OCR engine, by Ray Smith, Google Inc." at https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf
"Processing follows a traditional step-by-step
pipeline, but some of the stages were unusual in their
day, and possibly remain so even now. The first step is
a connected component analysis in which outlines of
the components are stored. This was a computationally
expensive design decision at the time, but had a
significant advantage: by inspection of the nesting of
outlines, and the number of child and grandchild
outlines, it is simple to detect inverse text and
recognize it as easily as black-on-white text. Tesseract
was probably the first OCR engine able to handle
white-on-black text so trivially."
So it seems it's not needed to have black text on white background, and should work the opposite too.
You can play around with the configuration of the OCR by changing the --psm and --oem values, in your case specifically I will suggest using
--psm 3
--oem 2
you can also look at the following link for further details
here
I guess you have used the generic approach for Binarization, that is the reason whole image is not binarized uniformly. You can use Adaptive Thresholding technique for binarization. You can also do some skew correction, perspective correction, noise removal for better results.
Refer to this medium article, to know about the above-mentioned techniques along with code samples.
For wavy text like yours there is this fantastic Python code on GitHub, which transforms the text to straight lines: https://github.com/tachylatus/page_dewarp.git (this is the most updated version of MZucker's original post and the mechanics are explained here:https://mzucker.github.io/2016/08/15/page-dewarping.html)