I am trying to do OCR of vehicles such as trains or trucks to identify the numbers and characters written on them. (Please note this is not license plate identification OCR)
I took this image. The idea is to be able to extract the text - BN SF 721 734 written on it.
For pre-processing, I first converted this image to grayscale and then converted it to a binarized image which looks something like this
I wrote some code in tesseract.
myimg = "image.png"
image = Image.open(myimg)
with PyTessBaseAPI() as api:
api.SetImage(image)
api.Recognize()
words = api.GetUTF8Text()
print words
print api.AllWordConfidences()
This code gave me a blank output with a confidence value of 95 which means that tesseract was 95% confident that no text exists in this image.
Then I used the setrectangle api in Tesseract to restrict OCR on a particular window within the image instead of trying to do OCR on the entire image.
myimg = "image.png"
image = Image.open(myimg)
with PyTessBaseAPI() as api:
api.SetImage(image)
api.SetRectangle(665,445,75,40)
api.Recognize()
words = api.GetUTF8Text()
print words
print api.AllWordConfidences()
print "----"
The coordinates 665, 445, 75 and 40 correspond to a rectangle which contains the text BNSF 721 734 in the image.
665 - top, 445- left, 75 - width and 40 - height.
The output I got was this:
an s
m,m
My question is how do I improve the results? I played around with the values in the setrectangle function and the results varied a bit but all of them were equally bad.
Is there a way to improve this?
If you are interested in how I converted the images to binarized images, I used OpenCV
img = cv2.imread(image)
grayscale_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
(thresh, im_bw) = cv2.threshold(grayscale_img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
thresh = 127
binarized_img = cv2.threshold(grayscale_img, thresh, 255, cv2.THRESH_BINARY)[1]
I suggest finding the contours in your cropped rectangle and setting some parameters to match the contours of your characters. For example: contours with area larger or smaller then some thresholds. Then draw one by one contour on an empty bitmap and perform OCR.
I know it's seems like a lot of work, but it gives you better and more robust results.
Good luck!
Related
So I am working on a project in which it is necessary to read characters off of license plates. Given an image of (just) the license plate I'm using openCV to segment the characters and get their bounding boxes. Then the individual characters are cut out and I'd like to use Tesseract to recognize what the characters are.
Problem is: I'm getting really bad results, even though the characters seem perfectly cut out by openCV. I've included some example images below. Tesseract either fails to detect any character at all, or detects entirely wrong characters (I don't mean it confuses a 0 with an O, or 1 and l...it, detects 7, as an example, if there is a 4 clearly visible).
Is there anything I am doing wrong, or have I misunderstood the options I am setting? Help would be greatly appreciated, as I'm not seeing why Tesseract shouldn't recognize these characters.
(I'm using Tesseract OCR v4, in the LSTM mode)
You can recognize by pytesseract in two-steps
Applying adaptive-threshold
Setting page-segmentation-mode to 6
Adaptive-threshold
Here, the algorithm determines the threshold for a pixel based on a small region around it. So we get different thresholds for different regions of the same image which gives better results for images with varying illumination. source
Adaptive-threshold result below
Adaptive-threshold result below
pytesseract result below
pytesseract result below
4
9
Code:
import cv2
import pytesseract
img_lst = ["four.png", "nine.png"]
for pth in img_lst:
img = cv2.imread(pth)
img = cv2.resize(img, (28, 28))
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 47, 2)
txt = pytesseract.image_to_string(thr, config="--psm 6 digits")
print(txt)
I am planning to use Opencv on Raspberry pi 3 with camera to count lines in the following image
It will be used in machine which produce threads. If one (or more) will be lost it will stop the machine.
Now i am wondering how to do that...?
I will do a loop to capture the images
I will crop the images to see only the part with lines
I will convert it to black&white
how to count them ? in a loop - check pixel value change? Or there is a better/faster idea?
Thank you for advice!
EDIT
P.S.
I used cv2.findContours (answer from Jeru Luke).
I've putted an A4 sheet with black lines in front of camera. I works ok in while loop BUT... i have 43 lines on sheet. When camera detects some differences i wrote the results to file. Sometimes i have 710,800,67 etc.
Pls look on file with values i have https://www.dropbox.com/s/jnn4w8mq3rrtppo/bledy.txt?dl=0
lines....The error is permanet for some few secounds. Tere is noting wronge when i have 43,43,43,43,44,43,43,43 (the one is wrong) because i watch few values before i put error. But when the are hundreds of bad values i have no idea...
I have something relatively simpler. It does not involve any for loops hence requires less time. I used the concept of counting the contours in the image after finding an appropriate threshold. I found the perfect threshold through trial-and-error.
I have the approach in python:
import cv2
path = 'C:/Users/Desktop/stack/contour/'
img = cv2.imread(path + 'lines.png', 0)
cv2.imshow('original Image', img)
ret, thresh = cv2.threshold(img, 80, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('thresh1', thresh)
_, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print('Number of lines:', len(contours))
cv2.waitKey(0)
cv2.destroyAllWindows()
Note:
As you can see there are no for loops involved. There is no need to count the number of pixel changes as well. Each presumed line becomes a contour. Using `len(contours) you get the number of lines present.
Using Hough Line transform would work well only If the lines are straight. Since the lines in the provided image are slanted it won't find perfect lines. This statement is more emphasized in the comments by #MarkSetchell.
Use Hough Lines Transform to detect the lines and just count the number of countours you find.
Here there's a tutorial for your problem (since you didn't specify the Language, this is in python).
OpenCV Tutorial Hough Lines
I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and a Threshold method for binarization, but the result is pretty bad.
Here is the the image I'm using for tests:
And here the preprocessed image:
What others filter can I use to make the image more readable for Tesseract?
I described some tips for preparing images for Tesseract here:
Using tesseract to recognize license plates
In your example, there are several things going on...
You need to get the text to be black and the rest of the image white (not the reverse). That's what character recognition is tuned on. Grayscale is ok, as long as the background is mostly full white and the text mostly full black; the edges of the text may be gray (antialiased) and that may help recognition (but not necessarily - you'll have to experiment)
One of the issues you're seeing is that in some parts of the image, the text is really "thin" (and gaps in the letters show up after thresholding), while in other parts it is really "thick" (and letters start merging). Tesseract won't like that :) It happens because the input image is not evenly lit, so a single threshold doesn't work everywhere. The solution is to do "locally adaptive thresholding" where a different threshold is calculated for each neighbordhood of the image. There are many ways of doing that, but check out for example:
Adaptive gaussian thresholding in OpenCV with cv2.adaptiveThreshold(...,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,...)
Local Otsu's method
Local adaptive histogram equalization
Another problem you have is that the lines aren't straight. In my experience Tesseract can handle a very limited degree of non-straight lines (a few percent of perspective distortion, tilt or skew), but it doesn't really work with wavy lines. If you can, make sure that the source images have straight lines :) Unfortunately, there is no simple off-the-shelf answer for this; you'd have to look into the research literature and implement one of the state of the art algorithms yourself (and open-source it if possible - there is a real need for an open source solution to this). A Google Scholar search for "curved line OCR extraction" will get you started, for example:
Text line Segmentation of Curved Document Images
Lastly: I think you would do much better to work with the python ecosystem (ndimage, skimage) than with OpenCV in C++. OpenCV python wrappers are ok for simple stuff, but for what you're trying to do they won't do the job, you will need to grab many pieces that aren't in OpenCV (of course you can mix and match). Implementing something like curved line detection in C++ will take an order of magnitude longer than in python (* this is true even if you don't know python).
Good luck!
Scanning at 300 dpi (dots per inch) is not officially a standard for OCR (optical character recognition), but it is considered the gold standard.
Converting image to Greyscale improves accuracy in reading text in general.
I have written a module that reads text in Image which in turn process the image for optimum result from OCR, Image Text Reader .
import tempfile
import cv2
import numpy as np
from PIL import Image
IMAGE_SIZE = 1800
BINARY_THREHOLD = 180
def process_image_for_ocr(file_path):
# TODO : Implement using opencv
temp_filename = set_image_dpi(file_path)
im_new = remove_noise_and_smooth(temp_filename)
return im_new
def set_image_dpi(file_path):
im = Image.open(file_path)
length_x, width_y = im.size
factor = max(1, int(IMAGE_SIZE / length_x))
size = factor * length_x, factor * width_y
# size = (1800, 1800)
im_resized = im.resize(size, Image.ANTIALIAS)
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.jpg')
temp_filename = temp_file.name
im_resized.save(temp_filename, dpi=(300, 300))
return temp_filename
def image_smoothening(img):
ret1, th1 = cv2.threshold(img, BINARY_THREHOLD, 255, cv2.THRESH_BINARY)
ret2, th2 = cv2.threshold(th1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
blur = cv2.GaussianBlur(th2, (1, 1), 0)
ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
return th3
def remove_noise_and_smooth(file_name):
img = cv2.imread(file_name, 0)
filtered = cv2.adaptiveThreshold(img.astype(np.uint8), 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 41,
3)
kernel = np.ones((1, 1), np.uint8)
opening = cv2.morphologyEx(filtered, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
img = image_smoothening(img)
or_image = cv2.bitwise_or(img, closing)
return or_image
Note: this should be a comment to Alex I answer, but it's too long so i put it as answer.
from "An Overview of the Tesseract OCR engine, by Ray Smith, Google Inc." at https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf
"Processing follows a traditional step-by-step
pipeline, but some of the stages were unusual in their
day, and possibly remain so even now. The first step is
a connected component analysis in which outlines of
the components are stored. This was a computationally
expensive design decision at the time, but had a
significant advantage: by inspection of the nesting of
outlines, and the number of child and grandchild
outlines, it is simple to detect inverse text and
recognize it as easily as black-on-white text. Tesseract
was probably the first OCR engine able to handle
white-on-black text so trivially."
So it seems it's not needed to have black text on white background, and should work the opposite too.
You can play around with the configuration of the OCR by changing the --psm and --oem values, in your case specifically I will suggest using
--psm 3
--oem 2
you can also look at the following link for further details
here
I guess you have used the generic approach for Binarization, that is the reason whole image is not binarized uniformly. You can use Adaptive Thresholding technique for binarization. You can also do some skew correction, perspective correction, noise removal for better results.
Refer to this medium article, to know about the above-mentioned techniques along with code samples.
For wavy text like yours there is this fantastic Python code on GitHub, which transforms the text to straight lines: https://github.com/tachylatus/page_dewarp.git (this is the most updated version of MZucker's original post and the mechanics are explained here:https://mzucker.github.io/2016/08/15/page-dewarping.html)
It is possible obtain only 5 objects (one per sign) by applying find_contour (opencv module) in this image: https://docs.google.com/file/d/0ByS6Z5WRz-h2WHEzNnJucDlRR2s/edit ?
Now I obtain 64 objects
After that I want to retrieve Humoments and make a comparison with other images.
For now i'd try only with the same image a little bit translated, for testing it returns they are the same.
My question I how can I obtain only 5 objects for applying humoments or if there are other solutions to calculate humoments fot the image?
import cv2
im = cv2.imread('Sassatelli 1984 n. 165 mod1.jpg')
imgray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(imgray, (0,0), 5)
cv2.imshow('Blur', blur)
cv2.waitKey()
th = 20
edges = cv2.Canny(blur, th, th*3)
cv2.imshow('canny',edges)
cv2.waitKey()
contours, hierarchy = cv2.findContours(edges, cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
print('objects found')
print(len(contours))
cnt = contours[0]
cv2.drawContours(blur,contours,-1,(0,255,0),3)
cv2.imshow('draw contours',blur)
cv2.waitKey()
moments = cv2.moments(cnt)
Case 1: Problem with saving image in jpg format
When you save a black-and-white-only (ie pixel values 0 and 255 only) image in jpg format, there is lossy compression, which changes the pixel values. If you want to see it, create such an image, save it in jpg, open the saved image and zoom to black-white edge. You can see a pixel value change.
So when you find contours, you expect there is only white objects, but in reality, there is some mid-values also, which is also considered as contours. It increases number of contours.
So to avoid this problem,
Better save images in png or any other lossless format etc.
Apply a threshold, (with a values of 127 or as you like) to make image real binary one before finding contours.
This is much more explained here : What does result of 'list(contour)' denote?
Case 2: Problem with white background
OpenCV findcontours() is designed to find white objects in black background. So if your background is white, it is also treated as one object. So invert the image before finding contours.
Case 3 : Problem with holes in objects
If you have holes in your object, it is also considered as an object. So if you want only external boundary of the objects, use cv2.RETR_EXTERNAL flag for findcontours() function.
Sample Code:
import cv2
import numpy as np
img = cv2.imread('sof.jpg')
gray = cv2.imread('sof.jpg',0)
ret,thresh = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV)
thresholded and inverted image :
Now find the contours, draw it, check the number of contours:
cv2.drawContours(img,contours,-1,(0,255,0),2)
cv2.imshow('img',img),cv2.waitKey(0),cv2.destroyAllWindows()
Result :
NOTE :
Here, I have taken only external contours. If you want to remove internal holes from these objects, you will need to use cv2.RETR_TREE or cv2.RETR_CCOMP flags, and check their hierarchy, and remove them. It is explained in this link : Contours 5 : Hierarchy
I'm trying to use OpenCV to "parse" screenshots from the iPhone game Blocked. The screenshots are cropped to look like this:
I suppose for right now I'm just trying to find the coordinates of each of the 4 points that make up each rectangle. I did see the sample file squares.c that comes with OpenCV, but when I run that algorithm on this picture, it comes up with 72 rectangles, including the rectangular areas of whitespace that I obviously don't want to count as one of my rectangles. What is a better way to approach this? I tried doing some Google research, but for all of the search results, there is very little relevant usable information.
The similar issue has already been discussed:
How to recognize rectangles in this image?
As for your data, rectangles you are trying to find are the only black objects. So you can try to do a threshold binarization: black pixels are those ones which have ALL three RGB values less than 40 (I've found it empirically). This simple operation makes your picture look like this:
After that you could apply Hough transform to find lines (discussed in the topic I referred to), or you can do it easier. Compute integral projections of the black pixels to X and Y axes. (The projection to X is a vector of x_i - numbers of black pixels such that it has the first coordinate equal to x_i). So, you get possible x and y values as the peaks of the projections. Then look through all the possible segments restricted by the found x and y (if there are a lot of black pixels between (x_i, y_j) and (x_i, y_k), there probably is a line probably). Finally, compose line segments to rectangles!
Here's a complete Python solution. The main idea is:
Apply pyramid mean shift filtering to help threshold accuracy
Otsu's threshold to get a binary image
Find contours and filter using contour approximation
Here's a visualization of each detected rectangle contour
Results
import cv2
image = cv2.imread('1.png')
blur = cv2.pyrMeanShiftFiltering(image, 11, 21)
gray = cv2.cvtColor(blur, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.015 * peri, True)
if len(approx) == 4:
x,y,w,h = cv2.boundingRect(approx)
cv2.rectangle(image,(x,y),(x+w,y+h),(36,255,12),2)
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.waitKey()
I wound up just building on my original method and doing as Robert suggested in his comment on my question. After I get my list of rectangles, I then run through and calculate the average color over each rectangle. I check to see if the red, green, and blue components of the average color are each within 10% of the gray and blue rectangle colors, and if they are I save the rectangle, if they aren't I discard it. This process gives me something like this:
From this, it's trivial to get the information I need (orientation, starting point, and length of each rectangle, considering the game window as a 6x6 grid).
The blocks look like bitmaps - why don't you use simple template matching with different templates for each block size/color/orientation?
Since your problem is the small rectangles I would start by removing them.
Since those lines are much thinner than the borders of the rectangles I would start by applying morphological operations on the image.
Using a structural element that looks like this:
element = [ 1 1
1 1 ]
should remove lines that are less than two pixels wide. After the small lines are removed the rectangle finding algorithm of OpenCV will most likely do the rest of the job for you.
The erosion can be done in OpenCV by the function cvErode
Try one of the many corner detectors like harris corner detector. also it is in general a good idea to try that at multiple resolutions : so do some preprocessing of of varying magnification.
It appears that you want some sort of color dominated square then you can suppress the other colors, by first using something like cvsplit .....and then thresholding the color...so only that region remains....follow that with a cropping operation ...I think that could work as well ....