I'm building an iOS application (take a picture and run OCR on it) using Tesseract (an OCR library) and it is working very well with well written numbers and characters (using usual fonts).
The problem I am having is that if I try it on a 7-Segment Display, it gives very very bad results.
So my question is: Does anyone know how I can approach this problem? Is there a way for Tesseract to recognize these characters?
I too had great difficulty in getting tesseract to recognize digits from images of LCD displays.
I had some marginal success by preprocessing the images with ImageMagick to overlay a copy of the image on itself with a slight vertical shift to fill in the gaps between segments:
$ composite -compose Multiply -geometry +0+3 foo.tif foo.tif foo2.png
In the end, though, my saving grace was the "Seven Segment Optical Character Recognition" binary: http://www.unix-ag.uni-kl.de/~auerswal/ssocr/
Many thanks to the author, Erik Auerswald, for this code!
I haven't tried OCRing 7-Segment Display, but I suspect that the problem might be caused by the characters not being connected components. Tesseract does not handle disconnected fonts well from my experience.
Simple erosion (image preprocessing) might help by connecting segments, but you would have to test it and play with kernel size to prevent too much distortion.
Related
I am trying to read the following captcha images with magick with no success so far. I am ok to use either magick or OpenCV to solve this catpcha.
So, far i have tried erode, gaussian blur and paint function but i am still not getting the whole word before tesseract can process the image.
I have also tried using the characterwhitelist of tesseract but i guess it needs something before it can even use that whitelist.
The best that i have reached is this image:
Command used : magick.exe c:\e793df3c-b831-11e6-88e4-544635854505.jpg -negate -morphology erode rectangle:1 -negate -threshold 25% -paint 1 c:\ofdbmf-2.jpg
Is it impossible ?
For those who are interested :
There are two ways to accomplish it :
Method #1 : If you have captcha source available
If you already have the source available, you can look out for the fonts that the source is using.
In this method, Since we have the source code, we can try to modify it to save out maximum(probably more than 10,000) CAPTCHA images along with the expected answer for each image.
You can use a simple ‘for’ loop and save all pictures with correct answer as the filename.
This will be your training data.
Then from here, split the image to each letter and reference that back to the letter from the filename, that way you will have multiple of the same letter images created in different angles and shape. You can use OpenCV Blobs here, then threshold it and then do the contour find.
One problem that you might face here is that you would have overlapping letters, for that a simple hack here is to say that if a single contour area is a lot wider than it is tall, that means we probably have two letters squished together. In that case, we can just split the conjoined letter in half down the middle and treat it as two separate letters.
Now that we have a way to extract individual letters, you can run it across all the CAPTCHA images. The goal is to collect different variations of each letter. We can save each letter in it’s own folder to keep things organized.
Finally, you can use simple convolutional neural network architecture with two convolutional layers and two fully-connected layers.
This way you will have 100% success rate in identifying the captcha letters/numbers.
Method #2 : If you don't have the source
Pretty much, you have to do a lot of work now, to start with, make sure you have the background of:
1) Python
2) Keras
3) tensorflow
4) OpenCV
If you do, then make your first step to Download as many captcha images as you can. I usually look for the Network tab in the Google Chrome developers options and then find the path to the captchas and then put that in loop to start downloading them.
Then, use the OpenCV to distill the images that you have downloaded by creating blobs, thresholding and contour defination
Finally, comes the Training part and then testing and validation.
For more info : https://mathematica.stackexchange.com/questions/143691/crack-captcha-using-deep-learning?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
I have to recognize the alphabets drawn on the screen. For this I am taking the Image of the alphabet drawn on the screen and passing it to tesseract. It's giving only 20% of results correctly. It's not recognizing all the characters correctly. How to train the tesseract to recognize the characters drawn on the screen correctly in iOS?
Thanks
Not only Tesseract, if you use any sdk you will not get the 100% accurate result. It mostly depend on the clarity of image, if your image is more clear then accuracy of output is high otherwise it is low.
Suggestion
Tesseract has very low accuracy. I really had a bad experience finding something that would recognize me text, image, etc, until I found VUFORIA.
It is really a great software base on OpenGLand well documented, but has non-ARC it is a bit difficult to modify its classes, so you need to make some search.
Also here you can check a review what does it do: Vuforia's Power
I have to OCR table from PDF document. I wrote simple Python+opencv script to get individual cells. After that new problem arose. Text is antialiased and not good-quality.
Recognition rate of tesseract is very low. I've tried to preprocess images with adaptive thresholding but results weren't much better.
I've tried trial version of ABBYY FineReader and indeed it gives fine output, but I don't want to use non-free software.
I wonder if some preprocessing would solve issue or is it nessesary to write and learn other OCR system.
If you look closely at your antialiased text samples, you'll notice that the edges contain a lot of red and blue:
This suggests that the antialiasing is taking place inside your computer, which has used subpixel rendering to optimise the results for your LCD monitor.
If so, it should be quite easy to extract the text at a higher resolution. For example, you can use ImageMagick to extract images from PDF files at 300 dpi by using a command line like the following:
convert -density 300 source.pdf output.png
You could even try loading the PDF in your favourite viewer and copying the text directly to the clipboard.
Addendum:
I tried converting your sample text back into its original pixels and applying the scaling technique mentioned in the comments. Here are the results:
Original image:
After scaling 300% and applying simple threshold:
After smart scaling and thresholding:
As you can see, some of the letters are still a bit malformed, but I think there's a better chance of reading this with Tesseract.
I'm developing an app which can recognize license plates (ANPR). The first step is to extract the licenses plates from the image. I am using OpenCV to detect the plates based on width/height ratio and this works pretty well:
But as you can see, the OCR results are pretty bad.
I am using tesseract in my Objective C (iOS) environment. These are my init variables when starting the engine:
// init the tesseract engine.
tesseract = new tesseract::TessBaseAPI();
int initRet=tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], [language UTF8String]);
tesseract->SetVariable("tessedit_char_whitelist", "BCDFGHJKLMNPQRSTVWXYZ0123456789-");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("load_system_dawg", "0");
How can I improve the results? Do I need to let OpenCV do more image manipulation? Or is there something I can improve with tesseract?
Two things will fix this completely:
Remove everything which is not text from the image. You need to use some CV to find the plate area (for example by color, etc) and then mask out all of the background. You want the input to tesseract to be black and white, where text is black and everything else is white
Remove skew (as mentioned by FrankPI above). tesseract is actually supposed to work okay with skew (see "Tesseract OCR Engine" overview by R. Smith) but on the other hand it doesn't always work, especially if you have a single line as opposed to a few paragraphs. So removing skew manually first is always good, if you can do it reliably. You will probably know the exact shape of the bounding trapezoid of the plate from step 1, so this should not be too hard. In the process of removing skew, you can also remove perspective: all license plates (usually) have the same font, and if you scale them to the same (perspective-free) shape the letter shapes would be exactly the same, that would help text recognition.
Some further pointers...
Don't try to code this at first: take a really easy to OCR (ie: from directly in front, no perspective) picture of a plate, edit it in photoshop (or gimp) and run it through tesseract on the commandline. Keep editing in different ways until this works. For example: select by color (or flood select the letter shapes), fill with black, invert selection, fill with white, perspective transform so corners of plate are a rectangle, etc. Take a bunch of pictures, some harder (maybe from odd angles, etc). Do this with all of them. Once this works completely, think about how to make a CV algorithm that does the same thing you did in photoshop :)
P.S. Also, it is better to start with higher resolution image if possible. It looks like the text in your example is around 14 pixels tall. tesseract works pretty well with 12 point text at 300 dpi, this is about 50 pixels tall, and it works much better at 600 dpi. Try to make your letter size be at least 50 preferably 100 pixels.
P.P.S. Are you doing anything to train tesseract? I think you have to do that, the font here is different enough to be a problem. You probably also need something to recognize (and not penalize) dashes which will be very common in your texts, looks like in the second example "T-" is recognized as H.
I don't know tesseract too much, but I have some information about OCR. Here we go.
In an OCR task you need to be sure that, your train data has the same font that you are trying to recognize. Or if you are trying to recognize multiple fonts, be sure that you have those fonts in your train data to get best performance.
As far as I know, tesseract applies OCR in few different ways: One, you give an image which has multiple letters in it and let tesseract do the segmentation. And other, you give segmented letters to tesseract and only expect it to recognize the letter. Maybe you can try to change the one which you are using.
If you are training recognizer by yourself be sure that you have enough and equally amount of each letter in your train data.
Hope this helps.
I've been working on an iOS app, if you need to improve the results you should train tesseract OCR, this improved 90% for me. Before tranning, OCR results were pretty bad.
So, I used this gist in the past to train tesseract ORC with a licence plate font.
If you are interested, I open-sourced this project some weeks ago on github
Here is my real world example with trying out OCR from my old power meter. I would like to use your OpenCV code so that OpenCV does automatic cropping of image, and I'll do image cleaning scripts.
First image is original image (croped power meter numbers)
Second image is slightly cleaned up image in GIMP, around 50% OCR accuracy in tesseract
Third image is completely cleaned image - 100% OCR recognized without any training!
Now License Plate can be easily recognized by mlmodel. I have created the core model you can find it here . You just need to split characters in 28*28 resolution through vision framework and send this image to VNImageRequestHandler like given below-
let handler = VNImageRequestHandler(cgImage: imageUI.cgImage!, options: [:])
you will get desired results by using my core mlmodel. Use this link for better clarification but use my model for better results in license plate recognition. I have also created the mlmodel for License Plate Recognition.
I hope someone will be able to help me.
I have pairs of black and white images that resulted from scanning texts with a large scanner (resulting files are up 500M). The texts scanned are nearly identical, and I need to check if there are any substantial differences.
Obviously I can not compare pixel by pixel since the same image scanned into a bmp will give me a slightly different result every time I scan.
Does anyone know of any library - open source or commertial - that I can buy or download, and build a .NET application around it.
Thank you in advance for your help.
Helen.
Use perceptive hashing. It checks if two images are similar.
You can also compute feature descriptor using one of the many algorithms available in open cv and just compare the vector distances. Consider images as same if the distanced is below some threshold.
You can try GIST, SURF, SIFT, etc. (Some are scale and rotation invariant also).
If you're working with text only, you could OCR both images and compare the extracted text.