Preprocess image in GPUImage before ocr - ios

I want to Pre Process this image in Apple Ios,
which kind of filters we can apply for this kind of images.
I want to remove double quotes characters before and after numbers numbers as well as last character as i marked in boxes.
I have tried whitelisting
tessrect.charwhitelist="0123456789";
and
tessrect.blacklist="\":";
I have used GPUImage lib for preprocessing.

Related

Template matching for colored image input

I have a working code for template matching. But it only works if the input image is converted into grayscale. Is it possible to do template matching considering the template color as well that needs to be found in the given image?
inputImg = cv2.imread("location")
template = cv2.imread("location")
Yes, you can do it but why?
The idea of converting to the grey-scale is to apply the selected edge-detection algorithm to find the features of the input image.
Since you are working the features the possibility of finding the template image in the original image will be higher. As a result, converting to grey-scale has two advantages. Accuracy and computational complexity.
The matchTemplate method is also working for the RGB images. Now you need to find the image characteristic for 3 different channels. Yet you are not sure about whether your features robust or not, since most edge-detection algorithms are designed for the grey-scale images.

Tesseract / OCR / OpenCV : Need to read captcha

I am trying to read the following captcha images with magick with no success so far. I am ok to use either magick or OpenCV to solve this catpcha.
So, far i have tried erode, gaussian blur and paint function but i am still not getting the whole word before tesseract can process the image.
I have also tried using the characterwhitelist of tesseract but i guess it needs something before it can even use that whitelist.
The best that i have reached is this image:
Command used : magick.exe c:\e793df3c-b831-11e6-88e4-544635854505.jpg -negate -morphology erode rectangle:1 -negate -threshold 25% -paint 1 c:\ofdbmf-2.jpg
Is it impossible ?
For those who are interested :
There are two ways to accomplish it :
Method #1 : If you have captcha source available
If you already have the source available, you can look out for the fonts that the source is using.
In this method, Since we have the source code, we can try to modify it to save out maximum(probably more than 10,000) CAPTCHA images along with the expected answer for each image.
You can use a simple ‘for’ loop and save all pictures with correct answer as the filename.
This will be your training data.
Then from here, split the image to each letter and reference that back to the letter from the filename, that way you will have multiple of the same letter images created in different angles and shape. You can use OpenCV Blobs here, then threshold it and then do the contour find.
One problem that you might face here is that you would have overlapping letters, for that a simple hack here is to say that if a single contour area is a lot wider than it is tall, that means we probably have two letters squished together. In that case, we can just split the conjoined letter in half down the middle and treat it as two separate letters.
Now that we have a way to extract individual letters, you can run it across all the CAPTCHA images. The goal is to collect different variations of each letter. We can save each letter in it’s own folder to keep things organized.
Finally, you can use simple convolutional neural network architecture with two convolutional layers and two fully-connected layers.
This way you will have 100% success rate in identifying the captcha letters/numbers.
Method #2 : If you don't have the source
Pretty much, you have to do a lot of work now, to start with, make sure you have the background of:
1) Python
2) Keras
3) tensorflow
4) OpenCV
If you do, then make your first step to Download as many captcha images as you can. I usually look for the Network tab in the Google Chrome developers options and then find the path to the captchas and then put that in loop to start downloading them.
Then, use the OpenCV to distill the images that you have downloaded by creating blobs, thresholding and contour defination
Finally, comes the Training part and then testing and validation.
For more info : https://mathematica.stackexchange.com/questions/143691/crack-captcha-using-deep-learning?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

using tesseract on MTG cards

my first hurdle so far is that running tesseract vanilla on images of MTG cards doesn't recognize the card title (honestly that's all I need because I can use that text to pull the rest of the card info from a database) I think the issue might be having to need to train tesseract to recognize the font use in mtg cards but im wondering if it might be an issue with tesseract not looking or not detecting text in a section of the image (specifically the title.)
Edit: including an image of a MTG card for reference.http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=175263&type=card
Ok so, after asking on reddit programming forums I think I found an answer that I am going to pursue:
The training feature of tesseract is indeed for improving rates for unusual fonts, but that's probably not the reason you have low success.
The environment the text is in is not well controlled - the card background can be a texture in one of five colours plus artifacts and lands. Tesseract greyscales the image before processing, so the contrast between the text and background is not sufficient.
You could put your cards through a preprocessor which mutes coloured areas to white and enhances monotones. That should increase the contrast so tesseract can make out the characters.
If anyone still following thsi believes that above path to be the wrong one to start down, please say so.
TLDR
I believe you're on the right track, doing preprocessing.
But you will need doing both preprocessing AND training Tesseract.
Preprocessing
Basically, you want to get the title text, and only the title text, for Tesseract to read. I suggest you follow the steps below:
Identify the borders of the card.
Cut out the title area for further processing.
Convert the image to black'n'white.
Use contours to identify the exact text area, and crop it out.
Send the image you got to Tesseract.
How to create a basic preprocessing is shown in the YouTube video Automatic MTG card sorting: Part 2 - Automatic perspective correction with OpenCV Also have a look at the third part in that serie.
With this said, there is a number of problems you will encounter. How to handle split cards? Will your algorithm manage white borders? What if the card is rotated or upside-down? just to name a few.
Need for Training
But even if you manage to create a perfect preprocessing algorithm you will still have to train Tesseract. This is due to the special text font used on the cards (which happens to be different fonts depending on the age of the card!).
Consider the card "Kinjalli's Caller".
http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=435169&type=card
Note how similar the "j" is to the "i". An untrained Tesseract tend to mix them up.
Conclusion
All this considered, my answer to you is that you need to do both preprocessing of the card image AND to train Tesseract.
If you're still interested I would like to suggest that you have a look at this MTG-card reading project on GitHub. That way you don't have to reinvent the wheel.
https://github.com/klanderfri/ReadMagicCard

Using tesseract to recognize license plates

I'm developing an app which can recognize license plates (ANPR). The first step is to extract the licenses plates from the image. I am using OpenCV to detect the plates based on width/height ratio and this works pretty well:
But as you can see, the OCR results are pretty bad.
I am using tesseract in my Objective C (iOS) environment. These are my init variables when starting the engine:
// init the tesseract engine.
tesseract = new tesseract::TessBaseAPI();
int initRet=tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], [language UTF8String]);
tesseract->SetVariable("tessedit_char_whitelist", "BCDFGHJKLMNPQRSTVWXYZ0123456789-");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("load_system_dawg", "0");
How can I improve the results? Do I need to let OpenCV do more image manipulation? Or is there something I can improve with tesseract?
Two things will fix this completely:
Remove everything which is not text from the image. You need to use some CV to find the plate area (for example by color, etc) and then mask out all of the background. You want the input to tesseract to be black and white, where text is black and everything else is white
Remove skew (as mentioned by FrankPI above). tesseract is actually supposed to work okay with skew (see "Tesseract OCR Engine" overview by R. Smith) but on the other hand it doesn't always work, especially if you have a single line as opposed to a few paragraphs. So removing skew manually first is always good, if you can do it reliably. You will probably know the exact shape of the bounding trapezoid of the plate from step 1, so this should not be too hard. In the process of removing skew, you can also remove perspective: all license plates (usually) have the same font, and if you scale them to the same (perspective-free) shape the letter shapes would be exactly the same, that would help text recognition.
Some further pointers...
Don't try to code this at first: take a really easy to OCR (ie: from directly in front, no perspective) picture of a plate, edit it in photoshop (or gimp) and run it through tesseract on the commandline. Keep editing in different ways until this works. For example: select by color (or flood select the letter shapes), fill with black, invert selection, fill with white, perspective transform so corners of plate are a rectangle, etc. Take a bunch of pictures, some harder (maybe from odd angles, etc). Do this with all of them. Once this works completely, think about how to make a CV algorithm that does the same thing you did in photoshop :)
P.S. Also, it is better to start with higher resolution image if possible. It looks like the text in your example is around 14 pixels tall. tesseract works pretty well with 12 point text at 300 dpi, this is about 50 pixels tall, and it works much better at 600 dpi. Try to make your letter size be at least 50 preferably 100 pixels.
P.P.S. Are you doing anything to train tesseract? I think you have to do that, the font here is different enough to be a problem. You probably also need something to recognize (and not penalize) dashes which will be very common in your texts, looks like in the second example "T-" is recognized as H.
I don't know tesseract too much, but I have some information about OCR. Here we go.
In an OCR task you need to be sure that, your train data has the same font that you are trying to recognize. Or if you are trying to recognize multiple fonts, be sure that you have those fonts in your train data to get best performance.
As far as I know, tesseract applies OCR in few different ways: One, you give an image which has multiple letters in it and let tesseract do the segmentation. And other, you give segmented letters to tesseract and only expect it to recognize the letter. Maybe you can try to change the one which you are using.
If you are training recognizer by yourself be sure that you have enough and equally amount of each letter in your train data.
Hope this helps.
I've been working on an iOS app, if you need to improve the results you should train tesseract OCR, this improved 90% for me. Before tranning, OCR results were pretty bad.
So, I used this gist in the past to train tesseract ORC with a licence plate font.
If you are interested, I open-sourced this project some weeks ago on github
Here is my real world example with trying out OCR from my old power meter. I would like to use your OpenCV code so that OpenCV does automatic cropping of image, and I'll do image cleaning scripts.
First image is original image (croped power meter numbers)
Second image is slightly cleaned up image in GIMP, around 50% OCR accuracy in tesseract
Third image is completely cleaned image - 100% OCR recognized without any training!
Now License Plate can be easily recognized by mlmodel. I have created the core model you can find it here . You just need to split characters in 28*28 resolution through vision framework and send this image to VNImageRequestHandler like given below-
let handler = VNImageRequestHandler(cgImage: imageUI.cgImage!, options: [:])
you will get desired results by using my core mlmodel. Use this link for better clarification but use my model for better results in license plate recognition. I have also created the mlmodel for License Plate Recognition.

How to train tesseract to recognize small numbers in low DPI?

I get data from video so there is not way for me to rescan the image, but I can scale them if necessary.
I do have only a limited number of characters, 1234567890:, but I have no control over the dpi of the original image or the font.
I tried to train tesseract but without any visible effect, the test project is located at https://github.com/ssbarnea/tesseract-sample but the current results are really bad.
Example of original image being captured:
Example of postprocessed image for OCR:
How can I improve the OCR process in this case?
You can try to add some extra space at the edges of the image, sometimes it helps for tesseract. However, opensource OCR engines are very sensitive to the source image DPI.

Resources