I am currently doing a project on iOS. The app I am developing is supposed to take a picture from a check (cheque?), and read the CMC7 number which is written at the bottom of the check.
Currently, I am working on it with openCV, because of the work that was previously done on the project before I arrived, but:
Is openCV better than Tesseract for that kind of job?
The difficulty here consists in the font that is used, which is this one :
http://www.dafont.com/fr/cmc7.font
As you can imagine, usual OCR can't recognize this font because of its shape. I think that the best way to do this job is to use the barcode of the font in order to recognize it, and not using the shape of the characters.
The think is that from what I know, Tesseract can recognize different kinds of fonts, and we can train it to a specific font, but what about this font that is used for the CMC7?
If I want to work on the barcode, is there a way to do it with Tesseract, or can't it only be used for font recognition?
We have the same problem. I don't think that it is possible to obtain features from cmc7 in barcode manner. Because you have different stroke height and position inside the number placeholder. I'm not familiar with Tesseract but for all type of correlators you may to choose features that strongly defines categories with low variance between samples in category. We are thinking to use the scale invariant features like LBP , HOG or eigenvectors to eliminate data loosing after interpolation.
Related
Most game anti-cheat use heuristic approach such as detecting known binaries signature or preventing third party library injection. But, Valve software use deep learning to combat cheat. Valve feed its AI with view angle, fire rate, etc. And its quite working good.
My question is, how do i make such thing but with images instead of data?
Consider this example
Not - Cheat :
Cheating :
Is it possible to make a model like that?
Well images are also just data.
You can seperate each image and its pixels into f.e. its raw numbers for like rgb.
This way you could model a network based on converted inputs of pixels from your image.
In this example, the pattern would probably just recognize huge spikes of those vibrant colors since those value will differ alot from the usual environment.
If the question aims to archive some kind of "visual cheat detection" and is not all about deep-learning, you could simply check the images pixels manually, if you know the color of your "cheat"-overlay or simply detect differences, and flag them this way.
I am currently willing to implement an iOS app that uses OCR to compute poker stats (you put your cards on a table, then take a picture with your iPhone camera and then magic happens). I know that OpenCV for iOS is the way to go but I don't find any code sample to also recognize the color (spade, heart, club, diamond) of the cards. How can I do it?
There are different ways of "understanding" the picture and each way has it's own pros and cons. template matching will not be a good idea since the cards are different and simply a very round heart and somewhat sharp and pointy heart would be the same but for template matching it would be a totally different "heart" , If you are sure that the user is going to input 2 cards than you would rather crop the cards and separate them. This can be done with simply snap color detection ( use canny edge detector to detect edges). Then you want to search for all the suits and find which one got the best result. You can use the "BOW" (bag of words approach) (google it a little bit) it's about building a visual vocabulary and simply with the frequency of visual words you must be able to tell which is which.
Generally nothing can give you a 100% guarantee but with BOW you can pull out some interesting results.
my first hurdle so far is that running tesseract vanilla on images of MTG cards doesn't recognize the card title (honestly that's all I need because I can use that text to pull the rest of the card info from a database) I think the issue might be having to need to train tesseract to recognize the font use in mtg cards but im wondering if it might be an issue with tesseract not looking or not detecting text in a section of the image (specifically the title.)
Edit: including an image of a MTG card for reference.http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=175263&type=card
Ok so, after asking on reddit programming forums I think I found an answer that I am going to pursue:
The training feature of tesseract is indeed for improving rates for unusual fonts, but that's probably not the reason you have low success.
The environment the text is in is not well controlled - the card background can be a texture in one of five colours plus artifacts and lands. Tesseract greyscales the image before processing, so the contrast between the text and background is not sufficient.
You could put your cards through a preprocessor which mutes coloured areas to white and enhances monotones. That should increase the contrast so tesseract can make out the characters.
If anyone still following thsi believes that above path to be the wrong one to start down, please say so.
TLDR
I believe you're on the right track, doing preprocessing.
But you will need doing both preprocessing AND training Tesseract.
Preprocessing
Basically, you want to get the title text, and only the title text, for Tesseract to read. I suggest you follow the steps below:
Identify the borders of the card.
Cut out the title area for further processing.
Convert the image to black'n'white.
Use contours to identify the exact text area, and crop it out.
Send the image you got to Tesseract.
How to create a basic preprocessing is shown in the YouTube video Automatic MTG card sorting: Part 2 - Automatic perspective correction with OpenCV Also have a look at the third part in that serie.
With this said, there is a number of problems you will encounter. How to handle split cards? Will your algorithm manage white borders? What if the card is rotated or upside-down? just to name a few.
Need for Training
But even if you manage to create a perfect preprocessing algorithm you will still have to train Tesseract. This is due to the special text font used on the cards (which happens to be different fonts depending on the age of the card!).
Consider the card "Kinjalli's Caller".
http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=435169&type=card
Note how similar the "j" is to the "i". An untrained Tesseract tend to mix them up.
Conclusion
All this considered, my answer to you is that you need to do both preprocessing of the card image AND to train Tesseract.
If you're still interested I would like to suggest that you have a look at this MTG-card reading project on GitHub. That way you don't have to reinvent the wheel.
https://github.com/klanderfri/ReadMagicCard
I'm developing an app which can recognize license plates (ANPR). The first step is to extract the licenses plates from the image. I am using OpenCV to detect the plates based on width/height ratio and this works pretty well:
But as you can see, the OCR results are pretty bad.
I am using tesseract in my Objective C (iOS) environment. These are my init variables when starting the engine:
// init the tesseract engine.
tesseract = new tesseract::TessBaseAPI();
int initRet=tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], [language UTF8String]);
tesseract->SetVariable("tessedit_char_whitelist", "BCDFGHJKLMNPQRSTVWXYZ0123456789-");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("load_system_dawg", "0");
How can I improve the results? Do I need to let OpenCV do more image manipulation? Or is there something I can improve with tesseract?
Two things will fix this completely:
Remove everything which is not text from the image. You need to use some CV to find the plate area (for example by color, etc) and then mask out all of the background. You want the input to tesseract to be black and white, where text is black and everything else is white
Remove skew (as mentioned by FrankPI above). tesseract is actually supposed to work okay with skew (see "Tesseract OCR Engine" overview by R. Smith) but on the other hand it doesn't always work, especially if you have a single line as opposed to a few paragraphs. So removing skew manually first is always good, if you can do it reliably. You will probably know the exact shape of the bounding trapezoid of the plate from step 1, so this should not be too hard. In the process of removing skew, you can also remove perspective: all license plates (usually) have the same font, and if you scale them to the same (perspective-free) shape the letter shapes would be exactly the same, that would help text recognition.
Some further pointers...
Don't try to code this at first: take a really easy to OCR (ie: from directly in front, no perspective) picture of a plate, edit it in photoshop (or gimp) and run it through tesseract on the commandline. Keep editing in different ways until this works. For example: select by color (or flood select the letter shapes), fill with black, invert selection, fill with white, perspective transform so corners of plate are a rectangle, etc. Take a bunch of pictures, some harder (maybe from odd angles, etc). Do this with all of them. Once this works completely, think about how to make a CV algorithm that does the same thing you did in photoshop :)
P.S. Also, it is better to start with higher resolution image if possible. It looks like the text in your example is around 14 pixels tall. tesseract works pretty well with 12 point text at 300 dpi, this is about 50 pixels tall, and it works much better at 600 dpi. Try to make your letter size be at least 50 preferably 100 pixels.
P.P.S. Are you doing anything to train tesseract? I think you have to do that, the font here is different enough to be a problem. You probably also need something to recognize (and not penalize) dashes which will be very common in your texts, looks like in the second example "T-" is recognized as H.
I don't know tesseract too much, but I have some information about OCR. Here we go.
In an OCR task you need to be sure that, your train data has the same font that you are trying to recognize. Or if you are trying to recognize multiple fonts, be sure that you have those fonts in your train data to get best performance.
As far as I know, tesseract applies OCR in few different ways: One, you give an image which has multiple letters in it and let tesseract do the segmentation. And other, you give segmented letters to tesseract and only expect it to recognize the letter. Maybe you can try to change the one which you are using.
If you are training recognizer by yourself be sure that you have enough and equally amount of each letter in your train data.
Hope this helps.
I've been working on an iOS app, if you need to improve the results you should train tesseract OCR, this improved 90% for me. Before tranning, OCR results were pretty bad.
So, I used this gist in the past to train tesseract ORC with a licence plate font.
If you are interested, I open-sourced this project some weeks ago on github
Here is my real world example with trying out OCR from my old power meter. I would like to use your OpenCV code so that OpenCV does automatic cropping of image, and I'll do image cleaning scripts.
First image is original image (croped power meter numbers)
Second image is slightly cleaned up image in GIMP, around 50% OCR accuracy in tesseract
Third image is completely cleaned image - 100% OCR recognized without any training!
Now License Plate can be easily recognized by mlmodel. I have created the core model you can find it here . You just need to split characters in 28*28 resolution through vision framework and send this image to VNImageRequestHandler like given below-
let handler = VNImageRequestHandler(cgImage: imageUI.cgImage!, options: [:])
you will get desired results by using my core mlmodel. Use this link for better clarification but use my model for better results in license plate recognition. I have also created the mlmodel for License Plate Recognition.
I am building an iOS app that, as a key feature, incorporates image matching. The problem is the images I need to recognize are small orienteering 10x10 plaques with simple large text on them. They can be quite reflective and will be outside(so the light conditions will be variable). Sample image
There will be up to 15 of these types of image in the pool and really all I need to detect is the text, in order to log where the user has been.
The problem I am facing is that with the image matching software I have tried, aurasma and slightly more successfully arlabs, they can't distinguish between them as they are primarily built to work with detailed images.
I need to accurately detect which plaque is being scanned and have considered using gps to refine the selection but the only reliable way I have found is to get the user to manually enter the text. One of the key attractions we have based the product around is being able to detect these images that are already in place and not have to set up any additional material.
Can anyone suggest a piece of software that would work(as is iOS friendly) or a method of detection that would be effective and interactive/pleasing for the user.
Sample environment:
http://www.orienteeringcoach.com/wp-content/uploads/2012/08/startfinishscp.jpeg
The environment can change substantially, basically anywhere a plaque could be positioned they are; fences, walls, and posts in either wooded or open areas, but overwhelmingly outdoors.
I'm not an iOs programmer, but I will try to answer from an algorithmic point of view. Essentially, you have a detection problem ("Where is the plaque?") and a classification problem ("Which one is it?"). Asking the user to keep the plaque in a pre-defined region is certainly a good idea. This solves the detection problem, which is often harder to solve with limited resources than the classification problem.
For classification, I see two alternatives:
The classic "Computer Vision" route would be feature extraction and classification. Local Binary Patterns and HOG are feature extractors known to be fast enough for mobile (the former more than the latter), and they are not too complicated to implement. Classifiers, however, are non-trivial, and you would probably have to search for an appropriate iOs library.
Alternatively, you could try to binarize the image, i.e. classify pixels as "plate" / white or "text" / black. Then you can use an error-tolerant similarity measure for comparing your binarized image with a binarized reference image of the plaque. The chamfer distance measure is a good candidate. It essentially boils down to comparing the distance transforms of your two binarized images. This is more tolerant to misalignment than comparing binary images directly. The distance transforms of the reference images can be pre-computed and stored on the device.
Personally, I would try the second approach. A (non-mobile) prototype of the second approach is relatively easy to code and evaluate with a good image processing library (OpenCV, Matlab + Image Processing Toolbox, Python, etc).
I managed to find a solution that is working quite well. Im not fully optimized yet but I think its just tweaking filters, as ill explain later on.
Initially I tried to set up opencv but it was very time consuming and a steep learning curve but it did give me an idea. The key to my problem is really detecting the characters within the image and ignoring the background, which was basically just noise. OCR was designed exactly for this purpose.
I found the free library tesseract (https://github.com/ldiqual/tesseract-ios-lib) easy to use and with plenty of customizability. At first the results were very random but applying sharpening and monochromatic filter and a color invert worked well to clean up the text. Next a marked out a target area on the ui and used that to cut out the rectangle of image to process. The speed of processing is slow on large images and this cut it dramatically. The OCR filter allowed me to restrict allowable characters and as the plaques follow a standard configuration this narrowed down the accuracy.
So far its been successful with the grey background plaques but I havent found the correct filter for the red and white editions. My goal will be to add color detection and remove the need to feed in the data type.